One aspect of cloud computing platforms like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP) is that it’s easier to create infrastructure resources than it is to destroy them. Even more challenging is maintaining full visibility over all of your cloud resources. Corey Quinn once said, and I’m paraphrasing, “the only way to see everything you have running in your AWS account is to look at your AWS bill.”
Not surprisingly, cloud customers wind up running—and paying for—more cloud infrastructure than they actually need. It’s not uncommon for enterprises operating at scale on cloud to be unable to account for 20% or more of the cloud resources they have running. While some of those untracked resources may still serve legitimate business uses (which itself is concerning if you aren’t tracking them), much of it is “orphaned infrastructure”—idle cloud resources in our environment that serve no business purpose.
Predictably, entire categories of consultants and product vendors have emerged to help customers get their cloud bills under control by helping find and terminate orphaned resources. Cloud “sprawl” has been a major problem since the beginning of the cloud era and it shows no signs of abating, which is a testament to just how challenging the problem really is.
But while orphaned resources are widely recognized as a cost problem that must be managed, few realize the significant security threat these untracked and unmanaged resources pose. From a security perspective, these costly orphans are actually dangerous zombies.
From Orphans to Zombies: Why untracked cloud resources are a security risk
The number one cause of data breaches in the cloud is misconfiguration, which is the responsibility of the cloud customer. According to Neil MacDonald at Gartner, “nearly all successful attacks on cloud services are the result of customer misconfiguration, mismanagement and mistakes.”
Eliminating cloud misconfiguration for the legitimate infrastructure we know about challenges the best DevOps and cloud security engineers. Large scale environments can involve tens of thousands of resources that span hundreds of accounts and involve effectively infinite configuration possibilities.
Now consider zombie cloud resources, which by definition, cloud and security teams aren’t tracking or managing. These can include resources such as EC2 instances, S3 buckets, EBS snapshots, AMIs, IAM users, VPC networks, and security groups. I’m using AWS examples here, but each cloud platform has its analogs and similar risk.
The misconfiguration risk posed by zombie cloud infrastructure isn’t just theoretical. It’s commonplace for cloud customers to discover zombie resources when they use visualization tools that generate cloud infrastructure diagrams. Humans are visual, and when they can visualize their cloud environment, zombies jump out from where they previously hid undetected in resource lists and cloud bills.
By definition, zombie cloud resources are:
- not included in your management and security tools
- not scanned for misconfiguration vulnerabilities
- not patched with the latest security updates
- not validated for compliance
- not cycled out via immutable infrastructure practices
And if a bad actor exploits zombie resources in your environment in order to breach your data, would you notice? Misconfiguration attacks involving infrastructure you know about are notoriously difficult to detect, even after the fact. If it involves your resources that you don’t know about, you’re probably not going to find out.
Addressing zombie cloud resource risk
If you have AWS, Azure, or GCP environments running at scale, you should assume the presence of zombie resources. Eliminating the zombies you have and preventing them in the future will improve your security posture and save you money. A great side effect of cloud security is, when done correctly up front, it usually improves your bottom line.
Here’s some tips on how to effectively address the problem:
1. Establish full and continuous visibility into your environment
Getting visibility into cloud environments is possible because every attribute of every resource you have running is discoverable via cloud provider APIs. There are tools that can provide this for you (Fugue is one of them). Prioritize visual diagrams over resource lists in a table to make it easy to identify zombies that don’t belong by seeing the layout and relationships of the resources.
Fugue generates visual, interactive diagrams of your cloud environment like the one shown here:
2. Identify and terminate zombie resources
Visual diagrams of your environments using a product like Fugue can help you identify zombies, but there are other tools available to help you determine which ones are true zombies:
- Here’s a script that can find unused AMI snapshots using the AWS CLI (developed by Dave Williams at New Light Technologies).
- Another script that can help you find unused AWS Security Groups using the AWS CLI (again from Dave at New Light Technologies—thanks Dave!)
- Use the AWS Console to find unattached EBS volumes
- Use a product like Fugue to find security groups with RDP or SSH ports open to the Internet
- Use the AWS Console to find orphaned IAM accounts.
Make sure you terminate zombies once you’ve identified them.
3. Enforce resource tags and establish effective tagging conventions
Using tags is one of the best ways to help you track and manage your cloud resources, but you need to establish tagging conventions and enforce them. Use resource names that are human readable and a point of contact, project name, and deployment date for each resource.
4. Embrace Infrastructure as Code and automated pipelines
If you’re operating at some scale, you’ll want to adopt an infrastructure as code tool like HashiCorp Terraform or AWS CloudFormation, and an automated CI/CD toolchain like Jenkins or AWS CodePipeline. It will be easier to keep track of what you’re doing and identify resources created or modified outside of your pipeline.
5. Include dev environments in your security plan
Few realize that dev environments can pose a security risk to production infrastructure and data. A lot can be learned from exploited dev resources that can be leveraged elsewhere, so while we don’t want to slow developers down with too many restrictions, we want to give them the tools to do their work security and clean up leftover resources when they’re done.
6. Conduct routine audits and compliance reporting
Audits and compliance reporting are probably already happening at your organization, but it goes without saying that if you’re doing steps one through five, they’ll make this process easier. Make sure you’re including orphaned/zombie resource identification and elimination a part of your audit process.
While you’re here…
Fugue is cloud security for developers, by developers. We make tools that bake security into the entire system lifecycle on the cloud. We’d love to show you how.
With Fugue, you can:
- Validate cloud infrastructure compliance for a number of policy frameworks like CIS Foundations Benchmark, HIPAA, PCI, SOC 2, NIST 800-53, ISO 27001, and GDPR.
- Get complete visibility into your cloud environment and configurations with dynamic visualization tools.
- Protect against cloud misconfiguration with baseline enforcement to make security critical cloud infrastructure self-healing.
- Shift Left on cloud security and compliance with CI/CD integration to help your developers move fast and safely.
- Get continuous compliance visibility and reporting across your entire enterprise cloud footprint.