Cloud computing platforms like Microsoft Azure and Amazon Web Services (AWS) are powerful because we can program them to respond to our application requirements automatically. Engineers can innovate really fast, spinning resources up and down on demand, and we only pay for what we use.
But constant change brings risk of misconfiguration that frequently results in compliance violations, security incidents, or major data breaches. That’s where Cloud Security Posture Management (CSPM) comes in to help ensure that cloud environments are configured securely and in accordance with various compliance policies.
“Nearly all successful attacks on cloud services are the result of customer misconfiguration, mismanagement and mistakes. Security and risk management leaders should invest in cloud security posture management processes and tools to proactively and reactively identify and remediate these risks.”
- Neil MacDonald, distinguished VP analyst at Gartner (Innovation Insight for Cloud Security Posture Management, Gartner 2019)
CSPM is focused on detecting and remediating cloud misconfiguration vulnerabilities that can lead to compliance violations and data breaches—and doing so before the bad guys can find and exploit them using their own discovery automation tools. Even if you follow best practices in configuring your cloud resources securely upon provisioning, configuration drift between approved deployments is inevitable—drift that often results in dangerous misconfigurations.
Traditional security analysis and alerting tools can’t detect and prevent modern cloud misconfiguration attacks, which don’t traverse traditional networks and don’t typically leave any noticeable trace. The key here is prevention, and that means using automated remediation for security-critical resources to correct misconfigurations before they can be exploited.
While there are likely a number of different security-critical cloud resources in your environment for which misconfiguration poses serious risk, it’s generally a good rule to start with your network (Azure Virtual Network) and the cloud equivalent of firewalls (Network Security Groups) when implementing automated remediation.
A dynamically-generated visual diagram of a simple Azure environment using Fugue.
The Risk of Azure Network Security Group Misconfiguration
Azure Network Security Groups (NSG) enable you to control a resource’s inbound and outbound traffic, and they can undergo a considerable amount of change. The reason for this is simple: manually adding NSG rules is the easiest and fastest way to grant oneself access to a resource in order to perform a task, such as maintenance on an instance. It’s critical to identify and correct these misconfigurations because bad actors use automation to detect them within minutes.
Some examples of NSG misconfigurations that can lead to serious security incidents include:
- Open ports for 0.0.0.0/0. This is a common NSG misconfiguration that opens up access to the Internet. It’s typically caused when someone on your team opens up this rule so they can perform a task, such as troubleshooting an instance or obtaining logs, and then often forget to delete the rule once finished.
- Someone opens port 22, 3389, or a database port to perform maintenance on an instance, leaving behind an open door for hackers to discover.
- An NSG is modified by closing an open port, but that port is relied upon by another user of the same NSG, taking down the application.
The Risk of Azure Virtual Network and Subnet Misconfiguration
Azure Virtual Networks (VNet) are dedicated networks for your Azure accounts that are logically isolated from other VNets on Azure. A VNet can include one or more subnets.
Some examples of VNet and subnet misconfigurations that can lead to serious security incidents include:
- A change to route tables or a network access control list (ACL) that leaves a network exposed to unauthorized traffic.
- Lost or modified service tags that render the VNet hidden to management and security tools that track cloud resources using tags.
Automated Remediation is Mandatory for Critical VNets and NSGs
Any action you take in the cloud can be accomplished via APIs—such as provisioning, updating, monitoring, and discovering cloud resources. This means everything is programmable and can be automated. But one area where automation has been slow to catch on is security, often due to a historic lack of trust in security automation.
But this is starting to change as teams and organizations are realizing the risks involved and the scale and complexity of manually managing cloud security at scale. Here are some of the primary drivers behind automated remediation:
- Bad actors have automation, and they’re faster than you. Take a look at the small handful of misconfiguration examples listed earlier. Hackers use tools that scan the internet looking for these, and others, in order to identify opportunities to exploit. It takes mere minutes for these tools to find your misconfiguration and swing into action and exploit your data. No human can outrun automation. Evaluate your effectiveness against misconfiguration by measuring your Mean Time to Remediation (MTTR) for security-critical resources.
- Humans are error-prone and shouldn’t be remediating misconfiguration. If your cloud misconfiguration response plan involves humans looking at screens full of alerts, prioritizing which misconfiguration events are critical, and then remediating them, you risk human error in miscategorizing them and again when remediating them.
- The cloud at scale is too vast and complex for humans to keep track of, let alone secure. Modern cloud environments can involve thousands of resources that span multiple regions and availability zones. Multiply that by the number of possible configuration attributes, and it’s effectively infinite. Even if you have an army of cloud security experts pointed at remediating misconfiguration, you’ll still be too slow and too error prone to sleep well at night.
Considerations When Implementing Automated Remediation
When it comes to selecting a solution for auto-remediation of cloud misconfiguration, it’s important to evaluate it based on your needs and team. Does the solution require you to code and maintain automation scripts (or bots), and how many will you need? Does it cover all of your security-critical resources and configuration attributes, or only some of them? Will it be “context aware” regarding system requirements to prevent destructive changes that take down your application?
These are just some questions you should ask. You’ll likely have more to add based on your environment, processes, and use case.
No team should set out to implement automated remediation for their entire cloud infrastructure environment. And it may not be necessary to do so for every single VNet and Network Security Group. Here’s how to know where to focus:
- Discover your resources. Identify where sensitive data resides (such as Azure Blob Storage, Azure Cosmos DB, and SQL Database). Then identify the VNets and NSGs associated with those resources—these are your ideal first candidate resources for automated remediation.
- Assess the current security posture of those resources. If you’re operating under a compliance framework like SOC2, PCI, and HIPAA, all include controls for how VNets and NSGs should be configured. If none apply, use the Azure CIS Foundations Benchmark. Make any necessary changes to address any policy violations.
- Implement your automated remediation solution for these resources. But before you deploy your solution in production, test it first! Deploy your solution to a test or staging environment that resembles production, and manually introduce misconfigurations to test how your solution handles the event, including whether or not it breaks the application. Once it’s passed your tests, then deploy to production.
One you’ve successfully implemented automated remediation for your security-critical VNets and NSGs, you’re ready to expand its use to your other security-critical resources.
Side Note: Nearly every enterprise cloud environment contains resources that shouldn’t be there. It’s not uncommon for 20-30% of a cloud environment to be comprised of idle “orphaned” resources, which typically aren’t patched with the latest security updates or scanned for misconfigurations. The first step in locking down the security of your Azure environment is establishing clear visibility into everything you have running, how it’s all configured, and how they relate to each other. Get rid of the orphans before they turn into dangerous zombie resources in your cloud.
While you’re here…
Fugue is cloud security for developers, by developers. We make tools that bake security into the entire system lifecycle on the cloud. We’d love to show you how.
With Fugue, you can:
- Get complete visibility into your cloud environment and configurations with dynamic visualization tools.
- Validate cloud infrastructure compliance for a number of policy frameworks like CIS Foundations Benchmark, HIPAA, PCI, SOC 2, NIST 800-53, ISO 27001, and GDPR.
- Protect against cloud misconfiguration with baseline enforcement to make security critical cloud infrastructure self-healing.
- Shift Left on cloud security and compliance with CI/CD integration to help your developers move fast and safely.
- Get continuous compliance visibility and reporting across your entire enterprise cloud footprint.