At Fugue, we are obsessed with infrastructure baselines and especially with how they are utilized to correct cloud resource misconfiguration and drift—the leading cause of cloud-based data breaches. Baselines are a relatively new concept, so we thought an informative blog post about baselines, what they are, why organizations need them, and how organizations can get started with baselines, would be a great introduction to baselines. So let’s get started.
What is an Infrastructure Baseline?
A baseline is a snapshot of a “known-good” configuration of cloud infrastructure. It is a complete picture of a cloud environment and defines every resource with all of its attributes. This is more detailed than an infrastructure as code file, which typically only defines a resource and a small set of attributes, but leaves out the default attributes. A baseline contains every detail, so for example, a baseline of an AWS VPC will specify all of the ACLs, subnets, and route tables.
Baseline as a Complete Picture
The concept of a baseline as a complete picture of infrastructure has only become possible because of cloud computing. It’s a lot like a map versus a photograph. A map is incomplete and only focuses on certain features such as the exit numbers or street names. But a photograph shows everything with all of its details.
Before the cloud, a traditional data center was more of a map than a photograph. You could see boxes and even how they are connected, but the data center was still full of mystery. For example, if you look at a switch, you have to read the procedural code configuring the switch to understand what it is doing. But with the cloud, the infrastructure configuration is exposed and configured via an API. Everything is discoverable and can be understood. Because of this, a baseline is a 100% resolution picture of a cloud infrastructure environment that the industry has never had before.
Security Teams Can Be Viewed as “The Bad Guy”
In smaller or less developed organizations creating applications, the Security team is often working from a big binder or Excel spreadsheet of required security controls. That Excel spreadsheet contains a list of rules that serves as loose guidance and may be applied in ambiguous ways, or not at all.
For example, the NIST 800-53 standard specifies more than 1,000 security controls that are all subject to interpretation. Applying all of the controls may even be compared to memorizing a phone book – it is a manual and tedious process that is highly subject to error.
This entire process is highly inefficient. In the agile development and deployment process, where teams are releasing frequent updates, rejecting their work late in the process because it violates security or policy controls wastes a lot of development time and delays innovation. Because the Security team is enforcing compliance controls only after significant development work has been completed, they will be inadvertently portrayed as “the bad guy” putting up roadblocks at the end of the development process.
In more sophisticated organizations, this adversarial relationship between DevOps and Security is exacerbated when security engineers write remediation scripts that programmatically enforce security or policy controls, such as closing ports that shouldn’t be opened. It’s not uncommon for security teams to automate destructive changes, such as accidentally closing a port that is required by an application to work properly, resulting in system downtime.
In one case that’s unfortunately all too common, a Security team implemented a script that disabled the enterprise’s production DNS servers because the script perceived the servers to violate security policy. Since DNS queries take time to propagate throughout a network, this created an extended outage. These types of incidents create distrust and motivate application development teams to oppose any attempts by Security to automate the enforcement of security controls.
How do organizations address the distrust?
Infrastructure as Code Helps, But Is Incomplete
Infrastructure as code (IaC) helps to address this organizational distrust between teams, but it only provides part of the solution. Developers can use a tool such as Terraform, CloudFormation, or Ansible to define templates for their cloud environments. In theory, these templates should inform stakeholders what cloud resources will actually look like and whether they conform with enterprise security policies.
Infrastructure as code templates, however, are not deterministic -- it is impossible to fully know what infrastructure deployed via CloudFormation or Terraform will actually look like until resources are actually running in the cloud.
Baselines are the Answer
Only an approach like baselining provides the details and high fidelity to enable different teams to collaborate effectively. Because every configuration attribute is spelled out, there is no ambiguity about whether a specific resource is compliant with enterprise security policy, such as requiring logging to be enabled or all network traffic to be based on HTTPS instead of HTTP. Every relevant compliance control can be run against the baseline to verify whether the control will pass or fail.
How to Get Started with Baselining
The most straightforward way for enterprises to get started using secure infrastructure baselines may seem counterintuitive: give developers complete freedom in their development process in order to innovate. As they are working, they may create environments that may not adhere strictly to all security policies but are helpful for them to accomplish business goals. For example, they may set overly permissive IAM or network policies to quickly create functioning environments.
As soon as developers have created something tangible in their development environment, they should use automated tools provided or approved by the Security or Compliance team to determine whether the infrastructure meets policy. Security and compliance are necessities that are not going away. It is better to empower developers to automate policy checks early on in the software development life cycle instead of at the very end to avoid potentially rearchitecting or rewriting certain components that violate policy.
Start with a Few Security Requirements
The simplest starting point is to begin with one or a few security requirements and gradually enforce more as application functionality evolves. For example, the security team could start immediately in ensuring that databases are encrypting data-at-rest with customer-managed keys. As developers add logging functionality to various components, the security team can enforce that logging is always enabled on each one. And as multiple components or services communicate with one another, the security team can enforce that HTTPS is used instead of HTTP.
Developers should use automated tools to scan infrastructure for policy violations as a step in the development process, ideally in a CI/CD pipeline. Any violations are logged as errors that can potentially abort the infrastructure deployment process.
A simple introduction to baselining is to first scan your development environment for policy violations, then scan your production environment soon afterward. Scans of your cloud environment can be completed in as little as 10 minutes. These scans can reveal policy violations in the development sandboxes and if they are deemed alarming, a scan of the production environments should be completed. Inevitably some of these same violations will exist in production and can expose your organization to unforeseen risks.
Visit Fugue to learn more about how our solution utilizes baselines and codeless auto-remediation to enforce continuous compliance.