In the last part of this series, we're going to look at the final stages of the software development life cycle (SDLC)—deployment and operations. As a reminder, in parts one and two, we discussed the overall concept of shifting left for security and compliance, and laid out some best practices for how to do so during the development and testing phases of the SDLC. In this post, we'll cover how using policy as code and baselines allows you to leverage all the work done in the earlier phases to prevent deployment of misconfigurations and ensure that your deployed infrastructure remains functional and compliant over time.
Deployment: Preventing cloud misconfiguration
In the development and testing phases, policy as code checks are used to generate error messages for security issues so that they can be addressed, and when we get to final deployment, we can use these checks to fail a build if the checks aren't passed. For example, if we are following GDPR rules, we must have encryption turned on in transit and at rest. By integrating your infrastructure as code and policy as code with your CI/CD pipeline, this can be checked in staging, and if a DynamoDB table isn't encrypted, the error message from Fugue can inform the CI/CD tool that a critical error occurred, which can fail the build. This forms an automated gating mechanism for ensuring the policies are being met.
If you've done your job well during development and integration with policy as code, there should be no surprises when you are headed to staging and production, as these errors should have already been caught. However, it's always good to do a final check prior to making changes to production. Deployment checking is very important, but is really only the beginning of ensuring that the production environment is secure. Drift happens for many reasons, so we must continue our security automation into the deployed environment.
Post-Deployment: Finding and fixing drift
Once a system has been deployed to the cloud, it is critical to continue to check it for compliance and security over time. You may think that checking at deploy time is adequate, but in every cloud environment I've ever seen, unknown and unexpected drift occurs. Often, I've had people tell me that they have this under control via their processes and procedures, but in every case, once we look for drift, they have a lot of it going on. This may be due to maintenance windows where people leave ports open, bug fixes that create new exposures, or even bad actors. Also, you'll want an historical record of your security and compliance posture for auditing and incident response purposes.
Let’s cover the three main things you'll want to do in production:
Just as you scanned your development, test, and staging environments for compliance with policy, you'll want to do it on an ongoing basis once you've deployed to production. There are two main reasons for this: to make sure you are constantly aware of your compliance posture, and to have a record over time of compliance for audits and investigations.
The good news is that this is very easy to do, once you've shifted policy left. Simply use the same policy as code on a regular basis to validate what is running in production. At Fugue, we often see customers run these compliance validations on an hourly or daily basis as a regular way of doing business. There are no new rules to define as we use the same policies we did in dev and test for production. This also means that we have no disagreement at any stage of the SDLC for what policies need to be in effect or how to interpret them. Here’s a screenshot of one of these compliance scans.
Baselining and Drift Detection
Immediately upon deployment to production, you'll want to establish a baseline for your new system configuration. The baseline is separate from compliance checking in that it is a complete model of the configuration of the cloud infrastructure, whether or not that particular configuration is related to any policy. For example, you might have a security group that allows ingress on the HTTP/S ports only. The baseline will capture that complete configuration. A policy might say that no open access is allowed on security groups to the SSH port. Let's say someone goes in and disables HTTP/S and opens SSH to pull an instance from operations and do some maintenance (we're not recommending this practice, but it's a simple example) and forgets to put things back to how they are supposed to be. A policy-based security-only solution might detect that the SSH port is open, but it won't know that the HTTP/S ports are supposed to be open. This will yield a multi-team dance on Slack or email to figure out what the correct configuration is. Security will know to close the SSH port, but will need to get DevOps to figure out what ports should be open. With a baseline that captures the entire configuration, this is all aggregated in a single drift notification, as seen below.
Another advantage of baselining and drift detection is that it drastically limits or even eliminates false positive security reports. The baseline has already been approved from a policy and security perspective, so correct configuration is known to the system. For example, you might generally have a policy that there may be no open S3 buckets, but for a particular application that serves public web pages you need an open bucket. This is captured in the baseline, so you won't get notifications that there is an open bucket—you already know that, and it's intentional!
We recommend doing drift detection at least hourly, and keeping a permanent record of all baselines and drifts, which Fugue does by default. You want to know quickly when a drift occurs so you can remediate it, and you'll want a record of the history of the system in case something goes wrong and you need to do forensic analysis of what happened.
Finally, for the most sensitive resources, you'll want to have automated remediation of drift events. In a baselined system, this means that when a drift occurs, you should have automated tooling to revert that drift back to the baseline. Note that this is much more powerful than having scripts or "bots" that look for security problems and attempt to fix them without knowledge of the known-good state of the system.
In the example above, our security group was modified to allow SSH from the world, but also to close the needed HTTP/S ports. With baseline enforcement, the automated remediation fixes both of these drifts, closing the security gap as well as restoring the correct ports. If you were to use scripts or bots, you might close the SSH port automatically, but the HTTP/S ports would remain closed until a ticket was filed and several teams got involved. Baseline enforcement is therefore much more powerful than remediation scripts and bots. Scripts and bots can also have unintended consequences in the form of damage to correct infrastructure. If an S3 bucket should be open to the world, but a security script closes or deletes it, it will break the application.
But What About When I Want to Make a Change?
You might now be thinking that with baselining and baseline enforcement, it'll be a hassle to make intentional changes to your environment. This isn't true. By integrating with the CI/CD pipeline, you can simply use Fugue's API to pause enforcement prior to deployment, and then once the deployment is done, you can create a new baseline and turn enforcement back on. Your CI/CD tool can automatically do these steps, so you'll never have to even think about it.
In this series we described how you can shift left on security for cloud infrastructure, while improving your post-deployment monitoring and enforcement. If you'd like to try this with your own infrastructure, we can get you going in less than an hour with a workshop, which we're happy to do for free.
While you're here...
If you’re operating at scale in the cloud and care about the security and compliance of your cloud infrastructure environments, Fugue can help. With Fugue, you can:
- Validate your cloud environments against a number of compliance policy frameworks like HIPAA, PCI, SOC 2, NIST 800-53, ISO 27001, and GDPR.
- Baseline your cloud environments to get complete and accurate visibility into your cloud infrastructure and configurations.
- Detect baseline drift and make critical resources self-healing to protect against misconfiguration, security incidents, and compliance violations.
- Shift Left on cloud infrastructure security and compliance with CI/CD integration to help your developers move fast and safely.
- Implement continuous compliance visibility and reporting across your entire enterprise cloud footprint.
Schedule a free cloud security workshop or compliance audit to get a handle on your cloud security posture and learn how Fugue can help.