In an earlier blog post, we discussed at a high level how security can shift left regarding cloud infrastructure. In this post, we'll drill in with more detail on how this can be done through the discrete phases of the Software Development Life Cycle (SDLC), beginning with the development phase, and extending through testing, and ultimately all the way to deployment and ongoing operations.
The approach looks like this at a high level:
- Development Phase: Perform security unit tests in the dev/sandbox environments where you are building your infrastructure-as-code.
- Testing Phase: Perform security integration tests in the test/stage environment to catch new security flaws that weren't found during unit testing.
- Production Phase: Baseline the production system's infrastructure and constantly monitor for drift and enable auto-remediation for critical resources.
Here, we’ll focus on the development and testing phases, and we’ll discuss the production phase in a later post. All of these can be carried out in an automated fashion, and in a way that fits in well with how developers, security engineers, and compliance analysts already work and think.
Development Phase: Unit Testing Your Infrastructure
Just as developers are expected to unit test their application code prior to merging into the build, DevOps teams and developers who are developing infrastructure-as-code should unit test their modules for security prior to merging into the stage environment. We already generally do this for efficacy of the infrastructure-as-code by doing test deployments (hopefully via a CI/CD toolchain), and to make sure the outcome is as intended from a functionality and error perspective. If you're using Terraform or CloudFormation, the only way to really know if your module is going to work is to try it out in a sandbox deployment. Often, "does it work" is the end of the unit test process, but this is inadequate from a security and compliance perspective, which means the security feedback happens later in the SDLC, generating rework and slowing down development.
Fugue pioneered the concept of policy-as-code, and the reason we like the approach is that well-conceived code can be used in automation and has consistent results. During the development phase, Fugue can be used as part of the CI/CD toolchain to catch security and compliance errors on the unit of infrastructure-as-code that is in development. For example, let's say I'm working on a new VPC network module that incorporates a new cloud feature. My intention is to offer this module to the rest of the organization, so they won't have to recreate a correct VPC network for each new project. I've written my Terraform or CloudFormation template, which is triggered to run on commit to the source code repository via my Jenkins implementation. I make some changes, use the validation tools available with TF and CF, and then commit those changes, looking to make sure they work as intended in my development sandbox account on the cloud. If you're working in the cloud/DevOps world, this should sound pretty familiar.
Security can shift left here by providing the development teams collections of policy-as-code (these are generally in the box with Fugue for frameworks like SOC 2, GDPR, PCI, HIPAA, NIST 800-53, ISO 27001, etc.) that can be used in a CI/CD integration step to provide immediate feedback to the developer. In our VPC example, while the module I wrote might build successfully, I could have easily missed some big security issues, such as not turning off all access for the default security group, or leaving VPC Flow Logs off. When Jenkins fires a Fugue scan after the build, I now get immediate feedback on where I'm not meeting policy. I don't need to remember the policies, or wait for an approval process to get things in my module correct from a security perspective.
While this is very powerful for catching a large number of issues, it won't catch all of them that might arise when the team integrates my new module into an overall infrastructure build, so we'll want to do integration testing of the infrastructure-as-code security posture during the testing phase.
Test Phase: Integration Testing Your Infrastructure
Often, several infrastructure-as-code modules are combined when building a real application on the cloud. You might have an approved module for the network, another for the IAM roles, a third for your data persistence services, and a fourth for your compute resources. When these are combined in a staging or test environment, they need to be looked at as an integration from a security perspective, just the same as integration tests are performed for the application code.
A simple example of how interactions of different modules of infrastructure code can yield new security problems is assigning IAM roles that are defined in one module to compute resources defined in another. During the development phase for the compute resources, we might have used overly permissive IAM policies and roles to expedite development. We know these aren't up to par for security, but we want to keep things open while we're making changes, and know that once we are moving to testing, we'll leverage the work of the IAM module authors. But perhaps we've made a mismatch of the IAM role with policies regarding the particular component of compute infrastructure, such as having an inadequate password policy. This won't be caught in unit testing, but will be caught in integration testing when all the modules are integrated.
This can also be accomplished using policy-as-code and CI/CD integration with Fugue. In the above example, having inadequate IAM password policies would trigger CIS controls 1-5, 1-6, and more as well as NIST controls such as NIST-800-53 IA-5 (1)(a). Where we were using policies as information and feedback to the developer, perhaps in staging/testing, we'd like to trigger a build failure in our CI/CD tool instead. In this way, security policies become first class citizens in terms of the overall build, and policy enforcement becomes a shared responsibility and is fully automated.
By extending security left to development and testing phases, we can have a high degree of certainty that the production environment meets policy when deployed. Since this post is about shifting left, we'll leave how to use automated policy in production environments for a future post. If you'd like to try this approach to shifting security left, reach out to us and we can schedule a workshop to get one of your environments configured for evaluation – usually in less than an hour.