By following the guidance in the framework, you can align your infrastructure with industry best practices and ensure your application and workload architecture is secure, resilient, high-performing, efficient, and cost-optimized.
This blog post dives into several AWS Well-Architected Framework controls, along with why they're important and how to gain compliance with them.
What is the AWS Well-Architected Framework?
The AWS Well-Architected Framework contains a series of foundational questions for you to ask your team and understand if your architecture meets industry best practices. It provides recommendations sourced from AWS Solutions Architects, gained over years of experience architecting thousands of customers' cloud infrastructure.
The framework is based on 5 pillars:
- Operational Excellence: Deliver business value and improve processes and procedures
- Security: Protect information and systems
- Reliability: Ensure your workload performs correctly and consistently
- Performance Efficiency: Use IT and computing resources efficiently
- Cost Optimization: Avoid unnecessary costs
Imagine designing the blueprint for a house. You need to ensure the house has a solid foundation before you can start building! Likewise, when architecting cloud infrastructure, it's key to have a solid foundation.
We're going to focus on two of the 5 pillars to help you lay that foundation: Security and Reliability.
The Security pillar is divided into 6 best practices:
- Security Foundations
- Identity and Access Management
- Infrastructure Protection
- Data Protection
- Incident Response
Securely operating a cloud workload is key to upholding the security pillar of the Well-Architected Framework. Designing secure infrastructure is only half the battle; the rest is implementing and maintaining it. How do you ensure the security of your application on an ongoing basis?
One way is to follow recommendation SEC1.2: "Secure AWS account: Secure access to your accounts, for example by enabling MFA and restrict use of the root user, and configure account contacts."
By restricting use of the root user, you can adopt the principle of least privilege and reduce the risk of disclosing highly privileged credentials. Should someone gain access to your root user, they'd have full access to all AWS services, including billing. AWS recommends locking away your root user access keys. Therefore, only use your root user for tasks that require root; otherwise, delete the root user's access key or make it inactive. For instructions, see Fugue's docs site.
Another way to secure your account is to configure a CloudWatch metric filter and alarm for usage of the root account. By creating a metric filter for applicable CloudTrail log events and creating a CloudWatch alarm for the filter, you can be alerted to root login attempts, which gives you visibility into the use of a fully privileged account. For instructions, see Fugue's docs site.
Identity and Access Management
Authentication is handled differently for the two types of identities: humans and machines. Humans include administrators, developers, and end users; machines include service applications, workloads, and operational tools. Both types of identities need to access the right resources under the right conditions in order to carry out their tasks securely.
Recommendation SEC2.1 says to "Use strong sign-in mechanisms. Enforce minimum password length, and educate users to avoid common or re-used passwords. Enforce multi-factor authentication (MFA) with software or hardware mechanisms to provide an additional layer."
Use of a custom password policy can greatly improve the security of the login process for human users. For instance, you can prevent re-use of a given password by the same user. By restricting users from using previously used passwords, you can help shield them from brute force attacks such as credential stuffing, which is where an attacker automates login attempts using a large number of found credentials. Setting the number of previously used passwords that can't be repeated to the maximum of 24 can help ensure a secure login process. For instructions, see Fugue's docs site.
Machines, on the other hand, must gain access to an AWS account programmatically through the use of an access key ID and secret access key. These credentials are used to sign programmatic requests to AWS via the CLI or APIs. Access keys should be rotated frequently; the longer an access key is out there, the higher the chance it could be compromised. A good rule of thumb is to generate a new key and deactivate and delete the old one every 90 days or less. For instructions, see Fugue's docs site.
The Reliability pillar is divided into 4 best practices:
Change management is an important part of ensuring the reliability of your workload's operation. Your application must be able to anticipate and withstand changes, whether imposed internally, as with demand spikes, or externally, as with security patches. By monitoring workload resources with logs and metrics, you gain insight into your application's health and can enable it to respond quickly to failure or low performance.
Recommendation REL6.1 says to "Monitor all components for the workload (Generation): Monitor the components of the workload with Amazon CloudWatch or third-party tools. Monitor AWS services with Personal Health Dashboard."
One of the best ways to monitor your workload components is with AWS CloudTrail. It records API calls for your account, including the caller's identity and source IP, the time of the API call, the request parameters, and the response returned by the API service. You can use the logs generated by CloudTrail to set up notifications when a reliability threshold is crossed or a failure occurs. By enabling CloudTrail in all regions, you facilitate resource change tracking and security analysis, as a plus. For instructions, see Fugue's docs site.
Protecting these logs — and monitoring who accesses them — is equally important. It's wise to enable bucket access logging for S3 buckets that store CloudTrail log data. Server access logging offers detailed information about the requests made to the bucket containing the CloudTrail logs. Tracking access requests in this way is handy for security and incident response workflows. For instructions, see Fugue's docs site.
Component failure should be expected and anticipated. As a result, your workload architecture needs to be designed to mitigate failures when they occur. This way you can meet your goals for recovery time objectives (RTO) and recovery point objectives (RPO). An RTO determines the maximum acceptable length of time when your service is unavailable, and an RPO determines how much data loss is acceptable between the most recent recovery point and the service interruption.
Recommendation REL9.2 says to "Secure and encrypt backups: Detect access using authentication and authorization, such as AWS IAM, and detect data integrity compromise by using encryption."
When you secure and encrypt data backups, you improve your likelihood of meeting your RPO and RTO goals. For instance, you should encrypt your RDS database instances with AWS-managed or customer-managed KMS customer master keys (CMKs). If you encrypt an RDS instance, you encrypt its automated backups (in addition to the underlying storage, read replicas, and snapshots). This can help prevent data exfiltration and data loss, which is also a security issue. For instructions, see Fugue's docs site.
Likewise, it's a good idea to enable server-side encryption (SSE) on S3 buckets. With S3 SSE, AWS encrypts data at the object level as it's written to disk, and decrypts it when you access it. This promotes integrity of data, a key facet of reliability. You can use one of three methods of encrypting your objects: Amazon S3-managed keys (SSE-S3), KMS keys stored in AWS KMS (SSE-KMS), and customer-provided keys (SSE-C). For instructions, see Fugue's docs site.
With the AWS Well-Architected Framework, AWS has provided its customers with valuable guidance on building and managing secure and resilient cloud infrastructure environments. By following this guidance, you can adhere to industry best practices and operate more securely and efficiently.