If you are following best practices, you have adopted a multi-account strategy for your cloud applications, with different workloads spread across different accounts. Users log in to one account and assume roles in other accounts as needed. Even your build system lives in a tools account separate from all the applications it deploys.
This is a great start. It limits the blast radius of a breach in one workload account from other workloads in other accounts. If someone is able to compromise the tools account, they potentially gain the ability to maliciously make deployments into any of the other accounts. A central build and deployment system pushing changes out to all your separate accounts may allow a minor account compromise to quickly escalate into a breach into all of your connected accounts. It’s time to take a deep look at the blast radius of the tools account.
Single Account Deploy System
When you only had one place to put everything, whether that was a data center or single AWS account, your deployment system had all the privileges it needed everywhere it needed to go. Since there were no separate accounts, the build server made whatever changes were needed for it’s deployments.
Multi Account Deployment System
While breaking out the environments into individual accounts can help prevent a breach in the QA account from reaching prod, a breach in the tools account can quickly jump to every other account using the deployment system. This is due to the fact that the deployment tools have permission to take action in all accounts.
In order to ensure that every instruction issued by the build system is valid, we need to change how deployments happen. Instead of the deployment server pushing the change into the target account, the deployment server should ask the target account, “Will you please deploy this update for me?”
Asking for a deployment to happen
The change of the deployment system asking the target account to deploy something results in a massive deprivileging of the deployment system. What previously was a large grant to create, update, and delete resources in multiple accounts is reduced to codebuild:RunBuild.
The Build Engine (BE) no longer is making these deployments. That functionality is performed by an Orchestration Engine (OE) which is asking to have a deployment made. We need to add some additional infrastructure to the accounts. The CodeBuild agent in the target account is responsible for validating the request before taking any action on it. Once the request has been completed, a message is sent back to the Orchestration Engine via an SQS queue in the orchestration account.
Validating the Deployment
A critical part of this change is that the destination accounts must be able to verify that any deployment has passed all pre-deployment validation steps needed. Eric Kasic discusses Pipeline Services in his blog post Deployment Pipeline Compliance and Control – a Service-based Approach.
Breaking out the deployment process into the additional step allows us to validate the digitally signed artifacts from the pipeline services before any updates are made within the application account. The CodeBuild job must execute the validation logic added to ensure that the request is valid and safe to deploy into this account. There are multiple steps that can be done here.
For Infrastructure as Code, re-run your compliance checks with local copies of the tools. This ensures that if a rule has been updated between when the build was created and when you are making your deployment, the code is still compliant.
Verify Build Signature
After your build system creates the artifact, it needs to create a manifest with both MD5 and SHA checksums of the build artifact. Then it uses its private key and creates a signed record for the build that includes the file location and the MD5 and SHA checksums. After the CodeBuild job downloads the artifact, it can validate that the file downloaded is the file that the build system produced.
Verify Deployment Request Signature
After the build system completed building the artifact, it created a signed record. That record and details of the deployment (date, time, how the deployment was requested, and the artifacts to be deployed) create a manifest for the deployment. This manifest should be signed by the orchestration engine and validated by the CodeBuild job.
Visualizing the New Process
Our build and release process is now separated into multiple steps to ensure that breaching any single account doesn’t result in an immediate ability to escalate to multiple accounts. The blocks above are:
- OE: Orchestration Engine – responsible for receiving updates from repositories and managing the process through production deployment.
- BE: Build Engine – Responsible for building from source updates. Downloads source code from repository, performs static analysis including third party library scanning, builds software and ensures it passes unit tests.
- CE: Compliance Engine – Responsible for ensuring that the finished package meets all corporate requirements. Re-runs static analysis on artifact, runs virus scanning, if artifact includes an operating system (e.g. containers and machine images), checks for known updates and known vulnerabilities.
After the build and compliance engines have completed their work, then we see the ask for the target account to make the deployment. The validation in the Lambda and CodePipeline jobs now give us increased confidence that this build is a legitimate deployment request and not the result of a malicious action being taken in our tools account.
Changing your deployments to an ask model increases your security and decreases the blast radius of any incident in the account your tools live in. Start this process by implementing pipeline services, and then work to update your deployment process to include validation of the signatures. Once those are done, you can safely decouple your deployment automation into the destination account and significantly limit the blast radius of your deployment account.
Stelligent Amazon Pollycast