CloudFormation Development Process and Privilege

As a developer works through the development of a CloudFormation template, they are likely working in a “sandbox” account where they have significant “power user” privileges.  This is convenient in order to allow the developer to focus on the business needs, but what happens when the same template is converged in a production environment with more restricted privileges?

This article introduces experimental tooling cfn-leaprog (cfn-LEAst-Privilege-ROle-Generator) to allow developers to generate least privileged IAM roles for CloudFormation templates and Service Catalog products in their sandbox environment that can align better with privileges in a production environment.

Deployment System Privileges in Production

The deployment system may very well have similar “power user” privileges in production that a developer has in a sandbox.  This solves any disconnect from the developer’s perspective, but it is an anti-pattern.  It is risky to afford too many privileges to the deployment system within production.  For one, errant software with too much privilege can wreak havoc against production infrastructure.  Second, if the deployment system is compromised then what?  Consider a deployment system built upon the Jenkins platform. Historically, there have been many exploits for Jenkins, so attaching a god-like AWS instance profile/role to a Jenkins worker is a mistake.

Alternatives for Deployment System Privileges

There are alternatives to provisioning a deployment system with godlike privileges.  How to set up a secure deployment system is far beyond the scope of this article, but at a very high level:

  • The deployment system can be provisioned to only have permission to assume other specific roles.  Then these other roles can be properly reduced down to the least privilege necessary to deploy a particular (CloudFormation) stack.
  • The deployment system or end users can provision products from a Service Catalog portfolio.  Each product in the portfolio can define launch constraints.  These constraints are an IAM role with the privileges necessary to deploy the product.  The rest of this article discusses CloudFormation specifically, but every point is applicable to creating Service Catalog products (since they are CloudFormation templates).

Least Privileged IAM Roles

This begs the question of how to develop the IAM role with the least privilege to support deploying a CloudFormation template or Service Catalog product?  A complex stack for a cloud-native application may need a variety of permissions across S3, DynamoDB, or EC2. Beyond just the privileges necessary to create the stack in the first place, there are the privileges necessary to delete the stack and to update the stack.

AWS provides a collection of “Managed Policies” to help simplify this creation process.  BEWARE: many of these policies contain wildcard policies that are in no way helpful for designing an IAM role with the least privilege.  For example, the managed policy AWSLambdaFullAccess contains permission for all S3 operations against all buckets.

Ideally, the AWS CloudFormation service would provide an authoritative operation to analyze a CloudFormation template and return which privileges it is going to require in order to deploy it.  At the time of this writing, there isn’t a known way to solicit this information from an AWS API.

Therefore, the least privileged role must be computed which is where cfn-leaprog can help.  Before diving into cfn-leaprog, it is worth considering the possible approaches for tooling.  In the most general sense, there are two approaches for computation: static and dynamic.  

Static Approach

A static approach parses a CloudFormation template and attempts to map the resources to the policy with the least privilege.  For an example of this approach, see Ian McKay’s tool cfnlp.

This approach is powerful in that it can run offline from AWS and in a matter of seconds.  This approach also has drawbacks to consider:

  • The tool has to “keep up” with mappings for new services supported by AWS/CloudFormation as they are released
  • Static analysis of a CloudFormation template can be difficult and ultimately incomplete.  Important data required to make decisions may not be available at the time of analysis (e.g. Parameter values, macros, and the behavior of custom resources).

Dynamic Approaches

A dynamic approach involves submitting a template for creation, deletion or update to the CloudFormation service to converge and querying for results in order to compute the policy.

This approach is more “authoritative” than a static analysis approach but involves a more complex orchestration of AWS services to obtain the necessary information to compute the policy.

Within the dynamic approach, there are at least two sub approaches:

  • CloudFormation failure event scraping
  • CloudTrail event filtering

After much experimentation, the CloudTrail event filtering approach has been found to be more reliable and performant at the cost of interacting with more complex AWS infrastructure.

CloudFormation Failure Event Scraping

This approach involves scraping the failure event messages from a CloudFormation stack for Actions to incrementally compute the least privilege policy.

  • Given a CloudFormation template, call CreateStack with an IAM role containing no Actions and wildcard Resource “*”.
  • After CreateStack fails, retrieve the “stack events” for the failed stack (via DescribeStackEvents).
  • Search through the stack events for a failure event and parse the “reason” field.  
    • For example, one regex pattern for a failure reason is: /API: (.*) User: (.*) is not authorized to perform: (.*) on resource: (.*)/. 
    • The third grouping is the action to add to the IAM role.
  • Call DeleteStack on the failed stack.
  • To capture the missing delete permissions, follow the same procedure to scrape the reasons from the failure events.
  • Add the missing actions to the role, and start the process over.
  • Continue the process until the stack converges without an error.
  • Clean up all the intermediate failed stacks.

The Resource for the policy is always set to “*” because AWS services support resource-level permissions inconsistently, and even for those that do, the patterns for specifying resources can vary.  The algorithm can track specific resources and include them as comments in the final policy for a human to evaluate.

This algorithm covers the privileges necessary to call CreateStack and DeleteStack for a particular template.  The algorithm for UpdateStack is slightly more complex.  

UpdateStack

When UpdateStack is called, the privileges needed are a function of the difference between the baseline created stack and the new version of the template, aka the changeset.  For example, if a baseline stack contains a vanilla S3 bucket and another version of the template for that bucket adds a Lifecycle policy, the policy with the least privilege for the UpdateStack would only include the actions to update the Lifecycle policy.  The policy would not include actions to update or change other unmentioned properties of the bucket.

To restate, a policy that covers all possible updates to a stack isn’t going to be the least privilege policy in the context of a specific update.

This actually simplifies the generation logic in that there is no need to generate all possible permutations of how a given resource can be updated.  The algorithm can then follow the same process as for CreateStack, but then add the step to call UpdateStack against a specific version of the template with any changed Parameters, and in turn scrape the update events.

Analysis

While experimenting, this was the first approach considered and implemented.  It was considered first because it is “simple” in that it only requires interacting with the CloudFormation service; it’s self-contained without dependencies on other infrastructure.  Unfortunately, it has a number of fatal drawbacks.  

The logic itself is fairly complex and the iterative process can take a LONG time, but the most important drawback is the dependency on the format of the stack event reason field.

  • THE FORMAT OF THE STACK EVENT REASON FIELD IS NOT CONSISTENT ACROSS AWS SERVICES.
    • With almost every new test template, a new format for a failure message in the event reason was discovered.  Initially, a registry was implemented to encapsulate this complexity, but after the Nth time, the whole approach was abandoned due to this inconsistency.
  • NOT ALL FAILURE MESSAGES HAVE A ONE-TO-ONE RELATIONSHIP WITH API CALLS
    • For example, when the permission to create a public bucket website is missing, the failure reason only indicates the bucket couldn’t be created, i.e. CreateBucket failed instead of PutBucketWebsite.

Implementation

The code that implements this algorithm is available at cfn-leaprog.

To experiment with the event scraping approach, first be sure to have Ruby installed.  For more information, see Installing Ruby.

To run against a simple CloudFormation template that creates a DynamoDB table:

Once the process completes, the rake task will emit a policy document to stdout with all discovered actions.  The policy will contain:

  • One Statement per service with all the actions for that service
  • Wildcard Resource: * for each Statement
  • Comments with concrete resource ARN’s per Statement

CloudTrail Event Filtering

This approach involves filtering CloudTrail events for operations that are a result of calling CreateStack, DeleteStack, or UpdateStack.  It is sort of the “opposite” approach from scraping. In scraping, the role is incrementally built up from nothing. Here an “administrator role” with godlike power is used to converge the stack and then the events are captured (i.e. paring down from an administrator role).

  • Setup infrastructure
    • Provision a CloudTrail Trail to emit events to a CloudWatchLogs Log Group
    • Provision a Lambda function to subscribe to the Log Group where userIdentity/type is AssumedRole
  • An administrative IAM role is created with a unique name, e.g. cfn-leaprog-134235
  • The Lambda function parses out the CloudWatch Log events and processes events from the CloudFormation service where the AssumeRole is cfn-leaprog*
  • The function writes the relevant events to a DynamoDB table keyed by the IAM Role ARN.
    • IAM Role ARN => { eventName, eventSource, resources }
  • Call CreateStack with the administrative role.
  • Call Delete Stack with the administrative role.
  • Delete the administrative role.
  • CloudTrail events can take up to ten minutes to percolate.  Wait a bit…..
  • Query the DynamoDB table for the events for the administrative role.
  • Generate the IAM policy from those actions and resources.
  • Optionally tear down infrastructure

CloudTrail Event

A CloudTrail Event is a complex JSON document capturing all relevant information around the invocation of an AWS API.  For the purposes of this filtering, the following subset is parsed and stored in a database for future reference:

From this event, it is known that CloudFormation invoked dynamodb:CreateTable against myDynamoDBTable while assuming the role cfn-leaprog-134235.  A correlation between the role ARN and the events can be stored in a database and then queried later for the sum total of operations. 

Analysis

The main drawback of this approach is the complexity in provisioning all the necessary AWS infrastructure and interacting with it.  On the other hand, it is much more performant than the other dynamic approach, as a given template only needs to be converged once (or twice in the case of UpdateStack).  Most importantly, the events in CloudTrail are effectively a “controlled interface” and are mostly reliable for parsing (but not without subtle variations).

Implementation

The code that implements this algorithm is available at cfn-leaprog.

To experiment with the CloudTrail filtering approach, first, be sure to have Ruby installed.  For more information, see Installing Ruby.

The output for the policy document is identical to the “scraping” approach.

To run against a simple CloudFormation template that creates a DynamoDB table:

Conclusion

Determining the least privileged IAM role for a CloudFormation template or a Service Catalog Launch Constraint is historically a manual and painful process.  AWS doesn’t seemingly provide much help in this area, but it is an important part of securing AWS resources. The CloudTrail Event Filtering approach within cfn-leaprog can be used to generate the first cut of such a role in a short amount of time with a few clicks.  Given the variety of support for resource-level permissions across the AWS ecosystem, a human should still inspect a generated policy to judge that it has the least privilege, but cfn-leaprog can reduce the amount of work from hours (days?) down to a few minutes.

Featured Image by Andy Beales on Unsplash

Stelligent Amazon Pollycast
Voiced by Amazon Polly