Stelligent

Generating Least Privileged IAM Roles for CloudFormation and Service Catalog with cfn-leaprog

CloudFormation Development Process and Privilege

As a developer works through the development of a CloudFormation template, they are likely working in a “sandbox” account where they have significant “power user” privileges.  This is convenient in order to allow the developer to focus on the business needs, but what happens when the same template is converged in a production environment with more restricted privileges?

This article introduces experimental tooling cfn-leaprog (cfn-LEAst-Privilege-ROle-Generator) to allow developers to generate least privileged IAM roles for CloudFormation templates and Service Catalog products in their sandbox environment that can align better with privileges in a production environment.

Deployment System Privileges in Production

The deployment system may very well have similar “power user” privileges in production that a developer has in a sandbox.  This solves any disconnect from the developer’s perspective, but it is an anti-pattern.  It is risky to afford too many privileges to the deployment system within production.  For one, errant software with too much privilege can wreak havoc against production infrastructure.  Second, if the deployment system is compromised then what?  Consider a deployment system built upon the Jenkins platform. Historically, there have been many exploits for Jenkins, so attaching a god-like AWS instance profile/role to a Jenkins worker is a mistake.

Alternatives for Deployment System Privileges

There are alternatives to provisioning a deployment system with godlike privileges.  How to set up a secure deployment system is far beyond the scope of this article, but at a very high level:

Least Privileged IAM Roles

This begs the question of how to develop the IAM role with the least privilege to support deploying a CloudFormation template or Service Catalog product?  A complex stack for a cloud-native application may need a variety of permissions across S3, DynamoDB, or EC2. Beyond just the privileges necessary to create the stack in the first place, there are the privileges necessary to delete the stack and to update the stack.

AWS provides a collection of “Managed Policies” to help simplify this creation process.  BEWARE: many of these policies contain wildcard policies that are in no way helpful for designing an IAM role with the least privilege.  For example, the managed policy AWSLambdaFullAccess contains permission for all S3 operations against all buckets.

Ideally, the AWS CloudFormation service would provide an authoritative operation to analyze a CloudFormation template and return which privileges it is going to require in order to deploy it.  At the time of this writing, there isn’t a known way to solicit this information from an AWS API.

Therefore, the least privileged role must be computed which is where cfn-leaprog can help.  Before diving into cfn-leaprog, it is worth considering the possible approaches for tooling.  In the most general sense, there are two approaches for computation: static and dynamic.  

Static Approach

A static approach parses a CloudFormation template and attempts to map the resources to the policy with the least privilege.  For an example of this approach, see Ian McKay’s tool cfnlp.

This approach is powerful in that it can run offline from AWS and in a matter of seconds.  This approach also has drawbacks to consider:

Dynamic Approaches

A dynamic approach involves submitting a template for creation, deletion or update to the CloudFormation service to converge and querying for results in order to compute the policy.

This approach is more “authoritative” than a static analysis approach but involves a more complex orchestration of AWS services to obtain the necessary information to compute the policy.

Within the dynamic approach, there are at least two sub approaches:

After much experimentation, the CloudTrail event filtering approach has been found to be more reliable and performant at the cost of interacting with more complex AWS infrastructure.

CloudFormation Failure Event Scraping

This approach involves scraping the failure event messages from a CloudFormation stack for Actions to incrementally compute the least privilege policy.

The Resource for the policy is always set to “*” because AWS services support resource-level permissions inconsistently, and even for those that do, the patterns for specifying resources can vary.  The algorithm can track specific resources and include them as comments in the final policy for a human to evaluate.

This algorithm covers the privileges necessary to call CreateStack and DeleteStack for a particular template.  The algorithm for UpdateStack is slightly more complex.  

UpdateStack

When UpdateStack is called, the privileges needed are a function of the difference between the baseline created stack and the new version of the template, aka the changeset.  For example, if a baseline stack contains a vanilla S3 bucket and another version of the template for that bucket adds a Lifecycle policy, the policy with the least privilege for the UpdateStack would only include the actions to update the Lifecycle policy.  The policy would not include actions to update or change other unmentioned properties of the bucket.

To restate, a policy that covers all possible updates to a stack isn’t going to be the least privilege policy in the context of a specific update.

This actually simplifies the generation logic in that there is no need to generate all possible permutations of how a given resource can be updated.  The algorithm can then follow the same process as for CreateStack, but then add the step to call UpdateStack against a specific version of the template with any changed Parameters, and in turn scrape the update events.

Analysis

While experimenting, this was the first approach considered and implemented.  It was considered first because it is “simple” in that it only requires interacting with the CloudFormation service; it’s self-contained without dependencies on other infrastructure.  Unfortunately, it has a number of fatal drawbacks.  

The logic itself is fairly complex and the iterative process can take a LONG time, but the most important drawback is the dependency on the format of the stack event reason field.

Implementation

The code that implements this algorithm is available at cfn-leaprog.

To experiment with the event scraping approach, first be sure to have Ruby installed.  For more information, see Installing Ruby.

To run against a simple CloudFormation template that creates a DynamoDB table:

See the gist on github.

Once the process completes, the rake task will emit a policy document to stdout with all discovered actions.  The policy will contain:

See the gist on github.

CloudTrail Event Filtering

This approach involves filtering CloudTrail events for operations that are a result of calling CreateStack, DeleteStack, or UpdateStack.  It is sort of the “opposite” approach from scraping. In scraping, the role is incrementally built up from nothing. Here an “administrator role” with godlike power is used to converge the stack and then the events are captured (i.e. paring down from an administrator role).

CloudTrail Event

A CloudTrail Event is a complex JSON document capturing all relevant information around the invocation of an AWS API.  For the purposes of this filtering, the following subset is parsed and stored in a database for future reference:

See the gist on github.

From this event, it is known that CloudFormation invoked dynamodb:CreateTable against myDynamoDBTable while assuming the role cfn-leaprog-134235.  A correlation between the role ARN and the events can be stored in a database and then queried later for the sum total of operations. 

Analysis

The main drawback of this approach is the complexity in provisioning all the necessary AWS infrastructure and interacting with it.  On the other hand, it is much more performant than the other dynamic approach, as a given template only needs to be converged once (or twice in the case of UpdateStack).  Most importantly, the events in CloudTrail are effectively a “controlled interface” and are mostly reliable for parsing (but not without subtle variations).

Implementation

The code that implements this algorithm is available at cfn-leaprog.

To experiment with the CloudTrail filtering approach, first, be sure to have Ruby installed.  For more information, see Installing Ruby.

The output for the policy document is identical to the “scraping” approach.

To run against a simple CloudFormation template that creates a DynamoDB table:

See the gist on github.

Conclusion

Determining the least privileged IAM role for a CloudFormation template or a Service Catalog Launch Constraint is historically a manual and painful process.  AWS doesn’t seemingly provide much help in this area, but it is an important part of securing AWS resources. The CloudTrail Event Filtering approach within cfn-leaprog can be used to generate the first cut of such a role in a short amount of time with a few clicks.  Given the variety of support for resource-level permissions across the AWS ecosystem, a human should still inspect a generated policy to judge that it has the least privilege, but cfn-leaprog can reduce the amount of work from hours (days?) down to a few minutes.

Featured Image by Andy Beales on Unsplash

Stelligent Amazon Pollycast