Finding Security Problems Early in the Development Process of a CloudFormation Template with "cfn-nag"

This is an older post. For newer information on cfn_nag and DevSecOps, please check out these posts:

Development Acceleration Through VS Code Remote Containers: How We Leverage VS Code Remote Containers For Rapid Development of cfn_nag
Custom Rule Distribution Enhancements for cfn_nag
Is My Container Image Secure? CI/CD Container Scanning using Trend Micro Deep Security Smart Check and AWS CodePipeline
AWS re:Invent 2019 DevOps and Security re:Cap

Continuous Security: Security in the Continuous Delivery Pipeline is a series of articles addressing security concerns and testing in the Continuous Delivery pipeline. This is the second article in the series.

CloudFormation Background

CloudFormation templates are a great way to provision AWS resources. They allow an infrastructure developer to declare what resources are to be created instead of worrying about the imperative and potentially complex AWS API calls necessary to make it so.
While CloudFormation simplifies the process to create and manage AWS resources, it doesn’t protect a developer from specifying resources in a way that is potentially insecure. In other words, a developer can quickly and easily create AWS resources, but can quickly create resources that are insecure! “Insecure” might mean exposing more “attack surface” by leaving TCP ports open to the world or creating IAM permissions that afford too much power to a given user.
This article expects that the reader has prior experience with developing CloudFormation templates. To learn more about CloudFormation, please see Getting Started with CloudFormation.

Infrastructure as Code and Mitigating Security Risks

Given a CloudFormation template is really like a piece of “code”, it invites consideration of traditional techniques used to develop higher quality software such as static analysis and automated integration testing. These techniques can be employed to mitigate the risk of developing insecure CloudFormation templates.
Static analysis attempts to analyse a body of code to predict or warn about potential faults, or anything undesirable. This is done by parsing the code and trying to understand its meaning without actually executing the code. In the case of CloudFormation, a static analysis tool would parse a given template in order to look for undesirable patterns without sending the template to the CloudFormation service for convergence.
Automated integration testing involves executing the code in a realistic environment and then measuring that actual outcomes match expectations. In the case of CloudFormation, this would involve sending a template to the CloudFormation service for convergence to actually create the resources. After convergence, the AWS API is employed to query these resources to assert that their configuration matches expectations.
There are trade-offs between these two approaches:

Static analysis typically executes lightning fast and can be run in a local “developer” environment. On the other hand, the quality and depth of the feedback can be rather limited given that it is trying to predict behaviour instead of actually measuring real behaviour.
Integration testing can take quite a bit of time because it involves running the actual code. In the case of CloudFormation, it can take quite a bit of time to create all of the AWS resources. Once the resources are created, the feedback can be very realistic and trustworthy because the actual resources are being verified.

In short, one provides quick, not-so-realistic feedback, while the other provides slower but more realistic feedback. In the context of a software delivery pipeline, both should be employed with the static analysis running first in order to provide quick feedback and to fail fast if necessary.
The rest of this article will consider static analysis of CloudFormation Templates.

Static Analysis of CloudFormation Templates with “cfn-nag”

Stelligent recently put together a proof-of-concept tool to demonstrate static analysis against CloudFormation templates with a focus on security issues.
This tool “cfn-nag” parses a collection of CloudFormation templates and applies rules to find code patterns that could lead to insecure infrastructure. The results of the tool include the logical resource identifiers for violating resources and an explanation of what rule has been violated.
While there are quite a number of particular rules the tool will attempt to match, the rough categories are:

IAM and resource policies (S3 Bucket, SQS, etc.)
- Matches policies that are overly permissive in some way (e.g. wildcards in actions or principals)
Security Group ingress and egress rules
- Matches rules that are overly liberal (e.g. an ingress rule open to 0.0.0.0/0, port range 1-65535 is open)
Access Logs
- Looks for access logs that are not enabled for applicable resources (e.g. Elastic Load Balancers and CloudFront Distributions)
Encryption
- (Server-side) encryption that is not enabled or enforced for applicable resources (e.g. EBS volumes or for PutObject calls on an S3 bucket)

All the rules are considered either warnings or failures. Any discovered failures will result in a non-zero exit code, while warnings will not. In the context of a delivery pipeline, “cfn-nag” should likely stop the pipeline in the case it finds failures, but perhaps not stop the pipeline in the case of just warnings.
Which rules are considered warnings and which are considered errors is a little “loose” or subjective. Philosophically, there are two rough criteria considered for whether a rule is a warning or a failure:
How sure can the tool be that the rule is being violated?
Generally speaking, the tool is trying to understand the CloudFormation template without executing it. Sometimes all the information to make a decision about a violation isn’t available in a convenient form (or even in the same CloudFormation template), but some evidence exists to suggest a problem. In this case a warning is issued.
How bad is it if the rule is being violated?
Again, this can be subjective, but the more dangerous issues will be failures. As a concrete example, any time there is a wildcard action specified in a permissions policy, it is a failure. Even if the policy means to afford a given principal the permission to do all things, as AWS adds new actions, the wildcard will grow to include these actions without a review of any kind

Using cfn-nag

Installation

Before using cfn-nag, it is necessary to install it and some prerequisites.
First, ensure that Ruby 2.1 or greater is installed. For more information on installing Ruby, see https://www.ruby-lang.org/en/documentation/installation/
Next, install the cfn-nag gem from https://rubygems.org:

gem install cfn-nag

Create a simple CloudFormation Template

Start with just a simple security group and a secure ingress rule. Save this as a file named: single_security_group.json

{
  "Resources": {
    "sg": {
      "Type": "AWS::EC2::SecurityGroup",
      "Properties": {
        "GroupDescription": "some_group_desc",
        "SecurityGroupIngress": {
          "CidrIp": "10.1.2.3/32",
          "FromPort": 34,
          "ToPort": 34,
          "IpProtocol": "tcp"
        },
        "VpcId": "vpc-12345678"
      }
    }
  }
}

Run cfn-nag against this new template

cfn_nag --input-json-path single_security_group.json

Review the results.

The results will indicate that this security group is actually not secure because no egress rule is specified. If no egress rule is specified, the default is to open all outbound traffic to the world. Note that the results specify the logical resource id, sg in this case.

 ---
| FAIL
|
| Resources: ["sg"]
|
| Missing egress rule means all traffic is allowed outbound.  Make this explicit if it is desired configuration
Failures count: 1
Warnings count: 0

Additionally, the results can be emitted as JSON to support further processing via the flag `–output-format json`

Add an egress rule and run cfn-nag again

Add the new code highlighted below. This will allow outbound HTTP traffic to any target IP address.

{
  "Resources": {
    "sg": {
      "Type": "AWS::EC2::SecurityGroup",
      "Properties": {
        "GroupDescription": "some_group_desc",
        "SecurityGroupIngress": {
          "CidrIp": "10.1.2.3/32",
          "FromPort": 34,
          "ToPort": 34,
          "IpProtocol": "tcp"
        },
        "SecurityGroupEgress": {
          "CidrIp": "0.0.0.0/0",
          "FromPort": 80,
          "ToPort": 80,
          "IpProtocol": "tcp"
        },
        "VpcId": "vpc-12345678"
      }
    }
  }
}

Review the results

With the addition of an explicit egress rule, the failure will disappear. A new warning is introduced because the egress rule is open to the world. Since it is just a warning, cfn-nag will return a successful exit code.

---
| WARN
|
| Resources: ["sg"]
|
| Security Groups found with cidr open to world on egress
Failures count: 0
Warnings count: 1

This warning may be a bad thing, or it may be totally fine. It is up to the developer to decide. For example, a NAT instance may have a fairly wide open egress like this. On the other hand, a garden variety instance hosting an application can and should typically have an egress rule that is better locked down.

Integrating cfn-nag into the Pipeline

Referring back to Figure 1 in the previous post in this series, the first stage of a deployment pipeline is Commit. This stage provides very rapid feedback to the developer, as the code is analysed before building. For security, this means making a decision about the resources and changes that would be triggered by executing the code, before actually triggering resource creation/modification. Not only does this provide rapid feedback (literally seconds), but has powerful security implications in that infrastructure code that would otherwise expose a vulnerability can be vetted and prevented from being executed.
A recommended approach is to call cfn-nag early on in the pipeline during the Commit stage, and to stop a deployment pipeline whenever cfn-nag finds any failures (vice warnings). Since cfn-nag returns a non-zero exit code whenever failures are found, this is just a matter of examining the exit code in order to make a decision to stop the pipeline. In other words:

#!/bin/bash
cfn_nag --input-json-path important_infrastructure.json
if [[ $? != 0 ]];
then
  # stop the pipeline
fi

Conclusion

While the feedback provided by cfn-nag won’t guarantee security for the enterprise, it can help find some obvious problems quickly and very early on in the development process. Even for certain patterns it cannot ascertain for sure are security problems, it can draw attention to them such that a developer can consider things further. Further, cfn-nag can easily be integrated into a deployment pipeline such that it can stop the pipeline when security problems are discovered and before any infrastructure has actually been modified or created. This can be a channel for security organisations to automatically enforce security standards upon (infrastructure and application) developers without forcing them to wait for manual reviews.
As it stands, cfn-nag is a proof-of-concept, so it is fairly immature and there are a number of improvements that can be made. If you have any feedback or would like to contribute, ping me on LinkedIn or check out the source code at cfn-nag.
Stelligent is hiring! Do you enjoy working on complex problems like security in the CD pipeline? Do you believe in the “everything-as-code” mantra? If your skills and interests lie at the intersection of DevOps automation and the AWS cloud, check out the careers page on our website.

Stelligent Amazon Pollycast