OVERVIEW

Administering AWS infrastructure through CloudFormation is one way to use Infrastructure as Code to simplify and replicate an environment. Here at Stelligent, we encourage using automation to apply CloudFormation templates. An early hurdle with CloudFormation one might encounter is a mistake that would break the initial creation of the stack. When CloudFormation fails during initial stack creation, it automatically rolls back the stack, and the stack cannot be recreated without deleting it first. I added two steps into my pipelines in order to prevent this:

  1. Validating the template before applying.
  2. Creating an empty stack that holds no infrastructure, then populating the empty stack.

THE NATURE OF THE PROBLEM

CloudFormation automatically rolls back on unsuccessful creation or update of a resource. Since there is no state before a stack is created, rolling back a stack on a failed creation results in the stack existing with no state. A non-existent state cannot be updated. Also, a new stack cannot be created because one with the same name already exists. The solution is to delete the no-state stack and create a new one after fixing the errors.

Below is an example of a CloudFormation stack that failed its creation because two subnets had the same CIDR block. When creating a CloudFormation stack through a CI/CD pipeline, a lot of error checking will need to be added in order to account for an unsuccessful creation of a stack. The pipeline role will also need to be able to delete stacks, which is not ideal for security best practices.

$ aws cloudformation create-stack --region us-west-1 --stack-name malformed-vpc --template-body file://malformed-vpc.yml
{
    "StackId": "arn:aws:cloudformation:us-west-1:123456789012:stack/malformed-vpc/18338b50-d5cb-11ea-8cbb-06dc7b91611d"
}

$ aws cloudformation update-stack --region us-west-1 --stack-name malformed-vpc --template-body file://corrected-vpc.yml
An error occurred (ValidationError) when calling the UpdateStack operation: Stack:arn:aws:cloudformation:us-west-1:123456789012:stack/malformed-vpc/18338b50-d5cb-11ea-8cbb-06dc7b91611d is in ROLLBACK_COMPLETE state and can not be updated.

NEGATING ERRORS IN ROLLBACKS OF STACK CREATION

The best way to ensure the initial launch of a stack is always successful is to create a stack that does not contain any resources. Then update the stack to create resources. If that update fails, the stack has a valid state to roll back to.

Creating an empty stack 

The CloudFormation template below uses a condition and a custom resource in order to not create anything. The custom resource NullResource will be created only when the condition HasNot is met. For the condition to be met, ‘a’ must equal ‘b’, which will never happen. Therefore, no resources are created.

empty-stack.yml

AWSTemplateFormatVersion: "2010-09-09"
Conditions:
  HasNot:
    Fn::Equals: [ 'a', 'b' ]

Resources:
  NullResource:
    Type: 'Custom::NullResource'
    Condition: HasNot

empty-stack.json

{
  'AWSTemplateFormatVersion' : '2010-09-09',

  'Conditions' : {
    'HasNot': { 'Fn::Equals' : [ 'a', 'b' ] }
  },

  'Resources' : {
    'NullResource' : {
      'Type' : 'Custom::NullResource',
      'Condition' : 'HasNot'
    }
  }
}

First create a stack with no resources, then update it to create the resources to be used. This simplifies pipeline logic to a simple if statement for creating the stack, instead of a more complex method with error handling to delete the stack if the creation fails.

if not stack_exists:
# Create empty CloudFormation stack
    aws cloudformation create-stack \
        --template-body file://empty-stack.yml \
        --stack-name $STACK_NAME
    aws cloudformation wait stack-create-complete \
        --stack-name $STACK_NAME

# Update CloudFormation stack
aws cloudformation update-stack \
    --template-body file://vpc.yml \
    --stack-name $STACK_NAME
aws cloudformation wait stack-update-complete \
    --stack-name $STACK_NAME

Using syntax checking tools

While CloudFormation does not create stacks when the template is not valid json or yaml, there are multiple other errors that can be caught with some linting tools. There are two tools I use to validate my CloudFormation syntax and best practices, one for syntax and linting and one for best practices. I highly recommend adding both to your pipelines and encourage engineers to utilize them locally.

Adding these locally and in pipelines

I added a function to my ~/.bash_profile file in order to easily incorporate both of these for my local development. I call it slt for Scan and Lint Template.

function slt() {
    cfn-lint $1 && cfn_nag_scan --input-path $1
}

The same command can be taken and put in a bash script for easy execution on a remote server, and dropped on the PATH:

slt.sh

cfn-lint $1 && cfn_nag_scan --input-path $1

Adding the empty stack functionality is not as straightforward. I wrote a python script to create and then update a stack. Feel free to add it to your ci/cd tool of choice. If no template file is passed it, the python script will only create the empty stack, making it easy to add to existing pipelines. View the empty_stack.py file here: https://github.com/stelligent/empty-stack

Share these two techniques with your stack developers to improve their experience immediately.  Then, add them to your pipeline to reduce complexity and eliminate unneeded permissions.

Stelligent Amazon Pollycast
Voiced by Amazon Polly