Stelligent Amazon Pollycast
Voiced by Amazon Polly

Photo by Michael Pujals on Unsplash

A primary motivator for building continuous delivery pipelines is to decrease the time it takes to get new software into the hands of your users.  The approach to designing these pipelines is to identify the manual steps involved with your existing process and replacing them with automation. The end result is an automated system that performs all the tasks necessary to take your software through build, test and finally deployment.

What happens, though, when a change is made manually to the deployed software?  This is referred to as configuration drift and it represents a major source of technical debt.  This debt has the potential to actually increase the delivery time for your software in the following ways:

  • Nondeterministic – The system is now in an unknown state.  We have no confidence that a subsequent invocation of the pipeline would leave the system in a healthy state.  This can affect the availability of the application impacting the user experience and forcing the product team to stop development of new software to repair the system.
  • Inconsistent – There is no confidence that the system in its current state can be recreated since we can no longer rely on automation alone to rebuild and/or replicate the current system.  This can affect the lead time for new changes by requiring the product team to diagnose inconsistencies and apply manual changes.

In this post, we will look at using AWS CloudFormation for automation and discuss how configuration drift can occur.  Then we’ll learn about a new CloudFormation capability called Drift Detection.  Finally, we will walk through an approach to protect ourselves from drift by leveraging drift detection in our continuous delivery pipeline.

Using CloudFormation for Automation

A tool that is commonly used to automate the provisioning of infrastructure in AWS is CloudFormation.  It allows you to declare the intended state of your infrastructure in a YAML or JSON file called a CloudFormation Template.  You then use the template to create or update a CloudFormation Stack which manages the creation/updating/deleting of your infrastructure Resources.  This relationship can be seen in the diagram below:

Anytime a change is needed, the flow should start with someone editing the CloudFormation Template file and then creating a Change Set for the CloudFormation Stack.  The Change Set represents the differences between the current state of the CloudFormation Stack and the desired state that is defined in the newly updated CloudFormation Template file.  The Change Set is then applied to the stack and the Resources are updated, at which point everything is back in sync.

Configuration drift occurs when a manual change is made to your resources outside the CloudFormation stack that created them.  As shown in the diagram below, a change to a Lambda function would cause the CloudFormation Stack to drift from the original state.

Introducing CloudFormation Drift Detection

CloudFormation now offers a drift detection service for your stacks and stack resources to detect configuration changes made outside of CloudFormation.  Resources are considered drifted if their actual configurations do not match the expected configurations in the CloudFormation stack. A stack that has any drifted resources is then itself considered to be drifted.

Examples of resource drift include the following:

      • Changing any configuration parameters for the resource.  
      • Removing a value for a configuration parameter that was defined in the original CloudFormation template.
    • Deletion of the entire resource.

It is important to note that drift detection is only performed on resources that support drift detection.  Those resources that currently do not support drift detection are given a status of NOT_CHECKED rather than MODIFIED or IN_SYNC.

The following screen from the CloudFormation console shows a stack that is in the DRIFTED status due to a single resource that is in the MODIFIED status:

You can then drill into the resource to see exactly which parameters changed for the MODIFIED resource:

Add Drift Detection to CodePipeline

Let’s demonstrate checking for drift as a part of a continuous delivery pipeline by using AWS CodePipeline along with AWS CodeBuild and CloudFormation drift detection. We have created a GitHub repository named drifter that contains a simple Lambda function written in Golang and the pipeline for managing build and deployment.

Components:

    • CodePipeline – Triggered by commits to GitHub repo and orchestrates the steps of the pipeline.
    • Build – A CodeBuild project that takes the Golang source code and compiles the executable.  Package the executable into a zip for deployment as a Lambda function.
    • Drift Detection – A CodeBuild project that runs a shell script to check the CloudFormation stack for any drift.
    • Deploy – A CloudFormation action to create and execute a changeset.

The source for the Lambda function is found in main.go:

The Lambda function is deployed with the app.yml SAM (Serverless Application Model) template:

Drift detection is handled by a script named check-drift.sh that uses the AWS CLI to check for drift on the CloudFormation stack.  First, the script initiates drift detection:

### Initiate drift detection
DRIFT_DETECTION_ID=$(aws cloudformation detect-stack-drift --stack-name ${STACK_NAME} --query StackDriftDetectionId --output text)

Next, we wait for the CloudFormation drift detection to complete.  Once complete, we query the stack to determine the drift status for the stack:

### Wait for detection to complete
echo -n "Waiting for drift detection to complete..."
while true; do
    DETECTION_STATUS=$(aws cloudformation describe-stack-drift-detection-status --stack-drift-detection-id ${DRIFT_DETECTION_ID} --query DetectionStatus --output text) 
    if [ "DETECTION_IN_PROGRESS" = "${DETECTION_STATUS}" ]; then 
        echo -n "."
        sleep 1 
    elif [ "DETECTION_FAILED" = "${DETECTION_STATUS}" ]; then 
        DETECTION_STATUS_REASON=$(aws cloudformation describe-stack-drift-detection-status --stack-drift-detection-id ${DRIFT_DETECTION_ID} --query DetectionStatusReason --output text)
        STACK_DRIFT_STATUS=$(aws cloudformation describe-stack-drift-detection-status --stack-drift-detection-id ${DRIFT_DETECTION_ID} --query StackDriftStatus --output text) 
        echo ${STACK_DRIFT_STATUS}
        echo "WARNING: ${DETECTION_STATUS_REASON}"
        break
    else
        STACK_DRIFT_STATUS=$(aws cloudformation describe-stack-drift-detection-status --stack-drift-detection-id ${DRIFT_DETECTION_ID} --query StackDriftStatus --output text) 
        echo ${STACK_DRIFT_STATUS}
        break
    fi
done

Finally, we check if the stack is in the DRIFTED status, and if so get the details about the resources that are not IN_SYNC and exit with a non-zero code to signal error:

### Describe the drift details
if [ "DRIFTED" = "${STACK_DRIFT_STATUS}" ]; then 
    aws cloudformation describe-stack-resource-drifts \
        --stack-name ${STACK_NAME} \
        --query 'StackResourceDrifts[?StackResourceDriftStatus!=`IN_SYNC`].{Type:ResourceType, Resource:LogicalResourceId, Status:StackResourceDriftStatus, Diff:PropertyDifferences}' >&2 
    exit 1 
fi

Try it out!

To try it out, let’s make a manual update of our Lambda function in the console to change the FEATURE_FLAG environment variable to true:

When we rerun the pipeline, we’ll see it fails on the “Drift Detection” action:

We can consult the logs in CodeBuild to see what caused the pipeline to fail:

After reverting the manual change on the Lambda function and rerunning the pipeline, we see that the pipeline is back to green:

Conclusion

Drift detection is a powerful tool that supports the adoption of continuous delivery pipelines and automation.  It provides a mechanism to identify and protect against manual changes that may occur outside of the pipeline that may compromise the reliability of the automation.  By adding drift detection to our pipeline we have gained confidence that our deployments will have predictable results.

To learn more, check out the user guide.