Stelligent

Drift Detection in Continuous Delivery Pipelines

https://unsplash.com/photos/Bk921ONJmIc

Photo by Michael Pujals on Unsplash

A primary motivator for building continuous delivery pipelines is to decrease the time it takes to get new software into the hands of your users.  The approach to designing these pipelines is to identify the manual steps involved with your existing process and replacing them with automation. The end result is an automated system that performs all the tasks necessary to take your software through build, test and finally deployment.

What happens, though, when a change is made manually to the deployed software?  This is referred to as configuration drift and it represents a major source of technical debt.  This debt has the potential to actually increase the delivery time for your software in the following ways:

In this post, we will look at using AWS CloudFormation for automation and discuss how configuration drift can occur.  Then we’ll learn about a new CloudFormation capability called Drift Detection.  Finally, we will walk through an approach to protect ourselves from drift by leveraging drift detection in our continuous delivery pipeline.

Using CloudFormation for Automation

A tool that is commonly used to automate the provisioning of infrastructure in AWS is CloudFormation.  It allows you to declare the intended state of your infrastructure in a YAML or JSON file called a CloudFormation Template.  You then use the template to create or update a CloudFormation Stack which manages the creation/updating/deleting of your infrastructure Resources.  This relationship can be seen in the diagram below:

Anytime a change is needed, the flow should start with someone editing the CloudFormation Template file and then creating a Change Set for the CloudFormation Stack.  The Change Set represents the differences between the current state of the CloudFormation Stack and the desired state that is defined in the newly updated CloudFormation Template file.  The Change Set is then applied to the stack and the Resources are updated, at which point everything is back in sync.

Configuration drift occurs when a manual change is made to your resources outside the CloudFormation stack that created them.  As shown in the diagram below, a change to a Lambda function would cause the CloudFormation Stack to drift from the original state.

Introducing CloudFormation Drift Detection

CloudFormation now offers a drift detection service for your stacks and stack resources to detect configuration changes made outside of CloudFormation.  Resources are considered drifted if their actual configurations do not match the expected configurations in the CloudFormation stack. A stack that has any drifted resources is then itself considered to be drifted.

Examples of resource drift include the following:

It is important to note that drift detection is only performed on resources that support drift detection.  Those resources that currently do not support drift detection are given a status of NOT_CHECKED rather than MODIFIED or IN_SYNC.

The following screen from the CloudFormation console shows a stack that is in the DRIFTED status due to a single resource that is in the MODIFIED status:

You can then drill into the resource to see exactly which parameters changed for the MODIFIED resource:

Add Drift Detection to CodePipeline

Let’s demonstrate checking for drift as a part of a continuous delivery pipeline by using AWS CodePipeline along with AWS CodeBuild and CloudFormation drift detection. We have created a GitHub repository named drifter that contains a simple Lambda function written in Golang and the pipeline for managing build and deployment.

Components:

The source for the Lambda function is found in main.go:

The Lambda function is deployed with the app.yml SAM (Serverless Application Model) template:

Drift detection is handled by a script named check-drift.sh that uses the AWS CLI to check for drift on the CloudFormation stack.  First, the script initiates drift detection:

### Initiate drift detection
DRIFT_DETECTION_ID=$(aws cloudformation detect-stack-drift --stack-name ${STACK_NAME} --query StackDriftDetectionId --output text)

Next, we wait for the CloudFormation drift detection to complete.  Once complete, we query the stack to determine the drift status for the stack:

### Wait for detection to complete
echo -n "Waiting for drift detection to complete..."
while true; do
    DETECTION_STATUS=$(aws cloudformation describe-stack-drift-detection-status --stack-drift-detection-id ${DRIFT_DETECTION_ID} --query DetectionStatus --output text) 
    if [ "DETECTION_IN_PROGRESS" = "${DETECTION_STATUS}" ]; then 
        echo -n "."
        sleep 1 
    elif [ "DETECTION_FAILED" = "${DETECTION_STATUS}" ]; then 
        DETECTION_STATUS_REASON=$(aws cloudformation describe-stack-drift-detection-status --stack-drift-detection-id ${DRIFT_DETECTION_ID} --query DetectionStatusReason --output text)
        STACK_DRIFT_STATUS=$(aws cloudformation describe-stack-drift-detection-status --stack-drift-detection-id ${DRIFT_DETECTION_ID} --query StackDriftStatus --output text) 
        echo ${STACK_DRIFT_STATUS}
        echo "WARNING: ${DETECTION_STATUS_REASON}"
        break
    else
        STACK_DRIFT_STATUS=$(aws cloudformation describe-stack-drift-detection-status --stack-drift-detection-id ${DRIFT_DETECTION_ID} --query StackDriftStatus --output text) 
        echo ${STACK_DRIFT_STATUS}
        break
    fi
done

Finally, we check if the stack is in the DRIFTED status, and if so get the details about the resources that are not IN_SYNC and exit with a non-zero code to signal error:

### Describe the drift details
if [ "DRIFTED" = "${STACK_DRIFT_STATUS}" ]; then 
    aws cloudformation describe-stack-resource-drifts \
        --stack-name ${STACK_NAME} \
        --query 'StackResourceDrifts[?StackResourceDriftStatus!=`IN_SYNC`].{Type:ResourceType, Resource:LogicalResourceId, Status:StackResourceDriftStatus, Diff:PropertyDifferences}' >&2 
    exit 1 
fi

Try it out!

To try it out, let’s make a manual update of our Lambda function in the console to change the FEATURE_FLAG environment variable to true:

When we rerun the pipeline, we’ll see it fails on the “Drift Detection” action:

We can consult the logs in CodeBuild to see what caused the pipeline to fail:

After reverting the manual change on the Lambda function and rerunning the pipeline, we see that the pipeline is back to green:

Conclusion

Drift detection is a powerful tool that supports the adoption of continuous delivery pipelines and automation.  It provides a mechanism to identify and protect against manual changes that may occur outside of the pipeline that may compromise the reliability of the automation.  By adding drift detection to our pipeline we have gained confidence that our deployments will have predictable results.

To learn more, check out the user guide.

Stelligent Amazon Pollycast