We often overlook similarities between new CD Pipeline code we’re writing today and code we’ve already written. In addition, we might sometimes rely on the ‘copy, paste, then modify’ approach to make quick work of CD Pipelines supporting similar application architectures. Despite the short-term gains, what often results is code sprawl and maintenance headaches. This blog post series is aimed at helping to reduce that sprawl and improve code re-use across CD Pipelines.
This mini-series references two real but simple applications, hosted in Github, representing the kind of organic growth phenomenon we often encounter and sometimes even cause, despite our best intentions. We’ll refactor them each a couple of times over the course of this series, with this post covering CloudFormation template reuse.

Chef’s a great tool… let’s misuse it!

During the deployment of an AWS AutoScaling Group, we typically rely on the user data section of the Launch Configuration to configure new instances in an automated, repeatable way. A common automation practice is to use chef-solo to accomplish this. We carefully chose chef-solo as a great tool for immutable infrastructure approaches. Both applications have CD Pipelines that leverage it as a scale-time configuration tool by reading a JSON document describing the actions and attributes to be applied.

It’s all roses

It’s a great approach: we sprinkle in a handful or two of CloudFormation parameters to support our Launch Configuration, embed the chef-solo JSON in the user data and decorate it with references to the CloudFormation parameters. Voila, we’re done! The implementation hardly took any time (probably less than an hour per application if you could find good examples in the internet), and each time we need a new CD Pipeline, we can just stamp out a new CloudFormation template.
Figure 1: Launch Configuration user data (as plain text)

Figure 2: CloudFormation parameters (corresponding to Figure 1)

Well, it’s mostly roses…

Why is it, then, that a few months and a dozen or so CD Pipelines later, we’re spending all our time debugging and doing maintenance on what should be minor tweaks to our application configurations? New configuration parameters take hours of trial and error, and new application pipelines can be copied and pasted into place, but even then it takes hours to scrape out the previous application’s specific needs from its CloudFormation template and replace them.

Fine, it’s got thorns, and they’re slowing us down

Maybe our great solution could have been better? Let’s start with the major pitfall to our original approach: each application we support has its own highly-customized CloudFormation template.

lots of application-specific CFN parameters exist solely to shuttle values to the chef-solo JSON
fairly convoluted user data, containing an embedded JSON structure and parameter references, is a bear to maintain
tracing parameter values from the CD Pipeline, traversing the CFN parameters into the user data… that’ll take some time to debug when it goes awry

One path to code reuse

Since we’re referencing two real GitHub application repositories that demonstrate our current predicament, we’ll continue using those repositories to present our solution via a code branch named Phase1 in each repository. At this point, we know our applications share enough of a common infrastructure approach that they should be sharing that part of the CloudFormation template.
The first part of our solution will be to extract the ‘differences’ from the CloudFormation templates between these two application pipelines. That should leave us with a common skeleton to work with, minus all the Chef specific items and user data, which will allow us to push the CFN template into an S3 bucket to be shared by both application CD pipelines.
The second part will be to add back the required application specificity, but in a way that migrates those differences from the CloudFormation templates to external artifacts stored in S3.

Taking it apart

Our first tangible goal is to make the user data generic enough to support both applications. We start by moving the inline chef-solo JSON to its own plain JSON document in each application’s pipeline folder (/pipelines/config/app-config.json). Later, we’ll modify our CD pipelines so they can make application and deployment-specific versions of that file and upload it to an S3 bucket.

Figure 3: Before/after comparison (diff) of our Launch Configuration User Data

Left: original user data; Right: updated user data

The second goal is to make a single, vanilla CloudFormation template. Since we orphaned these Chef only CloudFormation parameters by removing the parts of the user data referencing them, we can remove them. The resulting template’s focus can now be on meeting the infrastructure concerns of our applications.

Figure 4: Before/after comparison (diff) of the CloudFormation parameters required

Left: original CFN parameters; Right: pared-down parameters

At this point, we have eliminated all the differences between the CloudFormation templates, but now they can’t configure our application! Let’s fix that.

Reassembling it for reuse

Our objective now is to make our Launch Configuration user data truly generic so that we can actually reuse our CloudFormation template across both applications. We do that by scripting it to download the JSON that Chef needs from a specified S3 bucket. At the same time, we enhance the CD Pipelines by scripting them to create application and deploy-specific JSON, and to push that JSON to our S3 bucket.

Figure 5: Chef JSON stored as a deploy-specific object in S3

The S3 key is unique to the deployment.

To stitch these things together we add back one CloudFormation parameter, ChefJsonKey, required by both CD Pipelines – its value at execution time will be the S3 key where the Chef JSON will be downloaded from. (Since our CD Pipeline has created that file, it’s primed to provide that parameter value when it executes the CloudFormation stack.)
Two small details left. First, we give our AutoScaling Group instances the ability to download from that S3 bucket. Now that we’re convinced our CloudFormation template is as generic as it needs to be, we upload it to S3 and have our CD Pipelines reference it as an S3 URL.

Figure 6: Our S3 bucket structure ‘replaces’ the /pipeline/config folder

The templates can be maintained in GitHub.

That’s a wrap

We now have a vanilla CloudFormation template that supports both applications. When an AutoScaling group scales up, the new servers will now download a Chef JSON document from S3 in order to execute chef-solo. We were able to eliminate that template from both application pipelines and still get all the benefits of Chef based server configuration.
See these GitHub repositories referenced throughout the article:

In Part 2 of this series, we’ll continue our refactoring effort with a focus on the CD Pipeline code itself.
Authors: Jeff Dugas and Matt Adams
Interested in working with and sometimes misusing configuration management tools like Chef, Puppet, and Ansible ? Stelligent is hiring!

Stelligent Amazon Pollycast