Designing Applications for Failure

I recently had the opportunity to attend an AWS bootcamp Herndon, VA office and a short presentation given by their team on Designing for Failure. It opened my eyes to the reality of application design when dealing with failure or even basic exception handling.

One of the defining characteristics between a good developer and a great one is how they deal with failures. Good developers will handle the obvious examples in their code – checking for unexpected input, catching library exceptions, and sometimes edge cases. Why do we build resilient applications and what about the end user?

In this blog post, I’ll share with you the key points that a great developer follows when designing resilient applications.

Why build resilient applications?


There are two main reasons that we design applications for failure. As you can probably guess from the horrifying image above, the first reason is User Experience. It’s no secret that you will have user attrition and lost revenue if you cannot shield your end users from issues outside their control. The second reason is Business Services. All business critical systems require resiliency and the difference between a 99.7% uptime and 99.99% could be hours of lost revenue or interrupted business services.

Given an application load of 1 billion requests per month, a 99.7% downtime is 2+ hours versus just 4 minutes for 99.99%. Ouch!

Werner Vogels, the CTO of Amazon Web Services once said at a previous re:Invent “Everything fails, all the time.” It’s a devastating reality and it’s something we all must accept. No matter how mathematically improbable, we simply cannot eliminate all failures. It’s how we reduce the impact of those failures that improves the overall resiliency of our applications.

Graceful Degradation

The way we reduce the impact of failure on our users and business is through graceful degradation. Conceptually it’s very simple – we want to continue to operate in lieu of a failure in some degraded capacity. Keeping with the premise that applications fail all the time, you’ve probably experienced degraded services without even realizing it – and that is the ultimate goal.


Caching is the first layer of defense when dealing with a failure. Depending on your applications reliance on up-to-date bleeding edge information you should consider caching everything. It’s very easy for developers to reject caching because they always want the freshest information for their users. However, when the difference between a happy customer and a sad one is using some few-minute old data… choose the latter.

As an example, imagine you have a fairly advanced web application. What can you cache?

  • Full HTML pages with CloudFront
  • Database records with ElastiCache
  • Page Fragments with tools such as Varnish
  • Remote API calls from your backend with ElastiCache


As applications get more complex we rely on more external services than ever before. Whether it’s a 3rd party service provider or your microservices architecture at work, failures are common and often transient. A common pattern for dealing with transient failures on these types of requests is to implement retry logic. Using exponential back off or a Fibonacci sequence you can retry for some time before eventually throwing an exception. It’s important to fail fast and not trigger rate limiting on your source, so don’t continue indefinitely.

Rate Limiting

In the case of denial of service attacks, self-imposed or otherwise, your primary defense is rate limiting based on a context. You can limit the amount of requests to your application based on user data, source address or both. By imposing a limit on requests you can improve your performance during a failure by reducing the actual load and the load imposed by your retry logic. Also consider using exponential back off or a Fibonacci increase to help mitigate particularly demanding services.

For example, during a peak in capacity that cannot be met immediately, a reduction in load would allow your applications infrastructure to respond to the demand (think auto scaling) before completely failing.

Fail Fast

When your application is running out of memory, threads or other resources you can help recovery time by failing fast. You should return an error as soon as possible when it’s detected. Not only will your users be happier not waiting on your application to respond, you will also not cascade the delay into dependent services.

Static Fallback

Whether you’re rate limiting or simply cannot fail silently, you’ll need something to fallback to. A static fallback is a way to provide at least some response to your end users without leaving them to the wind with erroneous error output or no response at all. It’s always better to return content that makes sense to the context of the user and you’ve probably seen this before if you’re a frequent user of sites like Reddit or Twitter.


In the case of our example web application, you can configure Route53 to fallback to HTML pages and assets served from Amazon S3 with very little headache. You could set this up today!

Fail Silently

When all of your layers of protection have failed to preserve your service, it’s time to fail silently. Failing silently is when you rely on your logging, monitoring and other infrastructure to respond to your errors with the least impact to the end user. It’s a best practice to return a 200 OK with no content and log your errors on the backend than to return a 500 Internal Server Error, similar HTTP status code or worse yet, a nasty stack trace/log dump.

Failing Fast and You

There are two patterns that you can implement to improve your ability to fail fast: Circuit Breaking and Load Shedding. Generally, you want to leverage your monitoring tools such as Cloudwatch and your logs to detect failure early and begin mitigating the impact as soon as possible. At Stelligent, we strongly recommend automation in your infrastructure, and these two patterns are automation at it’s finest.

Circuit Breaking

Circuit breaking is purposefully degrading performance in light of failure events in your logging or monitoring system. You can utilize any of the degradation patterns mentioned above in the circuit. Finally, by implementing health checks into your service you can restore normal service as soon as possible.

Load Shedding

Load shedding is a method of failing fast that occurs at the networking level. Like circuit breaking, you can rely on monitoring data to reroute traffic from your application to a Static Fallback that you have configured. For example, Route53 has failover support built right in that would allow you to use this pattern right away.


Provision a hosted Git repo with AWS CodeCommit using CloudFormation

Recently, AWS announced that you can now automate the provisioning of a hosted Git repository with AWS CodeCommit using CloudFormation. This means that in addition to the console, CLI, and SDK, you can use declarative code to provision a new CodeCommit repository – providing greater flexibility in versioning, testing, and integration.

In this post, I’ll describe how engineers can provision a CodeCommit Git repository in a CloudFormation template. Furthermore, you’ll learn how to automate the provisioning of a deployment pipeline that uses this repository as its Source action to deploy an application using CodeDeploy to an EC2 instance. You’ll see examples, patterns, and a short video that walks you through the process.


Here are the prerequisites for this solution:

These will be explained in greater detail in the Deployment Steps section.

Architecture and Implementation

In the figure below, you see the architecture for launching a pipeline that deploys software to an EC2 instance from code stored in a CodeCommit repository. You can click on the image to launch the template in CloudFormation Designer.

  • CloudFormation – All of the resource generation of this solution is described in CloudFormation  which is a declarative code language that can be written in JSON or YAML.
  • CodeCommit – With the addition of the AWS::CodeCommit::Repository resource, you can define your CodeCommit Git repositories in CloudFormation.
  • CodeDeploy – CodeDeploy automates the deployment to the EC2 instance that was provisioned by the nested stack.
  • CodePipeline – I’m defining CodePipeline’s stages and actions in CloudFormation code which includes using CodeCommit as a Source action and CodeDeploy for a Deploy action (For more information, see Action Structure Requirements in AWS CodePipeline).
  • EC2 – A nested CloudFormation stack is launched to provision a single EC2 instance on which the CodeDeploy agent is installed. The CloudFormation template called through the nested stack is provided by AWS.
  • IAM – An Identity and Access Management (IAM) Role is provisioned via CloudFormation which defines the resources that the pipeline can access.
  • SNS – A Simple Notification Service (SNS) Topic is provisioned via CloudFormation. The SNS topic is used by the CodeCommit repository for notifications.

CloudFormation Template

In this section, I’ll show code snippets from the CloudFormation template that provisions the entire solution. The focus of my samples is on the CodeCommit resources. There are several other resources defined in this template including EC2, IAM, SNS, CodePipeline, and CodeDeploy. You can find a link to the template at the bottom of  this post.


In the code snippet below, you see that I’m using the AWS::CodeCommit::Repository CloudFormation resource. The repository name is provided as parameter to the template. I created a trigger to receive notifications when the master branch gets updated using an SNS Topic as a dependent resource that is created in the same CloudFormation template. This is based on the sample code provided by AWS.

        "RepositoryDescription":"CodeCommit Repository",


In this CodePipeline snippet, you see how I’m using the CodeCommit repository resource as an input for the Source action in CodePipeline. In doing this, it polls the CodeCommit repository for any changes. When it discovers changes, it initiates an instance of the deployment pipeline in CodePipeline.



You can see an illustration of this pipeline in the figure below.



Since costs can vary widely in using certain AWS services and other tools, I’ve provided a cost breakdown and some sample scenarios to give you an idea of what your monthly spend might look like. The AWS Cost Calculator can assist in establishing cost projections.

  • CloudFormation – No additional cost
  • CodeCommit – If you’re using on small project of less than six users, there’s no additional cost. See AWS CodeCommit Pricing for more information.
  • CodeDeploy – No additional cost
  • CodePipeline – $1 a month per pipeline unless you’re using it as part of the free tier. For more information, see AWS CodePipeline pricing.
  • EC2 – Approximately $15/month if you’re running once t1.micro instance 24/7. See AWS EC2 Pricing for more information.
  • IAM – No additional cost
  • SNS – Considering you probably won’t have over 1 million Amazon SNS requests for this particular solution, there’s no cost. For more information, see AWS SNS Pricing.

So, for this particular sample solution, you’ll spend around $16/month iff you run the EC2 instance for an entire month. If you just run it once and terminate it, you’ll spend a little over $1.


Here are some patterns to consider when using CodeCommit with CloudFormation.

  • CodeCommit Template – While this solution embeds the CodeCommit creation as part of a single CloudFormation template, it’s unlikely you’ll be updating the CodeCommit repository generation with every application change so you might create a template that focuses on the CodeCommit creation and run it as part of an infrastructure pipeline that gets updated when new CloudFormation is committed to it.
  • Centralized Repos – Most likely, you’ll want to host your CodeCommit repositories in a single AWS account and use cross-account IAM roles to share access across accounts in your organization. While you can create CodeCommit repos in any AWS account, it’ll likely lead to unnecessary complexity when engineers want to know where the code is located.

The last is more of a conundrum than a pattern. As one my colleagues posted in Slack:

I’m stuck in a recursive loop…where do I store my CloudFormation template for my CodeCommit repo?

Good question. I don’t have a good answer for that one just yet. Anyone have thoughts on this one? It gets very “meta”.

Deployment Steps

There are three main steps in launching this solution: preparing an AWS account, launching the stack, and testing the deployment. Each is described in more detail in this section.

Step 1. Prepare an AWS Account

  1. If you don’t already have an AWS account, create one at by following the on-screen instructions. Part of the sign-up process involves receiving a phone call and entering a PIN using the phone keypad. Be sure you’ve signed up for the CloudFormation service.
  2. Use the region selector in the navigation bar of the console to choose the Northern Virginia (us-east-1) region
  3. Create a key pair. To do this, in the navigation pane of the Amazon EC2 console, choose Key Pairs, Create Key Pair, type a name, and then choose Create.

Step 2. Launch the Stack

Click on the Launch Stack button below to launch the CloudFormation stack. Before you launch the stack, review the architecture, configuration, security, and other considerations discussed in this post. To download the template, click here.

Time to deploy: Approximately 7 minutes

The template includes default settings that you can customize by following the instructions in this post.

Create Details

Here’s a listing of the key AWS resources that are created when this stack is launched:

  • IAM – InstanceProfile, Policy, and Role
  • CodeCommit Repository – Hosts the versioned code
  • EC2 instance – with CodeDeploy agent installed
  • CodeDeploy – application and deployment
  • CodePipeline – deployment pipeline with CodeCommit Integration

CLI Example

Alternatively, you can launch the same stack from the command line as shown in the samples below.

Base Command

From an instance that has the AWS CLI installed, you can use the following snippet as a base command prepended to one of two options described in the Parameters section below.

aws cloudformation create-stack --profile {AWS Profile Name} --stack-name {Stack Name} --capabilities CAPABILITY_IAM --template-url ""

I’ve provided two ways to run the command – from a custom parameters file or from the CLI.

Option 1 – Custom Parameters JSON File

By attaching the command below to the base command, you can pass parameters from a file as shown in the sample below.

--parameters file:///localpath/to/example-parameters-cpl-cfn.json
Option 2 – Pass Parameters on CLI

Another way to launch the stack from the command line is to provide custom parameters populated with parameter values as shown in the sample below.

--parameters ParameterKey=EC2KeyPairName,ParameterValue=stelligent-dev ParameterKey=EmailAddress, ParameterKey=RepoName,ParameterValue=my-cc-repo

Step 3. Test the Deployment

Click on the CodePipelineURL Output in your CloudFormation stack. You’ll see that the pipeline has failed on the Source action. This is because the Source action expects a populated repository and it’s empty. The way to resolve this is to commit the application files to the newly-created CodeCommit repository. First, you’ll need to clone the repository locally. To do this, get the CloneUrlSsh Output from the CloudFormation stack you launched in Step 2. A sample command is shown below. You’ll replace {CloneUrlSsh} with the value from the CloudFormation stack output. For more information on using SSH to interact with CodeCommit, see the Connect to the CodeCommit Repository section at: Create and Connect to an AWS CodeCommit Repository.

git clone {CloneUrlSsh}
cd {localdirectory}

Once you’ve cloned the repository locally, download the sample application files from and place the files directly into your local repository. Do not include the SampleApp_Linux folder. Go to the local directory and type the following to commit and push the new files to the CodeCommit repository:

git add .
git commit -am "add all files from the AWS sample linux codedeploy application"
git push

Once these files have been committed, the pipeline will discover the changes in CodeCommit and run a new pipeline instance and both stages and actions should succeed as a result of this change.

Access the Application

Once the CloudFormation stack has successfully completed, go to CodeDeploy and select Deployments. For example, if you’re in the us-east-1 region, the URL might look like: (You can also find this link in the CodeDeployURL Output of the CloudFormation stack you launched). Next, click on the link for the Deployment Id of the deployment you just launched from CloudFormation. Then, click on the link for the Instance Id. From the EC2 instance, copy the Public IP value and paste into your browser and hit enter. You should see a page like the one below.


Commit Changes to CodeCommit

Make some visual changes to the index.html (look for background-color) and commit these changes to your CodeCommit repository to see these changes get deployed through your pipeline. You perform these actions from the directory where you cloned the local version of your CodeCommit repo (in the directory created by your git clone command). To push these changes to the remote repository, see the commands below.

git commit -am "change bg color to burnt orange"
git push

Once these changes have been committed, CodePipeline will discover the changes made to your CodeCommit repo and initiate a new pipeline. After the pipeline is successfully completed, follow the same instructions for launching the application from your browser. You’ll see that the color of the index page of the application has changed.


How-To Video

In this video, I walkthrough the deployment steps described above.

Additional Resources

Here are some additional resources you might find useful:


In this post, you learned how to define and launch a stack capable of launching a CloudFormation stack that provisions a CodeCommit Git repository in code. Additionally, the example included the automation of a CodePipeline deployment pipeline (which included the CodeCommit integration) along with creating and running the deployment on an EC2 instance using CodeDeploy.

Furthermore, I described the prerequisites, architecture, implementation, costs, patterns and deployment steps of the solution.

Sample Code

The code for the examples demonstrated in this post are located at Let us know if you have any comments or questions @stelligent or @paulduvall.

Stelligent is hiring! Do you enjoy working on complex problems like figuring out ways to automate all the things as part of a deployment pipeline? Do you believe in the “one-button everything” mantra? If your skills and interests lie at the intersection of DevOps automation and the AWS cloud, check out the careers page on our website.

Microservices Platform with ECS

Architecting applications with microservices is all the rage with developers right now, but running them at scale with cost efficiency and high availability can be a real challenge. In this post, we will address this challenge by looking at an approach to building microservices with Spring Boot and deploying them with CloudFormation on AWS EC2 Container Service (ECS) and Application Load Balancers (ALB). We will start with describing the steps to build the microservice, then walk through the platform for running the microservices, and finally deploy our microservice on the platform.

Spring Boot was chosen for the microservice development as it is a very popular framework in the Java community for building “stand-alone, production-grade Spring based Applications” quickly and easily. However, since ECS is just running Docker containers you can substitute your preferred development framework for Spring Boot and the platform described in this post will be still be able to run your microservice.

This post builds upon a prior post called Automating ECS: Provisioning in CloudFormation that does an awesome job of explaining how to use ECS. If you are new to ECS, I’d highly recommend you review that before proceeding. This post will expand upon that by using the new Application Load Balancer that provides two huge features to improve the ECS experience:

  • Target Groups: Previously in a “Classic” Elastic Load Balancer (ELB), all targets had to be able to handle all possible types of requests that the ELB received. Now with target groups, you can route different URLs to different target groups, allowing heterogeneous deployments. Specifically, you can have two target groups that handle different URLs (eg. /bananas and /apples) and use the ALB to route traffic appropriately.
  • Per Target Ports: Previously in an ELB, all targets had to listen on the same port for traffic from the ELB. In ECS, this meant that you had to manage the ports that each container listened on. Additionally, you couldn’t run multiple instances of a given container on a single ECS container instance since they would have different ports. Now, each container can use an ephemeral port (next available assigned by ECS) making port management and scaling up on a single ECS container instance a non-issue.

The infrastructure we create will look like the diagram below. Notice that there is a single shared ECS cluster and a single shared ALB with a target group, EC2 Container Registry (ECR) and ECS Service for each microservice deployed to the platform. This approach enables a cost efficient solution by using a single pool of compute resources for all the services. Additionally, high availability is accomplished via an Auto Scaling Group (ASG) for the ECS container instances that spans multiple Availability Zones (AZ).

Setup Your Development Environment

You will need to install the Spring Boot CLI to get started. The recommended way is to use SDKMAN! for the installation. First install SDKMAN! with:

 $ curl -s "" | bash

Then, install Spring Boot with:

$ sdk install springboot

Alternatively, you could install with Homebrew:

$ brew tap pivotal/tap
$ brew install springboot

Scaffold Your Microservice Project

For this example, we will be creating a microservice to manage bananas. Use the Spring Boot CLI to create a project:

$ spring init --build=gradle --package-name=com.stelligent --dependencies=web,actuator,hateoas -n Banana banana-service

This will create a new subdirectory named banana-service with the skeleton of a microservice in src/main/java/com/stelligent and a build.gradle file.

Develop the Microservice

Development of the microservice is a topic for an entire post of its own, but let’s look at a few important bits. First, the application is defined in BananaApplication:

public class BananaApplication {

  public static void main(String[] args) {, args);

The @SpringBootApplication annotation marks the location to start component scanning and enables configuration of the context within the class.

Next, we have the controller class with contains the declaration of the REST routes.

public class BananaController {

  @RequestMapping(method = RequestMethod.POST)
  public @ResponseBody BananaResource create(@RequestBody Banana banana)
    // create a banana...

  @RequestMapping(path = "/{id}", method = RequestMethod.GET)
  public @ResponseBody BananaResource retrieve(@PathVariable long id)
    // get a banana by its id


These sample routes handle a POST of JSON banana data to /bananas for creating a new banana, and a GET from /bananas/1234 for retrieving a banana by it’s id. To view a complete implementation of the controller including support for POST, PUT, GET, PATCH, and DELETE as well as HATEOAS for links between resources, check out

Additionally, to look at how to accomplish unit testing of the services, check out the tests created in using WebMvcTest, MockMvc and Mockito.

Create Microservice Platform

The platform will consist of a separate CloudFormation stack that contains the following resources:

  • VPC – To provide the network infrastructure to launch the ECS container instances into.
  • ECS Cluster – The cluster that the services will be deployed into.
  • Auto Scaling Group – To manage the ECS container instances that contain the compute resources for running the containers.
  • Application Load Balancer – To provide load balancing for the microservices running in containers. Additionally, this provides service discovery for the microservices.


The template is available at platform.template. The AMIs used by the Launch Configuration for the EC2 Container Instances must be the ECS optimized AMIs:

      AMIID: ami-2b3b6041
      AMIID: ami-ac6872cd
      AMIID: ami-03238b70
      AMIID: ami-fb2f1295
      AMIID: ami-43547120
      AMIID: ami-bfe095df
      AMIID: ami-c78f43a4
      AMIID: ami-e1e6f88d

Additionally, the EC2 Container Instances must have the ECS Agent configured to register with the newly created ECS Cluster:

    Type: AWS::AutoScaling::LaunchConfiguration
              command: !Sub |
                echo ECS_CLUSTER=${EcsCluster}  >> /etc/ecs/ecs.config

Next, an Application Load Balancer is created for the later stacks to register with:

    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
      - !Ref PublicSubnetAZ1
      - !Ref PublicSubnetAZ2
      - !Ref PublicSubnetAZ3
    Type: AWS::ElasticLoadBalancingV2::Listener
      LoadBalancerArn: !Ref EcsElb
      - Type: forward
        TargetGroupArn: !Ref EcsElbDefaultTargetGroup
      Port: '80'
      Protocol: HTTP

Finally we have a Gradle task in our build.gradle for upserting the platform CloudFormation stack based on a custom task named StackUpTask defined in buildSrc.

task platformUp(type: StackUpTask) {
    region project.region
    stackName "${project.stackBaseName}-platform"
    template file("ecs-resources/platform.template")
    waitForComplete true
    capabilityIam true
    if(project.hasProperty('keyName')) {
        stackParams['KeyName'] = project.keyName

Simply run the following to create/update the platform stack:

$ gradle platformUp

Deploy Microservice

Once the platform stack has been created, there are two additional stacks to create for each microservice. First, there is a repo stack that creates the EC2 Container Registry (ECR) for the microservice. This stack also creates a target group for the microservice and adds the target group to the ALB with a rule for which URL path patterns should be routed to the target group.

The second stack is for the service and creates the ECS task definition based on the version of the docker image that should be run, as well as the ECS service which specifies how many tasks to run and the ALB to associate with.

The reason for the two stacks is that you must have the ECR provisioned before you can push a docker image to it, and you must have a docker image in the ECR before creating the ECS service. Ideally, you would create the repo stack once, then configure a CodePipeline job to continuously push changes to the code to ECR as new images and then updating the service stack to reference the newly pushed image.


The entire repo template is available at repo.template, an important new resource to check out is the ALB Listener Rule that provides the URL patterns that should be handled by the new target group that is created:

    Type: AWS::ElasticLoadBalancingV2::ListenerRule
      - Type: forward
        TargetGroupArn: !Ref EcsElbTargetGroup
      - Field: path-pattern
        Values: [“/bananas”]
      ListenerArn: !Ref EcsElbListenerArn
      Priority: 1

The entire service template is available at service.template, but notice that the ECS Task Definition uses port 0 for HostPort. This allows for ephemeral ports that are assigned by ECS to remove the requirement for us to manage container ports:

    Type: AWS::ECS::TaskDefinition
      - Name: banana-service
        Cpu: '10'
        Essential: 'true'
        Image: !Ref ImageUrl
        Memory: '300'
        - HostPort: 0
          ContainerPort: 8080
      Volumes: []

Next, notice how the ECS Service is created and associated with the newly created Target Group:

    Type: AWS::ECS::Service
      Cluster: !Ref EcsCluster
      DesiredCount: 6
        MaximumPercent: 100
        MinimumHealthyPercent: 0
      - ContainerName: microservice-exemplar-container
        ContainerPort: '8080'
        TargetGroupArn: !Ref EcsElbTargetGroupArn
      Role: !Ref EcsServiceRole
      TaskDefinition: !Ref MicroserviceTaskDefinition

Finally, we have a Gradle task in our service build.gradle for upserting the repo CloudFormation stack:

task repoUp(type: StackUpTask) {
 region project.region
 stackName "${project.stackBaseName}-repo-${}"
 template file("../ecs-resources/repo.template")
 waitForComplete true
 capabilityIam true

 stackParams['PathPattern'] ='/bananas'
 stackParams['RepoName'] =

And then another to upsert the service CloudFormation stack:

task serviceUp(type: StackUpTask) {
 region project.region
 stackName "${project.stackBaseName}-service-${}"
 template file("../ecs-resources/service.template")
 waitForComplete true
 capabilityIam true

 stackParams['ServiceDesiredCount'] = project.serviceDesiredCount
 stackParams['ImageUrl'] = "${project.repoUrl}:${project.revision}"

 mustRunAfter dockerPushImage

And finally, a task to coordinate the management of the stacks and the build/push of the image:

task deploy(dependsOn: ['dockerPushImage', 'serviceUp']) {
  description "Upserts the repo stack, pushes a docker image, then upserts the service stack"

dockerPushImage.dependsOn repoUp

This then provides a simple command to deploy new or update existing microservices:

$ gradle deploy

Defining a similar build.gradle file in other microservices to deploy them to the same platform.

Blue/Green Deployment

When running the gradle deploy, the existing service stack is updated to use a new task definition that references a new docker image in ECR. This CloudFormation update causes ECS to do a rolling replacement of the containers, launching new containers with the new image and killing containers with the old image.

However, if you are looking for a more traditional blue/green deployment, this could be accomplished by creating a new service stack (the green stack) with the new docker image, rather than updating the existing. The new stack would attach to the existing ALB target group at which point you could update the existing service stack (the blue stack) to no longer reference the ALB target group, which would take it out of service without killing the containers.

Next Steps

Stay tuned for future blog posts that builds on this platform by accomplishing service discovery in a more decoupled manner through the use of Eureka as a service registry, Ribbon as a service client, and Zuul as an edge router.

Additionally, this solution isn’t complete since there is no Continuous Delivery pipeline defined. Look for an additional post showing how to use CodePipeline to orchestrate the movement of changes to the microservice source code into production.

The code for the examples demonstrated in this post are located at Let us know if you have any comments or questions @stelligent.

Are you interested in building resilient applications in AWS? Stelligent is hiring!

Stelligent Bookclub: “Working Effectively with Legacy Code” by Michael Feathers

If you’re a member of the tech industry then you’ve probably had to work with legacy code — those ancient systems that just hold everything together. Not every project is greenfield, thus “Working Effectively with Legacy Code” by Michael Feathers is a book that has a reputation that would provide a good insight into how we could improve our relationships with these necessary systems and provide a better service to our customers. This blog post will share of some the key takeaways from the book.

What is Legacy Code?

In a world where continuous delivery is on every company’s radar, it’s important to accept that legacy code exists and you’re going to be required to bring it into the fold. One of the key aspects of a successful continuous delivery model is the feedback loop. Your developers and your businesses build confidence in your ability to make changes through testing. Static analysis, unit testing, integration testing, and performance testing are all the difference between having a continuously deployed application and a high risk, rarely changed legacy codebase.

To quote the book, legacy code can be simply defined as ‘code without tests’. This definition at it’s surface may seem too simple and lacks the nuance that comes with the anxiety of working with legacy code. However, underneath it provides an immediate insight into an important way that we shed some of that negativity that comes with working with legacy code.

The transformation of untested legacy code into a low-risk well understood codebase is the focus of this book.

Making Changes & Minimizing Risk

“Anyone can open a text editor and spew the most arcane non-sense into it.”

Many of the changes to software can be simplified to: understanding the design, making changes and poking around to ensure you didn’t break anything. It’s not a very effective way to manage the risk that is associated with code changes. As developers, we all want to follow best practice: to write clean, well-structured and heavily tested code. Unfortunately, this Utopian culture is a rare find and we must learn to live with real-world deliverable code.

Changes are made to code for one of four reasons: new features, bug fixing, design improvements or optimization. Working on any codebase involves risk, but poorly understood software makes it impossible to understand the risk with any change. This leads to a mentality of risk mitigation where developers follow the ‘If it ain’t broke, don’t fix it’ mantra. I’m not talking about developers leaving obvious problems fester when working on a piece of code. Rather, when a change is being committed to a legacy codebase, developers tend to take the quick and painless approach — the ‘let’s just hack it in right here’.

There are three questions the author suggests that you can use to mitigate risk:

  1. What changes will we have to make?
  2. How will we know that we’ve done it correctly?
  3. How will we know that we haven’t broken anything?


It’s that concept of software development that is known by many but understood by few. Any developer will tell you that code should be tested, but for a variety of reasons it’s either never done or abandoned early. This comes from a pattern of poor understanding and laziness — and I mean both by managers and developers. It’s as much cultural as technical, if not more.

Tests are a way to detect change and improve the speed at which you receive feedback in your workflow. Software is complicated and so is testing — have you ever wondered why there are so many types of testing? We’ve mentioned a few already, but try to imagine each type as a way of localizing problems. With each type of testing you’re focusing the feedback on the various layers of your application testing deeper and deeper so you can detect changes in functionality at any level.

Take unit tests for example. We know they’re supposed to run in isolated environments, be fast (really fast!), and provide very localized error information. In a situation where you’re continuously integrating (i.e. making frequent commits) your unit tests should be able to rapidly provide feedback on your changes. You’ll know exactly which change caused the error and you’ll be able to integrate more quickly.

Regression tests come in all shapes, sizes and have plenty of names: integration tests, smoke tests, acceptance tests, etc to name a few. The key point to understand with each layer of testing you’re moving further away from localized feedback into a realm of testing for regression. It’s a hard concept, so the author of the book gives an interesting anecdote to describe a situation where your unit tests may pass but some higher level functionality will be changed unintentionally — imagine the code works correctly, but there’s unintended consequences. With each layer of testing you’re moving away from precision error localization and checking for code functionality to testing if your program is working correctly.


Working with legacy code has a bit of a conundrum — how do you write tests into a codebase without first making changes? You cannot, and so the author proposes taking on the technical debt of adding tests at the same time as performing code refactoring.

“We want to make functional changes that deliver value while bringing more of the system under test.”

The key to successful refactoring is picking off a chunk of work that you can manage. It’s very easy as a developer to go down a rabbit hole of refactoring and adding tests when you initially only needed to fix a simple bug. Sure, you may succeed in your hare adventures of refactoring an entire interface. It’ll pass all the tests, then, suddenly it’s now failing in integration and you’re burning countless hours making it worse than before. When you’re working on a system that will need refactoring, try to limit your scope to only the functionality that requires change in the first place. Then, when you succeed and your colleagues are cheering, move onto the next.

“Let’s face it, working with legacy code is surgery and doctors never operate alone.”

The author made a small reference to Extreme Programming (XP) and I feel it deserves more attention. If you’re not already, you should try pair programming when working on difficult problems and especially when refactoring. It’s an excellent way to reduce risk and spread knowledge around.

There’s more

Overall, “Working Effectively with Legacy Code” was a great read. Every experienced developer has something to learn with some of the techniques illustrated. Additionally, the breadth of topics can provide useful patterns about transforming a dreary legacy system into a world where Continuous Delivery is possible. It’s a fun topic to cover that allows you to find enjoyment in refactoring and writing tests.

There’s a ton more information within the text that relates specifically to how to implement changes varying from object-oriented design, dependency management and cultural guidance.

Interested in working someplace that gives all employees an impressive book expense budget? We’re hiring.

Refactoring CD Pipelines – Part 2: Metadata driven pipeline

In the last post we created two basic applications each with a basic shell script to automate deploying them into AWS. In this post we will continue refactoring those deploy scripts to work on getting them setup using a common pipeline. We are aiming to have the pipeline executable code configured through metadata allowing us to customize the pipeline through configuration. Although we are not using a build server one could easily be used to orchestrate the pipeline with the framework that we create here.

Our previous deploy script focused on deploying the application to AWS. If we look at the scripts from each repository’s /pipeline folder side-by-side we notice that they are almost identical. This seems like a good place to practice some code reuse. Let’s build out a pipeline that can allow us to share common code across the two applications. Once we complete this pipeline we will have common code that is flexible enough to be used across multiple applications.

We start by defining the steps of a pipeline from the existing deploy scripts. By reading the scripts we can identify that we get the code and gather variables, run some tests, create an AWS Cloud Formation (CFN) stack, and run a simple test against each deployed application.

Logical grouping. Note the practical differences between the pipelines.
Pipeline Step App: blog_refactor_php App: blog_refactor_nodejs
SCM Polling Variables, checkout code… Variables, checkout code…
Static Analysis foodcritic() foodcritic(), jslint()
App Prerequisites AWS Relational Database Service (RDS) creation, Chef runlist/attributes upload Chef runlist/attributes upload
App Deployment AWS Auto Scaling Group (ASG) creation/app deployment ASG creation/app deployment
Acceptance Testing curl endpoint – expect 200 curl endpoint – expect 200

Now that we have the steps laid out we need to decide on a technology to implement this pipeline.

(Rake + Ruby) > Bash

We could continue to use bash for our pipeline code by adding in some structure rather than a flat script. Even though extracting the steps we have identified into function gains us some code reuse, we are still lacking features. By switching to a more advanced language we will gain library support we can leverage to avoid reinventing the wheel. Ruby and Rake seem like a good combination to build the pipeline since they seem to fulfill all of these requirements.

Rake is a well-established build tool that leverages the power of Ruby as a dynamic language. Besides defining tasks with prerequisites and parallel task execution, it offers us the ability to define tasks dynamically. Rake is task oriented which mirrors our pipeline “steps” idea pretty well. We can also get some flexibility out of Rake with the ability to run tasks directly from the command line or integrate the rake tasks into a CI/CD system. Since Rake is just Ruby anyway, integrating any classes we create into the tasks should be pretty simple as well.

To maximize code reuse in an easy, repeatable way you could create a Ruby gem to house common code. This is the approach we took, using metadata to dynamically define Rake tasks and wiring those tasks to reusable classes in a Ruby gem.

Not Your Parent’s Rakefile

Our approach uses Rake primarily as the connective tissue between a hypothetical CI/CD server and the underlying Ruby code that executes the pipeline step logic.

Normally, Rake tasks will be defined along with the code they execute, either in a Rakefile housing multiple tasks or split into separate .rake files. For our sample applications to leverage the pipeline gem we also use a Rakefile, but its job is mostly to read the application’s pipeline metadata and convey it to the gem’s Rakefile.

The gem’s Rakefile iterates over the steps array in the pipeline metadata, defining one Rake task per pipeline step. Each Rake task’s pipeline functionality is delegated to a dynamically-instantiated Ruby class (the ‘worker’ class in the code snippet below) assigned to that step in the metadata.

The @store variable is an instance of a parameter-store class; substitute with any parameter or credentials store in your implementations. Injected into each worker class, the store instance gives the worker access to any outputs from previous Rake tasks as well as the ability to create outputs for downstream Rake tasks.

Figure 1: An application’s pipeline metadata becomes Rake tasks!

The steps are just Ruby classes; your team codes them to match what your pipelines need to do. Similarly, your team should code the store class to match your team’s needs. Because of this, what we’re showing here is more in line with a framework to help your team maximize code reuse.

That seems like a sweet piece of tech, but what do we do when one of our applications has pipeline needs that don’t align perfectly with our gem’s worker class capabilities?

Down With Conformity!

For our post, we have refactored both our PHP and NodeJS applications to leverage the pipeline gem. While most of the pipeline gem’s worker classes are sufficient to support each application’s CD pipeline, our framework needed to be flexible enough to extend a worker class as well as to support fully-custom steps.

Figure 2: Worker classes can come from your application pipeline (left) or the pipeline gem (center) in order to support the corresponding Rake task (right).

Extending standard steps via worker-class inheritance

The gem defines a class that performs some static code analysis as part of the CD pipeline, namely running foodcritic against the application’s pipeline cookbook. To support linting for NodeJS application, we can follow these steps so that our NodeJSApp application can provide its own pipeline customization.

First, we create a Ruby class (we called it `ExtendedStaticAnalysis`) in the NodeJS application’s pipeline/lib folder that inherits from the gem’s StaticAnalysis class. This gives us access to execute the foodcritic tests provided by the base worker class.

Next we add a method to ExtendedStaticAnalysis that performs the jslint analysis.

Finally, we change our application’s pipeline metadata so that it will instantiate this new class instead of the gem’s standard worker to perform that step. If we then run `rake –tasks` to show the steps our pipeline now supports, we’ll see a `build:commit:extended_static_analysis` task in the list! (see Figure 1)

Adding custom steps

As with extending step classes, your application’s pipeline can implement its own steps instead of using one of the built-in step implementations. If there’s a serious mismatch between what your application’s pipeline needs to do and what the gem provides, we can create new worker classes in the gem to support a whole new category of pipeline steps.

That’s a wrap

We now have a gem that can be reused and extended for our application pipelines and more. By encapsulating the common logic into Ruby classes in the gem, we’ve eliminated the repetitious code (and the temptation to copy/paste!) without taking away the flexibility to have custom logic in specific pipeline steps. When we expand our suite of applications, we rely on metadata and a small amount of custom code where necessary. Instead of needing to execute Rake tasks from a wrapper script or by hand (great for step development) you can integrate this with your CI/CD server.

See these GitHub repositories referenced by the article:

Authors: Jeff Dugas and Matt Adams

Interested in building out cutting edge continuous delivery platforms and developing a deep love/hate relationship with Rake? Stelligent is hiring!

One-Button Everything in AWS

There’s an approach that we seek at Stelligent known informally as “One-Button Everything”. At times, it can be an elusive goal but it’s something that we often aim to achieve on behalf of our customers. The idea is that we want to be able to create the complete, functioning software system by clicking a single button. Since all of the work we do for our customers is on the Amazon Web Services (AWS) platform, this often results in a single Launch Stack button (as shown below).


The button is a nice metaphor while also being a literal thing in AWS CloudFormation. It’s also emblematic of the principles of simplicity, comprehensiveness, and consistency (discussed later). For example, while you might be able to do the same with a single command, you would need to take into account the setup and configuration required in order to run that single command which might reduce simplicity and consistency.

In this post, I describe the principles and motivation, user base, scope of a complete software system, assumptions and prerequisites, alternative scenarios, and documentation.


These are the three principles that the “one-button everything” mantra is based upon:

  • Comprehensive – The goal is to orchestrate the full solution stack, not partial implementations of the solution stack
  • Consistent – Works the same way every time. Documentation is similar across solution stacks. Once you require “one-off” implementations, it makes it susceptible to errors
  • Simple – Few steps and dependencies. Make it difficult to make mistakes.

These three principles guide the design of these one-button systems.

The Users are Us

The users of your one-button systems are often other engineers within your organization. A tired argument you might hear is that you don’t need to create simple systems for other engineers since they’re technical too. I could not disagree more with this reasoning. As engineers, we should not be spending time on repetitive, failure-prone activities and put the burden on others  – at scale. This belief doesn’t best serve the needs of the organization as most engineers should be spending time on providing features to users who receive value of their work.

What is the complete software system?

A common question we get is “what makes up a complete software system?” To us, the complete software system refers to all of the infrastructure and software that composes the system. For example, this includes:

  • Networks (e.g. VPC)
  • Compute (EC2, Containers, Serverless, etc.)
  • Storage (e.g. S3, EBS, etc.)
  • Database and Data (RDS, DynamoDB, etc.)
  • Version control repositories (e.g. CodeCommit)
  • Deployment Pipelines
    • Orchestration of software delivery workflows
    • Execution of these workflows
    • Building and deploying application/Service code
    • Test execution
    • Static Analysis
    • Security hardening, tests and analysis
    • Notification systems
  • Monitoring systems


Assumptions and Prerequisites

In order to create an effective single-button system, the following patterns are assumed:

  • Everything as Code – The application, configuration, infrastructure, data, tests, and the process to launch the system are all defined in code;
  • Everything is Versioned – All of this code must be versioned in a version-control repository;
  • Everything is Automated – The process for going from zero to working system including the workflow and the “glue code” that puts it all together is defined in code and automated;
  • Client configuration is not assumed – Ideally, you don’t want users to require a certain client-side configuration as it presents room for error and confusion.

As for prerequisites, there might be certain assumptions you document – as long as they are truly one-time only operations. For example, we included the following prerequisites in a demo system we open sourced:

Given a version-control repository, the bootstrapping and the application must be capable of launching from a single CloudFormation command and a CloudFormation button click – assuming that an EC2 Key Pair and Route 53 Hosted Zone has been configured. The demo should not be required to run from a local environment.

So, it assumes the user has created or, in this case, cloned the Git repository and that they’ve established a valid EC2 Key Pair and a Route 53 Hosted Zone. Given that, users should be able to click the Launch Stack button in their AWS account and launch the complete working system with a running deployment pipeline. In this case, the working system includes a VPC network (and associated network resources), IAM, ENI, DynamoDB, a utility EC2 instance, a Jenkins server and a running pipeline in CodePipeline. This pipeline then uses CloudFormation to provision the application infrastructure (e.g. EC2 instances, connect to DynamoDB), etc. Then, as part of the pipeline and its integration with Jenkins, it runs tasks to use Chef to configure the environment to run Node.js, and run automated infrastructure and application tests in RSpec, ServerSpec, Mocha, and Chai. All of this behavior has been provisioned, configured and orchestrated in a way so that anyone can initially click one Launch Stack button to go from zero to fully working system in less than 30 minutes. Once the initial environment is up and running, the application provisioning, configuration, deployment and testing runs in less than 10 minutes. For more on this demo environment, see below.

Decompose system based on lifecycles

While you might create a system that’s capable of recreating the system from a single launch stack button, it doesn’t mean that you apply all changes from the ground up for every change. This is because building everything from “scratch” every time erodes fast feedback. So, while you might have a single button to launch the entire solution stack, you won’t be clicking it all that much. This might sound antithetical to everything I’ve been saying so far but it’s really about viewing it as a single system that can be updated based on change frequency to logical architectural layers.

…while you might have a single button to launch the entire solution stack, you won’t be clicking it all that much.

For example, because you want quick feedback, you won’t rebuild your deployment pipeline, environment images, or the network if there is only a change to the application/service code. Each of these layers might have their own deployment pipelines that get triggered when there’s a code change (which is, often, less frequent than application code changes). As part of the application deployment pipeline, it can consume artifacts generated by the other pipelines.

This approach can take some time to get right as you still want to rebuild your system whenever there’s a code commit; you just need to be judicious in terms of what gets rebuilt depending with each change type. As illustrated in Figure 1, here are some examples of different layers that we often decompose our tech stacks into along with their typical change frequency (your product may vary):

  • Network (VPC) – Once a week
  • Storage – Once a week
  • Routing – Once a week. For example, Route 53 changes.
  • Database – Once a week
  • Deployment Pipeline – Once a week. Apply a CloudFormation stack update.
  • Environment Images – Once per day
  • Application/Service – Many times a day
  • Data- Many times a day
Figure 1 – Stack Lifecycles

As illustrated, while a single solution can be launched from one button, it’s often decomposed into a series of other buttons and commands.


Launching Non-CloudFormation Solutions

As mentioned, one of the nice features of CloudFormation is the ability to provide a single link to launch a stack. While the stack launch is initially driven by CloudFormation, it can also orchestrate to any other type of tool you might use via the AWS::CloudFormation::Init resource type. This way you can get the benefits of launching from a single Launch Stack button in your AWS account while leveraging the integration with many other types of tools.

For example, you might have implemented your using Ruby, Docker, Python, etc. In this scenario, let’s imagine you’re using CFNDSL (albeit, a little circular considering you’re using Ruby to generate CloudFormation). In this scenario, users could still initiate the solution by clicking on a single Launch Stack button – which would launch a CloudFormation stack. The CloudFormation stack would configure a client that uses Ruby to run CFNDSL which uses the Ruby code to generate CloudFormation behavior. By driving this through CloudFormation, you don’t rely on the user to properly configure the client.

Alternatively, you can meet many of the design goals by implementing the same through a single command but, realize, there can be a potential cost in simplicity and consistency as a result.


The documentation provided to users/engineers should be simple to understand and execute and difficult to commit errors. Figure 2 shows an example from in which an engineer can view the prerequisites, supported AWS regions, an active architecture illustration, a how-to video, and finally, a Launch Stack button. You might use this as a sample for the documentation you write on behalf of the users of your AWS infrastructure.

Figure 2 – Launch Stack Documentation


You learned how you can create one button (or command) to launch your entire software system from code, prerequisites, what makes up that software system, how you might decompose subsystems based on lifecycles and, finally, a way of documenting the instructions for launching the solution.

Stelligent is hiring! Do you enjoy working on complex problems like figuring out ways to automate all the things as part of a deployment pipeline? Do you believe in the “everything-as-code” mantra? If your skills and interests lie at the intersection of DevOps automation and the AWS cloud, check out the careers page on our website.

Testing Nix Packages in Docker

In this blog post, we will cover developing nix packages and testing them in docker.  We will set up a container with the proper environment to build nix packages, and then we will test build existing packages from the nixpkgs repo.  This lays the foundation for using docker to test your own nix packages.

First, a quick introduction to Nix. Nix is a purely functional package manager. Nix expressions, a simple functional language, declaratively define each package.  A Nix expression describes everything that goes into a package build action, known as a derivation.  Because it’s a functional language, it’s easy to support building variants of a package: turn the Nix expression into a function and call it any number of times with the appropriate arguments. Due to the hashing scheme, variants don’t conflict with each other in the Nix store.  More details regarding Nix packages and NixOS can be found here.

To begin, we must write the Dockerfile that will be used to build the development environment.  We need to declare the baseimage to be used (centos7), install a few dependencies in order to install nix and clone nixpkgs repo.  We also need to set up the nix user and permissions:

FROM centos:latest

MAINTAINER “Fernando J Pando” <nando@********.com>

RUN yum -y install bzip2 perl-Digest-SHA git

RUN adduser nixuser && groupadd nixbld && usermod -aG nixbld nixuser

RUN mkdir -m 0755 /nix && chown nixuser /nix

USER nixuser

WORKDIR /home/nixuser

We clone the nixpkgs github repo (this can be set to any fork/branch for testing):

RUN git clone

We download the nix installer (latest stable 1.11.4):

RUN curl -o

We are now ready to set up the environment that will allow us to build nix packages in this container.  There are several environment variables that need to be set for this to work, and we can pull them from the environment script provided by nix installation (~/.nix-profile/etc/profile.d/  For easy reference, here is that file:

# Set the default profile.
if ! [ -L &amp;amp;quot;$NIX_LINK&amp;amp;quot; ]; then
echo &amp;amp;quot;creating $NIX_LINK&amp;amp;quot; &amp;amp;gt;&amp;amp;amp;2
/nix/store/xmlp6pyxi6hi3vazw9821nlhhiap6z63-coreutils-8.24/bin/ln -s &amp;amp;quot;$_NIX_DEF_LINK&amp;amp;quot; &amp;amp;quot;$NIX_LINK&amp;amp;quot;

export PATH=$NIX_LINK/bin:$NIX_LINK/sbin:$PATH

# Subscribe the user to the Nixpkgs channel by default.
if [ ! -e $HOME/.nix-channels ]; then
echo &amp;amp;quot; nixpkgs&amp;amp;quot; &amp;amp;gt; $HOME/.nix-channels

# Append ~/.nix-defexpr/channels/nixpkgs to $NIX_PATH so that
# &amp;amp;lt;nixpkgs&amp;amp;gt; paths work when the user has fetched the Nixpkgs
# channel.
export NIX_PATH=${NIX_PATH:+$NIX_PATH:}nixpkgs=$HOME/.nix-defexpr/channels/nixpkgs

# Set $SSL_CERT_FILE so that Nixpkgs applications like curl work.
if [ -e /etc/ssl/certs/ca-certificates.crt ]; then # NixOS, Ubuntu, Debian, Gentoo, Arch
export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
elif [ -e /etc/ssl/certs/ca-bundle.crt ]; then # Old NixOS
export SSL_CERT_FILE=/etc/ssl/certs/ca-bundle.crt
elif [ -e /etc/pki/tls/certs/ca-bundle.crt ]; then # Fedora, CentOS
export SSL_CERT_FILE=/etc/pki/tls/certs/ca-bundle.crt
elif [ -e &amp;amp;quot;$NIX_LINK/etc/ssl/certs/ca-bundle.crt&amp;amp;quot; ]; then # fall back to cacert in Nix profile
export SSL_CERT_FILE=&amp;amp;quot;$NIX_LINK/etc/ssl/certs/ca-bundle.crt&amp;amp;quot;
elif [ -e &amp;amp;quot;$NIX_LINK/etc/ca-bundle.crt&amp;amp;quot; ]; then # old cacert in Nix profile
export SSL_CERT_FILE=&amp;amp;quot;$NIX_LINK/etc/ca-bundle.crt&amp;amp;quot;

We pull out the relevant environment given a Docker base Centos7 image:

ENV USER=nixuser

ENV HOME=/home/nixuser

ENV _NIX_DEF_LINK=/nix/var/nix/profiles/default

ENV SSL_CERT_FILE=/etc/pki/tls/certs/ca-bundle.crt

ENV NIX_LINK=$HOME/.nix-profile


ENV NIX_PATH=${NIX_PATH:+$NIX_PATH:}nixpkgs=$HOME/.nix-defexpr/channels/nixpkgs

RUN echo “ nixpkgs” > $HOME/.nix-channels


Now that the environment has been set up, we are ready to install nix:

RUN bash /home/nixuser/

The last step is to set the working directory to the cloned github nixpkgs directory, so testing will execute from there when the container is run:

WORKDIR /home/nixuser/nixpkgs

At this point, we are ready to build the container:

docker build . -t nand0p/nixpkgs-devel

Conversely, you can pull this container image from docker hub:

docker pull nand0p/nixpkgs-devel

(NOTE: The docker hub image will contain a version of the nixpkgs repo from the time the container image was built.   If you are testing on the bleeding edge, always build the container fresh before testing.)

Once the container is ready, we can now begin test building an existing nix package:

docker run -ti nand0p/nixpkgs-devel nix-build -A nginx

The above command will test build the nginx package inside your new docker container. It will also build all the dependencies.  In order to test the package interactively to ensure to resulting binary works as expected, the container can be launched with bash:

docker run -ti nand0p/nixpkgs-devel bash
[nixuser@aa2c8e29c5e9 nixpkgs]$

Once you have shell inside the container, you can build and test run the package:

[nixuser@aa2c8e29c5e9 nixpkgs]$ nix-build -A nginx

[nixuser@aa2c8e29c5e9 nixpkgs]$ /nix/store/7axviwfzbsqy50zznfxb7jzfvmg9pmwx-nginx-1.10.1/bin/nginx -V
nginx version: nginx/1.10.1
built by gcc 5.4.0 (GCC)
built with OpenSSL 1.0.2h 3 May 2016
TLS SNI support enabled
configure arguments: –prefix=/nix/store/7axviwfzbsqy50zznfxb7jzfvmg9pmwx-nginx-1.10.1 –with-http_ssl_module –with-http_v2_module –with-http_realip_module –with-http_addition_module –with-http_xslt_module –with-http_image_filter_module –with-http_geoip_module –with-http_sub_module –with-http_dav_module –with-http_flv_module –with-http_mp4_module –with-http_gunzip_module –with-http_gzip_static_module –with-http_auth_request_module –with-http_random_index_module –with-http_secure_link_module –with-http_degradation_module –with-http_stub_status_module –with-ipv6 –with-file-aio –add-module=/nix/store/vbfb8z3hgymbaz59wa54qf33yl84jii7-nginx-rtmp-module-v1.1.7-src –add-module=/nix/store/ps5s8q9v91l03972gw0h4l9iazx062km-nginx-dav-ext-module-v0.0.3-src –add-module=/nix/store/46sjsnbkx7rgwgn3pgigcfaygrs2cx30-headers-more-nginx-module-v0.26-src –with-cc-opt=’-fPIE -fstack-protector-all –param ssp-buffer-size=4 -O2 -D_FORTIFY_SOURCE=2′ –with-ld-opt=’-pie -Wl,-z,relro,-z,now’

Thanks for reading, and stay tuned for part II, where we will modify a nix package for our custom needs, and test it against our own fork of the nixpkgs repo.

Refactoring CD Pipelines – Part 1: Chef-Solo in AWS AutoScaling Groups

We often overlook similarities between new CD Pipeline code we’re writing today and code we’ve already written. In addition, we might sometimes rely on the ‘copy, paste, then modify’ approach to make quick work of CD Pipelines supporting similar application architectures. Despite the short-term gains, what often results is code sprawl and maintenance headaches. This blog post series is aimed at helping to reduce that sprawl and improve code re-use across CD Pipelines.

This mini-series references two real but simple applications, hosted in Github, representing the kind of organic growth phenomenon we often encounter and sometimes even cause, despite our best intentions. We’ll refactor them each a couple of times over the course of this series, with this post covering CloudFormation template reuse.

Chef’s a great tool… let’s misuse it!

During the deployment of an AWS AutoScaling Group, we typically rely on the user data section of the Launch Configuration to configure new instances in an automated, repeatable way. A common automation practice is to use chef-solo to accomplish this. We carefully chose chef-solo as a great tool for immutable infrastructure approaches. Both applications have CD Pipelines that leverage it as a scale-time configuration tool by reading a JSON document describing the actions and attributes to be applied.

It’s all roses

It’s a great approach: we sprinkle in a handful or two of CloudFormation parameters to support our Launch Configuration, embed the chef-solo JSON in the user data and decorate it with references to the CloudFormation parameters. Voila, we’re done! The implementation hardly took any time (probably less than an hour per application if you could find good examples in the internet), and each time we need a new CD Pipeline, we can just stamp out a new CloudFormation template.

Figure 1: Launch Configuration user data (as plain text)


Figure 2: CloudFormation parameters (corresponding to Figure 1)

Well, it’s mostly roses…

Why is it, then, that a few months and a dozen or so CD Pipelines later, we’re spending all our time debugging and doing maintenance on what should be minor tweaks to our application configurations? New configuration parameters take hours of trial and error, and new application pipelines can be copied and pasted into place, but even then it takes hours to scrape out the previous application’s specific needs from its CloudFormation template and replace them.

Fine, it’s got thorns, and they’re slowing us down

Maybe our great solution could have been better? Let’s start with the major pitfall to our original approach: each application we support has its own highly-customized CloudFormation template.

  • lots of application-specific CFN parameters exist solely to shuttle values to the chef-solo JSON
  • fairly convoluted user data, containing an embedded JSON structure and parameter references, is a bear to maintain
  • tracing parameter values from the CD Pipeline, traversing the CFN parameters into the user data… that’ll take some time to debug when it goes awry

One path to code reuse

Since we’re referencing two real GitHub application repositories that demonstrate our current predicament, we’ll continue using those repositories to present our solution via a code branch named Phase1 in each repository. At this point, we know our applications share enough of a common infrastructure approach that they should be sharing that part of the CloudFormation template.

The first part of our solution will be to extract the ‘differences’ from the CloudFormation templates between these two application pipelines. That should leave us with a common skeleton to work with, minus all the Chef specific items and user data, which will allow us to push the CFN template into an S3 bucket to be shared by both application CD pipelines.

The second part will be to add back the required application specificity, but in a way that migrates those differences from the CloudFormation templates to external artifacts stored in S3.

Taking it apart

Our first tangible goal is to make the user data generic enough to support both applications. We start by moving the inline chef-solo JSON to its own plain JSON document in each application’s pipeline folder (/pipelines/config/app-config.json). Later, we’ll modify our CD pipelines so they can make application and deployment-specific versions of that file and upload it to an S3 bucket.

Figure 3: Before/after comparison (diff) of our Launch Configuration User Data

Screen Shot 2016-08-30 at 4.41.42 PM
Left: original user data; Right: updated user data

The second goal is to make a single, vanilla CloudFormation template. Since we orphaned these Chef only CloudFormation parameters by removing the parts of the user data referencing them, we can remove them. The resulting template’s focus can now be on meeting the infrastructure concerns of our applications.

Figure 4: Before/after comparison (diff) of the CloudFormation parameters required

Screen Shot 2016-08-31 at 6.39.47 PM
Left: original CFN parameters; Right: pared-down parameters


At this point, we have eliminated all the differences between the CloudFormation templates, but now they can’t configure our application! Let’s fix that.

Reassembling it for reuse

Our objective now is to make our Launch Configuration user data truly generic so that we can actually reuse our CloudFormation template across both applications. We do that by scripting it to download the JSON that Chef needs from a specified S3 bucket. At the same time, we enhance the CD Pipelines by scripting them to create application and deploy-specific JSON, and to push that JSON to our S3 bucket.

Figure 5: Chef JSON stored as a deploy-specific object in S3

Screen Shot 2016-08-30 at 4.43.50 PM
The S3 key is unique to the deployment.

To stitch these things together we add back one CloudFormation parameter, ChefJsonKey, required by both CD Pipelines – its value at execution time will be the S3 key where the Chef JSON will be downloaded from. (Since our CD Pipeline has created that file, it’s primed to provide that parameter value when it executes the CloudFormation stack.)

Two small details left. First, we give our AutoScaling Group instances the ability to download from that S3 bucket. Now that we’re convinced our CloudFormation template is as generic as it needs to be, we upload it to S3 and have our CD Pipelines reference it as an S3 URL.

Figure 6: Our S3 bucket structure ‘replaces’ the /pipeline/config folder 

Screen Shot 2016-08-30 at 4.43.36 PM
The templates can be maintained in GitHub.

That’s a wrap

We now have a vanilla CloudFormation template that supports both applications. When an AutoScaling group scales up, the new servers will now download a Chef JSON document from S3 in order to execute chef-solo. We were able to eliminate that template from both application pipelines and still get all the benefits of Chef based server configuration.

See these GitHub repositories referenced throughout the article:

In Part 2 of this series, we’ll continue our refactoring effort with a focus on the CD Pipeline code itself.

Authors: Jeff Dugas and Matt Adams

Interested in working with and sometimes misusing configuration management tools like Chef, Puppet, and Ansible ? Stelligent is hiring!


Containerized CI Solutions in AWS – Part 1: Jenkins in ECS

In this first post of a series exploring containerized CI solutions, I’m going to be addressing the CI tool with the largest market share in the space: Jenkins. Whether you’re already running Jenkins in a more traditional virtualized or bare metal environment, or if you’re using another CI tool entirely, I hope to show you how and why you might want to run your CI environment using Jenkins in Docker, particularly on Amazon EC2 Container Service (ECS). If I’ve done my job right and all goes well, you should have run a successful Jenkins build on ECS well within a half hour from now!

For more background information on ECS and provisioning ECS resources using CloudFormation, please feel free to check out Stelligent’s two-part blog series on the topic.

An insanely quick background on Jenkins

Jenkins is an open source CI tool written in Java. One of its strengths is the very large collection of plugins available, including one for ECS. The Amazon EC2 Container Service Plugin can launch containers on your ECS cluster that automatically register themselves as Jenkins slaves, execute the appropriate Jenkins job on the container, and then automatically remove the container/build slave afterwards.

But, first…why?

But before diving into the demo, why would you want to run your CI builds in containers? First, containers are portable, which, especially when also utilizing Docker for your development environment, will give you a great deal of confidence that if your application builds in a Dockerized CI environment, it will build successfully locally and vice-versa. Next, even if you’re not using Docker for your development environment, a containerized CI environment will give you the benefit of an immutable build infrastructure where you can be sure that you’re building your application in a new ephemeral environment each time. And last but certainly not least, provisioning containers is very fast compared to virtual machines, which is something that you will notice immediately if you’re used to spinning up VMs/cloud instances for build slaves like with the Amazon EC2 Plugin.

As for running the Jenkins master on ECS, one benefit is fast recovery if the Jenkins EC2 instance goes down. When using EFS for Jenkins state storage and a multi-AZ ECS cluster like in this demo, the Jenkins master will recover very quickly in the event of an EC2 container instance failure or AZ outage.

Okay, let’s get down to business…

Let’s begin: first launch the provided CloudFormation stack by clicking the button below:


You’ll have to enter these parameters:

  • AvailabilityZone1: an AZ that your AWS account has access to
  • AvailabilityZone2: another accessible AZ in the same region as AvailabilityZone1
  • InstanceType: EC2 instance type for ECS container instances (must be at least t2.small for this demo)
  • KeyPair: a key pair that will allow you to SSH into the ECS container instances, if necessary
  • PublicAccessCIDR: a CIDR block that will have access to view the public Jenkins proxy and SSH into container instances (ex:
    • NOTE: Jenkins will not automatically be secured by a user and password, so this parameter can be used to secure your Jenkins master by limiting network access to the provided CIDR block.  If you’d like to limit access to Jenkins to only your public IP address, enter “[YOUR_PUBLIC_IP_ADDRESS]/32” here, or if you’d like to allow access to the world (and then possibly secure Jenkins yourself afterwards) enter ““.
It’s almost this easy, but just click Launch Stack once

Okay, the stack is launching—so what’s going on here?

template1-designer (1).png
Ohhh, now I get it

In a nutshell, this CloudFormation stack provisions a VPC containing a multi-AZ ECS cluster, and a Jenkins ECS service that uses Amazon Elastic File System (Amazon EFS) storage to persist Jenkins data. For ease of use, this CloudFormation stack also contains a basic NGINX reverse proxy that allows you to view Jenkins via a public endpoint. Both Jenkins and NGINX each consist of an ECS service, ECS task definition, and classic ELB (internal for Jenkins, and Internet-facing for the proxy).

In actuality, I think that a lot of organizations would choose to keep Jenkins internal in a private subnet and rely on a VPN for outside access to Jenkins. Instead, to keep things relatively simple, this stack only creates public subnets and relies on security groups for network access control.


There are a couple of reasons why running a Jenkins master on ECS is a bit complicated. One is that there is an ECS limitation that allows you to only associate one load balancer with an ECS service, and Jenkins runs as a single Java application that listens for web traffic on one port and for JNLP connections for build slaves on another port (defaults are 8080 and 50000, respectively). When launching a workload in ECS, using an Elastic Load Balancer for service discovery as I’m doing in this example, and provisioning using CloudFormation, you need to use a Classic Load Balancer that is listening on both Jenkins ports (listening on multiple ports is not currently possible with the recently revealed Application Load Balancer).

Another complication is that Jenkins stores its state in XML on disk, as opposed to some other CI tools that allow you to use an external database to store state (examples coming later in this blog series). This is why I chose to use EFS in this stack—when requiring persistent data in an ECS container, you must be able to sync Docker volumes between your ECS container instances because a container for your service can run on any container instance in the cluster. EFS provides a valuable solution to this issue by allowing you to mount an NFS file system that is shared amongst all the container instances in your cluster.

Coffee break!

Depending on how long you took to digest that fancy diagram and my explanation, feel free to grab a cup of coffee; the stack took about 7-8 minutes to complete successfully during my testing. When you see that beautiful CREATE_COMPLETE in the stack status, continue on.

Jenkins configuration

One of the CloudFormation stack outputs is PublicJenkinsURL; navigate to that URL in your browser and you should see the Jenkins home page (at least within a minute, once the instance is in service):


To make things easier, let’s click ENABLE AUTO REFRESH (in the upper-right) right off the bat.

Then click Manage Jenkins > Manage Plugins, navigate to the Available tab, and select these two plugins (you can filter the plugins by each name in the Filter text box):

  • Amazon EC2 Container Service Plugin
  • Git plugin
    • NOTE: there are a number of “Git” plugins, but you’ll want the one that’s just named “Git plugin

And click Download now and install after restart.

Select the Restart Jenkins when installation is complete and no jobs are running checkbox at the bottom, and Jenkins will restart after the plugins are downloaded.


When Jenkins comes back after restarting, go back to the Jenkins home screen, and navigate to Manage Jenkins > Configure System.

Scroll down to the Cloud section, click Add a new cloud > Amazon EC2 Container Service Cloud, and enter the following configuration (substituting the CloudFormation stack output where indicated):

  • Name: ecs
  • Amazon ECS Credential: – none –  (because we’re using the IAM role of the container instance instead)
  • Amazon ECS Region Name: us-east-1  (or the region you launched your stack in)
  • ECS Cluster: [CloudFormation stack output: JenkinsConfigurationECSCluster]
  • Click Advanced…
  • Alternative Jenkins URL: [CloudFormation stack output: JenkinsConfigurationAlternativeJenkinsURL]
  • Click ECS slave templates > Add
    • Label: jnlp-slave-with-java-build-tools
    • Docker Image: cloudbees/jnlp-slave-with-java-build-tools:latest
    • Filesystem root: /home/jenkins
    • Memory: 512
    • CPU units: 512

And click Save at the bottom.

That should take you back to the Jenkins home page again. Now click New Item, and enter an item name of aws-java-sample, select Freestyle project, and click OK.


Enter the following configuration:

  • Make sure Restrict where this project can be run is selected and set:
    • Label Expression: jnlp-slave-with-java-build-tools
  • Under Source Code Management, select Git and enter:


  • Under Build, click Add build step > Execute shell, and set:
    • Command: mvn package

Click Save.

That’s it for the Jenkins configuration. Now click Build Now on the left side of the screen.


Under Build History, you’re going to see a “pending – waiting for next available executor” message, which will switch to a progress bar when the ECS container starts.  When the progress bar appears (it might take a couple of minutes for the first build while ECS downloads the Docker build slave image, but after this it should only take a few seconds when the image is cached on your ECS container instance), click it and you’ll see the console output for the build:


And Success!

Okay, Maven is downloading a bunch of dependencies…and more dependencies…and more dependencies…and finally building…and see that “Finished: SUCCESS?” Congratulations, you just ran a build in an ECS Jenkins build slave container!

Next Steps

One thing that you may have noticed is that we used a Docker image provided by CloudBees (the enterprise backers of Jenkins). For your own projects, you might need to build and use a custom build slave Docker image. You’ll probably want to set up a pipeline for each of these Docker builds (and possibly publish to Amazon ECR), and configure an ECS slave template that uses this custom image. One caveat: Jenkins slaves need to have Java installed, which, depending on your build dependencies, may increase the size of your Docker image somewhat significantly (well, relatively so for a Docker image). For reference, check out the Dockerfile of a bare-bones Jenkins build slave provided by the Jenkins project on Docker Hub.

Next Next Steps

Pretty cool, right? Well, while it’s the most popular, Jenkins isn’t the only player in the game—stay tuned for a further exploration and comparison of containerized CI solutions on AWS in this blog series!

Interested in Docker, Jenkins, and/or working someplace where your artful use of monkey GIFs will finally be truly appreciated? Stelligent is hiring!

DevOps in AWS Radio: Orchestrating Docker containers with AWS ECS, ECR and CodePipeline (Episode 4)

In this episode, Paul Duvall and Brian Jakovich from Stelligent cover recent DevOps in AWS news and speak about the AWS EC2 Container Service (ECS), AWS EC2 Container Registry (ECR), HashiCorp Consul, AWS CodePipeline, and other tools in providing Docker-based solutions for customers. Here are the show notes:

DevOps in AWS News

Episode Topics

  1. Benefits of using ECS, ECR, Docker, etc.
  2. Components of ECS, ECR and Service Discovery
  3. Orchestrating and automating the deployment pipeline using CloudFormation, CodePipeline, Jenkins, etc. 

Blog Posts

  1. Automating ECS: Provisioning in CloudFormation (Part 1)
  2. Automating ECS: Orchestrating in CodePipeline and CloudFormation (Part 2)

About DevOps in AWS Radio

On DevOps in AWS Radio, we cover topics around applying DevOps principles and practices such as Continuous Delivery in the Amazon Web Services cloud. This is what we do at Stelligent for our customers. We’ll bring listeners into our roundtables and speak with engineers who’ve recently published on our blog and we’ll also be reaching out to the wider DevOps in AWS community to get their thoughts and insights.

The overall vision of this podcast is to describe how listeners can create a one-click (or “no click”) implementation of their software systems and infrastructure in the Amazon Web Services cloud so that teams can deliver software to users whenever there’s a business need to do so. The podcast will delve into the cultural, process, tooling, and organizational changes that can make this possible including:

  • Automation of
    • Networks (e.g. VPC)
    • Compute (EC2, Containers, Serverless, etc.)
    • Storage (e.g. S3, EBS, etc.)
    • Database and Data (RDS, DynamoDB, etc.)
  • Organizational and Team Structures and Practices
  • Team and Organization Communication and Collaboration
  • Cultural Indicators
  • Version control systems and processes
  • Deployment Pipelines
    • Orchestration of software delivery workflows
    • Execution of these workflows
  • Application/service Architectures – e.g. Microservices
  • Automation of Build and deployment processes
  • Automation of testing and other verification approaches, tools and systems
  • Automation of security practices and approaches
  • Continuous Feedback systems
  • Many other Topics…