Containerized CI Solutions in AWS – Part 1: Jenkins in ECS

In this first post of a series exploring containerized CI solutions, I’m going to be addressing the CI tool with the largest market share in the space: Jenkins. Whether you’re already running Jenkins in a more traditional virtualized or bare metal environment, or if you’re using another CI tool entirely, I hope to show you how and why you might want to run your CI environment using Jenkins in Docker, particularly on Amazon EC2 Container Service (ECS). If I’ve done my job right and all goes well, you should have run a successful Jenkins build on ECS well within a half hour from now!

For more background information on ECS and provisioning ECS resources using CloudFormation, please feel free to check out Stelligent’s two-part blog series on the topic.

An insanely quick background on Jenkins

Jenkins is an open source CI tool written in Java. One of its strengths is the very large collection of plugins available, including one for ECS. The Amazon EC2 Container Service Plugin can launch containers on your ECS cluster that automatically register themselves as Jenkins slaves, execute the appropriate Jenkins job on the container, and then automatically remove the container/build slave afterwards.

But, first…why?

But before diving into the demo, why would you want to run your CI builds in containers? First, containers are portable, which, especially when also utilizing Docker for your development environment, will give you a great deal of confidence that if your application builds in a Dockerized CI environment, it will build successfully locally and vice-versa. Next, even if you’re not using Docker for your development environment, a containerized CI environment will give you the benefit of an immutable build infrastructure where you can be sure that you’re building your application in a new ephemeral environment each time. And last but certainly not least, provisioning containers is very fast compared to virtual machines, which is something that you will notice immediately if you’re used to spinning up VMs/cloud instances for build slaves like with the Amazon EC2 Plugin.

As for running the Jenkins master on ECS, one benefit is fast recovery if the Jenkins EC2 instance goes down. When using EFS for Jenkins state storage and a multi-AZ ECS cluster like in this demo, the Jenkins master will recover very quickly in the event of an EC2 container instance failure or AZ outage.

Okay, let’s get down to business…

Let’s begin: first launch the provided CloudFormation stack by clicking the button below:


You’ll have to enter these parameters:

  • AvailabilityZone1: an AZ that your AWS account has access to
  • AvailabilityZone2: another accessible AZ in the same region as AvailabilityZone1
  • InstanceType: EC2 instance type for ECS container instances (must be at least t2.small for this demo)
  • KeyPair: a key pair that will allow you to SSH into the ECS container instances, if necessary
  • PublicAccessCIDR: a CIDR block that will have access to view the public Jenkins proxy and SSH into container instances (ex:
    • NOTE: Jenkins will not automatically be secured by a user and password, so this parameter can be used to secure your Jenkins master by limiting network access to the provided CIDR block.  If you’d like to limit access to Jenkins to only your public IP address, enter “[YOUR_PUBLIC_IP_ADDRESS]/32” here, or if you’d like to allow access to the world (and then possibly secure Jenkins yourself afterwards) enter ““.
It’s almost this easy, but just click Launch Stack once

Okay, the stack is launching—so what’s going on here?

template1-designer (1).png
Ohhh, now I get it

In a nutshell, this CloudFormation stack provisions a VPC containing a multi-AZ ECS cluster, and a Jenkins ECS service that uses Amazon Elastic File System (Amazon EFS) storage to persist Jenkins data. For ease of use, this CloudFormation stack also contains a basic NGINX reverse proxy that allows you to view Jenkins via a public endpoint. Both Jenkins and NGINX each consist of an ECS service, ECS task definition, and classic ELB (internal for Jenkins, and Internet-facing for the proxy).

In actuality, I think that a lot of organizations would choose to keep Jenkins internal in a private subnet and rely on a VPN for outside access to Jenkins. Instead, to keep things relatively simple, this stack only creates public subnets and relies on security groups for network access control.


There are a couple of reasons why running a Jenkins master on ECS is a bit complicated. One is that there is an ECS limitation that allows you to only associate one load balancer with an ECS service, and Jenkins runs as a single Java application that listens for web traffic on one port and for JNLP connections for build slaves on another port (defaults are 8080 and 50000, respectively). When launching a workload in ECS, using an Elastic Load Balancer for service discovery as I’m doing in this example, and provisioning using CloudFormation, you need to use a Classic Load Balancer that is listening on both Jenkins ports (listening on multiple ports is not currently possible with the recently revealed Application Load Balancer).

Another complication is that Jenkins stores its state in XML on disk, as opposed to some other CI tools that allow you to use an external database to store state (examples coming later in this blog series). This is why I chose to use EFS in this stack—when requiring persistent data in an ECS container, you must be able to sync Docker volumes between your ECS container instances because a container for your service can run on any container instance in the cluster. EFS provides a valuable solution to this issue by allowing you to mount an NFS file system that is shared amongst all the container instances in your cluster.

Coffee break!

Depending on how long you took to digest that fancy diagram and my explanation, feel free to grab a cup of coffee; the stack took about 7-8 minutes to complete successfully during my testing. When you see that beautiful CREATE_COMPLETE in the stack status, continue on.

Jenkins configuration

One of the CloudFormation stack outputs is PublicJenkinsURL; navigate to that URL in your browser and you should see the Jenkins home page (at least within a minute, once the instance is in service):


To make things easier, let’s click ENABLE AUTO REFRESH (in the upper-right) right off the bat.

Then click Manage Jenkins > Manage Plugins, navigate to the Available tab, and select these two plugins (you can filter the plugins by each name in the Filter text box):

  • Amazon EC2 Container Service Plugin
  • Git plugin
    • NOTE: there are a number of “Git” plugins, but you’ll want the one that’s just named “Git plugin

And click Download now and install after restart.

Select the Restart Jenkins when installation is complete and no jobs are running checkbox at the bottom, and Jenkins will restart after the plugins are downloaded.


When Jenkins comes back after restarting, go back to the Jenkins home screen, and navigate to Manage Jenkins > Configure System.

Scroll down to the Cloud section, click Add a new cloud > Amazon EC2 Container Service Cloud, and enter the following configuration (substituting the CloudFormation stack output where indicated):

  • Name: ecs
  • Amazon ECS Credential: – none –  (because we’re using the IAM role of the container instance instead)
  • Amazon ECS Region Name: us-east-1  (or the region you launched your stack in)
  • ECS Cluster: [CloudFormation stack output: JenkinsConfigurationECSCluster]
  • Click Advanced…
  • Alternative Jenkins URL: [CloudFormation stack output: JenkinsConfigurationAlternativeJenkinsURL]
  • Click ECS slave templates > Add
    • Label: jnlp-slave-with-java-build-tools
    • Docker Image: cloudbees/jnlp-slave-with-java-build-tools:latest
    • Filesystem root: /home/jenkins
    • Memory: 512
    • CPU units: 512

And click Save at the bottom.

That should take you back to the Jenkins home page again. Now click New Item, and enter an item name of aws-java-sample, select Freestyle project, and click OK.


Enter the following configuration:

  • Make sure Restrict where this project can be run is selected and set:
    • Label Expression: jnlp-slave-with-java-build-tools
  • Under Source Code Management, select Git and enter:


  • Under Build, click Add build step > Execute shell, and set:
    • Command: mvn package

Click Save.

That’s it for the Jenkins configuration. Now click Build Now on the left side of the screen.


Under Build History, you’re going to see a “pending – waiting for next available executor” message, which will switch to a progress bar when the ECS container starts.  When the progress bar appears (it might take a couple of minutes for the first build while ECS downloads the Docker build slave image, but after this it should only take a few seconds when the image is cached on your ECS container instance), click it and you’ll see the console output for the build:


And Success!

Okay, Maven is downloading a bunch of dependencies…and more dependencies…and more dependencies…and finally building…and see that “Finished: SUCCESS?” Congratulations, you just ran a build in an ECS Jenkins build slave container!

Next Steps

One thing that you may have noticed is that we used a Docker image provided by CloudBees (the enterprise backers of Jenkins). For your own projects, you might need to build and use a custom build slave Docker image. You’ll probably want to set up a pipeline for each of these Docker builds (and possibly publish to Amazon ECR), and configure an ECS slave template that uses this custom image. One caveat: Jenkins slaves need to have Java installed, which, depending on your build dependencies, may increase the size of your Docker image somewhat significantly (well, relatively so for a Docker image). For reference, check out the Dockerfile of a bare-bones Jenkins build slave provided by the Jenkins project on Docker Hub.

Next Next Steps

Pretty cool, right? Well, while it’s the most popular, Jenkins isn’t the only player in the game—stay tuned for a further exploration and comparison of containerized CI solutions on AWS in this blog series!

Interested in Docker, Jenkins, and/or working someplace where your artful use of monkey GIFs will finally be truly appreciated? Stelligent is hiring!

DevOps in AWS Radio: Orchestrating Docker containers with AWS ECS, ECR and CodePipeline (Episode 4)

In this episode, Paul Duvall and Brian Jakovich from Stelligent cover recent DevOps in AWS news and speak about the AWS EC2 Container Service (ECS), AWS EC2 Container Registry (ECR), HashiCorp Consul, AWS CodePipeline, and other tools in providing Docker-based solutions for customers. Here are the show notes:

DevOps in AWS News

Episode Topics

  1. Benefits of using ECS, ECR, Docker, etc.
  2. Components of ECS, ECR and Service Discovery
  3. Orchestrating and automating the deployment pipeline using CloudFormation, CodePipeline, Jenkins, etc. 

Blog Posts

  1. Automating ECS: Provisioning in CloudFormation (Part 1)
  2. Automating ECS: Orchestrating in CodePipeline and CloudFormation (Part 2)

About DevOps in AWS Radio

On DevOps in AWS Radio, we cover topics around applying DevOps principles and practices such as Continuous Delivery in the Amazon Web Services cloud. This is what we do at Stelligent for our customers. We’ll bring listeners into our roundtables and speak with engineers who’ve recently published on our blog and we’ll also be reaching out to the wider DevOps in AWS community to get their thoughts and insights.

The overall vision of this podcast is to describe how listeners can create a one-click (or “no click”) implementation of their software systems and infrastructure in the Amazon Web Services cloud so that teams can deliver software to users whenever there’s a business need to do so. The podcast will delve into the cultural, process, tooling, and organizational changes that can make this possible including:

  • Automation of
    • Networks (e.g. VPC)
    • Compute (EC2, Containers, Serverless, etc.)
    • Storage (e.g. S3, EBS, etc.)
    • Database and Data (RDS, DynamoDB, etc.)
  • Organizational and Team Structures and Practices
  • Team and Organization Communication and Collaboration
  • Cultural Indicators
  • Version control systems and processes
  • Deployment Pipelines
    • Orchestration of software delivery workflows
    • Execution of these workflows
  • Application/service Architectures – e.g. Microservices
  • Automation of Build and deployment processes
  • Automation of testing and other verification approaches, tools and systems
  • Automation of security practices and approaches
  • Continuous Feedback systems
  • Many other Topics…

Introduction to ServerSpec: What is ServerSpec and How do we use it at Stelligent? (Part1)

In this three-part series, I’ll explain ServerSpec infrastructure testing and how we use it at Stelligent, provide some concrete examples demonstrating how we use it at Stelligent, and discuss how Stelligent has extended Serverspec with custom AWS resources and matchers. We will also walk through writing some new matchers for an AWS resource. If you’re new to Serverspec, don’t worry, I’ll be covering the basics!

What is Serverspec?

Serverspec is an integration testing framework written on top of the Ruby RSpec dsl with custom resources and matchers that form expectations targeted at infrastructure. Serverspec tests verify the actual state of your infrastructure such as (bare-metal servers, virtual machines, cloud resources) and ask the question are they configured correctly? Test can be driven by many of the popular configuration management tools, like Puppet, Ansible, CFEngine and Itamae.

Serverspec allows for infrastructure code to be written using Test Driven Development (TDD), by expressing the state that the infrastructure must provide and then writing infrastructure code that implements those expectation. The biggest benefit is that with a suite of Serverspec expectations in place, developers can refactor infrastructure code with a high degree of confidence that a change does not produce any undesirable side-effects.

Anatomy of a server spec test

Since Serverspec is an extension on top of RSpec it shares the same DSL syntax and language features as RSpec. The commonly used features of the DSL are outlined below. For suggestions on writing better specs see the appropriately named

describe 'testGroupName' do
  it 'testName' do
    expect(resource).to matcher 'matcher_parameter'

Concrete example:

describe 'web site' do
  it 'responds on port 80' do
    expect(port 80).to be_listening 'tcp'

Describe blocks can be nested to provide a hierarchical description of behavior such as

describe 'DevopsEngineer' do
  describe 'that test infrastructure' do
    it 'can refactor easily' do

However, the context block is an alias for describe and provides a richer semantics. The above example can be re-written using context instead of nested describes.

describe 'DevopsEngineer' do
  context 'that test infrastructure' do
    it 'can refactor easily' do

There are two syntaxes that can be used for expressing expectations, should and expect. The should form was deprecated with RSpec 3.0, because it can produce unexpected results for some edge cases which are detailed in this Rspec blog post. Betterspec advises that new projects always use the expect syntax in expect vs should. The Serverspec website resources section advises that all of the examples use should syntax due to preference and that  Serverspec works well with expect. Even though the should syntax is more concise and arguably more readable, this post we will adhere to the expect syntax since the should syntax is deprecated and has potential for edge case failures. An example of both are provided because the reader is likely to see it both ways in practice.

Expect Syntax:

describe 'web site' do
  it 'responds on port 80' do
    expect(port 80).to be_listening

Should Syntax:

describe port(80) do
  it { should be_listening }

A Resource, typically represented as a Ruby class, is the first argument to the expect block and represents the unit under test about which the assertions must be true or false. Matchers form the basis of positive and negative expectations/assertions as predicates on Resource. They are implemented as custom classes or methods on the Resource and can take parameters so that matchers can be generalized and configurable. In the case of the port resource its matcher be_listening can be passed anyone of the following parameters tcp, udp, tcp6, or udp6 like this be_listening ‘tcp’

Positive and negative case are achieved  when matchers are combined with either expect(…).to matcher or expect(…).not_to. Serverspec provides custom resources and matchers for integration testing of infrastructure code and Stelligent has created some custom resources for AWS resource called serverspec-aws-resources.  The following shows the basic expectation form and a concrete example.

Basic form:

expect (resource).to matcher optional_arguments

Example using port resource:

expect(port 80).to be_listening 'tcp'

Test driven development with Serverspec

At Stelligent, we treat infrastructure as code and that code should be written using the same design principles as traditional application software. In that spirit we employ test driven development (TDD) when writing infrastructure code. TDD helps focus our attention on what the infrastructure code needs to do and sets a clear definition of done. It promotes a lean and agile development process where we only build enough infrastructure code to make the tests pass.

A great example of how Serverspec supports test driven development is when writing our chef cookbooks that are used to build AMIs on some projects at Stelligent. Locally a developer can utilize Serverspec, Test Kitchen, Vagrant and a virtualization provider to write the Chef cookbook. A great tutorial about getting setup with these tools and writing you first test is available in Learning Chef: A Guide to Configuration Management and Automation.

Using this toolchain allows the developer to adhere to the four pillars of test-driven infrastructure — writing, running, provisioning, feedback. These pillars are necessary to accomplish the red-green-refactor technique the basis of TDD.

First, we make it RED by write a failing Serverspec test and run kitchen verify. Test Kitchen then takes care of running the targeted platform on the virtualization provider, provisioning the platform and providing feedback. In this case we have not written the Chef cookbook to pass the Serverspec test so Test Kitchen will report a failure for the feedback.

Then, we make it GREEN by writing a cookbook that minimally satisfies the test requirements. After the node is converged with the Chef cookbook we can run kitchen verify and see that the test passes. We are now safe to refactor with the confidence that even a trivial change will not produce an unintended side-effect.

The same Serverspec tests that were used locally to test are then used during the build pipeline to verify the actual state of the infrastructure resources in AWS. On every commit the continuous delivery pipeline executes the Serverspec integration tests against running provisioned resources and fails the build stage if an error occurs. This automated testing provides short iteration cycles and high degree of confidence on each commit. This supports an agile methodology for infrastructure development and an evolutionary architecture process.

How we utilize Serverspec at Stelligent

Provisioning of AMIs

Provisioning AMIs in the continuous delivery pipeline is done with a combination of Packer and Chef cookbooks. Packer is a tool that is used in our continuous delivery to build AWS machine images using a declarative json document that describes the provisioning. The Chef provisioner is declared in the Packer config file along with the Chef runlist. Building the AMI is typically done in three stages build-test-promote. During the build stage Packer is run which launches the AMI that serves as the base AMI and then executes each of the provisioners in the Packer config file. In the case of the Chef provisioner it converges the running EC2 node. When Packer successfully completes the AMI is stored in AWS and the ID is passed to the test stage.

The test stage launches a new EC2 instance with the AMI ID from the build step and  executes the Serverspec test against the running EC2 instance. If the tests pass then the EC2 instance is promoted and stored in AWS, available for other pipelines to utilize as an artifact. In order to promote code reuse and speed up AMI pipelines we have broken down AMI creation into multiple pipelines that depend on the results of another AMI pipeline. It look something like this  base-ami > hardened-ami > application-ami. Each of these AMI pipelines have there own Serverspec test that ensure the AMI’s requirements are met.  We do the same thing for our Jenkins servers, with additional test to verify that the instances jobs are configured correctly.

Documenting existing servers

We document existing infrastructure before moving it to AWS. A pattern that we use is to elicit a client specification of the existing infrastructure and then translate it to an executable Serverspec definition. This allows us to test the spec against the existing infrastructure to see if there are any errors in the client provided specification.

For example, we may be told they have a Centos OS instance running a Tomcat web server to serve their web application. The specification is translated into a suite of Serverspec test and executed against the existing and results in a failure for a test that Tomcat is installed. After debugging the instance we find that it is actually running JBoss application server. We then update the test to match reality and execute it again. After this explorative process is done we have an executable specification that we can use to test the existing infrastructure, but also use as a guide when building our new infrastructure in AWS.

In conclusion

Serverspec is an invaluable tool in your toolkit to support Test Driven Infrastructure . It’s built on top of the widely adopted Ruby RSpec DSL. It can be executed both locally to support development, and as part of your continuous delivery pipeline. It allows you to take a test driven development approach to writing infrastructure code. The resulting suite of tests form a specification, that when automated as part of CD pipeline provide a high degree confidence in the infrastructure code.

Deploying Kubernetes on CoreOS to AWS

Linux containers are a relatively new abstraction framework with exciting implications for Continuous Integration and Continuous Delivery patterns. They allow appropriately designed applications to be tested, validated, and deployed in an immutable fashion at much greater speed than with traditional virtual machines. When it comes to production use however, an orchestration framework is desirable to maintain a minimum number of container workers, load balance between them, schedule jobs and the like. An extremely popular method of doing this is to use AWS EC2 Container Service (ECS) with the Amazon Linux distribution, however if you find yourself making the “how do we run containers” decision then it pays to explore other technology stacks as well.

In this demo, we’ll launch a Kubernetes cluster of CoreOS on AWS. Kubernetes is a container orchestration platform that utilizes Docker to provision, run, and monitor containers on Linux. It is developed primarily by Google, and is similar to container orchestration projects they run internally. CoreOS is a lightweight Linux distribution optimized for container hosting. Related projects are Docker Inc’s “Docker Datacenter,” RedHat’s Atomic, and RancherOS.

1) Download the kubernetes package. Releases are at . This demo assumes version 1.1.8.

2) Download the coreos-kubernetes package. Releases are at . This demo assumes version 0.4.1.

3) Extract both. core-kubernetes provides a kube-aws binary used for provisioning a kube cluster in AWS (using CloudFormation), while the kubernetes package is used for its kubectl binary

tar xzf kubernetes.tar.gz
tar xzf kube-aws-PLATFORM-ARCH.tar.gz (kube-aws-darwin-amd64.tar.gz for Macs, for instance)

4) Setup your AWS credentials and profiles

5) Generate a KMS key to use for the cluster (change region from us-west-1 if desired, but you will need to change it everywhere). Make a note of the ARN of the generated key, it will be used in the cluster.yaml later

aws kms --region=us-west-1 create-key --description="kube-aws assets"

5) Get a sample cluster.yaml . This is a configuration file later used for generating the AWS CloudFormation scripts and associated resources used to launch the cluster.

mkdir my-cluster; cd my-cluster
~/darwin-amd64/kube-aws init --cluster-name=YOURCLUSTERNAME \
 --external-dns-name=FQDNFORCLUSTER --region=us-west-1 \
 --availability-zone=us-west-1c --key-name=VALIDEC2KEYNAME \

6) Modify the cluster.yaml with appropriate settings. “externalDNSName” wants a FQDN that will either be configured automatically if you provide a Route53 zone id for “hostedZoneId” or that you will configure AFTER provisioning has completed. This becomes the kube controller endpoint used by the Kubernetes control tooling.

Note that a new VPC is created for the Kubernetes cluster unless you configure it to use an existing VPC. You specify a region in the cluster.yaml, and if you don’t specify an Availability Zone then the “A” AZ will be used by default.

7) Render the CFN templates, validate, then launch the cluster

~/darwin-amd64/kube-aws render
~/darwin-amd64/kube-aws validate
~/darwin-amd64/kube-aws up


This will setup a short-term Certificate Authority (365 days) and SSL certs (90 days) for communication and then launch a cluster into CloudFormation. It will also store data about the cluster for use with kubectl

8) After the cluster has come up, an EIP will be output. Assign this EIP to the FQDN you used for externalDNSName in cluster.yaml if you did not allow kube-aws to configure this automatically via Route53. This is important, as it’s how the tools will try to control the cluster.

9) You can then start playing with the cluster. My sample session:

# Display active Kubernetes nodes
~/kubernetes/platforms/darwin/amd64/kubectl --kubeconfig=kubeconfig get nodes
NAME STATUS AGE Ready,SchedulingDisabled 19m Ready 19m Ready 19m
# Display name and EIP of the cluster
~/darwin-amd64/kube-aws status
Controller IP: a.b.c.d
# Launch the "nginx" Docker image as container instance "my-nginx"
# 2 replicas, wire port 80
~/kubernetes/platforms/darwin/amd64/kubectl --kubeconfig=kubeconfig run my-nginx --image=nginx --replicas=2 --port=80
deployment "my-nginx" created
# Show process list
~/kubernetes/platforms/darwin/amd64/kubectl --kubeconfig=kubeconfig get po
my-nginx-2494149703-2dhrr 1/1 Running 0 2m
my-nginx-2494149703-joqb5 1/1 Running 0 2m
# Expose port 80 on the my-nginx instances via an Elastic Load Balancer
~/kubernetes/platforms/darwin/amd64/kubectl --kubeconfig=kubeconfig expose deployment my-nginx --port=80 --type=LoadBalancer
service "my-nginx" exposed
# Show result for the service
~/kubernetes/platforms/darwin/amd64/kubectl --kubeconfig=kubeconfig get svc my-nginx -o wide
my-nginx 10.x.0.y 80/TCP 3m run=my-nginx
# Describe the my-nginx service. This will show the CNAME of the ELB that
# was created and which exposes port 80
~/kubernetes/platforms/darwin/amd64/kubectl --kubeconfig=kubeconfig describe service my-nginx
Name: my-nginx
Namespace: default
Labels: run=my-nginx
Selector: run=my-nginx
Type: LoadBalancer
IP: 10.x.0.y
LoadBalancer Ingress:
Port: <unset> 80/TCP
NodePort: <unset> 31414/TCP
Endpoints: 10.a.b.c:80,10.d.e.f:80
Session Affinity: None
 FirstSeen LastSeen Count From SubobjectPath Type Reason Message
 --------- -------- ----- ---- ------------- -------- ------ -------
 4m 4m 1 {service-controller } Normal CreatingLoadBalancer Creating load balancer
 4m 4m 1 {service-controller } Normal CreatedLoadBalancer Created load balancer

Thus we have created a three node Kubernetes cluster (one controller, two workers) running two copies of the nginx container. We then setup an ELB to balance traffic between the instances.

Kubernetes certainly has operational complexity to trade off against features and robustness. There are a lot of moving parts to maintain, and at Stelligent we tend to recommend AWS ECS for situations in which it will suffice.


Stelligent is hiring! Do you enjoy working on complex problems like figuring out ways to automate all the things as part of a deployment pipeline? Do you believe in the “everything-as-code” mantra? If your skills and interests lie at the intersection of DevOps automation and the AWS cloud, check out the careers page on our website.

Stelligent Bookclub: “Building Microservices” by Sam Newman

At Stelligent, we put a strong focus on education and so I wanted to share some books that have been popular within our team. Today we explore the world of microservices with “Building Microservices” by Sam Newman.

Microservices are an approach to distributed systems that promotes the use of small independent services within a software solution. By adopting microservices, teams can achieve better scaling and gain autonomy, that allows teams to chose their technologies and iterate independently from other teams.

As a result, a change to one part of the system could unintentionally break a different part, which in turn might lead to hard-to-predict outages

Microservices are an alternative to the development of a monolithic codebase in many organizations – a codebase that contains your entire application and where new code piles on at alarming rates. Monoliths become difficult to work with as interdependencies within the code begin to develop.

As a result, a change to one part of the system could unintentionally break a different part, which in turn might lead to hard-to-predict outages. This is where Newman’s argument about the benefits of microservices really comes into play.

  • Reasons to split the monolith
    • Increase pace of change
    • Security
    • Smaller team structure
    • Adopt the proper technology for a problem
    • Remove tangled dependencies
    • Remove dependency on databases for integration
    • Less technical debt

By splitting monoliths at their seams, we can slowly transform a monolithic codebase into a group of microservices. Each service his loosely coupled and highly cohesive, as a result changes within a microservice do not change it’s function to other parts of the system. Each element works in a blackbox where only the inputs and outputs matter. When splitting a monolith, databases pose some of the greatest challenge; as a result, Newman devotes a significant chunk of the text/book to explaining various useful techniques to reduce these dependencies.

Ways to reduce dependencies

  • Clear well documented api
  • Loose coupling and high cohesion within a microservice
  • Enforce standards on how services can interact with each other

Though Newman’s argument for the adoption of microservices is spot-on, his explanation on continuous delivery and scaling micro-services is shallow. For anyone who has a background in CD or has read “Continuous Delivery” these sections do not deliver. For example, he takes the time to talk about machine images at great length but lightly brushes over build pipelines. The issue I ran into with scaling microservices is Newman suggests that ideally each microservice should ideally be put on its own instance where it exists independently of all other services. Though this is a possibility and it would be nice to have this would be highly unlikely to happen in a production environment where cost is a consideration. Though he does talk about using traditional virtualization, Vagrant, linux containers, and Docker to host multiple services on a single host he remains platform agnostic and general. As a result he misses out on the opportunity to talk about services like Amazon ECS, Kubernetes, or Docker Swarm. Combining these technologies with reserved cloud capacity would be a real world example that I feel would have added a lot to this section

Overall Newman’s presentation of microservices is a comprehensive introduction for IT professionals. Some of the concepts covered are basic but there are many nuggets of insight that are worth reading for. If you are looking to get a good idea about how microservices work, pick it up. If you’re looking to advance your microservice patterns or suggest some, feel free to comment below!

Interested in working someplace that gives all employees an impressive book expense budget? We’re hiring.

Automating Habitat with AWS CodePipeline

This article outlines a proof-of-concept (POC) for automating Habitat operations from AWS CodePipeline. Habitat is Chef’s new application automation platform that provides a packaging system that results in apps that are “immutable and atomically deployed, with self-organizing peer relationships.”  Habitat is an innovative technology for packaging applications, but a Continuous Delivery pipeline is still required to automate deployments.  For this exercise I’ve opted to build a lightweight pipeline using CodePipeline and Lambda.

An in-depth analysis of how to use Habitat is beyond the scope for this post, but you can get a good introduction by following their tutorial. This POC essentially builds a CD pipeline to automate the steps described in the tutorial, and builds the same demo app (mytutorialapp). It covers the “pre-artifact” stages of the pipeline (Source, Commit, Acceptance), but keep an eye out for a future post which will flesh out the rest.

Also be sure to read the article “Continuous deployment with Habitat” which provides a good overview of how the developers of Habitat intend it to be used in a pipeline, including links to some repos to help implement that vision using Chef Automate.

Technology Overview


The application we’re automating is called mytutorialapp. It is a simple “hello world” web app that runs on nginx. The application code can be found in the hab-demo repository.


The pipeline is provisioned by a CloudFormation stack and implemented with CodePipeline. The pipeline uses a Lambda function as an Action executor. This Lambda function delegates command execution to  an EC2 instance via an SSM Run Command: aws:runShellScript. The pipeline code can be found in the hab-demo-pipeline repository. Here is a simplified diagram of the execution mechanics:



The CloudFormation stack that provisions the pipeline also creates several supporting resources.  Check out the pipeline.json template for details, but here is a screenshot to show what’s included:


Pipeline Stages

Here’s an overview of the pipeline structure. For the purpose of this article I’ve only implemented the Source, Commit, and Acceptance stages. This portion of the pipeline will get the source code from a git repo, build a Habitat package, build a Docker test environment, deploy the Habitat package to the test environment, run tests on it and then publish it to the Habitat Depot. All downstream pipeline stages can then source the package from the Depot.

  • Source
    • Clone the app repo
  • Commit
    • Stage-SourceCode
    • Initialize-Habitat
    • Test-StaticAnalysis
    • Build-HabitatPackage
  • Acceptance
    • Create-TestEnvironment
    • Test-HabitatPackage
    • Publish-HabitatPackage

Action Details

Here are the details for the various pipeline actions. These action implementations are defined in a “pipeline-runner” Lambda function and invoked by CodePipeline. Upon invocation, the scripts are executed on an EC2 box that gets provisioned at the same time as the code pipeline.

Commit Stage


Pulls down the source code artifact from S3 and unzips it.


Sets Habitat environment variables and generates/uploads a key to access my Origin on the Habitat Depot.


Runs static analysis on using bash -n.


Builds the Habitat package

Acceptance Stage


Creates a Docker test environment by running a Habitat package export command inside the Habitat Studio.


Runs a Bats test suite which verifies that the webserver is running and the “hello world” page is displayed.


Uploads the Habitat package to the Depot. In a later pipeline stage, a package deployment can be sourced directly from the Depot.

Wrapping up

This post provided an early look at a mechanism for automating Habitat deployments from AWS CodePipeline. There is still a lot of work to be done on this POC project so keep an eye out for later posts that describe the mechanics of the rest of the pipeline.

Do you love Chef and Habitat? Do you love AWS? Do you love automating software development workflows to create CI/CD pipelines? If you answered “Yes!” to any of these questions then you should come work at Stelligent. Check out our Careers page to learn more.


DevOps in AWS Radio: Automating Compliance with AWS Config and Lambda (Episode 3)

In this episode, Paul Duvall and Brian Jakovich from Stelligent cover recent DevOps in AWS news and speak about Automating Compliance using AWS Config, Config Rules and AWS Lambda. Here are the show notes:

About DevOps in AWS Radio

On DevOps in AWS Radio, we cover topics around applying DevOps principles and practices such as Continuous Delivery in the Amazon Web Services cloud. This is what we do at Stelligent for our customers. We’ll bring listeners into our roundtables and speak with engineers who’ve recently published on our blog and we’ll also be reaching out to the wider DevOps in AWS community to get their thoughts and insights.

The overall vision of this podcast is to describe how listeners can create a one-click (or “no click”) implementation of their software systems and infrastructure in the Amazon Web Services cloud so that teams can deliver software to users whenever there’s a business need to do so. The podcast will delve into the cultural, process, tooling, and organizational changes that can make this possible including:

  • Automation of
    • Networks (e.g. VPC)
    • Compute (EC2, Containers, Serverless, etc.)
    • Storage (e.g. S3, EBS, etc.)
    • Database and Data (RDS, DynamoDB, etc.)
  • Organizational and Team Structures and Practices
  • Team and Organization Communication and Collaboration
  • Cultural Indicators
  • Version control systems and processes
  • Deployment Pipelines
    • Orchestration of software delivery workflows
    • Execution of these workflows
  • Application/service Architectures – e.g. Microservices
  • Automation of Build and deployment processes
  • Automation of testing and other verification approaches, tools and systems
  • Automation of security practices and approaches
  • Continuous Feedback systems
  • Many other Topics…

ChefConf 2016

Highlights and Announcements

ChefConf 2016 was held in Austin, TX last week, and as usual, Stelligent sent a few engineers to take advantage of the learning and networking opportunities. This year’s conference was a 4 day event with a wide variety of workshops, lectures, keynote speeches, and social events. Here is a summary of the highlights and announcements.

Chef Automate

Chef Automate was one of the big announcements at ChefConf 2016. It is billed as “One platform that delivers DevOps workflow, automated compliance, and end-to-end pipeline visibility.” It brings together Chef for infrastructure automation, InSpec for compliance automation, and Habitat for application automation, and delivers a full-stack deployment pipeline along with customizable dashboards for comprehensive visibility. Chef Compliance and Chef Delivery have been phased out as stand-alone products and replaced by Chef Automate as the company’s flagship commercial offering.



Habitat was actually announced back in June, but it was a big focus of ChefConf 2016. There were five info sessions and a Habitat Zone for networking with and learning from other community members. Habitat is an open source project that focuses on application automation and provides a packaging system that results in apps that are “immutable and atomically deployed, with self-organizing peer relationships.” Here are the key features listed on the project website:

  • Habitat is unapologetically app-centric. It’s designed with best practices for the modern application in mind.
  • Habitat gives you a packaging format and a runtime supervisor with deployment coordination and service discovery built in.
  • Habitat packages contain everything the app needs to run with no outside dependencies. They are isolated, immutable, and auditable.
  • The Habitat supervisor knows the packaged app’s peer relationships, upgrade strategy, and policies for restart and security. The supervisor is also responsible for runtime configuration and connecting to management services, such as monitoring.

Habitat packages have the following attributes:


Chef Certification

A new Chef Certification program was also announced at the conference. It is a badge-based program where passing an exam for a particular competency earns you a badge. A certification is achieved by earning all the required badges for that learning track. The program is in an early adopter phase and not all badges are available yet. Here’s what those tracks look like right now:

Chef Certified Developer



Chef Certified Windows Developer



Chef Certified Architect



Join Us

Do you love Chef? Do you love AWS? Do you love automating software development workflows to create CI/CD pipelines? If you answered “Yes!” to any of these questions then you should come work at Stelligent. Check out our Careers page to learn more.

Cross-Account Access Control with Amazon STS for DynamoDB

In this post, we’ll be talking about creating cross-account access for DynamoDB. DynamoDB is a NoSQL Database in the cloud provided by Amazon Web Services.

Whether you’re creating a production deployment pipeline that leverages a shared Keystore or deploying an application in multiple accounts with shared resources, you may find yourself wondering how to provide access to your AWS resources from multiple AWS accounts.

Keystore is an open source pipeline secret management tool from Stelligent that is backed by DynamoDB and encrypted with Amazon’s Key Management System. Check it out on Github.

Although we will focus on DynamoDB, the concepts discussed in this post are not necessarily limited to DynamoDB and have many other uses for a variety of AWS services where multi-account access control is desired.

DynamoDB does not provide any built-in access control; however, it does provide an interface to fine-grained access control for users. If you’re looking to provide access to DynamoDB from a web app, mobile app or federated user, check out the documentation in AWS to get started with AWS’ Identity and Access Management (IAM).

This post will focus on leveraging IAM Roles and AWS’ Security Token Service (STS) to provide the more advanced access control to our DynamoDB tables.

In our use-case, we needed to provide a second account access to our DynamoDB tables for a Keystore implementation. The goal was to provide this second account access to our secrets without duplicating the data or the storage costs.

The plan is to leverage the features of IAM and STS to provide the access control, this works by creating two roles:

  • The role created on Account A will provide access to DynamoDB and KMS, and allow Account B to assume it.
  • The role created on Account B will provide access to STS’ AssumeRole action against our role in Account A. Any host or user with this role will be able to acquire temporary API credentials from STS for Account A.

For more information on how this works under the hood, check out the AWS Documentation on Cross-Account Access Delegation.

As a security best practice, you’ll want to ensure the access provided is as specific as possible. You should limit access to specific actions, DynamoDB tables and keys in KMS.

When creating resources in your account, it’s always a good idea to use a configuration management tool and for our examples we will be using CloudFormation to configure and provision the IAM and STS resources.

Step 1: Create a Role in Account A

  • Allow STS to assume it from Account B
  • Attach a policy to allow access to DynamoDB and KMS


Click here to review the CloudFormation template in the button above.

Step 2: Create a Role in Account B

  • Allow STS AssumeRole from Amazon EC2
  • Allow access to only your Account A assumable ARN
    • You’ll need the ARN from Step 1


Click here to review the CloudFormation template in the button above.

Step 3: Try it out!

  • You can use the AWS CLI to retrieve a temporary set of credentials from STS to use for access to Account A’s DynamoDB!

Our very own Jeff Bachtel has adapted a snippet for acquiring and implementing temporary STS credentials into your shell. Here it is:

#!/bin/bash -e
# Adapted from
# Clear out existing AWS session environment, or the awscli call will fail

NAME="${3:-$LOGNAME@`hostname -s`}"

# KST=access*K*ey, *S*ecretkey, session*T*oken
KST=(`aws sts assume-role --role-arn "${ROLE_ARN}" \
                          --role-session-name "${NAME}" \
                          --duration-seconds ${DURATION} \
                          --query '[Credentials.AccessKeyId,Credentials.SecretAccessKey,Credentials.SessionToken]' \
                          --output text`)

echo 'export AWS_DEFAULT_REGION=${AWS_DEFAULT_REGION:-us-east-1}'
echo "export AWS_ACCESS_KEY_ID='${KST[0]}'"
echo "export AWS_SECRET_ACCESS_KEY='${KST[1]}'"
echo "export AWS_SESSION_TOKEN='${KST[2]}'"      # older var seems to work the same way
echo "export AWS_SECURITY_TOKEN='${KST[2]}'"

From an EC2 instance launched with the role created in Step 2, we can use this script to test our cross-account access.

$ eval $(./ arn:aws:iam::123456789:role/AccountARole)

When you’re ready to go back to your own Instance Profile credentials, you can unset the temporary token:


Wrapping Up

As you can see, using the power of IAM and STS to bridge the gap between two accounts to share resources is quite easy and secure. There’s tons of possibilities here that go beyond that of DynamoDB and KMS to allow you to reduce costs and technical debt.

Automate CodePipeline Manual Approvals in CloudFormation

pipeline_manual_approvals_onestage.jpgRecently, AWS announced that it added manual approval actions to AWS CodePipeline. In doing so, you can now model your entire software delivery process – whether it’s entirely manual or a hybrid of automated and manual approval actions.


In this post, I describe how you can add manual approvals to an existing pipeline – manually or via CloudFormation – to minimize your CodePipeline costs.


The AWS CodePipeline pricing model is structured to incentivize two things:

  • Frequent Code Commits
  • Long-lived Pipelines

This is because AWS charges $1 per active pipeline per month. Therefore, if you were to treat these pipelines as ephemeral, you’d likely be paying more than you might be otherwise consuming. While in experimentation mode, you might be regularly launching and terminating pipelines as you determine the appropriate stages and actions for an application/service, once you’ve established this pipeline, the change lifecycle is likely to be much less.

Since CodePipeline uses compute resources, AWS had to make a decision on whether they incentivize frequent code commits or treat pipelines ephemerally – as they do with other resources like EC2. If they’d chosen to charge by the frequency activity then it could result in paying more when committing more code – which would be a very bad thing since you want developers to be committing code many times a day.


While we tend to prefer an immutable approach in most things when it comes to the infrastructure, the fact is that different parts of your system will change at varying frequencies. This is the case with your pipelines. Once your pipelines have been established, typically, you might make add, edit, or remove some stages and actions but probably not every day.

Our “workaround” is to use CloudFormation’s update capability to modify our pipeline’s stages and actions without incurring the additional $1 that we’d get charged if we were to launch a new active pipeline.

The best way to apply these changes is to make the minimum required changes in the template so that errors are prevalent if they do occur.

Manual Approvals

There are many reasons your software delivery workflow might require manual approvals including exploratory testing, visual inspection, change advisory boards, code reviews, etc.

Some other reasons for manual approvals include canary and blue/green deployments – where you might make final deployment decisions once some user or deployment testing is complete.

With manual approvals in CodePipeline, you can now make the approval process a part of a fully automated software delivery process.

Create and Connect to a CodeCommit Repository

Follow these instructions for creating and connecting to an AWS CodeCommit repository: Create and Connect to an AWS CodeCommit Repository. Take note of the repository name as you’ll be using it as a CloudFormation user parameter later. The default that I use the lab is called codecommit-demo but you can modify this CloudFormation parameter.

Launch a Pipeline

Click the button below to launch a CloudFormation stack that provisions AWS CodePipeline with some default Lambda Invoke actions.

Once the CloudFormation has launched successfully, click on the link next to the PipelineUrl Output from your CloudFormation stack. This launches your pipeline. You should see a pipeline similar to the one in the figure below.


Update a Pipeline

To update your pipeline, click on the Edit button at the top of the pipeline in CodePipeline. Then, click the (+) Stage link in between the Staging the Production stage. Enter the name ExploratoryTesting for the stage name, then click the (+) Action link. The add action window displays. Choose the new Approval Action category from the drop down and enter the other required and optional fields, as appropriate. Finally, click the Add action button.


Once you’ve done this, click on the Release change button. Once it goes through the pipeline stages and actions, it transitions to the Exploratory Testing stage where your pipeline should look similar to the figure below.


At this time, if your SNS Topic registered with the pipeline is linked to an email address, you’ll receive an email message that looks similar to the one below.


As you can see, you can click on the link to be brought to the same pipeline where you can approve or reject the “stage”.

Applying Changes in CloudFormation

You can apply the same updates to CodePipeline that you had previously manually performed in code using CloudFormation update-stack. We recommend you minimize the incremental number of changes you apply using CloudFormation so that they are specific to CodePipeline changes. This is because limiting your change sets often results in limiting the amount of time you spend troubleshooting any problems.

Once you’ve manually added the new manual approval stage and action, you can use your AWS CLI to get the JSON configuration that you can use in your CloudFormation update template. To do this, run the following command substituting {YOURPIPELINENAME} with the name of your pipeline.

aws codepipeline get-pipeline --name {YOURPIPELINENAME} >pipeline.json

You’ll also notice that this command pipes the output to a file that you can use as a means of copying and formatting as part of stage and action configuration in CodePipeline. For example, the difference between the initial pipeline and the updated pipeline is shown in the JSON configuration below.



                  "CustomData":"Approval or Reject this change after running Exploratory Tests"

You can take this code and add it to a new CloudFormation template so that it’s between the Staging and Production stages. Once you’ve done this, go back to your command line and run the update-stack command from your AWS CLI. An example is shown below. You’ll replace the {CFNSTACKNAME} with your stack name. If you want to make additional changes to the new stack, you can download the CloudFormation template and update it to an S3 location you control.

aws cloudformation update-stack --stack-name {CFNSTACKNAME} --template-url --region us-east-1 --capabilities="CAPABILITY_IAM" --parameters ParameterKey=RepositoryBranch,UsePreviousValue=true ParameterKey=RepositoryName,UsePreviousValue=true ParameterKey=S3BucketLambdaFunction,UsePreviousValue=true ParameterKey=SNSTopic,UsePreviousValue=true

By running this command against the initial stack, you’ll see the same updates that you’d manually defined previously. The difference is that it’s defined in code which means you can version, test and deploy changes.

An alternative approach is to manually apply the changes using Update Stack through from your CloudFormation stack. You’ll enter the new CloudFormation template as an input and CloudFormation will determine which changes it will apply to your infrastructure. You see a screenshot of the change that CloudFormation will apply below.



By incorporating manual approvals into your software delivery process, you can fully automate its workflow. You learned how you can apply changes to your pipeline using CloudFormation as a way of minimizing your costs while providing a repeatable, reliable update process through code.

Sample Code

The code for the examples demonstrated in this post are located at Let us know if you have any comments or questions @stelligent or @paulduvall.

Stelligent is hiring! Do you enjoy working on complex problems like figuring out ways to automate all the things as part of a deployment pipeline? Do you believe in the “everything-as-code” mantra? If your skills and interests lie at the intersection of DevOps automation and the AWS cloud, check out the careers page on our website.



My colleagues at Stelligent including Eric Kascic and Casey Lee provided some use cases for manual approvals.