Blog

Big Data at AWS re:Invent 2016

AWS re:Invent 2016 has kicked off for me in the realm of Big Data. It’s a challenging topic and one of great interest to companies around the globe so it was a no-brainer to be hanging around with folks at The Mirage for the Big Data talks. This blog post will be a quick write up on some interesting topics, announcements and features of the various tools covered today.

Big Data in AWS

The Big Data Mini Con had no announcements for new services. However, Amazon’s ecosystem for Big Data tools is growing rapidly and we got a sweet introduction to what is currently available, here’s some of the more interesting ones:

  • Import/Export Snowball – A nearly indestructible petabyte scale means of importing or exporting data into or out of Amazon S3.
  • Kinesis – AWS flavor of data stream and real-time analytics processing.
  • Redshift – A petabyte scale data warehouse solution as a service.
  • EMR – Apache Ecosystem / Hadoop as a service.
  • Data Pipeline – Data orchestration service for inter-AWS and on-premise workflows.
  • S3 – Durable, infinitely scalable, distributed object storage in the cloud.
  • Direct Connect – Up to 10GB direct connection from your VPC to your on-premise network.
  • Machine Learning – A real-time predictive modeling service.
  • Quick Sight – A business intelligence, data visualization and analytics tool.

Announcement: Data Transformation with Lambda on Kinesis Streams

A new feature that is coming to Kinesis — the ability for you to transform your streaming data with Lambda. The idea is to have a lambda function transform your data as it comes in instead of relying on an application running in EC2 processing the stream. It’s another very effective way of controlling costs and reducing the overhead of dealing with scaling yourself. In order to kickstart acceptance, Amazon intends to provide a library of templates for common transformation use-cases.

Amazon EMR

Recently, autoscaling in Amazon ElasticMapReduce became generally available. You can now configure your EMR clusters to scale based on metrics in AWS CloudWatch. It’s no dummy either, not only will it perform scaling operations on your actually processing throughput but it will also optimize your instance time.

  • The amount of metrics available in CloudWatch for EMR clusters is staggering and it’s this level of integration that makes autoscaling super intelligent. Instead of relying on abstract information about CPU and Memory — which can be hit or miss based on your work loads — you can configure scaling events to happen on REAL throughput metrics such as MapSlotsOpen, ReduceSlotsOpen or AppsPending based on which tools you’re running.
  • Instance time optimization is built into your EMR cluster’s autoscaling. It will automatically give you full-utilization of your instance before terminating due to a scale-down event. So when you scale-up and purchase an hour of EC2 capacity, you will get the entire hour of extra horsepower before it scales down. This way you get all of the capacity you paid for versus paying for the full hour and only utilizing a few minutes of it.

Announcement: Advanced Spot Provisioning & Spot Block support

A feature coming soon to EMR is Advanced Spot Provisioning, an extension to the spot instance reservations specifically tailored for distributed systems in AWS. This new feature will allow you to configure spot instance reservations for a list of instance types. You will be able to have a range of instance sizes running in your cluster and have spot instances requested differently for both your core node fleet and your task node fleet. The provisioning tool will select the most optimal instance and availability zone based on the capacity and price you have configured.

In addition to the optimizations of spot provisioning I mentioned above, EMR will also take advantage of Spot Instance Blocks. With traditional spot instances, you can reserve instances with great discounts at the risk of losing that capacity when normal demand increases. With Spot Instance Blocks, you can block off 1 to 6 hours of spot instance capacity. Spot Instance Blocks are priced differently than Spot Instances, but can be a big source of cost reduction in your data processing architecture for larger workloads.

Compute & Storage Decoupling

The final concept with EMR that was really driven home today was the decoupling of your compute and storage resources. In a traditional setup you typically have storage and compute bound together — meaning when you need more storage you end up with more compute and vis-a-vis.

With the latest iteration of Amazon EMR (5.2.0 recently released) HBase is now able to be fully integrated with EMRFS, which uses Amazon S3 for storage. By moving your storage into an S3 backed solution, you no longer have to scale your cluster for storage demands, get effectively infinite scalability, and you take advantage of S3’s eleven 9’s of durability. Traditional HDFS is still installed onto EMR so you can take advantage of a distributed local data store as needed.

Amazon Redshift

Amazon Redshift is an exceptional tool for Data Warehousing and it’s one of my favorite services offered by AWS. If you haven’t already, take some time to dive deep into the documentation and understand the complexities behind maximizing your architecture. This session was an excellent starter into concepts like data processing, distribution keys, sort keys, and compression.

Keys

Optimizing your queries in Redshift, like any other database, is going to be based around your keys. Since this is a lengthy topic, I’ll give you a quick overview of the important keys in Redshift. I recommend watching this session online or reviewing the documentation to fully understand how to architect them.

  • Distribution keys are used to physically store rows together and collate. Using the distribution key in all your JOINs is a best practice for performance — even though it may seem redundant.
  • Sort Keys are columns you specify for Redshift to optimize queries with. Redshift can skip entire blocks of data by referencing header data internally thus dramatically improving query performance.
  • Interleaved Sort Keys are designed for very large data sets and provide more performance as the table size increases. You can create interleaved sort keys with up to eight columns.

AWS Schema Conversion Tool

The schema conversion tool can be pointed to an existing database to copy its schema into a Redshift cluster and recommend schema changes for compatibility. Currently, it works with Oracle, Netezza, Greenplum, Teradata and more recently Redshift itself.

The schema conversion tool now supports Redshift -to- Redshift conversion as a means to optimizing your existing data structure. By analyzing your existing redshift cluster, the tool can provide recommendations for distribution keys and sort keys. Since the optimization service only provides recommendations based on existing usage and a very flat view of your data, you’ll want to test extensively before making a hard switchover.

Wrap Up

Today’s talks were enlightening and I eagerly await for new big data-as-a-service products to be announced in the coming days. If you want more information about these topics or want to hear the case studies yourself, I recommend you watch the sessions when they are released:

  • BDM205 – Big Data Mini Con State of the Union
  • BDM401 – Deep Dive: Amazon EMR Best Practices & Design Patterns
  • BDM402 – Best Practices for Data Warehousing with Amazon Redshift

For those of you at re:Invent, some of these sessions will be repeated and I highly recommend them.

If you see my ugly mug the next couple days, be sure to say hello — I’d love to know what your doing with Big Data, DevOps or AWS.

Going to AWS re:Invent is one of many excellent perks for Stelligent engineers. Good news! We’re hiring, check out our Careers page.

Stelligent CTO and Co-Founder, Paul Duvall named an AWS Community Hero

Paul Duvall, Stelligent CTO and Co-Founder, has been named an AWS Community Hero. The AWS Community Heroes program is designed to recognize and honor individuals who have had a real impact within the AWS community. Among those are AWS experts who go above and beyond to share their experience and knowledge through social media, blogs, events, user groups, and workshops.

“To be first nominated and ultimately selected as an AWS Community Hero is a great honor,” said Duvall. “I have to add that it is very natural for me to promote AWS, which we have recommended to all who know Stelligent since we first gained exposure to it in 2009, and to which we have exclusively focused our efforts since 2013. AWS is, without a doubt, the best cloud service provider.”

Duvall is constantly exploring how to build solutions in AWS that are in line with continuous delivery principles, and he is currently working on a new book, Enterprise DevOps in AWS, a spiritual successor to his critically acclaimed Continuous Integration released in 2007.

“Paul has established crucial tenets at Stelligent that have shaped everyone here, and we couldn’t be more pleased that AWS has recognized his impact, not just here at Stelligent, but within the AWS community-at-large,” noted Rob Daly, Stelligent CEO and Co-Founder. Two of Stelligent’s core values are “sharing” and “continuous improvement,” and all Stelligent employees are held to those values, with Duvall leading through example. For instance, he shares his AWS expertise and experience through frequent posts on Stelligent’s blog and helps his colleagues craft theirs, as well. Duvall also leads efforts with various AWS product teams, offering insight and R&D effort to explore and contribute feedback regarding various AWS services, both in beta and general availability.

Along with Paul, Stelligent engineers have been responsible for dozens of AWS and Devops blog related posts in the last year alone.  “Every post on our blog is dedicated to achieving continuous delivery while adhering to AWS best practices,” added Jonny Sywulak, Stelligent’s Blog Czar and Senior DevOps Automation Engineer. “We obsess over customers, and we dedicate ourselves to applying what we believe are essential practices to achieve the aims of continuous delivery. This acknowledgement of Paul underscores the value of sharing experiences to advance technology and to create awareness about how Stelligent can help.

More details are available at our press release.

About Stelligent
Stelligent is a technology services company that provides DevOps Automation in the Amazon Web Services (AWS) cloud. We aim for “one-click deployment.” Our reason for being is to help our customers gain the ability to continuously deploy their software, when they want to, and with confidence. We’ve been providing Continuous Delivery solutions in AWS since 2009. Follow @Stelligent on twitter.com/Stelligent. Learn more at http://www.stelligent.com.

Designing Applications for Failure

I recently had the opportunity to attend an AWS bootcamp Herndon, VA office and a short presentation given by their team on Designing for Failure. It opened my eyes to the reality of application design when dealing with failure or even basic exception handling.

One of the defining characteristics between a good developer and a great one is how they deal with failures. Good developers will handle the obvious examples in their code – checking for unexpected input, catching library exceptions, and sometimes edge cases. Why do we build resilient applications and what about the end user?

In this blog post, I’ll share with you the key points that a great developer follows when designing resilient applications.

Why build resilient applications?

screen-shot-2016-10-05-at-11-53-03-am

There are two main reasons that we design applications for failure. As you can probably guess from the horrifying image above, the first reason is User Experience. It’s no secret that you will have user attrition and lost revenue if you cannot shield your end users from issues outside their control. The second reason is Business Services. All business critical systems require resiliency and the difference between a 99.7% uptime and 99.99% could be hours of lost revenue or interrupted business services.

Given an application load of 1 billion requests per month, a 99.7% downtime is 2+ hours versus just 4 minutes for 99.99%. Ouch!

Werner Vogels, the CTO of Amazon Web Services once said at a previous re:Invent “Everything fails, all the time.” It’s a devastating reality and it’s something we all must accept. No matter how mathematically improbable, we simply cannot eliminate all failures. It’s how we reduce the impact of those failures that improves the overall resiliency of our applications.

Graceful Degradation

The way we reduce the impact of failure on our users and business is through graceful degradation. Conceptually it’s very simple – we want to continue to operate in lieu of a failure in some degraded capacity. Keeping with the premise that applications fail all the time, you’ve probably experienced degraded services without even realizing it – and that is the ultimate goal.

Caching

Caching is the first layer of defense when dealing with a failure. Depending on your applications reliance on up-to-date bleeding edge information you should consider caching everything. It’s very easy for developers to reject caching because they always want the freshest information for their users. However, when the difference between a happy customer and a sad one is using some few-minute old data… choose the latter.

As an example, imagine you have a fairly advanced web application. What can you cache?

  • Full HTML pages with CloudFront
  • Database records with ElastiCache
  • Page Fragments with tools such as Varnish
  • Remote API calls from your backend with ElastiCache

Retry

As applications get more complex we rely on more external services than ever before. Whether it’s a 3rd party service provider or your microservices architecture at work, failures are common and often transient. A common pattern for dealing with transient failures on these types of requests is to implement retry logic. Using exponential back off or a Fibonacci sequence you can retry for some time before eventually throwing an exception. It’s important to fail fast and not trigger rate limiting on your source, so don’t continue indefinitely.

Rate Limiting

In the case of denial of service attacks, self-imposed or otherwise, your primary defense is rate limiting based on a context. You can limit the amount of requests to your application based on user data, source address or both. By imposing a limit on requests you can improve your performance during a failure by reducing the actual load and the load imposed by your retry logic. Also consider using exponential back off or a Fibonacci increase to help mitigate particularly demanding services.

For example, during a peak in capacity that cannot be met immediately, a reduction in load would allow your applications infrastructure to respond to the demand (think auto scaling) before completely failing.

Fail Fast

When your application is running out of memory, threads or other resources you can help recovery time by failing fast. You should return an error as soon as possible when it’s detected. Not only will your users be happier not waiting on your application to respond, you will also not cascade the delay into dependent services.

Static Fallback

Whether you’re rate limiting or simply cannot fail silently, you’ll need something to fallback to. A static fallback is a way to provide at least some response to your end users without leaving them to the wind with erroneous error output or no response at all. It’s always better to return content that makes sense to the context of the user and you’ve probably seen this before if you’re a frequent user of sites like Reddit or Twitter.

screen-shot-2016-10-05-at-11-54-57-am

In the case of our example web application, you can configure Route53 to fallback to HTML pages and assets served from Amazon S3 with very little headache. You could set this up today!

Fail Silently

When all of your layers of protection have failed to preserve your service, it’s time to fail silently. Failing silently is when you rely on your logging, monitoring and other infrastructure to respond to your errors with the least impact to the end user. It’s a best practice to return a 200 OK with no content and log your errors on the backend than to return a 500 Internal Server Error, similar HTTP status code or worse yet, a nasty stack trace/log dump.

Failing Fast and You

There are two patterns that you can implement to improve your ability to fail fast: Circuit Breaking and Load Shedding. Generally, you want to leverage your monitoring tools such as Cloudwatch and your logs to detect failure early and begin mitigating the impact as soon as possible. At Stelligent, we strongly recommend automation in your infrastructure, and these two patterns are automation at it’s finest.

Circuit Breaking

Circuit breaking is purposefully degrading performance in light of failure events in your logging or monitoring system. You can utilize any of the degradation patterns mentioned above in the circuit. Finally, by implementing health checks into your service you can restore normal service as soon as possible.

Load Shedding

Load shedding is a method of failing fast that occurs at the networking level. Like circuit breaking, you can rely on monitoring data to reroute traffic from your application to a Static Fallback that you have configured. For example, Route53 has failover support built right in that would allow you to use this pattern right away.

 

Provision a hosted Git repo with AWS CodeCommit using CloudFormation

Recently, AWS announced that you can now automate the provisioning of a hosted Git repository with AWS CodeCommit using CloudFormation. This means that in addition to the console, CLI, and SDK, you can use declarative code to provision a new CodeCommit repository – providing greater flexibility in versioning, testing, and integration.

In this post, I’ll describe how engineers can provision a CodeCommit Git repository in a CloudFormation template. Furthermore, you’ll learn how to automate the provisioning of a deployment pipeline that uses this repository as its Source action to deploy an application using CodeDeploy to an EC2 instance. You’ll see examples, patterns, and a short video that walks you through the process.

Prerequisites

Here are the prerequisites for this solution:

These will be explained in greater detail in the Deployment Steps section.

Architecture and Implementation

In the figure below, you see the architecture for launching a pipeline that deploys software to an EC2 instance from code stored in a CodeCommit repository. You can click on the image to launch the template in CloudFormation Designer.

  • CloudFormation – All of the resource generation of this solution is described in CloudFormation  which is a declarative code language that can be written in JSON or YAML.
  • CodeCommit – With the addition of the AWS::CodeCommit::Repository resource, you can define your CodeCommit Git repositories in CloudFormation.
  • CodeDeploy – CodeDeploy automates the deployment to the EC2 instance that was provisioned by the nested stack.
  • CodePipeline – I’m defining CodePipeline’s stages and actions in CloudFormation code which includes using CodeCommit as a Source action and CodeDeploy for a Deploy action (For more information, see Action Structure Requirements in AWS CodePipeline).
  • EC2 – A nested CloudFormation stack is launched to provision a single EC2 instance on which the CodeDeploy agent is installed. The CloudFormation template called through the nested stack is provided by AWS.
  • IAM – An Identity and Access Management (IAM) Role is provisioned via CloudFormation which defines the resources that the pipeline can access.
  • SNS – A Simple Notification Service (SNS) Topic is provisioned via CloudFormation. The SNS topic is used by the CodeCommit repository for notifications.

CloudFormation Template

In this section, I’ll show code snippets from the CloudFormation template that provisions the entire solution. The focus of my samples is on the CodeCommit resources. There are several other resources defined in this template including EC2, IAM, SNS, CodePipeline, and CodeDeploy. You can find a link to the template at the bottom of  this post.

CodeCommit

In the code snippet below, you see that I’m using the AWS::CodeCommit::Repository CloudFormation resource. The repository name is provided as parameter to the template. I created a trigger to receive notifications when the master branch gets updated using an SNS Topic as a dependent resource that is created in the same CloudFormation template. This is based on the sample code provided by AWS.

    "MyRepo":{
      "Type":"AWS::CodeCommit::Repository",
      "DependsOn":"MySNSTopic",
      "Properties":{
        "RepositoryName":{
          "Ref":"RepoName"
        },
        "RepositoryDescription":"CodeCommit Repository",
        "Triggers":[
          {
            "Name":"MasterTrigger",
            "CustomData":{
              "Ref":"AWS::StackName"
            },
            "DestinationArn":{
              "Ref":"MySNSTopic"
            },
            "Events":[
              "all"
            ]
          }
        ]
      }
    },

CodePipeline

In this CodePipeline snippet, you see how I’m using the CodeCommit repository resource as an input for the Source action in CodePipeline. In doing this, it polls the CodeCommit repository for any changes. When it discovers changes, it initiates an instance of the deployment pipeline in CodePipeline.

        "Stages":[
          {
            "Name":"Source",
            "Actions":[
              {
                "InputArtifacts":[

                ],
                "Name":"Source",
                "ActionTypeId":{
                  "Category":"Source",
                  "Owner":"AWS",
                  "Version":"1",
                  "Provider":"CodeCommit"
                },
                "OutputArtifacts":[
                  {
                    "Name":"MyApp"
                  }
                ],
                "Configuration":{
                  "BranchName":{
                    "Ref":"RepositoryBranch"
                  },
                  "RepositoryName":{
                    "Ref":"RepoName"
                  }
                },
                "RunOrder":1
              }
            ]
          },

You can see an illustration of this pipeline in the figure below.

cpl-cc

Costs

Since costs can vary widely in using certain AWS services and other tools, I’ve provided a cost breakdown and some sample scenarios to give you an idea of what your monthly spend might look like. The AWS Cost Calculator can assist in establishing cost projections.

  • CloudFormation – No additional cost
  • CodeCommit – If you’re using on small project of less than six users, there’s no additional cost. See AWS CodeCommit Pricing for more information.
  • CodeDeploy – No additional cost
  • CodePipeline – $1 a month per pipeline unless you’re using it as part of the free tier. For more information, see AWS CodePipeline pricing.
  • EC2 – Approximately $15/month if you’re running once t1.micro instance 24/7. See AWS EC2 Pricing for more information.
  • IAM – No additional cost
  • SNS – Considering you probably won’t have over 1 million Amazon SNS requests for this particular solution, there’s no cost. For more information, see AWS SNS Pricing.

So, for this particular sample solution, you’ll spend around $16/month iff you run the EC2 instance for an entire month. If you just run it once and terminate it, you’ll spend a little over $1.

Patterns

Here are some patterns to consider when using CodeCommit with CloudFormation.

  • CodeCommit Template – While this solution embeds the CodeCommit creation as part of a single CloudFormation template, it’s unlikely you’ll be updating the CodeCommit repository generation with every application change so you might create a template that focuses on the CodeCommit creation and run it as part of an infrastructure pipeline that gets updated when new CloudFormation is committed to it.
  • Centralized Repos – Most likely, you’ll want to host your CodeCommit repositories in a single AWS account and use cross-account IAM roles to share access across accounts in your organization. While you can create CodeCommit repos in any AWS account, it’ll likely lead to unnecessary complexity when engineers want to know where the code is located.

The last is more of a conundrum than a pattern. As one my colleagues posted in Slack:

I’m stuck in a recursive loop…where do I store my CloudFormation template for my CodeCommit repo?

Good question. I don’t have a good answer for that one just yet. Anyone have thoughts on this one? It gets very “meta”.

Deployment Steps

There are three main steps in launching this solution: preparing an AWS account, launching the stack, and testing the deployment. Each is described in more detail in this section.

Step 1. Prepare an AWS Account

  1. If you don’t already have an AWS account, create one at http://aws.amazon.com by following the on-screen instructions. Part of the sign-up process involves receiving a phone call and entering a PIN using the phone keypad. Be sure you’ve signed up for the CloudFormation service.
  2. Use the region selector in the navigation bar of the console to choose the Northern Virginia (us-east-1) region
  3. Create a key pair. To do this, in the navigation pane of the Amazon EC2 console, choose Key Pairs, Create Key Pair, type a name, and then choose Create.

Step 2. Launch the Stack

Click on the Launch Stack button below to launch the CloudFormation stack. Before you launch the stack, review the architecture, configuration, security, and other considerations discussed in this post. To download the template, click here.

Time to deploy: Approximately 7 minutes

The template includes default settings that you can customize by following the instructions in this post.

Create Details

Here’s a listing of the key AWS resources that are created when this stack is launched:

  • IAM – InstanceProfile, Policy, and Role
  • CodeCommit Repository – Hosts the versioned code
  • EC2 instance – with CodeDeploy agent installed
  • CodeDeploy – application and deployment
  • CodePipeline – deployment pipeline with CodeCommit Integration

CLI Example

Alternatively, you can launch the same stack from the command line as shown in the samples below.

Base Command

From an instance that has the AWS CLI installed, you can use the following snippet as a base command prepended to one of two options described in the Parameters section below.

aws cloudformation create-stack --profile {AWS Profile Name} --stack-name {Stack Name} --capabilities CAPABILITY_IAM --template-url "https://s3.amazonaws.com/stelligent-public/cloudformation-templates/github/labs/codecommit/codecommit-cpl-cfn.json"
Parameters

I’ve provided two ways to run the command – from a custom parameters file or from the CLI.

Option 1 – Custom Parameters JSON File

By attaching the command below to the base command, you can pass parameters from a file as shown in the sample below.

--parameters file:///localpath/to/example-parameters-cpl-cfn.json
Option 2 – Pass Parameters on CLI

Another way to launch the stack from the command line is to provide custom parameters populated with parameter values as shown in the sample below.

--parameters ParameterKey=EC2KeyPairName,ParameterValue=stelligent-dev ParameterKey=EmailAddress,ParameterValue=jsmith@example.com ParameterKey=RepoName,ParameterValue=my-cc-repo

Step 3. Test the Deployment

Click on the CodePipelineURL Output in your CloudFormation stack. You’ll see that the pipeline has failed on the Source action. This is because the Source action expects a populated repository and it’s empty. The way to resolve this is to commit the application files to the newly-created CodeCommit repository. First, you’ll need to clone the repository locally. To do this, get the CloneUrlSsh Output from the CloudFormation stack you launched in Step 2. A sample command is shown below. You’ll replace {CloneUrlSsh} with the value from the CloudFormation stack output. For more information on using SSH to interact with CodeCommit, see the Connect to the CodeCommit Repository section at: Create and Connect to an AWS CodeCommit Repository.

git clone {CloneUrlSsh}
cd {localdirectory}

Once you’ve cloned the repository locally, download the sample application files from SampleApp_Linux.zip and place the files directly into your local repository. Do not include the SampleApp_Linux folder. Go to the local directory and type the following to commit and push the new files to the CodeCommit repository:

git add .
git commit -am "add all files from the AWS sample linux codedeploy application"
git push

Once these files have been committed, the pipeline will discover the changes in CodeCommit and run a new pipeline instance and both stages and actions should succeed as a result of this change.

Access the Application

Once the CloudFormation stack has successfully completed, go to CodeDeploy and select Deployments. For example, if you’re in the us-east-1 region, the URL might look like: https://console.aws.amazon.com/codedeploy/home?region=us-east-1#/deployments (You can also find this link in the CodeDeployURL Output of the CloudFormation stack you launched). Next, click on the link for the Deployment Id of the deployment you just launched from CloudFormation. Then, click on the link for the Instance Id. From the EC2 instance, copy the Public IP value and paste into your browser and hit enter. You should see a page like the one below.

codedeploy_before

Commit Changes to CodeCommit

Make some visual changes to the index.html (look for background-color) and commit these changes to your CodeCommit repository to see these changes get deployed through your pipeline. You perform these actions from the directory where you cloned the local version of your CodeCommit repo (in the directory created by your git clone command). To push these changes to the remote repository, see the commands below.

git commit -am "change bg color to burnt orange"
git push

Once these changes have been committed, CodePipeline will discover the changes made to your CodeCommit repo and initiate a new pipeline. After the pipeline is successfully completed, follow the same instructions for launching the application from your browser. You’ll see that the color of the index page of the application has changed.

codedeploy_after

How-To Video

In this video, I walkthrough the deployment steps described above.

Additional Resources

Here are some additional resources you might find useful:

Summary

In this post, you learned how to define and launch a stack capable of launching a CloudFormation stack that provisions a CodeCommit Git repository in code. Additionally, the example included the automation of a CodePipeline deployment pipeline (which included the CodeCommit integration) along with creating and running the deployment on an EC2 instance using CodeDeploy.

Furthermore, I described the prerequisites, architecture, implementation, costs, patterns and deployment steps of the solution.

Sample Code

The code for the examples demonstrated in this post are located at https://github.com/stelligent/cloudformation_templates/blob/master/labs/codecommit/. Let us know if you have any comments or questions @stelligent or @paulduvall.

Stelligent is hiring! Do you enjoy working on complex problems like figuring out ways to automate all the things as part of a deployment pipeline? Do you believe in the “one-button everything” mantra? If your skills and interests lie at the intersection of DevOps automation and the AWS cloud, check out the careers page on our website.

Microservices Platform with ECS

Architecting applications with microservices is all the rage with developers right now, but running them at scale with cost efficiency and high availability can be a real challenge. In this post, we will address this challenge by looking at an approach to building microservices with Spring Boot and deploying them with CloudFormation on AWS EC2 Container Service (ECS) and Application Load Balancers (ALB). We will start with describing the steps to build the microservice, then walk through the platform for running the microservices, and finally deploy our microservice on the platform.

Spring Boot was chosen for the microservice development as it is a very popular framework in the Java community for building “stand-alone, production-grade Spring based Applications” quickly and easily. However, since ECS is just running Docker containers you can substitute your preferred development framework for Spring Boot and the platform described in this post will be still be able to run your microservice.

This post builds upon a prior post called Automating ECS: Provisioning in CloudFormation that does an awesome job of explaining how to use ECS. If you are new to ECS, I’d highly recommend you review that before proceeding. This post will expand upon that by using the new Application Load Balancer that provides two huge features to improve the ECS experience:

  • Target Groups: Previously in a “Classic” Elastic Load Balancer (ELB), all targets had to be able to handle all possible types of requests that the ELB received. Now with target groups, you can route different URLs to different target groups, allowing heterogeneous deployments. Specifically, you can have two target groups that handle different URLs (eg. /bananas and /apples) and use the ALB to route traffic appropriately.
  • Per Target Ports: Previously in an ELB, all targets had to listen on the same port for traffic from the ELB. In ECS, this meant that you had to manage the ports that each container listened on. Additionally, you couldn’t run multiple instances of a given container on a single ECS container instance since they would have different ports. Now, each container can use an ephemeral port (next available assigned by ECS) making port management and scaling up on a single ECS container instance a non-issue.

The infrastructure we create will look like the diagram below. Notice that there is a single shared ECS cluster and a single shared ALB with a target group, EC2 Container Registry (ECR) and ECS Service for each microservice deployed to the platform. This approach enables a cost efficient solution by using a single pool of compute resources for all the services. Additionally, high availability is accomplished via an Auto Scaling Group (ASG) for the ECS container instances that spans multiple Availability Zones (AZ).

ms-architecture-3
Setup Your Development Environment

You will need to install the Spring Boot CLI to get started. The recommended way is to use SDKMAN! for the installation. First install SDKMAN! with:

 $ curl -s "https://get.sdkman.io" | bash

Then, install Spring Boot with:

$ sdk install springboot

Alternatively, you could install with Homebrew:

$ brew tap pivotal/tap
$ brew install springboot

Scaffold Your Microservice Project

For this example, we will be creating a microservice to manage bananas. Use the Spring Boot CLI to create a project:

$ spring init --build=gradle --package-name=com.stelligent --dependencies=web,actuator,hateoas -n Banana banana-service

This will create a new subdirectory named banana-service with the skeleton of a microservice in src/main/java/com/stelligent and a build.gradle file.

Develop the Microservice

Development of the microservice is a topic for an entire post of its own, but let’s look at a few important bits. First, the application is defined in BananaApplication:

@SpringBootApplication
public class BananaApplication {

  public static void main(String[] args) {
    SpringApplication.run(BananaApplication.class, args);
  }
}

The @SpringBootApplication annotation marks the location to start component scanning and enables configuration of the context within the class.

Next, we have the controller class with contains the declaration of the REST routes.

@RequestMapping("/bananas")
@RestController
public class BananaController {

  @RequestMapping(method = RequestMethod.POST)
  public @ResponseBody BananaResource create(@RequestBody Banana banana)
  {
    // create a banana...
  }

  @RequestMapping(path = "/{id}", method = RequestMethod.GET)
  public @ResponseBody BananaResource retrieve(@PathVariable long id)
  {
    // get a banana by its id
  }

}

These sample routes handle a POST of JSON banana data to /bananas for creating a new banana, and a GET from /bananas/1234 for retrieving a banana by it’s id. To view a complete implementation of the controller including support for POST, PUT, GET, PATCH, and DELETE as well as HATEOAS for links between resources, check out BananaController.java.

Additionally, to look at how to accomplish unit testing of the services, check out the tests created in BananaControllerTest.java using WebMvcTest, MockMvc and Mockito.

Create Microservice Platform

The platform will consist of a separate CloudFormation stack that contains the following resources:

  • VPC – To provide the network infrastructure to launch the ECS container instances into.
  • ECS Cluster – The cluster that the services will be deployed into.
  • Auto Scaling Group – To manage the ECS container instances that contain the compute resources for running the containers.
  • Application Load Balancer – To provide load balancing for the microservices running in containers. Additionally, this provides service discovery for the microservices.

ms-architecture-1.png

The template is available at platform.template. The AMIs used by the Launch Configuration for the EC2 Container Instances must be the ECS optimized AMIs:

Mappings:
  AWSRegionToAMI:
    us-east-1:
      AMIID: ami-2b3b6041
    us-west-2:
      AMIID: ami-ac6872cd
    eu-west-1:
      AMIID: ami-03238b70
    ap-northeast-1:
      AMIID: ami-fb2f1295
    ap-southeast-2:
      AMIID: ami-43547120
    us-west-1:
      AMIID: ami-bfe095df
    ap-southeast-1:
      AMIID: ami-c78f43a4
    eu-central-1:
      AMIID: ami-e1e6f88d

Additionally, the EC2 Container Instances must have the ECS Agent configured to register with the newly created ECS Cluster:

  ContainerInstances:
    Type: AWS::AutoScaling::LaunchConfiguration
    Metadata:
      AWS::CloudFormation::Init:
        config:
          commands:
            01_add_instance_to_cluster:
              command: !Sub |
                #!/bin/bash
                echo ECS_CLUSTER=${EcsCluster}  >> /etc/ecs/ecs.config

Next, an Application Load Balancer is created for the later stacks to register with:

 EcsElb:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      Subnets:
      - !Ref PublicSubnetAZ1
      - !Ref PublicSubnetAZ2
      - !Ref PublicSubnetAZ3
  EcsElbListener:
    Type: AWS::ElasticLoadBalancingV2::Listener
    Properties:
      LoadBalancerArn: !Ref EcsElb
      DefaultActions:
      - Type: forward
        TargetGroupArn: !Ref EcsElbDefaultTargetGroup
      Port: '80'
      Protocol: HTTP

Finally we have a Gradle task in our build.gradle for upserting the platform CloudFormation stack based on a custom task named StackUpTask defined in buildSrc.

task platformUp(type: StackUpTask) {
    region project.region
    stackName "${project.stackBaseName}-platform"
    template file("ecs-resources/platform.template")
    waitForComplete true
    capabilityIam true
    if(project.hasProperty('keyName')) {
        stackParams['KeyName'] = project.keyName
    }
}

Simply run the following to create/update the platform stack:

$ gradle platformUp

Deploy Microservice

Once the platform stack has been created, there are two additional stacks to create for each microservice. First, there is a repo stack that creates the EC2 Container Registry (ECR) for the microservice. This stack also creates a target group for the microservice and adds the target group to the ALB with a rule for which URL path patterns should be routed to the target group.

The second stack is for the service and creates the ECS task definition based on the version of the docker image that should be run, as well as the ECS service which specifies how many tasks to run and the ALB to associate with.

The reason for the two stacks is that you must have the ECR provisioned before you can push a docker image to it, and you must have a docker image in the ECR before creating the ECS service. Ideally, you would create the repo stack once, then configure a CodePipeline job to continuously push changes to the code to ECR as new images and then updating the service stack to reference the newly pushed image.

ms-architecture-2.png

The entire repo template is available at repo.template, an important new resource to check out is the ALB Listener Rule that provides the URL patterns that should be handled by the new target group that is created:

EcsElbListenerRule:
    Type: AWS::ElasticLoadBalancingV2::ListenerRule
    Properties:
      Actions:
      - Type: forward
        TargetGroupArn: !Ref EcsElbTargetGroup
      Conditions:
      - Field: path-pattern
        Values: [“/bananas”]
      ListenerArn: !Ref EcsElbListenerArn
      Priority: 1

The entire service template is available at service.template, but notice that the ECS Task Definition uses port 0 for HostPort. This allows for ephemeral ports that are assigned by ECS to remove the requirement for us to manage container ports:

 MicroserviceTaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      ContainerDefinitions:
      - Name: banana-service
        Cpu: '10'
        Essential: 'true'
        Image: !Ref ImageUrl
        Memory: '300'
        PortMappings:
        - HostPort: 0
          ContainerPort: 8080
      Volumes: []

Next, notice how the ECS Service is created and associated with the newly created Target Group:

 EcsService:
    Type: AWS::ECS::Service
    Properties:
      Cluster: !Ref EcsCluster
      DesiredCount: 6
      DeploymentConfiguration:
        MaximumPercent: 100
        MinimumHealthyPercent: 0
      LoadBalancers:
      - ContainerName: microservice-exemplar-container
        ContainerPort: '8080'
        TargetGroupArn: !Ref EcsElbTargetGroupArn
      Role: !Ref EcsServiceRole
      TaskDefinition: !Ref MicroserviceTaskDefinition

Finally, we have a Gradle task in our service build.gradle for upserting the repo CloudFormation stack:

task repoUp(type: StackUpTask) {
 region project.region
 stackName "${project.stackBaseName}-repo-${project.name}"
 template file("../ecs-resources/repo.template")
 waitForComplete true
 capabilityIam true

 stackParams['PathPattern'] ='/bananas'
 stackParams['RepoName'] = project.name
}

And then another to upsert the service CloudFormation stack:

task serviceUp(type: StackUpTask) {
 region project.region
 stackName "${project.stackBaseName}-service-${project.name}"
 template file("../ecs-resources/service.template")
 waitForComplete true
 capabilityIam true

 stackParams['ServiceDesiredCount'] = project.serviceDesiredCount
 stackParams['ImageUrl'] = "${project.repoUrl}:${project.revision}"

 mustRunAfter dockerPushImage
}

And finally, a task to coordinate the management of the stacks and the build/push of the image:

task deploy(dependsOn: ['dockerPushImage', 'serviceUp']) {
  description "Upserts the repo stack, pushes a docker image, then upserts the service stack"
}

dockerPushImage.dependsOn repoUp

This then provides a simple command to deploy new or update existing microservices:

$ gradle deploy

Defining a similar build.gradle file in other microservices to deploy them to the same platform.

Blue/Green Deployment

When running the gradle deploy, the existing service stack is updated to use a new task definition that references a new docker image in ECR. This CloudFormation update causes ECS to do a rolling replacement of the containers, launching new containers with the new image and killing containers with the old image.

However, if you are looking for a more traditional blue/green deployment, this could be accomplished by creating a new service stack (the green stack) with the new docker image, rather than updating the existing. The new stack would attach to the existing ALB target group at which point you could update the existing service stack (the blue stack) to no longer reference the ALB target group, which would take it out of service without killing the containers.

Next Steps

Stay tuned for future blog posts that builds on this platform by accomplishing service discovery in a more decoupled manner through the use of Eureka as a service registry, Ribbon as a service client, and Zuul as an edge router.

Additionally, this solution isn’t complete since there is no Continuous Delivery pipeline defined. Look for an additional post showing how to use CodePipeline to orchestrate the movement of changes to the microservice source code into production.

The code for the examples demonstrated in this post are located at https://github.com/stelligent/microservice-exemplar. Let us know if you have any comments or questions @stelligent.

Are you interested in building resilient applications in AWS? Stelligent is hiring!

Stelligent Bookclub: “Working Effectively with Legacy Code” by Michael Feathers

If you’re a member of the tech industry then you’ve probably had to work with legacy code — those ancient systems that just hold everything together. Not every project is greenfield, thus “Working Effectively with Legacy Code” by Michael Feathers is a book that has a reputation that would provide a good insight into how we could improve our relationships with these necessary systems and provide a better service to our customers. This blog post will share of some the key takeaways from the book.

What is Legacy Code?

In a world where continuous delivery is on every company’s radar, it’s important to accept that legacy code exists and you’re going to be required to bring it into the fold. One of the key aspects of a successful continuous delivery model is the feedback loop. Your developers and your businesses build confidence in your ability to make changes through testing. Static analysis, unit testing, integration testing, and performance testing are all the difference between having a continuously deployed application and a high risk, rarely changed legacy codebase.

To quote the book, legacy code can be simply defined as ‘code without tests’. This definition at it’s surface may seem too simple and lacks the nuance that comes with the anxiety of working with legacy code. However, underneath it provides an immediate insight into an important way that we shed some of that negativity that comes with working with legacy code.

The transformation of untested legacy code into a low-risk well understood codebase is the focus of this book.

Making Changes & Minimizing Risk

“Anyone can open a text editor and spew the most arcane non-sense into it.”

Many of the changes to software can be simplified to: understanding the design, making changes and poking around to ensure you didn’t break anything. It’s not a very effective way to manage the risk that is associated with code changes. As developers, we all want to follow best practice: to write clean, well-structured and heavily tested code. Unfortunately, this Utopian culture is a rare find and we must learn to live with real-world deliverable code.

Changes are made to code for one of four reasons: new features, bug fixing, design improvements or optimization. Working on any codebase involves risk, but poorly understood software makes it impossible to understand the risk with any change. This leads to a mentality of risk mitigation where developers follow the ‘If it ain’t broke, don’t fix it’ mantra. I’m not talking about developers leaving obvious problems fester when working on a piece of code. Rather, when a change is being committed to a legacy codebase, developers tend to take the quick and painless approach — the ‘let’s just hack it in right here’.

There are three questions the author suggests that you can use to mitigate risk:

  1. What changes will we have to make?
  2. How will we know that we’ve done it correctly?
  3. How will we know that we haven’t broken anything?

Testing

It’s that concept of software development that is known by many but understood by few. Any developer will tell you that code should be tested, but for a variety of reasons it’s either never done or abandoned early. This comes from a pattern of poor understanding and laziness — and I mean both by managers and developers. It’s as much cultural as technical, if not more.

Tests are a way to detect change and improve the speed at which you receive feedback in your workflow. Software is complicated and so is testing — have you ever wondered why there are so many types of testing? We’ve mentioned a few already, but try to imagine each type as a way of localizing problems. With each type of testing you’re focusing the feedback on the various layers of your application testing deeper and deeper so you can detect changes in functionality at any level.

Take unit tests for example. We know they’re supposed to run in isolated environments, be fast (really fast!), and provide very localized error information. In a situation where you’re continuously integrating (i.e. making frequent commits) your unit tests should be able to rapidly provide feedback on your changes. You’ll know exactly which change caused the error and you’ll be able to integrate more quickly.

Regression tests come in all shapes, sizes and have plenty of names: integration tests, smoke tests, acceptance tests, etc to name a few. The key point to understand with each layer of testing you’re moving further away from localized feedback into a realm of testing for regression. It’s a hard concept, so the author of the book gives an interesting anecdote to describe a situation where your unit tests may pass but some higher level functionality will be changed unintentionally — imagine the code works correctly, but there’s unintended consequences. With each layer of testing you’re moving away from precision error localization and checking for code functionality to testing if your program is working correctly.

Refactoring

Working with legacy code has a bit of a conundrum — how do you write tests into a codebase without first making changes? You cannot, and so the author proposes taking on the technical debt of adding tests at the same time as performing code refactoring.

“We want to make functional changes that deliver value while bringing more of the system under test.”

The key to successful refactoring is picking off a chunk of work that you can manage. It’s very easy as a developer to go down a rabbit hole of refactoring and adding tests when you initially only needed to fix a simple bug. Sure, you may succeed in your hare adventures of refactoring an entire interface. It’ll pass all the tests, then, suddenly it’s now failing in integration and you’re burning countless hours making it worse than before. When you’re working on a system that will need refactoring, try to limit your scope to only the functionality that requires change in the first place. Then, when you succeed and your colleagues are cheering, move onto the next.

“Let’s face it, working with legacy code is surgery and doctors never operate alone.”

The author made a small reference to Extreme Programming (XP) and I feel it deserves more attention. If you’re not already, you should try pair programming when working on difficult problems and especially when refactoring. It’s an excellent way to reduce risk and spread knowledge around.

There’s more

Overall, “Working Effectively with Legacy Code” was a great read. Every experienced developer has something to learn with some of the techniques illustrated. Additionally, the breadth of topics can provide useful patterns about transforming a dreary legacy system into a world where Continuous Delivery is possible. It’s a fun topic to cover that allows you to find enjoyment in refactoring and writing tests.

There’s a ton more information within the text that relates specifically to how to implement changes varying from object-oriented design, dependency management and cultural guidance.

Interested in working someplace that gives all employees an impressive book expense budget? We’re hiring.

Refactoring CD Pipelines – Part 2: Metadata driven pipeline

In the last post we created two basic applications each with a basic shell script to automate deploying them into AWS. In this post we will continue refactoring those deploy scripts to work on getting them setup using a common pipeline. We are aiming to have the pipeline executable code configured through metadata allowing us to customize the pipeline through configuration. Although we are not using a build server one could easily be used to orchestrate the pipeline with the framework that we create here.

Our previous deploy script focused on deploying the application to AWS. If we look at the scripts from each repository’s /pipeline folder side-by-side we notice that they are almost identical. This seems like a good place to practice some code reuse. Let’s build out a pipeline that can allow us to share common code across the two applications. Once we complete this pipeline we will have common code that is flexible enough to be used across multiple applications.

We start by defining the steps of a pipeline from the existing deploy scripts. By reading the scripts we can identify that we get the code and gather variables, run some tests, create an AWS Cloud Formation (CFN) stack, and run a simple test against each deployed application.

Logical grouping. Note the practical differences between the pipelines.
Pipeline Step App: blog_refactor_php App: blog_refactor_nodejs
SCM Polling Variables, checkout code… Variables, checkout code…
Static Analysis foodcritic() foodcritic(), jslint()
App Prerequisites AWS Relational Database Service (RDS) creation, Chef runlist/attributes upload Chef runlist/attributes upload
App Deployment AWS Auto Scaling Group (ASG) creation/app deployment ASG creation/app deployment
Acceptance Testing curl endpoint – expect 200 curl endpoint – expect 200

Now that we have the steps laid out we need to decide on a technology to implement this pipeline.

(Rake + Ruby) > Bash

We could continue to use bash for our pipeline code by adding in some structure rather than a flat script. Even though extracting the steps we have identified into function gains us some code reuse, we are still lacking features. By switching to a more advanced language we will gain library support we can leverage to avoid reinventing the wheel. Ruby and Rake seem like a good combination to build the pipeline since they seem to fulfill all of these requirements.

Rake is a well-established build tool that leverages the power of Ruby as a dynamic language. Besides defining tasks with prerequisites and parallel task execution, it offers us the ability to define tasks dynamically. Rake is task oriented which mirrors our pipeline “steps” idea pretty well. We can also get some flexibility out of Rake with the ability to run tasks directly from the command line or integrate the rake tasks into a CI/CD system. Since Rake is just Ruby anyway, integrating any classes we create into the tasks should be pretty simple as well.

To maximize code reuse in an easy, repeatable way you could create a Ruby gem to house common code. This is the approach we took, using metadata to dynamically define Rake tasks and wiring those tasks to reusable classes in a Ruby gem.

Not Your Parent’s Rakefile

Our approach uses Rake primarily as the connective tissue between a hypothetical CI/CD server and the underlying Ruby code that executes the pipeline step logic.

Normally, Rake tasks will be defined along with the code they execute, either in a Rakefile housing multiple tasks or split into separate .rake files. For our sample applications to leverage the pipeline gem we also use a Rakefile, but its job is mostly to read the application’s pipeline metadata and convey it to the gem’s Rakefile.

The gem’s Rakefile iterates over the steps array in the pipeline metadata, defining one Rake task per pipeline step. Each Rake task’s pipeline functionality is delegated to a dynamically-instantiated Ruby class (the ‘worker’ class in the code snippet below) assigned to that step in the metadata.

The @store variable is an instance of a parameter-store class; substitute with any parameter or credentials store in your implementations. Injected into each worker class, the store instance gives the worker access to any outputs from previous Rake tasks as well as the ability to create outputs for downstream Rake tasks.

screen-shot-2016-09-21-at-6-54-34-pm
Figure 1: An application’s pipeline metadata becomes Rake tasks!


The steps are just Ruby classes; your team codes them to match what your pipelines need to do. Similarly, your team should code the store class to match your team’s needs. Because of this, what we’re showing here is more in line with a framework to help your team maximize code reuse.

That seems like a sweet piece of tech, but what do we do when one of our applications has pipeline needs that don’t align perfectly with our gem’s worker class capabilities?

Down With Conformity!

For our post, we have refactored both our PHP and NodeJS applications to leverage the pipeline gem. While most of the pipeline gem’s worker classes are sufficient to support each application’s CD pipeline, our framework needed to be flexible enough to extend a worker class as well as to support fully-custom steps.

rake-task-generation
Figure 2: Worker classes can come from your application pipeline (left) or the pipeline gem (center) in order to support the corresponding Rake task (right).

Extending standard steps via worker-class inheritance

The gem defines a class that performs some static code analysis as part of the CD pipeline, namely running foodcritic against the application’s pipeline cookbook. To support linting for NodeJS application, we can follow these steps so that our NodeJSApp application can provide its own pipeline customization.

First, we create a Ruby class (we called it `ExtendedStaticAnalysis`) in the NodeJS application’s pipeline/lib folder that inherits from the gem’s StaticAnalysis class. This gives us access to execute the foodcritic tests provided by the base worker class.

Next we add a method to ExtendedStaticAnalysis that performs the jslint analysis.

Finally, we change our application’s pipeline metadata so that it will instantiate this new class instead of the gem’s standard worker to perform that step. If we then run `rake –tasks` to show the steps our pipeline now supports, we’ll see a `build:commit:extended_static_analysis` task in the list! (see Figure 1)

Adding custom steps

As with extending step classes, your application’s pipeline can implement its own steps instead of using one of the built-in step implementations. If there’s a serious mismatch between what your application’s pipeline needs to do and what the gem provides, we can create new worker classes in the gem to support a whole new category of pipeline steps.

That’s a wrap

We now have a gem that can be reused and extended for our application pipelines and more. By encapsulating the common logic into Ruby classes in the gem, we’ve eliminated the repetitious code (and the temptation to copy/paste!) without taking away the flexibility to have custom logic in specific pipeline steps. When we expand our suite of applications, we rely on metadata and a small amount of custom code where necessary. Instead of needing to execute Rake tasks from a wrapper script or by hand (great for step development) you can integrate this with your CI/CD server.

See these GitHub repositories referenced by the article:

Authors: Jeff Dugas and Matt Adams

Interested in building out cutting edge continuous delivery platforms and developing a deep love/hate relationship with Rake? Stelligent is hiring!

One-Button Everything in AWS

There’s an approach that we seek at Stelligent known informally as “One-Button Everything”. At times, it can be an elusive goal but it’s something that we often aim to achieve on behalf of our customers. The idea is that we want to be able to create the complete, functioning software system by clicking a single button. Since all of the work we do for our customers is on the Amazon Web Services (AWS) platform, this often results in a single Launch Stack button (as shown below).

cloudformation-launch-stack

The button is a nice metaphor while also being a literal thing in AWS CloudFormation. It’s also emblematic of the principles of simplicity, comprehensiveness, and consistency (discussed later). For example, while you might be able to do the same with a single command, you would need to take into account the setup and configuration required in order to run that single command which might reduce simplicity and consistency.

In this post, I describe the principles and motivation, user base, scope of a complete software system, assumptions and prerequisites, alternative scenarios, and documentation.

Principles

These are the three principles that the “one-button everything” mantra is based upon:

  • Comprehensive – The goal is to orchestrate the full solution stack, not partial implementations of the solution stack
  • Consistent – Works the same way every time. Documentation is similar across solution stacks. Once you require “one-off” implementations, it makes it susceptible to errors
  • Simple – Few steps and dependencies. Make it difficult to make mistakes.

These three principles guide the design of these one-button systems.

The Users are Us

The users of your one-button systems are often other engineers within your organization. A tired argument you might hear is that you don’t need to create simple systems for other engineers since they’re technical too. I could not disagree more with this reasoning. As engineers, we should not be spending time on repetitive, failure-prone activities and put the burden on others  – at scale. This belief doesn’t best serve the needs of the organization as most engineers should be spending time on providing features to users who receive value of their work.

What is the complete software system?

A common question we get is “what makes up a complete software system?” To us, the complete software system refers to all of the infrastructure and software that composes the system. For example, this includes:

  • Networks (e.g. VPC)
  • Compute (EC2, Containers, Serverless, etc.)
  • Storage (e.g. S3, EBS, etc.)
  • Database and Data (RDS, DynamoDB, etc.)
  • Version control repositories (e.g. CodeCommit)
  • Deployment Pipelines
    • Orchestration of software delivery workflows
    • Execution of these workflows
    • Building and deploying application/Service code
    • Test execution
    • Static Analysis
    • Security hardening, tests and analysis
    • Notification systems
  • Monitoring systems

 

Assumptions and Prerequisites

In order to create an effective single-button system, the following patterns are assumed:

  • Everything as Code – The application, configuration, infrastructure, data, tests, and the process to launch the system are all defined in code;
  • Everything is Versioned – All of this code must be versioned in a version-control repository;
  • Everything is Automated – The process for going from zero to working system including the workflow and the “glue code” that puts it all together is defined in code and automated;
  • Client configuration is not assumed – Ideally, you don’t want users to require a certain client-side configuration as it presents room for error and confusion.

As for prerequisites, there might be certain assumptions you document – as long as they are truly one-time only operations. For example, we included the following prerequisites in a demo system we open sourced:

Given a version-control repository, the bootstrapping and the application must be capable of launching from a single CloudFormation command and a CloudFormation button click – assuming that an EC2 Key Pair and Route 53 Hosted Zone has been configured. The demo should not be required to run from a local environment.

So, it assumes the user has created or, in this case, cloned the Git repository and that they’ve established a valid EC2 Key Pair and a Route 53 Hosted Zone. Given that, users should be able to click the Launch Stack button in their AWS account and launch the complete working system with a running deployment pipeline. In this case, the working system includes a VPC network (and associated network resources), IAM, ENI, DynamoDB, a utility EC2 instance, a Jenkins server and a running pipeline in CodePipeline. This pipeline then uses CloudFormation to provision the application infrastructure (e.g. EC2 instances, connect to DynamoDB), etc. Then, as part of the pipeline and its integration with Jenkins, it runs tasks to use Chef to configure the environment to run Node.js, and run automated infrastructure and application tests in RSpec, ServerSpec, Mocha, and Chai. All of this behavior has been provisioned, configured and orchestrated in a way so that anyone can initially click one Launch Stack button to go from zero to fully working system in less than 30 minutes. Once the initial environment is up and running, the application provisioning, configuration, deployment and testing runs in less than 10 minutes. For more on this demo environment, see below.

Decompose system based on lifecycles

While you might create a system that’s capable of recreating the system from a single launch stack button, it doesn’t mean that you apply all changes from the ground up for every change. This is because building everything from “scratch” every time erodes fast feedback. So, while you might have a single button to launch the entire solution stack, you won’t be clicking it all that much. This might sound antithetical to everything I’ve been saying so far but it’s really about viewing it as a single system that can be updated based on change frequency to logical architectural layers.

…while you might have a single button to launch the entire solution stack, you won’t be clicking it all that much.

For example, because you want quick feedback, you won’t rebuild your deployment pipeline, environment images, or the network if there is only a change to the application/service code. Each of these layers might have their own deployment pipelines that get triggered when there’s a code change (which is, often, less frequent than application code changes). As part of the application deployment pipeline, it can consume artifacts generated by the other pipelines.

This approach can take some time to get right as you still want to rebuild your system whenever there’s a code commit; you just need to be judicious in terms of what gets rebuilt depending with each change type. As illustrated in Figure 1, here are some examples of different layers that we often decompose our tech stacks into along with their typical change frequency (your product may vary):

  • Network (VPC) – Once a week
  • Storage – Once a week
  • Routing – Once a week. For example, Route 53 changes.
  • Database – Once a week
  • Deployment Pipeline – Once a week. Apply a CloudFormation stack update.
  • Environment Images – Once per day
  • Application/Service – Many times a day
  • Data- Many times a day
one-button.jpg
Figure 1 – Stack Lifecycles

As illustrated, while a single solution can be launched from one button, it’s often decomposed into a series of other buttons and commands.

 

Launching Non-CloudFormation Solutions

As mentioned, one of the nice features of CloudFormation is the ability to provide a single link to launch a stack. While the stack launch is initially driven by CloudFormation, it can also orchestrate to any other type of tool you might use via the AWS::CloudFormation::Init resource type. This way you can get the benefits of launching from a single Launch Stack button in your AWS account while leveraging the integration with many other types of tools.

For example, you might have implemented your using Ruby, Docker, Python, etc. In this scenario, let’s imagine you’re using CFNDSL (albeit, a little circular considering you’re using Ruby to generate CloudFormation). In this scenario, users could still initiate the solution by clicking on a single Launch Stack button – which would launch a CloudFormation stack. The CloudFormation stack would configure a client that uses Ruby to run CFNDSL which uses the Ruby code to generate CloudFormation behavior. By driving this through CloudFormation, you don’t rely on the user to properly configure the client.

Alternatively, you can meet many of the design goals by implementing the same through a single command but, realize, there can be a potential cost in simplicity and consistency as a result.

Documentation

The documentation provided to users/engineers should be simple to understand and execute and difficult to commit errors. Figure 2 shows an example from https://github.com/stelligent/cloudformation_templates in which an engineer can view the prerequisites, supported AWS regions, an active architecture illustration, a how-to video, and finally, a Launch Stack button. You might use this as a sample for the documentation you write on behalf of the users of your AWS infrastructure.

opsworks_codepipeline_single_button
Figure 2 – Launch Stack Documentation

Summary

You learned how you can create one button (or command) to launch your entire software system from code, prerequisites, what makes up that software system, how you might decompose subsystems based on lifecycles and, finally, a way of documenting the instructions for launching the solution.

Stelligent is hiring! Do you enjoy working on complex problems like figuring out ways to automate all the things as part of a deployment pipeline? Do you believe in the “everything-as-code” mantra? If your skills and interests lie at the intersection of DevOps automation and the AWS cloud, check out the careers page on our website.

Testing Nix Packages in Docker

In this blog post, we will cover developing nix packages and testing them in docker.  We will set up a container with the proper environment to build nix packages, and then we will test build existing packages from the nixpkgs repo.  This lays the foundation for using docker to test your own nix packages.

First, a quick introduction to Nix. Nix is a purely functional package manager. Nix expressions, a simple functional language, declaratively define each package.  A Nix expression describes everything that goes into a package build action, known as a derivation.  Because it’s a functional language, it’s easy to support building variants of a package: turn the Nix expression into a function and call it any number of times with the appropriate arguments. Due to the hashing scheme, variants don’t conflict with each other in the Nix store.  More details regarding Nix packages and NixOS can be found here.

To begin, we must write the Dockerfile that will be used to build the development environment.  We need to declare the baseimage to be used (centos7), install a few dependencies in order to install nix and clone nixpkgs repo.  We also need to set up the nix user and permissions:

FROM centos:latest

MAINTAINER “Fernando J Pando” <nando@********.com>

RUN yum -y install bzip2 perl-Digest-SHA git

RUN adduser nixuser && groupadd nixbld && usermod -aG nixbld nixuser

RUN mkdir -m 0755 /nix && chown nixuser /nix

USER nixuser

WORKDIR /home/nixuser

We clone the nixpkgs github repo (this can be set to any fork/branch for testing):

RUN git clone https://github.com/nixos/nixpkgs.git

We download the nix installer (latest stable 1.11.4):

RUN curl https://nixos.org/nix/install -o nix.install.sh

We are now ready to set up the environment that will allow us to build nix packages in this container.  There are several environment variables that need to be set for this to work, and we can pull them from the environment script provided by nix installation (~/.nix-profile/etc/profile.d/nix.sh).  For easy reference, here is that file:


# Set the default profile.
if ! [ -L &amp;amp;quot;$NIX_LINK&amp;amp;quot; ]; then
echo &amp;amp;quot;creating $NIX_LINK&amp;amp;quot; &amp;amp;gt;&amp;amp;amp;2
_NIX_DEF_LINK=/nix/var/nix/profiles/default
/nix/store/xmlp6pyxi6hi3vazw9821nlhhiap6z63-coreutils-8.24/bin/ln -s &amp;amp;quot;$_NIX_DEF_LINK&amp;amp;quot; &amp;amp;quot;$NIX_LINK&amp;amp;quot;
fi

export PATH=$NIX_LINK/bin:$NIX_LINK/sbin:$PATH

# Subscribe the user to the Nixpkgs channel by default.
if [ ! -e $HOME/.nix-channels ]; then
echo &amp;amp;quot;https://nixos.org/channels/nixpkgs-unstable nixpkgs&amp;amp;quot; &amp;amp;gt; $HOME/.nix-channels
fi

# Append ~/.nix-defexpr/channels/nixpkgs to $NIX_PATH so that
# &amp;amp;lt;nixpkgs&amp;amp;gt; paths work when the user has fetched the Nixpkgs
# channel.
export NIX_PATH=${NIX_PATH:+$NIX_PATH:}nixpkgs=$HOME/.nix-defexpr/channels/nixpkgs

# Set $SSL_CERT_FILE so that Nixpkgs applications like curl work.
if [ -e /etc/ssl/certs/ca-certificates.crt ]; then # NixOS, Ubuntu, Debian, Gentoo, Arch
export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
elif [ -e /etc/ssl/certs/ca-bundle.crt ]; then # Old NixOS
export SSL_CERT_FILE=/etc/ssl/certs/ca-bundle.crt
elif [ -e /etc/pki/tls/certs/ca-bundle.crt ]; then # Fedora, CentOS
export SSL_CERT_FILE=/etc/pki/tls/certs/ca-bundle.crt
elif [ -e &amp;amp;quot;$NIX_LINK/etc/ssl/certs/ca-bundle.crt&amp;amp;quot; ]; then # fall back to cacert in Nix profile
export SSL_CERT_FILE=&amp;amp;quot;$NIX_LINK/etc/ssl/certs/ca-bundle.crt&amp;amp;quot;
elif [ -e &amp;amp;quot;$NIX_LINK/etc/ca-bundle.crt&amp;amp;quot; ]; then # old cacert in Nix profile
export SSL_CERT_FILE=&amp;amp;quot;$NIX_LINK/etc/ca-bundle.crt&amp;amp;quot;
fi

We pull out the relevant environment given a Docker base Centos7 image:

ENV USER=nixuser

ENV HOME=/home/nixuser

ENV _NIX_DEF_LINK=/nix/var/nix/profiles/default

ENV SSL_CERT_FILE=/etc/pki/tls/certs/ca-bundle.crt

ENV NIX_LINK=$HOME/.nix-profile

ENV PATH=$NIX_LINK/bin:$NIX_LINK/sbin:$PATH

ENV NIX_PATH=${NIX_PATH:+$NIX_PATH:}nixpkgs=$HOME/.nix-defexpr/channels/nixpkgs

RUN echo “https://nixos.org/channels/nixpkgs-unstable nixpkgs” > $HOME/.nix-channels

RUN ln -sfv $_NIX_DEF_LINK $NIX_LINK

Now that the environment has been set up, we are ready to install nix:

RUN bash /home/nixuser/nix.install.sh

The last step is to set the working directory to the cloned github nixpkgs directory, so testing will execute from there when the container is run:

WORKDIR /home/nixuser/nixpkgs

At this point, we are ready to build the container:

docker build . -t nand0p/nixpkgs-devel

Conversely, you can pull this container image from docker hub:

docker pull nand0p/nixpkgs-devel

(NOTE: The docker hub image will contain a version of the nixpkgs repo from the time the container image was built.   If you are testing on the bleeding edge, always build the container fresh before testing.)

Once the container is ready, we can now begin test building an existing nix package:

docker run -ti nand0p/nixpkgs-devel nix-build -A nginx

The above command will test build the nginx package inside your new docker container. It will also build all the dependencies.  In order to test the package interactively to ensure to resulting binary works as expected, the container can be launched with bash:

docker run -ti nand0p/nixpkgs-devel bash
[nixuser@aa2c8e29c5e9 nixpkgs]$

Once you have shell inside the container, you can build and test run the package:

[nixuser@aa2c8e29c5e9 nixpkgs]$ nix-build -A nginx
/nix/store/7axviwfzbsqy50zznfxb7jzfvmg9pmwx-nginx-1.10.1

[nixuser@aa2c8e29c5e9 nixpkgs]$ /nix/store/7axviwfzbsqy50zznfxb7jzfvmg9pmwx-nginx-1.10.1/bin/nginx -V
nginx version: nginx/1.10.1
built by gcc 5.4.0 (GCC)
built with OpenSSL 1.0.2h 3 May 2016
TLS SNI support enabled
configure arguments: –prefix=/nix/store/7axviwfzbsqy50zznfxb7jzfvmg9pmwx-nginx-1.10.1 –with-http_ssl_module –with-http_v2_module –with-http_realip_module –with-http_addition_module –with-http_xslt_module –with-http_image_filter_module –with-http_geoip_module –with-http_sub_module –with-http_dav_module –with-http_flv_module –with-http_mp4_module –with-http_gunzip_module –with-http_gzip_static_module –with-http_auth_request_module –with-http_random_index_module –with-http_secure_link_module –with-http_degradation_module –with-http_stub_status_module –with-ipv6 –with-file-aio –add-module=/nix/store/vbfb8z3hgymbaz59wa54qf33yl84jii7-nginx-rtmp-module-v1.1.7-src –add-module=/nix/store/ps5s8q9v91l03972gw0h4l9iazx062km-nginx-dav-ext-module-v0.0.3-src –add-module=/nix/store/46sjsnbkx7rgwgn3pgigcfaygrs2cx30-headers-more-nginx-module-v0.26-src –with-cc-opt=’-fPIE -fstack-protector-all –param ssp-buffer-size=4 -O2 -D_FORTIFY_SOURCE=2′ –with-ld-opt=’-pie -Wl,-z,relro,-z,now’

Thanks for reading, and stay tuned for part II, where we will modify a nix package for our custom needs, and test it against our own fork of the nixpkgs repo.

Refactoring CD Pipelines – Part 1: Chef-Solo in AWS AutoScaling Groups

We often overlook similarities between new CD Pipeline code we’re writing today and code we’ve already written. In addition, we might sometimes rely on the ‘copy, paste, then modify’ approach to make quick work of CD Pipelines supporting similar application architectures. Despite the short-term gains, what often results is code sprawl and maintenance headaches. This blog post series is aimed at helping to reduce that sprawl and improve code re-use across CD Pipelines.

This mini-series references two real but simple applications, hosted in Github, representing the kind of organic growth phenomenon we often encounter and sometimes even cause, despite our best intentions. We’ll refactor them each a couple of times over the course of this series, with this post covering CloudFormation template reuse.

Chef’s a great tool… let’s misuse it!

During the deployment of an AWS AutoScaling Group, we typically rely on the user data section of the Launch Configuration to configure new instances in an automated, repeatable way. A common automation practice is to use chef-solo to accomplish this. We carefully chose chef-solo as a great tool for immutable infrastructure approaches. Both applications have CD Pipelines that leverage it as a scale-time configuration tool by reading a JSON document describing the actions and attributes to be applied.

It’s all roses

It’s a great approach: we sprinkle in a handful or two of CloudFormation parameters to support our Launch Configuration, embed the chef-solo JSON in the user data and decorate it with references to the CloudFormation parameters. Voila, we’re done! The implementation hardly took any time (probably less than an hour per application if you could find good examples in the internet), and each time we need a new CD Pipeline, we can just stamp out a new CloudFormation template.

Figure 1: Launch Configuration user data (as plain text)

 

Figure 2: CloudFormation parameters (corresponding to Figure 1)

Well, it’s mostly roses…

Why is it, then, that a few months and a dozen or so CD Pipelines later, we’re spending all our time debugging and doing maintenance on what should be minor tweaks to our application configurations? New configuration parameters take hours of trial and error, and new application pipelines can be copied and pasted into place, but even then it takes hours to scrape out the previous application’s specific needs from its CloudFormation template and replace them.

Fine, it’s got thorns, and they’re slowing us down

Maybe our great solution could have been better? Let’s start with the major pitfall to our original approach: each application we support has its own highly-customized CloudFormation template.

  • lots of application-specific CFN parameters exist solely to shuttle values to the chef-solo JSON
  • fairly convoluted user data, containing an embedded JSON structure and parameter references, is a bear to maintain
  • tracing parameter values from the CD Pipeline, traversing the CFN parameters into the user data… that’ll take some time to debug when it goes awry

One path to code reuse

Since we’re referencing two real GitHub application repositories that demonstrate our current predicament, we’ll continue using those repositories to present our solution via a code branch named Phase1 in each repository. At this point, we know our applications share enough of a common infrastructure approach that they should be sharing that part of the CloudFormation template.

The first part of our solution will be to extract the ‘differences’ from the CloudFormation templates between these two application pipelines. That should leave us with a common skeleton to work with, minus all the Chef specific items and user data, which will allow us to push the CFN template into an S3 bucket to be shared by both application CD pipelines.

The second part will be to add back the required application specificity, but in a way that migrates those differences from the CloudFormation templates to external artifacts stored in S3.

Taking it apart

Our first tangible goal is to make the user data generic enough to support both applications. We start by moving the inline chef-solo JSON to its own plain JSON document in each application’s pipeline folder (/pipelines/config/app-config.json). Later, we’ll modify our CD pipelines so they can make application and deployment-specific versions of that file and upload it to an S3 bucket.

Figure 3: Before/after comparison (diff) of our Launch Configuration User Data

Screen Shot 2016-08-30 at 4.41.42 PM
Left: original user data; Right: updated user data

The second goal is to make a single, vanilla CloudFormation template. Since we orphaned these Chef only CloudFormation parameters by removing the parts of the user data referencing them, we can remove them. The resulting template’s focus can now be on meeting the infrastructure concerns of our applications.

Figure 4: Before/after comparison (diff) of the CloudFormation parameters required

Screen Shot 2016-08-31 at 6.39.47 PM
Left: original CFN parameters; Right: pared-down parameters

 

At this point, we have eliminated all the differences between the CloudFormation templates, but now they can’t configure our application! Let’s fix that.

Reassembling it for reuse

Our objective now is to make our Launch Configuration user data truly generic so that we can actually reuse our CloudFormation template across both applications. We do that by scripting it to download the JSON that Chef needs from a specified S3 bucket. At the same time, we enhance the CD Pipelines by scripting them to create application and deploy-specific JSON, and to push that JSON to our S3 bucket.

Figure 5: Chef JSON stored as a deploy-specific object in S3

Screen Shot 2016-08-30 at 4.43.50 PM
The S3 key is unique to the deployment.

To stitch these things together we add back one CloudFormation parameter, ChefJsonKey, required by both CD Pipelines – its value at execution time will be the S3 key where the Chef JSON will be downloaded from. (Since our CD Pipeline has created that file, it’s primed to provide that parameter value when it executes the CloudFormation stack.)

Two small details left. First, we give our AutoScaling Group instances the ability to download from that S3 bucket. Now that we’re convinced our CloudFormation template is as generic as it needs to be, we upload it to S3 and have our CD Pipelines reference it as an S3 URL.

Figure 6: Our S3 bucket structure ‘replaces’ the /pipeline/config folder 

Screen Shot 2016-08-30 at 4.43.36 PM
The templates can be maintained in GitHub.

That’s a wrap

We now have a vanilla CloudFormation template that supports both applications. When an AutoScaling group scales up, the new servers will now download a Chef JSON document from S3 in order to execute chef-solo. We were able to eliminate that template from both application pipelines and still get all the benefits of Chef based server configuration.

See these GitHub repositories referenced throughout the article:

In Part 2 of this series, we’ll continue our refactoring effort with a focus on the CD Pipeline code itself.

Authors: Jeff Dugas and Matt Adams

Interested in working with and sometimes misusing configuration management tools like Chef, Puppet, and Ansible ? Stelligent is hiring!