Application Auto Scaling with Amazon ECS

In this blog post, you’ll see an example of Application Auto Scaling for the Amazon ECS (EC2 Container Service). Automatic scaling of the container instances in your ECS cluster has been a feature for quite some time, but until recently you were not able to scale the tasks in your ECS service with built-in technology from AWS. In May of 2016, Automatic Scaling with Amazon ECS was announced which allowed us to configure elasticity into our deployed container services in Amazon’s cloud.

Developer Note: Skip to the “CloudFormation Examples” section to skip right to the code!

Why should you auto scale your container services?

Efficient and effective scaling of your microservices is why you should choose automatic scaling of your containers. If your primary goals include fault tolerance or elastic workloads, then leveraging a combination of cloud technology for autoscaling and infrastructure as code are the keys to success. With AWS’ Automatic Application Autoscaling, you can quickly configure elasticity into your architecture in a repeatable and testable way.

Introducing CloudFormation Support

For the first few months of this new feature it was not available in AWS CloudFormation. Configuration was either a manual process in the AWS Console or a series of API calls made from the CLI or one of Amazon’s SDKs. Finally, in August of 2016, we can now manage this configuration easily using CloudFormation.

The resource types you’re going to need to work with are:

The ScalableTarget and ScalingPolicy are the new resources that configure how your ECS Service behaves when an Alarm is triggered. In addition, you will need to create a new Role to give access to the Application Auto Scaling service to describe your CloudWatch Alarms and to modify your ECS Service — such as increasing your Desired Count.

CloudFormation Examples

The below examples were written for AWS CloudFormation in the YAML format. You can plug these snippets directly into your existing templates with minimal adjustments necessary. Enjoy!

Step 1: Implement a Role

These permissions were gathered from the various sources in AWS documentation.

ApplicationAutoScalingRole:
  Type: AWS::IAM::Role
  Properties:
    AssumeRolePolicyDocument:
      Statement:
      - Effect: Allow
        Principal:
          Service:
          - application-autoscaling.amazonaws.com
        Action:
        - sts:AssumeRole
     Path: "/"
     Policies:
     - PolicyName: ECSBlogScalingRole
       PolicyDocument:
         Statement:
         - Effect: Allow
           Action:
           - ecs:UpdateService
           - ecs:DescribeServices
           - application-autoscaling:*
           - cloudwatch:DescribeAlarms
           - cloudwatch:GetMetricStatistics
           Resource: "*"

Step 2: Implement some alarms

The below alarm will initiate scaling based on container CPU Utilization.

AutoScalingCPUAlarm:
  Type: AWS::CloudWatch::Alarm
  Properties:
    AlarmDescription: Containers CPU Utilization High
    MetricName: CPUUtilization
    Namespace: AWS/ECS
    Statistic: Average
    Period: '300'
    EvaluationPeriods: '1'
    Threshold: '80'
    AlarmActions:
    - Ref: AutoScalingPolicy
    Dimensions:
    - Name: ServiceName
      Value:
        Fn::GetAtt:
        - YourECSServiceResource
        - Name
    - Name: ClusterName
      Value:
        Ref: YourECSClusterName
    ComparisonOperator: GreaterThanOrEqualToThreshold

Step 3: Implement the ScalableTarget

This resource configures your Application Scaling to your ECS Service and provides some limitations for its function. Other than your MinCapacity and MaxCapacity, these settings are quite fixed when used with ECS.

AutoScalingTarget:
  Type: AWS::ApplicationAutoScaling::ScalableTarget
  Properties:
    MaxCapacity: 20
    MinCapacity: 1
    ResourceId:
      Fn::Join:
      - "/"
      - - service
        - Ref: YourECSClusterName
        - Fn::GetAtt:
          - YourECSServiceResource
          - Name
    RoleARN:
      Fn::GetAtt:
      - ApplicationAutoScalingRole
      - Arn
    ScalableDimension: ecs:service:DesiredCount
    ServiceNamespace: ecs

Step 4: Implement the ScalingPolicy

This resource configures your exact scaling configuration — when to scale up or down and by how much. Pay close attention to the StepAdjustments in the StepScalingPolicyConfiguration as the documentation on this is very vague.

In the below example, we are scaling up by 2 containers when the alarm is greater than the Metric Threshold and scaling down by 1 container when below the Metric Threshold. Take special note of how MetricIntervalLowerBound and MetricIntervalUpperBound work together. When unspecified, they are effectively infinity for the upper bound and negative infinity for the lower bound. Finally, note that these thresholds are computed based on aggregated metrics — meaning the Average, Minimum or Maximum of your combined fleet of containers.

AutoScalingPolicy:
  Type: AWS::ApplicationAutoScaling::ScalingPolicy
  Properties:
    PolicyName: ECSScalingBlogPolicy
    PolicyType: StepScaling
    ScalingTargetId:
      Ref: AutoScalingTarget
    ScalableDimension: ecs:service:DesiredCount
    ServiceNamespace: ecs
    StepScalingPolicyConfiguration:
      AdjustmentType: ChangeInCapacity
      Cooldown: 60
      MetricAggregationType: Average
      StepAdjustments:
      - MetricIntervalLowerBound: 0
        ScalingAdjustment: 2
      - MetricIntervalUpperBound: 0
        ScalingAdjustment: -1

Wrapping It Up

Amazon Web Services continues to provide excellent resources for automation, elasticity and virtually unlimited scalability. As you can see, with a couple solid examples underfoot you can very quickly build in that on-demand elasticity and inherent fault tolerance. After you have your tasks auto scaled, I recommend you check out the documentation on how to scale your container instances also to provide the same benefits to your ECS cluster itself.

Deploying Microservices? Let mu help!

With support for ECS Application Auto Scaling coming soon to Stelligent mu, it offers the fastest and most comprehensive platform for deploying microservices as containers.

Want to learn more about mu from its creators? Check out the DevOps in AWS Radio’s podcast or find more posts in our blog.

Additional Resources

Here are some of the supporting resources discussed in this post.

We’re Hiring!

Like what you’ve read? Would you like to join a team on the cutting edge of DevOps and Amazon Web Services? We’re hiring talented engineers like you. Click here to visit our careers page.

 

 

Enforcing Compliance with AWS Organizations

You have a large organization with several development teams that work on various software projects that support your business. A year ago, you brought in a consultant that told you to use multiple AWS accounts because there were benefits to be gained. For example, using multiple accounts we can contain the damage from a possible security breach and isolate work by teams so that others don’t inadvertently disrupt that work. But there are also issues that we must deal with.
When a company has more than one AWS account and especially many AWS accounts, it becomes difficult to manage those accounts. How do we know that all teams are using good security policies? How do we take advantage of billing incentives for using more and more of an AWS resource? How do we manage the billing in general for all of those accounts? And if a company is in a business that requires them to comply with a set of standards such as PCI or HIPAA, how can we guarantee that teams are using only services that are certified compliant? And how can we automate the creation of new accounts in a way that they are properly configured, to begin with?

What Are AWS Organizations?

AWS organizations allow companies with multiple AWS accounts to manage those accounts from a billing and administrative perspective from a single root account. Why is this important? Until Organizations came along, I like to think of having multiple accounts as being like the Wild West. Each account was on its own and there was no way to manage all of them from one place. Users had no way to apply policies, manage permissions, or manage billing from a “company” perspective. AWS Organizations give us the tools we need to bring these accounts together and control them all in a predictable way.

Service Control Policies (SCPs)

Service Control Policies allow us to define the services that an account can access. In our case, we know that we want to allow access to only the services that are HIPAA compliant. Any service that isn’t compliant should not be allowed to be used by the teams. Using the root account, we can push this policy out to all accounts that we have within our organization.

Organizational Units (OUs)

Most organizations have accounts that have different requirements. Using the example above, some accounts may have to be HIPAA compliant while others may be used for other purposes and do not have to follow any guidelines. AWS Organizations gives us the ability to group accounts into Organizational Units.

Organizational Units allow us to split our accounts into separate groups and apply different policies to those groups. Continuing with the example from above, we can have an OU for all accounts that must be HIPAA compliant and an OU for accounts that are general purpose. All accounts in the HIPAA OU will be restricted to only the services that are HIPAA certified while the accounts in the general purpose OU have access to all AWS services. The rules that are applied to an OU even overrule account administrators. If an admin accidentally logs into an account and specifically sets permissions in that account to allow access to a service that has been restricted at the OU level, the OU rule that was applied to the account will still block that access.

OUs can be up to 5 levels deep. You can have multiple OUs inside of an OU. This allows even more granular control over accounts. As an example, let’s assume that some of our HIPAA accounts also handle patient transactional data. This means that we are dealing with both PCI and HIPAA data in those accounts. We can create an OU inside of our HIPAA account that restricts access to only services that are PCI compliant. The result is at the first level we have accounts that can only access HIPAA compliant services. In the PCI OU under the HIPAA OU we have accounts that can only access services which are HIPAA compliant AND PCI compliant.

One thing that must be remembered is that the root or “master” account cannot be restricted. Even if it is placed within an OU, none of the AWS services will be restricted to this account. Therefore, it is essential that the root account is not used by anyone other than the administrator of all accounts.

Account Creation Automation

It is often the case that a company will grow and will add teams as they are needed. These new teams will sometimes need their own set of accounts to work in to avoid disrupting the work of other teams. AWS Organizations provides the ability to automate this task. We can create an account, attach policies to this account, and add this account to the appropriate group all through the Organizations API. Not only is this useful for new teams, but it is also useful when developers need test accounts that need to be created quickly, then deleted when work within that account is finished.

How Does All of That Help Me?

Let’s take a look at an example and apply the tools above to solve the problems that companies with multiple accounts face. Let’s assume we have a health care company with a wide range of systems under their control. Some systems house identifiable patient data, which requires those systems to be HIPAA compliant, and some systems simply house generic data that can be used to generate high-level reports. The latter systems do not require any special treatment. One other platform the company has allows patients to log in and make payments. This platform allows users to store their credit card data for future transactions, which means the services they use must be PCI compliant.

Where Do We Start?

Before we begin we need to gather our requirements. We know that our company must be both HIPAA and PCI compliant so we can start by breaking the teams down into groups of standards they must follow.

Compliance Number of Teams
HIPAA 9
PCI 7
HIPAA and PCI (These overlap from the previous groups) 4
None 3

Once we have our teams broken out into groups, we need to know how many accounts each team has. Or this example, we are going to assume each team has 4 accounts:  Dev, Test, QA, PROD. Note that we have a group of 4 accounts that overlap in service restriction requirements. Unfortunately, Organizations will not allow an account to belong to 2 Organizational Units that are at the same hierarchical level. We will discuss the details of how to achieve this later when we create our OUs and begin adding accounts to them.

Once we have our accounts grouped we are ready to start planning our organization. The resulting Organization will have this overall structure:

cloudcraft - AWS Orgs

LIMITATION ALERT:

It’s worth noting at this point that AWS organizations treat accounts differently depending on how they were originally created. The Organizations API provides the ability to remove an account from the Organization, but only if that account was invited to join the organization. If the account was created by the organization, that account cannot be removed from the organization without deleting the account entirely. The Organizations API also does not provide the ability to delete an account, no matter how it was created. To delete an account, you must log into that account and do that manually. These limitations may influence how companies want to handle bringing accounts into an organization.

One other important fact we need to know is that the account that owns the user we use to create the Organization will become the master account. Make sure never to create an Organization from an account that needs to have policies applied to it. A master account will always have “root” access, even if it is moved to an Organizational Unit that restricts services. The services of the master account cannot be restricted and the wide-open policies will always override anything that is more restrictive.

Once we have our account information, let’s move on to creating the organization.

Creating an Organization

Before we begin, we need to make sure we have the AWS Command Line tools installed on the OS of your choice. Organizations can also be managed using the AWS SDK for your language of choice, but we’re going to use the command line tools for this example. Again, make sure we are using a user from the account we want to be the master. Make sure that user is configured with your CLI tools. Once our configs have been verified, we can issue the following command:

Minimum permissions for your user:

  • organizations:CreateOrganization
aws organizations create-organization --feature-set ALL

Notice that we are passing in a parameter to the create-organization command called “feature-set”. This tells AWS what control the organization will have over our accounts. There are 2 options we can pass in here:  ALL, CONSOLIDATED_BILLING. The ALL parameter value enables consolidated billing and also allows the organization to put policies in place that can restrict the services the account can access. This is the default value if this parameter is omitted. A value of CONSOLIDATED_BILLING will allow the new organization to consolidate the billing of all accounts under the master account. The Organization will not be allowed to restrict the services each account has access to. For our company, we need ALL functionality so we retain the ability to control access for some accounts to only HIPAA and PCI compliant services.

After running this command, we get back a response from AWS

{ "Organization": { "AvailablePolicyTypes": [{ "Status": "ENABLED", "Type": "SERVICE_CONTROL_POLICY" }], "MasterAccountId": "111111111111", "MasterAccountArn": "arn:aws:organizations::111111111111:account/o-exampleorgid/111111111111", "MasterAccountEmail": "bill@example.com", "FeatureSet": "ALL", "Id": "o-exampleorgid", "Arn": "arn:aws:organizations::111111111111:organization/o-exampleorgid" } }

We need to capture the “Id” value and keep that for future use.

Let’s Add Some Accounts

Inviting Accounts

Now that we have a newly created Organization, we can start adding our accounts to our organization. As mentioned above, there are 2 ways to add an account to an Organization. The first method and the one we’ll be using primarily for our example is to send an invitation to our accounts that already exist.

I want to reiterate that it’s important to note here that any account we invite to our Organization can be removed at any time. If we want our accounts tied to this Organization without the option to be removed (as a way of ensuring our policies are always in place), we need to create that account from within the Organization. Any resources would have to be migrated from the existing account to the new account.

To send an invitation to an existing account, we can issue the following command:

Minimum permissions for your users:

  • organizations:DescribeOrganization
  • organizations:InviteAccountToOrganization
aws organizations invite-account-to-organization --target '{"Type": "ACCOUNT", "Id": "ACCOUNT_ID_NUMBER"}'

We are passing in a data structure to the target parameter of the command. In this example, we are passing in the account ID. The key Type can also have values of EMAIL or ORGANIZATION. In those cases, we would set the Id to the appropriate value.

Another optional parameter that we could have passed as “notes”. If we want to include additional information in the email that is auto-generated by Organizations, we can pass that information using the “notes” parameter.

The response from this command should look like this:

{
  "Handshake": {
    "Action": "INVITE",
    "Arn": "arn:aws:organizations::111111111111:handshake/o-exampleorgid/invite/h-examplehandshakeid111",
    "ExpirationTimestamp": 1482952459.257,
    "Id": "h-examplehandshakeid111",
    "Parties": [{
      "Id": "o-exampleorgid",
      "Type": "ORGANIZATION"
    },
    {
      "Id": "juan@example.com",
      "Type": "EMAIL"
    }],
    "RequestedTimestamp": 1481656459.257,
    "Resources": [{
      "Type": "MASTER_EMAIL",
      "Value": "bill@amazon.com"
    },
    {
      "Type": "MASTER_NAME",
      "Value": "Org Master Account"
    },
    {
      "Type": "ORGANIZATION_FEATURE_SET",
      "Value": "FULL"
    },
    {
      "Type": "ORGANIZATION",
      "Value": "o-exampleorgid"
    },
    {
      "Type": "EMAIL",
      "Value": "juan@example.com"
    }],
    "State": "OPEN"
  }
}

Once again, we are interested in the “Id” value of the “Handshake” object. Each time we run the command to invite an account, we will receive this “Id” back in the response. We need to record that value for each account we invite so we can use it in the next step to accept the invitation.

Accepting Invitations

The process of inviting and adding an account to an organization is a “handshake” transaction. An invitation is sent to the account we want to add to our organization and the “owner” of that account must log in and accept that invitation. Fortunately for us, this can also be accomplished through the CLI. Again, we need to make sure our CLI is configured with a principal user that has the IAM permissions to accept that handshake. Once we have the CLI configured, we can issue the following command:

Minimum permissions for your user:

  • organizations:ListHandshakesForAccount
  • organizations:AcceptHandshake
  • organizations:DeclineHandshake
aws organizations accept-handshake --handshake-id HANDSHAKE_ID

The handshake ID that is being passed into this command was given to us in the response of the command to send the invitation.

Remember that we can also send and accept invitations through the console. For users with a few accounts, this may be acceptable. But if you are dealing with more than a few accounts you are definitely going to want to automate this process.

LIMITATION ALERT:

AWS has set a limit on a number of invitations that can be sent per day of 20. If you need to send more than that, contact customer support and they will up your limit.

Using Organizational Units

Here’s where the real power of Organizations starts to show. Now that we have our accounts added to the Organization we need to group them into OUs and restrict the services that can be used within those accounts. Before we started creating the Organization, we took the time to group our accounts by the compliance standard they needed to adhere to. We can use that information to help us create our OUs to move our accounts into. Looking at our chart we can see that we have four different types of accounts. We have HIPAA compliant, PCI compliant, HIPAA and PCI compliant, and accounts that require no restrictions at all. We are going to create three top-level OUs and one OU that is within either the PCI or the HIPAA OU. Because we are simply overlapping 2 sets of compliance standards, it really doesn’t matter which OU we use as a parent.

We’ll start by creating the three top-level OUs. We can issue the following commands to create those:

Minimum permissions for your user:

  • organizations:CreateOrganizationalUnit
aws organizations create-organizational-unit --parent-id PARENT_ORG_ID --name HipaaOU
aws organizations create-organizational-unit --parent-id PARENT_ORG_ID --name PciOU
aws organizations create-organizational-unit --parent-id PARENT_ORG_ID --name GeneralOU

We now have three top-level Organizational Units that we can add accounts to. We have already invited all existing accounts to our Organization. They reside at the top-level of our Org. To place those accounts into the proper OU we need to issue the “move” command on each account.

Minimum permissions for your user:

  • organizations:MoveAccount
aws organizations move-account --account-id ACCOUNT_ID --source-parent-id PARENT_ORG_ID --destination-parent-id OU_ID

We will need to issue this command for each account we need to move to an OU. We need to make sure we are using the correct destination ID to place the account into the proper OU.

We need to repeat the last 2 steps to create the sub OU for our overlapping HIPAA and PCI accounts. This time around the PARENT_ORG_ID will be changed from the ID of the organization itself to the ID of the organizational unit we want to create this sub OU in. We will create this OU within the HipaaOU that we created in the previous step.

And we can move those accounts that require both HIPAA and PCI compliance into this new OU using the same command we used to move the other accounts.

Service Control Policies

Simply moving accounts into OUs accomplishes nothing on its own. In order to take advantage of the power of these new OUs, we need to apply policies that will restrict the services that the accounts within the OU can access. At the time of this writing, Service Control Policies are the only policies that can be applied to an OU.

In order to apply a Service Control Policy to our account, we need to create a policy file that we can pass into the create-policy command. We could place this text within the command itself, but with the number of services we need to include and the fact that we have to escape characters, that approach is error-prone and very messy. Here’s what our policy file will look like

{ 
  “Version”: “2012-10-17”,
  “Statement”: [{
    “Effect”: “Allow”,
    “Action”: [
      “ec2:*”,
      “rds:*”,
      “dynamodb:*”
    ],
    “Resource”: “*”
  }]
}

In the above policy file, we are explicitly allowing a few services. There are many more HIPAA compliant services, but for the sake of this example, we are going to limit the policy to these three services.

TRAP FOR YOUNG PLAYERS:

It needs to be mentioned here that Service Control Policies which are applied to an OU will not grant any user any rights. We are not pushing this policy as a way to give each user in the accounts in the OU access to these services. This policy is in place as a way to restrict the permissions that can be applied to a user. And they will apply to all users, including administrators.

It’s also worth noting that the policies we are putting in place to restrict services assume that the “Allow *” policies have been removed from the root, OU, and individual accounts. If “Allow *” is still in place in any of these locations, the above policy will have no effect on the account(s) it is applied to.

We need to create two additional policy files, one for each additional OU type. Because we removed the “Allow *” policy from all accounts, OUs, and the root Organization, we will need to create a policy file for our GeneralOU that allows all services for that OU. We will reuse the PCI policy file for the sub OU that allows both HIPAA and PCI services.

Once we have our policy files in place, we can start creating those policies:

Minimum permissions for your user:

  • organizations:CreatePolicy
aws organizations create-policy --content file://allow_hipaa_policy.json --name AllowHipaaServices --type SERVICE_CONTROL_POLICY --description "This policy allows all HIPAA services"
aws organizations create-policy --content file://allow_pci_policy.json --name AllowPCIServices --type SERVICE_CONTROL_POLICY --description "This policy allows all PCI services"
organizations create-policy --content file://allow_all_policy.json --name AllowAllServices --type SERVICE_CONTROL_POLICY --description "This policy allows all services"

We have created three new policies that now need to be attached to our OUs.

Minimum permissions for your user:

  • organizations:AttachPolicy
aws organizations attach-policy --policy-id HIPAA_POLICY_ID --target-id HIPAA_OU_ID
aws organizations attach-policy --policy-id PCI_POLICY_ID --target-id HIPAA_OU_ID
aws organizations attach-policy --policy-id GENERAL_POLICY_ID --target-id HIPAA_OU_ID
aws organizations attach-policy --policy-id PCI_POLICY_ID --target-id HIPAA_PCI_OU_ID

Let’s take the time to examine what is happening here. We know that we have removed all permissions for all services for our root Organization, OUs, and accounts. We created policies that allow services that are compliant with HIPAA and PCI respectively. And we know that when we apply those policies to our OUs, the accounts within that OU will now have access to those services. In the case of the sub OU that allows both PCI and HIPAA services, the sub OU that has the overlapping accounts will inherit the services that are allowed by the HIPAA OU. Applying the AllowPCIServices policy to the sub OU will mean that in addition to the services it inherited, it will also be allowed to access the services which are PCI compliant.

Conclusion

Success! We have created a new Organization, invited our accounts into that organization, and grouped those accounts into OUs so we could ensure each group of accounts is compliant to the required standards. When dealing with a few accounts, working from the command line is fine. For larger amounts of accounts, it is highly recommended to script this process out.

AWS Organizations helps companies manage multiple accounts from a billing and policy standpoint. The use of Organizations helps reduce accidental security policies that violate compliance laws that companies may have to follow. It also reduces the time and effort required to create new accounts by providing an API that allows the auto-creation of new accounts with the correct policies already attached. Users can be restricted to the accounts they need access to and blocked from the accounts they don’t. All companies that have multiple accounts can benefit from the features provided by Organizations.

About Stelligent
Stelligent is an APN Advanced Consulting Partner and hold the AWS DevOps Competency. As a technology services company that provides DevOps Automation on Amazon Web Services (AWS) Cloud, we aim for “one-click deployment.” Our reason for being is to help our customers gain the ability to continuously deploy their software, when they want to, and with confidence. We’ve been providing DevOps Automation solutions on AWS since 2009. Follow @Stelligent on Twitter. Learn more at http://www.stelligent.com

Designing Applications for Failure

I recently had the opportunity to attend an AWS bootcamp Herndon, VA office and a short presentation given by their team on Designing for Failure. It opened my eyes to the reality of application design when dealing with failure or even basic exception handling.

One of the defining characteristics between a good developer and a great one is how they deal with failures. Good developers will handle the obvious examples in their code – checking for unexpected input, catching library exceptions, and sometimes edge cases. Why do we build resilient applications and what about the end user?

In this blog post, I’ll share with you the key points that a great developer follows when designing resilient applications.

Why build resilient applications?

screen-shot-2016-10-05-at-11-53-03-am

There are two main reasons that we design applications for failure. As you can probably guess from the horrifying image above, the first reason is User Experience. It’s no secret that you will have user attrition and lost revenue if you cannot shield your end users from issues outside their control. The second reason is Business Services. All business critical systems require resiliency and the difference between a 99.7% uptime and 99.99% could be hours of lost revenue or interrupted business services.

Given an application load of 1 billion requests per month, a 99.7% downtime is 2+ hours versus just 4 minutes for 99.99%. Ouch!

Werner Vogels, the CTO of Amazon Web Services once said at a previous re:Invent “Everything fails, all the time.” It’s a devastating reality and it’s something we all must accept. No matter how mathematically improbable, we simply cannot eliminate all failures. It’s how we reduce the impact of those failures that improves the overall resiliency of our applications.

Graceful Degradation

The way we reduce the impact of failure on our users and business is through graceful degradation. Conceptually it’s very simple – we want to continue to operate in lieu of a failure in some degraded capacity. Keeping with the premise that applications fail all the time, you’ve probably experienced degraded services without even realizing it – and that is the ultimate goal.

Caching

Caching is the first layer of defense when dealing with a failure. Depending on your applications reliance on up-to-date bleeding edge information you should consider caching everything. It’s very easy for developers to reject caching because they always want the freshest information for their users. However, when the difference between a happy customer and a sad one is using some few-minute old data… choose the latter.

As an example, imagine you have a fairly advanced web application. What can you cache?

  • Full HTML pages with CloudFront
  • Database records with ElastiCache
  • Page Fragments with tools such as Varnish
  • Remote API calls from your backend with ElastiCache

Retry

As applications get more complex we rely on more external services than ever before. Whether it’s a 3rd party service provider or your microservices architecture at work, failures are common and often transient. A common pattern for dealing with transient failures on these types of requests is to implement retry logic. Using exponential back off or a Fibonacci sequence you can retry for some time before eventually throwing an exception. It’s important to fail fast and not trigger rate limiting on your source, so don’t continue indefinitely.

Rate Limiting

In the case of denial of service attacks, self-imposed or otherwise, your primary defense is rate limiting based on a context. You can limit the amount of requests to your application based on user data, source address or both. By imposing a limit on requests you can improve your performance during a failure by reducing the actual load and the load imposed by your retry logic. Also consider using exponential back off or a Fibonacci increase to help mitigate particularly demanding services.

For example, during a peak in capacity that cannot be met immediately, a reduction in load would allow your applications infrastructure to respond to the demand (think auto scaling) before completely failing.

Fail Fast

When your application is running out of memory, threads or other resources you can help recovery time by failing fast. You should return an error as soon as possible when it’s detected. Not only will your users be happier not waiting on your application to respond, you will also not cascade the delay into dependent services.

Static Fallback

Whether you’re rate limiting or simply cannot fail silently, you’ll need something to fallback to. A static fallback is a way to provide at least some response to your end users without leaving them to the wind with erroneous error output or no response at all. It’s always better to return content that makes sense to the context of the user and you’ve probably seen this before if you’re a frequent user of sites like Reddit or Twitter.

screen-shot-2016-10-05-at-11-54-57-am

In the case of our example web application, you can configure Route53 to fallback to HTML pages and assets served from Amazon S3 with very little headache. You could set this up today!

Fail Silently

When all of your layers of protection have failed to preserve your service, it’s time to fail silently. Failing silently is when you rely on your logging, monitoring and other infrastructure to respond to your errors with the least impact to the end user. It’s a best practice to return a 200 OK with no content and log your errors on the backend than to return a 500 Internal Server Error, similar HTTP status code or worse yet, a nasty stack trace/log dump.

Failing Fast and You

There are two patterns that you can implement to improve your ability to fail fast: Circuit Breaking and Load Shedding. Generally, you want to leverage your monitoring tools such as Cloudwatch and your logs to detect failure early and begin mitigating the impact as soon as possible. At Stelligent, we strongly recommend automation in your infrastructure, and these two patterns are automation at it’s finest.

Circuit Breaking

Circuit breaking is purposefully degrading performance in light of failure events in your logging or monitoring system. You can utilize any of the degradation patterns mentioned above in the circuit. Finally, by implementing health checks into your service you can restore normal service as soon as possible.

Load Shedding

Load shedding is a method of failing fast that occurs at the networking level. Like circuit breaking, you can rely on monitoring data to reroute traffic from your application to a Static Fallback that you have configured. For example, Route53 has failover support built right in that would allow you to use this pattern right away.

 

One-Button Everything in AWS

There’s an approach that we seek at Stelligent known informally as “One-Button Everything”. At times, it can be an elusive goal but it’s something that we often aim to achieve on behalf of our customers. The idea is that we want to be able to create the complete, functioning software system by clicking a single button. Since all of the work we do for our customers is on the Amazon Web Services (AWS) platform, this often results in a single Launch Stack button (as shown below).

cloudformation-launch-stack

The button is a nice metaphor while also being a literal thing in AWS CloudFormation. It’s also emblematic of the principles of simplicity, comprehensiveness, and consistency (discussed later). For example, while you might be able to do the same with a single command, you would need to take into account the setup and configuration required in order to run that single command which might reduce simplicity and consistency.

In this post, I describe the principles and motivation, user base, scope of a complete software system, assumptions and prerequisites, alternative scenarios, and documentation.

Principles

These are the three principles that the “one-button everything” mantra is based upon:

  • Comprehensive – The goal is to orchestrate the full solution stack, not partial implementations of the solution stack
  • Consistent – Works the same way every time. Documentation is similar across solution stacks. Once you require “one-off” implementations, it makes it susceptible to errors
  • Simple – Few steps and dependencies. Make it difficult to make mistakes.

These three principles guide the design of these one-button systems.

The Users are Us

The users of your one-button systems are often other engineers within your organization. A tired argument you might hear is that you don’t need to create simple systems for other engineers since they’re technical too. I could not disagree more with this reasoning. As engineers, we should not be spending time on repetitive, failure-prone activities and put the burden on others  – at scale. This belief doesn’t best serve the needs of the organization as most engineers should be spending time on providing features to users who receive value of their work.

What is the complete software system?

A common question we get is “what makes up a complete software system?” To us, the complete software system refers to all of the infrastructure and software that composes the system. For example, this includes:

  • Networks (e.g. VPC)
  • Compute (EC2, Containers, Serverless, etc.)
  • Storage (e.g. S3, EBS, etc.)
  • Database and Data (RDS, DynamoDB, etc.)
  • Version control repositories (e.g. CodeCommit)
  • Deployment Pipelines
    • Orchestration of software delivery workflows
    • Execution of these workflows
    • Building and deploying application/Service code
    • Test execution
    • Static Analysis
    • Security hardening, tests and analysis
    • Notification systems
  • Monitoring systems

 

Assumptions and Prerequisites

In order to create an effective single-button system, the following patterns are assumed:

  • Everything as Code – The application, configuration, infrastructure, data, tests, and the process to launch the system are all defined in code;
  • Everything is Versioned – All of this code must be versioned in a version-control repository;
  • Everything is Automated – The process for going from zero to working system including the workflow and the “glue code” that puts it all together is defined in code and automated;
  • Client configuration is not assumed – Ideally, you don’t want users to require a certain client-side configuration as it presents room for error and confusion.

As for prerequisites, there might be certain assumptions you document – as long as they are truly one-time only operations. For example, we included the following prerequisites in a demo system we open sourced:

Given a version-control repository, the bootstrapping and the application must be capable of launching from a single CloudFormation command and a CloudFormation button click – assuming that an EC2 Key Pair and Route 53 Hosted Zone has been configured. The demo should not be required to run from a local environment.

So, it assumes the user has created or, in this case, cloned the Git repository and that they’ve established a valid EC2 Key Pair and a Route 53 Hosted Zone. Given that, users should be able to click the Launch Stack button in their AWS account and launch the complete working system with a running deployment pipeline. In this case, the working system includes a VPC network (and associated network resources), IAM, ENI, DynamoDB, a utility EC2 instance, a Jenkins server and a running pipeline in CodePipeline. This pipeline then uses CloudFormation to provision the application infrastructure (e.g. EC2 instances, connect to DynamoDB), etc. Then, as part of the pipeline and its integration with Jenkins, it runs tasks to use Chef to configure the environment to run Node.js, and run automated infrastructure and application tests in RSpec, ServerSpec, Mocha, and Chai. All of this behavior has been provisioned, configured and orchestrated in a way so that anyone can initially click one Launch Stack button to go from zero to fully working system in less than 30 minutes. Once the initial environment is up and running, the application provisioning, configuration, deployment and testing runs in less than 10 minutes. For more on this demo environment, see below.

Decompose system based on lifecycles

While you might create a system that’s capable of recreating the system from a single launch stack button, it doesn’t mean that you apply all changes from the ground up for every change. This is because building everything from “scratch” every time erodes fast feedback. So, while you might have a single button to launch the entire solution stack, you won’t be clicking it all that much. This might sound antithetical to everything I’ve been saying so far but it’s really about viewing it as a single system that can be updated based on change frequency to logical architectural layers.

…while you might have a single button to launch the entire solution stack, you won’t be clicking it all that much.

For example, because you want quick feedback, you won’t rebuild your deployment pipeline, environment images, or the network if there is only a change to the application/service code. Each of these layers might have their own deployment pipelines that get triggered when there’s a code change (which is, often, less frequent than application code changes). As part of the application deployment pipeline, it can consume artifacts generated by the other pipelines.

This approach can take some time to get right as you still want to rebuild your system whenever there’s a code commit; you just need to be judicious in terms of what gets rebuilt depending with each change type. As illustrated in Figure 1, here are some examples of different layers that we often decompose our tech stacks into along with their typical change frequency (your product may vary):

  • Network (VPC) – Once a week
  • Storage – Once a week
  • Routing – Once a week. For example, Route 53 changes.
  • Database – Once a week
  • Deployment Pipeline – Once a week. Apply a CloudFormation stack update.
  • Environment Images – Once per day
  • Application/Service – Many times a day
  • Data- Many times a day
one-button.jpg
Figure 1 – Stack Lifecycles

As illustrated, while a single solution can be launched from one button, it’s often decomposed into a series of other buttons and commands.

 

Launching Non-CloudFormation Solutions

As mentioned, one of the nice features of CloudFormation is the ability to provide a single link to launch a stack. While the stack launch is initially driven by CloudFormation, it can also orchestrate to any other type of tool you might use via the AWS::CloudFormation::Init resource type. This way you can get the benefits of launching from a single Launch Stack button in your AWS account while leveraging the integration with many other types of tools.

For example, you might have implemented your using Ruby, Docker, Python, etc. In this scenario, let’s imagine you’re using CFNDSL (albeit, a little circular considering you’re using Ruby to generate CloudFormation). In this scenario, users could still initiate the solution by clicking on a single Launch Stack button – which would launch a CloudFormation stack. The CloudFormation stack would configure a client that uses Ruby to run CFNDSL which uses the Ruby code to generate CloudFormation behavior. By driving this through CloudFormation, you don’t rely on the user to properly configure the client.

Alternatively, you can meet many of the design goals by implementing the same through a single command but, realize, there can be a potential cost in simplicity and consistency as a result.

Documentation

The documentation provided to users/engineers should be simple to understand and execute and difficult to commit errors. Figure 2 shows an example from https://github.com/stelligent/cloudformation_templates in which an engineer can view the prerequisites, supported AWS regions, an active architecture illustration, a how-to video, and finally, a Launch Stack button. You might use this as a sample for the documentation you write on behalf of the users of your AWS infrastructure.

opsworks_codepipeline_single_button
Figure 2 – Launch Stack Documentation

Summary

You learned how you can create one button (or command) to launch your entire software system from code, prerequisites, what makes up that software system, how you might decompose subsystems based on lifecycles and, finally, a way of documenting the instructions for launching the solution.

Stelligent is hiring! Do you enjoy working on complex problems like figuring out ways to automate all the things as part of a deployment pipeline? Do you believe in the “everything-as-code” mantra? If your skills and interests lie at the intersection of DevOps automation and the AWS cloud, check out the careers page on our website.

Automating Penetration Testing in a CI/CD Pipeline (Part 2)

Continuous Security: Security in the Continuous Delivery Pipeline is a series of articles addressing security concerns and testing in the Continuous Delivery pipeline. This is the sixth article in the series.

In the first post, we discussed what OWASP ZAP is, how it’s installed and automating that installation process with Ansible. This second article of three will drill down into how to use the ZAP server, created in Part 1 for penetration testing your web-based application.

Penetration Test Script

If you recall the flow diagram (below) from the first post, we will need a way to talk to ZAP so that it can trigger a test against our application. To do this we’ll use the available ZAP API and wrap up the API in a Python script. The script will allow us to specify our ZAP server, target application server, trigger each phase of the penetration test and report our results.

ZAP-Basic-CI_CD-Flow - New Page (1)

The core of the ZAP API is to open our proxy, access the target application, spider the application, run an automated scan against it and fetch the results. This can be accomplished with just a handful of commands; however, our goal is to eventually get this bound into a CI/CD environment, so the script will have to be more versatile than a handful of commands.

The Python ZAP API can be easily installed via pip:

pip install python-owasp-zap-v2.4

We’ll start by breaking down what was outlined in the above paragraph. For learning purposes, these can be easily ran from the Python command line.

from zapv2 import ZAPv2

target = "http://" % target_application_url
zap = ZAPv2(proxies={'http': "http://%s" %zap_hostname_or_ip,
                     'https': "https://%s" %zap_hostname_or_ip}
zap.urlopen(target)
zap.spider.scan(target)
zap.spider.static()
# when status is >= 100, the spider has completed and we can run our scan
zap.ascan.scan(target)
zap.ascan.status()
# when status is >= 100, the scan has completed and we can fetch results
print zap.core.alerts()

This snippet will print our results straight to STDOUT in a mostly human readable format. To wrap all this up so that we can easily integrate this into an automated environment we can easily change our output to JSON, accept incoming parameters for our ZAP host names and target url. The following script takes the above commands and adds the features just mentioned.

The script can be called as follows:

./pen-test-app.py --zap-host zap_host.example.com:8080 --target app.example.com

Take note, the server that is launching our penetration test does not need to run ZAP itself, nor does it need to run the application we wish to run our pen test against.

Lets set up a very simple web-based application that we can use to test against. This isn’t a real-world example but it works well for the scope of this article. We’ll utilize Flask, a simple Python-based http server and allow it run a basic application that will simply display what was typed into the form field once submitted. The script can be downloaded here.

First Flask needs to be installed and the server started with the following:

pip install flask
python simple_server.py

The server will run on port 5000 over http. Using the example command above, we’ll run our ZAP penetration test against it as so:

/pen-test-app.py --zap-host 192.168.1.5:8080 --target http://192.168.1.73:5000
Accessing http://192.168.1.73:5000
Spidering http://192.168.1.73:5000
Spider completed
Scanning http://192.168.1.73:5000
Info: Scan completed; writing results.

Please note that the ZAP host is simply a url and a port, while the target must specify the protocol, either ‘http’ or ‘https’.

The ‘pen-test-app.py’ script is just an example of one of the many ways OWASP ZAP can be used in an automated manner. Tests can also be written to integrate FireFox (with ZAP as its proxy) and Selenium to mimic user interaction with your application. This could also be ran from the same script in addition to the existing tests.

Scan and Report the Results

The ZAP API will return results to the ‘pen-test-app.py’ script which in turns will write them to a JSON file, ‘results.json’. These results could be easily scanned for risk severities such as “grep -ie ‘high’ -e ‘medium’ results.json”. This does not give us much granularity in determining which tests are reporting errors nor if they critical enough to fail an entire build pipeline.

This is where a tool called Behave comes into play. Behave is a Gerkin-based language that allows the user to write test scenarios in a very human readable format.

Behave can be easily installed with pip:

pip install behave

Once installed our test scenarios are placed into a feature file. For this example we can create a file called ‘pen_test.feature’ and create a scenario.

Feature: Pen test the Application
  Scenario: The application should not contain Cross Domain Scripting vulnerabilities
    Given we have valid json alert output
    When there is a cross domain source inclusion vulnerability
    Then none of these risk levels should be present
      | risk |
      | Medium |
      | High |

The above scenario gets broken down into steps. The ‘Given’, ‘When’ and ‘Then’ will each correlate to a portion of Python code that will test each statement. The ‘risk’ portion is a table, that will be passed to our ‘Then’ statement. This can be read as “If the scanner produced valid JSON, succeed if there are no CSX vulnerabilities or only ones with ‘Low’ severity.

With the feature file in place, each step must now be written. A directory must be created called ‘steps’. Inside the ‘steps’ directory we create a file with the same name as the feature file but with a ‘.py’ extension instead of a ‘.feature’ extension. The following example contains the code for each step above to produce a valid test scenario.

import json
import re
import sys

from behave import *

results_file = 'results.json'

@given('we have valid json alert output')
def step_impl(context):
    with open(results_file, 'r') as f:
        try:
            context.alerts = json.load(f)
        except Exception as e:
            sys.stdout.write('Error: Invalid JSON in %s: %s\n' %
                             (results_file, e))
            assert False

@when('there is a cross domain source inclusion vulnerability')
def step_impl(context):
    pattern = re.compile(r'cross(?:-|\s+)(?:domain|site)', re.IGNORECASE)
    matches = list()

    for alert in context.alerts:
        if pattern.match(alert['alert']) is not None:
             matches.append(alert)
    context.matches = matches
    assert True

@then('none of these risk levels should be present')
def step_impl(context):
    high_risks = list()

    risk_list = list()
    for row in context.table:
        risk_list.append(row['risk'])

    for alert in context.matches:
         if alert['risk'] in risk_list:
             if not any(n['alert'] == alert['alert'] for n in high_risks):
                 high_risks.append(dict({'alert': alert['alert'],
                                          'risk': alert['risk']}))

    if len(high_risks) > 0:
        sys.stderr.write("The following alerts failed:\n")
    for risk in high_risks:
        sys.stderr.write("\t%-5s: %s\n" % (risk['alert'], risk['risk']))
        assert False

    assert True

To run the above test simply type ‘behave’ from the command line.

behave
 
Feature: Pen test the Application # pen_test.feature:1

  Scenario: The application should not contain Cross Domain Scripting vulnerabilities # pen_test.feature:7
    Given we have valid json alert output # steps/pen_test.py:14 0.001s
    When there is a cross domain source inclusion vulnerability # steps/pen_test.py:25 0.000s
    Then none of these risk levels should be present # steps/pen_test.py:67 0.000s
      | risk |
      | Medium |
      | High |

1 feature passed, 0 failed, 0 skipped
1 scenario passed, 0 failed, 0 skipped
3 steps passed, 0 failed, 0 skipped, 0 undefined
Took 0m0.001s

We can clearly see what was ran and each result. If this was ran from a Jenkins server, the return code will be read and the job will succeed. If a step fails, behave will return non-zero, triggering Jenkins to fail the job. If the job fails, it’s up to the developer to investigate the pipeline, find the point it failed, login to the Jenkins server and view the console output to see which test failed. This may not be the most ideal method. We can tell behave that we want our output in JSON so that another script can consume the JSON, reformat it into something an existing reporting mechanism could use and upload it to a central location.

To change behave’s behavior to dump JSON:

behave --no-summary --format json.pretty > behave_results.json

A reporting script can either read the behave_results, json file or read the STDIN pipe directly from behave. We’ll discuss more regarding this in the followup post.

Summary

If you’ve been following along since the first post, we have learned how to set up our own ZAP service, have the ZAP service penetration test a target web application and examine the results. This may be a suitable scenario for many systems. However, integrating this into a full CI/CD pipeline would be the optimal and most efficient use of this.

In part three we will delve into how to fully integrate ZAP so that not only will your application involve user, acceptance and capacity testing, it will now pass through security testing before reaching your end users.

Stelligent is hiring! Do you enjoy working on complex problems like security in the CD pipeline? Do you believe in the “everything-as-code” mantra? If your skills and interests lie at the intersection of DevOps automation and the AWS cloud, check out the careers page on our website.

Serverless Delivery: Orchestrating the Pipeline (Part 3)

In the second post of this series, we focused on how to get our serverless application running with Lambda, API Gateway and S3. Our application is now able to run on a serverless platform, but we still have not applied the fundamentals of continuous delivery that we talked about in the first part of this series.

In this third and final part of this series on serverless delivery, we will implement a continuous delivery pipeline using serverless technology. Our pipeline will be orchestrated by CodePipeline with actions implemented in Lambda functions. The definition of the CodePipeline resource as well as the Lambda functions that support it are all defined in the same CloudFormation stack that we looked at last week.

Visualize The Goal

To help visualize what we are building, here is a picture of what the final pipeline looks like.

codepipeline

If you’re new to CodePipeline, let’s go over a few important terms:

  • Job – An execution of the pipeline.
  • Stage – A group of actions in the pipeline. Stages never run in parallel to each other and only one job can actively be running in a stage at a time. If a stage is currently running and a new job arrives at the stage it will block until the prior job completes. If multiple new jobs arrive, only the newest will be run while the rest will be dropped.
  • Action – A step to be performed. Actions can be in parallel or series to each other. In this pipeline, all our actions will be implemented by Lambda invocations.
  • Artifact – Each action can declare input and output artifacts that will be stored in an S3 bucket. These are objects that it will either expect to have before it runs, or objects that it will produce and make available after it runs.

The pipeline we have built for our application consists of the following four stages:

  • Source – The source stage has only one action to acquire the source code for later stages.
  • Commit – The commit stage has two actions that are responsible for:
    • Resolving project dependencies
    • Processing (e.g., compile, minify, uglify) the source code
    • Static analysis of the code
    • Unit testing of the application
    • Packaging the application for subsequent deployment
  • Acceptance – The acceptance stage has actions that will:
    • Update the Lambda function from latest source
    • Update S3 bucket with latest artifacts
    • Update API Gateway
    • End-to-end testing of the application
  • Production – The production stage performs the same steps as the Acceptance stage but against the production Lambda, S3 and API Gateway resources

Here is a more detailed picture of the pipeline. We will spend the rest of this post breaking down each step of the pipeline.

pipeline-overview

Start with Source Stage

Diagram Step: 1

The source stage only has one action in it, a 3rd party action provided by GitHub. The action will register a hook with the repo that you provide to kickoff a new job for the pipeline whenever code is pushed to the GitHub repository. Additionally, the action will pull the latest code from the branch you specified and zip it up into an object in an S3 bucket for later actions to reference.

{
  "Name": "Source",
  "Actions": [
    {
      "InputArtifacts": [],
      "Name": "Source",
      "ActionTypeId": {
        "Category": "Source",
        "Owner": "ThirdParty",
        "Version": "1",
        "Provider": "GitHub"
      },
      "Configuration": {
        "Owner": "stelligent",
        "Repo": "dromedary",
        "Branch": "serverless",
        "OAuthToken": "XXXXXX",
      },
      "OutputArtifacts": [
        {
          "Name": "SourceOutput"
        }
      ],
      "RunOrder": 1
    }
  ]
}

 

This approach helps solve a common challenge with source code management using Lambda. Obviously no one wants to upload code through the console, so many end up using CloudFormation to manage their Lambda functions. The challenge is that the CloudFormation Lambda resource expects your code to be zipped in an S3 bucket. This means you either need to use S3 as the “source of truth” for your source code, or have a process to keep it in sync fro the real “source of truth”. By building a pipeline, you can keep your source in GitHub and use the next actions that we are about to go through to deploy the Lambda function.

Build from Commit Stage

Diagram Steps: 2,3,4

The commit stage of the pipeline consists of two actions that are implemented with Lambda invocations. The first action is responsible for resolving the application dependencies via NPM. This can be an expensive operation taking many minutes, and is needed by many downstream actions, so the dependencies are zipped up and become an output artifact of this first action. Here are the details of the action:

  • Download & Unzip – Get the source artifact from S3 and unzip into a temp directory
  • Run NPM – Run npm install  the extracted source folder
  • Zip & Upload – Zip up the source folder with its dependencies in node_modules and upload the artifact to S3

Download the input artifact is accomplished with the following code:

var artifact = null;
jobDetails.data.inputArtifacts.forEach(function (a) {
  if (a.name == artifactName && a.location.type == 'S3') {
    artifact = a;
  }
});

if (artifact != null) {
  var params = {
    Bucket: artifact.location.s3Location.bucketName,
    Key: artifact.location.s3Location.objectKey
  };
  return getS3Object(params, destDirectory);
} else {
  return Promise.reject("Unknown Source Type:" + JSON.stringify(sourceOutput));
}

Likewise, the output artifact is uploaded with the following:

var artifact = null;
jobDetails.data.outputArtifacts.forEach(function (a) {
  if (a.name == artifactName && a.location.type == 'S3') {
    artifact = a;
  }
});

if (artifact != null) {
  var params = {
    Bucket: artifact.location.s3Location.bucketName,  
    Key: artifact.location.s3Location.objectKey
  };
  return putS3Object(params, zipfile);
} else {
  return Promise.reject("Unknown Source Type:" + JSON.stringify(sourceOutput));
}

 

Diagram Steps: 5,6,7

The second action in the commit stage is responsible for acquiring the source and dependencies, processing the source code, performing static analysis, running unit tests and packaging the output artifacts. This is accomplished by an Lambda action that invokes a Gulp task on the project. This allows the details of these steps to be defined in Gulp alongside the source code and able to change at a different pace than the pipeline. Here is the CloudFormation for this action:

{
  "InputArtifacts":[
    {
      "Name": "SourceInstalledOutput"
    }
  ],
  "Name":"TestAndPackage",
  "ActionTypeId":{
    "Category":"Invoke",
    "Owner":"AWS",
    "Version":"1",
    "Provider":"Lambda"
  },
  "Configuration":{
    "FunctionName":{
      "Ref":"CodePipelineGulpLambda"
    },
    "UserParameters": "task=package&DistSiteOutput=dist/site.zip&DistLambdaOutput=dist/lambda.zip”

  },
  "OutputArtifacts": [
    {
      "Name": "DistSiteOutput"
    },
    {
      "Name": "DistLambdaOutput"
    }
  ],
  "RunOrder":2
}

Notice the UserParameters  setting defined in the resource above. CodePipeline treats it as an opaque string that is passed into the Lambda function. I chose to use a query string format to pass multiple values into the Lambda function. The task  parameter defines what gulp task to run and the DistSiteOutput  and DistLambdaOutput  parameters tell the Lambda function where to expect to find artifacts to then upload to S3.

For more details on how to implement CodePipeline actions in Lambda, check out the entire source of these functions at index.js or read the post Mocking CodePipeline with Lambda.

Test in Acceptance Stage

Diagram Steps: 8,9,10,11

The Acceptance stage is responsible for acquiring the packaged application artifacts and deploying the application to a test environment and then running a Gulp task to execute the end-to-end tests against that environment. Let’s look at the details of each of these four actions in this stage:

  • Deploy App – The Lambda for the application is updated with the code from the Commit stage as a new version. Additionally, the test  alias is moved to this new version. As you may recall from part 2 this alias is used by the test stage of the API Gateway to determine which version of the Lambda function to invoke.

LambdaVersionAliases

  • Deploy API – At this point, this is a no-op. My goal is to have this action use a Swagger file in the source code to update the API Gateway resources, methods, and integrations. This will allow these API changes to affect the API Gateway on each build, where with this current solution would require an update of the CloudFormation stack outside the pipeline to change the API Gateway.
  • Deploy Site – This action publishes all static content (HTML, CSS, JavaScript and images) to a test S3 bucket. Additionally, it published a config.json file to the bucket that the application uses to determine the endpoint for the APIs. Here’s a sample of the file that is created:
{
  "apiBaseurl":"https://rue1bmchye.execute-api.us-west-2.amazonaws.com/test/",
  "version":"20160324-231829"
}
  • End-to-end Testing – This action invokes a Gulp action to run the functional tests. Additionally, it sets and an environment variable with the endpoint URL for the application for the Gulp process to test against.

Sidebar: Continuation Token

One challenge of using Lambda for actions is the current 300 second function execution timeout limit. If you have an action that will take longer than 300 seconds (e.g., launching a CloudFormation stack) you can utilize the continuation token. A continuation token is an opaque value that you can return to CodePipeline to indicate that you are not complete with your action yet. CodePipeline will then reinvoke your action, passing in the continuation token you provided in the prior invocation.

The following code uses the UserParameters  as a maximum number of attempts and uses continuationToken  as a number of attempts. If the action needs more time, it compares the maxAttempts  with the priorAttempts  and if there are still more attempts available, it calls into CodePipeline to signal success and passes a continuation token to indicate that the action needs to be reinvoked.

var jobData = event["CodePipeline.job"].data;
var maxAttempts = parseInt(jobData.actionConfiguration.configuration.UserParameters) || 0
var priorAttempts = parseInt(jobData.continuationToken) || 0;

if(priorAttempts < maxAttempts) {
    console.log("Retrying later.");

    var params = {
        jobId: event["CodePipeline.job"].id,
        continuationToken: (priorAttempts+1).toString()
    };
    codepipeline.putJobSuccessResult(params);

}

Deploy from Production Stage

The Production stage uses the same action definitions from the Acceptance stage to deploy and test the application. The only difference is that it passes in the production S3 bucket name and Lambda ARN to deploy to.

I spent time considering how to do a Blue/Green deployment with this environment. Blue/Green deployment is an approach to reduce deployment risk by launching a duplicate environment for code changes (green environment) and then cutting over traffic from the existing (blue environment) to the new environment. This also affords a safe and quick rollback by switching traffic back to the old (blue) environment.

I looked into doing a DNS based Blue/Green using Route53 Resource Records. This would be accomplished by creating a new API Gateway and Lambda function for each job and using weighted routing to move traffic over from the old API Gateway to the new API Gateway.

I’m not convinced this level of complexity would provide much value however, because given the way Lambda manages version and API Gateway manages deployments, you can easily roll changes back very quickly by moving the Lambda version alias. One limitation though is you cannot do a canary deployment with a single API Gateway and Lambda version aliases. I’m curious what your thoughts are on this, ping me on Twitter @Stelligent with #ServerlessDelivery.

Sidebar: Gulp + CloudFormation

You’ll also notice that there is a gulpfile.js in the dromedary-serverless repo to make it easier to launch and manage the CloudFormation stack. Specifically, you can run gulp pipeline:up  to launch the stack, gulp pipeline:wait  to wait for the pipeline creation to complete and gulp pipeline:status  to see the status of the stack and its outputs. This code has been factored out into its own repo named serverless-pipeline if you’d like to add this type of integration between Gulp and CloudFormation in your own project.

Try it out!

Want to experiment with this stack in your own account? The CloudFormation templates are available for you to run with the link below. First, you’ll want to fork the dromedary repository into your GitHub account. Then you’ll need to provide the following parameters to the stack:

  • Hosted Zone – you’ll need to setup a Route53 hosted zone in your account before launching the stack for the Route53 resource records to be created in.
  • Test DNS Name – a fully qualified hostname (within the Hosted Zone you created) for the test resources (e.g.test.example.com ).
  • Production DNS Name – a fully qualified hostname (within the Hosted Zone you created) for the production resources (e.g.prod.example.com ).
  • OAuth2 Token – your OAuth2 token (see here for details)
  • User Name – your GitHub username

stack-parameters

Conclusion

In this series, we have addressed how achieve the fundamentals of continuous delivery in a serverless environment. Let’s review those fundamentals and how we addressed them:

  • Continuous – We used CodePipeline to run a series of actions against every commit to our GitHub repository.
  • Quality – We built static analysis, unit tests and end-to-end tests into our pipeline and ran them for every commit.
  • Automated – The provisioning of the pipeline and the application was done from a single CloudFormation stack
  • Reproducible – Other than creation of a Route53 Hosted Zone, there were no prerequisites to running this CloudFormation stack
  • Serverless – All the tools chosen where AWS managed services, including Lambda, API Gateway, S3, Route53 and CodePipeline. No servers were harmed in the making of this series.

Please follow us on Twitter to be informed of future articles on Serverless Delivery and other exciting topics.  Also, keep your eye out for a new book set to be released later this year by Stelligent CTO, Paul Duvall, on Continuous Delivery in AWS – which will contain a chapter on serverless delivery.

 

Resources

Serverless Delivery: Bootstrapping the Pipeline (Part 2)

In the first of this three part series on Serverless Delivery, we took a look at the high level architecture of running a continuous delivery pipeline with CodePipeline + Lambda. Our objective is to run the Dromedary application in a serverless environment with a serverless continuous delivery pipeline.

Before we can build the pipeline, we need to have the platform in place to deploy our application to. In this second part of the series we will look at what changes need to be made to a Node.js Express application to run in Lambda and the CloudFormation templates needed to create the serverless resources to host our application.

Prepare Your Application

Lambdas are defined as a function that takes in an event object containing data elements mapped to it in the API Gateway Integration Request. An application using Express in Node.js however expects its request to be initiated from an HTTP request on a socket. In order to run your Express application as a Lambda function, you’ll need some code to mediate between the two frameworks.

Although the best approach would be to have your application natively support the Lambda event, this may not always be feasible. Therefore, I have created a small piece of code to serve as a mediator and put it outside of the Dromedary application in its own module named lambda-express for others to leverage.

lambda-express

Install the module with npm install –save lambda-express  and then use it in your Express application to define the Lambda handler:

var lambdaExpress = require('lambda-express');
var express = require('express');
var app = express();

// ... initialize the app as usual ...

// create a handler function that maps lambda inputs to express
exports.handler = lambdaExpress.appHandler(app);

In the dromedary application, this is available in a separate index.js file. You’ll also notice in the dromedary application, that it passes a callback function rather than the express app to the appHandler function. This allows it to use information on the event to configure the application, in this case via environment variables:

exports.handler = lambdaExpress.appHandler(function(event,context) {
process.env.DROMEDARY_DDB_TABLE_NAME = event.ddbTableName;

var app = require('./app.js');
return app;
});

You now have an Express application that will be able to respond to Lambda events that are generated from API Gateway. Now let’s look at what resources need to be defined in your CloudFormation templates. For the dromedary application, these templates are defined in a separate repository named dromedary-serverless in the pipeline/cfn directory.

Define Website Resources

Sample – site.json

Buckets need to be defined for test and production stages of the pipeline to host the static content of the application. This includes the HTML, CSS, images and any JavaScript that will run in the browser. Each bucket will need a resource like what you see below.

"TestBucket" : {
  "Type" : "AWS::S3::Bucket",
  "Properties" : {
    "AccessControl" : "PublicRead",
    "BucketName" : “www-test.mysite.com”,
    "WebsiteConfiguration" : {
      "IndexDocument" : "index.html"
    }
  }
}

Here are the important pieces to pay attention to:

  • AccessControl – set to to PublicRead  assuming this is a public website.
  • WebsiteConfiguration – add an IndexDocument  entry to define the home page for the bucket.
  • BucketName – needs to match exactly the Name  for the Route53 ResourceRecord you create. For example, if I’m going to setup a DNS record for www-test.mysite.com , then the bucket name should also be www-test.mysite.com .

We will also want Route53 resource records for each of the test and production buckets. Here is a sample record:

"TestSiteRecord": {
  "Type": "AWS::Route53::RecordSetGroup",
  "Properties": {
    "HostedZoneId": “Z00ABC123DEF”,
    "RecordSets": [{
      "Name": “www-test.mysite.com.”,
      "Type": "A",
      "AliasTarget": {
        "HostedZoneId": “Z3BJ6K6RIION7M”,
        "DNSName": “s3-website-us-west-2.amazonaws.com"
      }
    }]
  }
}

Make sure that your record does the following:

  • Name – must be the same as the bucket name above with a period at the end.
  • HostedZoneId – should be specific to your account and for the zone you are hosting (mysite.com in this example).
  • AliasTarget – will be reference the zone id and endpoint for S3 in the region you created the bucket. The zone ids and endpoints can be found in the AWS General Reference Guide.

Declare Lambda Functions

Sample – app.json

Lambda functions will need to be declared for test and production stages of the pipeline to serve the Express application. Each function will only be stubbed out in the CloudFormation template so the API Gateway resources can reference it. Each execution of a CodePipeline job will deploy the latest version of the code as a new version of the function. Here is a sample declaration of the Lambda function in CloudFormation:

"TestAppLambda": {
  "Type" : "AWS::Lambda::Function",
  "Properties" : {
    "Code" : {
      "ZipFile": { "Fn::Join": ["n", [
        "exports.handler = function(event, context) {",
        " context.fail(new Error(500));",
        "};"
      ]]}
    },
    "Description" : "serverless application",
    "Handler" : "index.handler",
    "MemorySize" : 384,
    "Timeout" : 10,
    "Role" : {“Ref”: “TestAppLambdaTrustRole”},
    "Runtime" : "nodejs"
  }
}

Notice the following about the Lambda resource:

  • ZipFile – the default implementation of the function is provided inline. Notice it just returns a 500 error. This will be replaced by real code when CodePipeline runs.
  • MemorySize – this is the only control you have over the system resources allocated to your function. CPU performance is determined by the amount of memory you allocate, so if you need more CPU, increase this number. Your cost is directly related to this number as is the duration of each invocation. There is a sweet spot you need to find where you get the shortest durations for the system resources.
  • Timeout – max time (in seconds) for a given invocation of the function to run before it is forcibly terminated. The maximum value for this is 300 seconds.
  • Role – reference the ARN of the IAM role that you want to assign to your function when it runs. You’ll want to have “Principal”:{“Service”:[“lambda.amazonaws.com”]} in the AssumePolicyDocument to grant the Lambda service access to the sts:AssumeRole action. You’ll also want to include in the policy access to CloudWatch Logs with “Action”: [“logs:CreateLogGroup”,”logs:CreateLogStream”,”logs:PutLogEvents”]

Define API Gateway and Stages

Sample – api-gateway.json

Our Lambda function requires something to receive HTTP requests and deliver them as Lambda events. Fortunately, the AWS API Gateway is a perfect solution for this need. A single API Gateway definition with two stages, one for test and one for production will be defined to provide the public access to your Lambda function defined above. Unfortunately, CloudFormation does not have support yet for API Gateway. However, Andrew Templeton has created a set of custom resources that do a great job of filling the gap. For each package, you will need to create a Lambda function in your CloudFormation template, for example:

"ApiGatewayRestApi": {
  "Type" : "AWS::Lambda::Function",
  "Properties" : {
    "Code" : {
      "S3Bucket": “dromedary-serverless-templates”,
      "S3Key": "cfn-api-gateway-restapi.zip"
    },
    "Description" : "Custom CFN Lambda",
    "Handler" : "index.handler",
    "MemorySize" : 128,
    "Timeout" : 30,
    "Role" : { "Ref": "ApiGatewayCfnLambdaRole" },
    "Runtime" : "nodejs"
  }
}

 

Make sure the role that you create and reference from the Lambda above contains policy statements to give it access to apigateway:*  actions, as well as granting the custom resource access to PassRole  the role defined for API Integration.

{
  "Effect": "Allow",
  "Resource": [
    { "Fn::GetAtt": [ "ApiIntegrationCredentialsRole", "Arn" ] }
  ],
  "Action": [
    "iam:PassRole"
  ]
}

 

Defining the API Gateway above consists of the following six resources. For each one, I will highlight only the important properties to be aware of as well as a picture from the console of what the CloudFormation resource creates:

  1. cfn-api-gateway-restapi – this is the top level API definition.
    apigw-api
    • Name – although not required, you ought to provide a unique name for the API
  2. cfn-api-gateway-resource – a resource (or path) for the API. In the reference application, I’ve created just one root resource that is a wildcard and captures all sub paths.  The sub path is then passed into lambda-express as a parameter and mapped into a path that Express can handle.
     apigw-resource
    • PathPart – define a specific path of your API, or {subpath}  to capture all paths as a variable named subpath
    • ParentId – This must reference the RootResourceId from the restapi resource
      "ParentId": { "Fn::GetAtt": [ "RestApi", "RootResourceId" ] }
  3. cfn-api-gateway-method – defines the contract for a request to an HTTP method (e.g., GET or POST) on the path created above.
    apigw-method
    • HttpMethod – the method to support (GET)
    • RequestParameters – a map of parameters on the request to expect and pass down to the integration
      "RequestParameters": {
        "method.request.path.subpath": true
      }
  4. cfn-api-gateway-method-response – defines the contract for the response from an HTTP method defined above.
     apigw-method-response
    • HttpMethod – the method to support
    • StatusCode – the code to return for this response (e.g., 200)
    • ResponseParameters – a map of the headers to include in the response
      "ResponseParameters": {
        "method.response.header.Access-Control-Allow-Origin": true,
        "method.response.header.Access-Control-Allow-Methods": true,
        "method.response.header.Content-Type": true
      }
  5. cfn-api-gateway-integration – defines where to send the request for a given method defined above.
    apigw-integration
    • Type – for Lambda function integration, choose AWS
    • IntegrationHttpMethod – for Lambda function integration, choose POST
    • Uri – the AWS service URI to integrate with. For Lambda use the example below. Notice that the function name and version are not in the URI, but rather there are variables in their place. This way we can allow the different stages (test and production) to control which Lambda function and which version of that function to call:
      arn:aws:apigateway:us-west-2:lambda:path/2015-03-31/functions/arn:aws:lambda:us-west-2::function:${stageVariables.AppFunctionName}:${stageVariables.AppVersion}/invocations
    • Credentials – The role to run as when invoking the Lambda function
    • RequestParameters – The mapping of parameters from the request to integration request parameters
      "RequestParameters": {
        "integration.request.path.subpath": "method.request.path.subpath"
      }
    • RequestTemplates – The template for the JSON to pass to the Lambda function. This template captures all the context information from API Gateway that lambda-express will need to create the request that Express understands:
      "RequestTemplates": {
        "application/json": {
          "Fn::Join": ["n",[
            "{",
            " "stage": "$context.stage",",
            " "request-id": "$context.requestId",",
            " "api-id": "$context.apiId",",
            " "resource-path": "$context.resourcePath",",
            " "resource-id": "$context.resourceId",",
            " "http-method": "$context.httpMethod",",
            " "source-ip": "$context.identity.sourceIp",",
            " "user-agent": "$context.identity.userAgent",",
            " "account-id": "$context.identity.accountId",",
            " "api-key": "$context.identity.apiKey",",
            " "caller": "$context.identity.caller",",
            " "user": "$context.identity.user",",
            " "user-arn": "$context.identity.userArn",",
            " "queryString": "$input.params().querystring",",
            " "headers": "$input.params().header",",
            " "pathParams": "$input.params().path",",
            "}"
          ] ]
        }
      }
  6. cfn-api-gateway-integration-response – defines how to send the response back to the API client
     apigw-integration-response
    • ResponseParameters – a mapping from the integration response to the method response declared above. Notice the CORS headers that are necessary since the hostname for the API Gateway is different from the hostname provided in the Route53 resource record for the S3 bucket. Without these, the browser will deny AJAX request from the site to these APIs. You can read more about CORS in the API Gateway Developer Guide. Also notice that the response Content-Type is pulled from the contentType attribute in the JSON object returned from the Lambda function:
      "ResponseParameters": {
        "method.response.header.Access-Control-Allow-Origin": "'*'",
        "method.response.header.Access-Control-Allow-Methods": "'GET, OPTIONS'",
        "method.response.header.Content-Type": "integration.response.body.contentType"
      }
    • ResponseTemplates – a template of how to create the response from the Lambda invocation. In the reference application, the Lambda function returns a JSON object with a payload attribute containing a Base64 encoding of the response payload:
      "ResponseTemplates": {
        "application/json": "$util.base64Decode( $input.path('$.payload') )"
      }

       

Additionally, there are two deployment resources created, one for test and one for production. Here is an example of one:

  • cfn-api-gateway-deployment – a deployment has a name that will be the prefix for all resources defined.
     apigw-deployment
    • StageName – “test” or “prod”
    • Variables – a list of variables for the stage. This is where the function name and version are defined for the integration URI defined above:
      "Variables": {
        "AppFunctionName": "MyFunctionName”,
        "AppVersion": "prod"
      }

       

Deploy via Swagger

I would prefer to replace most of the above API Gateway CloudFormation with a Swagger file in the application source repository and have the pipeline use the import tool to create API Gateway deployments from the Swagger. There are a few challenges with this approach. First, the creation of the Swagger requires including the AWS extensions which has a bit of a learning curve. This challenge is made easier by the fact that you can create the API Gateway via the console and then export Swagger. The other challenge is that the import tool is a Java based application that requires Maven to run. This may be difficult to get working in a Lambda invocation from CodePipeline especially given the 300 second timeout.  I will however be spending some time researching this option and will blog about the results.

Stay Tuned!

Now that we have all the resources in place to deploy our application to, we can build a serverless continuous delivery pipeline with CodePipeline and Lambda. Next week we conclude this series with the third and final part looking at the details of the CloudFormation template for the pipeline and each stage of the pipeline as well as the Lambda functions that support them. Be sure to check it out!

Resources

Serverless Delivery: Architecture (Part 1)

If your application tech stack doesn’t need servers, why should your continuous delivery pipeline? Serverless applications deserve serverless delivery!

The software development discipline of continuous delivery has had a tremendous impact on decreasing the cost and risk of delivering changes while simultaneously increasing code quality by ensuring that software systems are always in a releasable state. However, when applying the tools and techniques that exist for this practice to serverless application frameworks and platforms, sometimes existing toolsets do not align well with these new approaches. This post is the first in a three-part series that looks at how to implement the same fundamental tenets of continuous delivery while utilizing tools and techniques that complement the serverless architecture in Amazon Web Services (AWS).

Here are the requirements of the serverless delivery pipeline:

  • Continuous – The pipeline must be capable of taking any commit on master that passes all test cases to production
  • Quality – The pipeline must include unit testing, static code analysis and functional testing of the application
  • Automated – The provisioning of the pipeline and the application must be done from a single CloudFormation command
  • Reproducible – The CloudFormation template should be able to run on a new AWS account with no additional setup other than creation of a Route53 Hosted Zone
  • Serverless – All layers of the application must run on platforms that meet the definition of serverless as described in the next section

What is Serverless?

What exactly is a serverless platform? Obviously, there is still hardware at some layer in the stack to host the application. However, what Amazon has provided with Lambda is a platform where developers no longer need to think about the following:

  • Operating System  – no need to select, secure, configure, administer or patch the OS
  • Servers– no cost risk of over-provisioning and no performance risk of under-provisioning
  • Capacity – no need to monitor utilization and scale capacity based on load
  • High Availability– compute resources are available across multiple AZs

In summary, a serverless platform is one in which an application can be deployed on top of without having to provision or administer any of the resources within the platform. Just show up with some code and the platform handles all the ‘ilities’.

app-overview

The diagram above highlights the components that are used in this serverless application. Let’s look at each one individually:

  • Node.js + Express – In the demo application for this blog series, we will be deploying a Node.js JavaScript application that leverages the Express framework into Lambda. There is a small bit of “glue” code that we will highlight later in the series to adapt the Lambda contract to the Express contract.
  • Lambda – This is where your application logic runs. You deploy your code here but do not need to specify the number of servers or size of the servers to run the code on. You only pay for the number of requests and amount of time those requests take to execute.
  • API Gateway – The API Gateway exposes your Lambda function at an HTTP endpoint. It provides capabilities such as authorization, policy enforcement, rate limiting and data transformation as a service that is entirely managed by Amazon.
  • DynamoDB – Dynamic data is stored in a DynamoDB table. DynamoDB is a NoSQL datastore that has both infinitely scalable storage capacity and throughput capacity which is entirely managed by Amazon.
  • S3 – All static content including HTML, CSS, images and client-side JavaScript is stored in an S3 bucket and served as a static website directly from S3.

The diagram below compares the pricing for running a Node.js application with Lambda and API Gateway versus a pair of EC2 instances and an ELB. Notice that for the m4.large, the break even is around two million requests per day. It is important to mention that 98.8% of the cost of the serverless deployment is the API Gateway. The cost of running the application out of Lambda is insignificant relative to the costs in using API Gateway.

serverless-app-cost

This cost analysis shows that applications/environments with low transaction volume can realize cost savings by running on Lambda + API Gateway, but the cost of API Gateway will become cost prohibitive at higher scale.

What is Serverless Delivery?

Serverless delivery is just the application of serverless platforms to achieve continuous delivery fundamentals. Specifically, a serverless delivery pipeline does not include tools such as Jenkins or resources such as EC2, Autoscaling Groups, and ELBs.

pipeline

The diagram above shows the technology used to accomplish serverless delivery for the sample application. Let’s look at what each component provides:

  • AWS CodePipeline – Orchestrates various Lambda tasks to move code that was checked into GitHub forward towards production.
  • AWS S3 Artifact Bucket – Each action in the pipeline can create a new artifact. The artifact becomes an output from the action in the pipeline and is stored in an S3 bucket to become an input for the next action.
  • AWS Lambda – You create Lambda functions to do the work of individual actions in the pipeline. For example, running a gulp task on a repository is handled by a Lambda function.
  • NPM and Gulp – NPM is used for resolving all the dependencies of a given repository. Gulp is used for defining the tasks of the repository, such as running unit tests and packaging artifacts.
  • Testing toolsJSHint is used for performing static analysis of the code, Mocha is a unit test framework for JavaScript and Chai is a BDD assertion library for describing the unit tests.
  • AWS CloudFormation – CFN templates are used to create and update all these resources in a serverless stack.

Serverless delivery for traditional architectures?

Although this serverless delivery architecture could be applied to more traditional application architectures (e.g., a Java application on EC2 and ELB resources) the challenge might be having pipeline actions complete within the 300 second Lambda maximum. For example, running Maven phases within a single Lambda function invocation, including the resolving of dependencies, compilation, unit testing and packaging would likely be difficult. There may be opportunities to split up the goals into multiple invocations and persist state to S3, but that is beyond the scope of this series.

Pricing

The pricing model for Lambda is favorable for applications that have an idle time and the cost grows linearly with the number of executions. The diagram below compares the pricing for running the pipeline with Lambda and CodePipeline against a Jenkins server running on an EC2 instance. For best performance, the Jenkins server ought to run on an m4.large instance, but just to highlight the savings, m3.medium and t2.micro instances were evaluated as well. Notice that for the m4.large, the break even happens after you are doing over 600 builds per day and even with a t2.micro, the break even doesn’t happen until well over 100 builds per day.

serverless-delivery-cost
In conclusion, running a continuous delivery pipeline with CodePipeline + Lambda is very attractive based on the cost efficiency of utility pricing, the simplicity of using a managed services environment, and the tech stack parity of using Node.js for both the application and the pipeline.

Stay Tuned!

Next week we will dive into part two of this series, looking at what changes need to be made for an Express application to run in Lambda and the CloudFormation templates needed to create a serverless delivery pipeline. Finally, we will conclude with part three going into the details of each stage of the pipeline and the Lambda functions that support them.

Resources

Create a Pipeline using the AWS CodePipeline Console

In this demonstration, you’ll perform a simple walkthrough of a two-stage pipeline using the CodePipeline console. This is based on AWS’ Simple Pipeline Walkthrough. A screencast is provided below.

Using the AWS console is often the best way to get familiar with the concepts around any new AWS service. The console is also a good way to view the status of resource configuration and attributes. In posts on Provisioning AWS CodePipeline with CloudFormation and Mocking AWS CodePipeline pipelines with Lambda, you became familiar with how to automate the provisioning of pipelines in CodePipeline using CloudFormation – along with many other AWS resources. In this post, you’ll take a step back and go over the steps of manually creating a simple pipeline using the AWS console.

Create an application and deployment in CodeDeploy

To create an application and deployment, go to the AWS Console and launch the CodeDeploy service. Choose Sample Deployment and click Next Step.

create-pipeline-step-4

On the next page, choose Amazon Linux as the Operating System, then choose your EC2 Key Pair from the drop down and you’ll want to keep the Name and CodeDeployDemo defaults for Tag Key and Value. Finally, click Launch Instances. You’ll need to wait several minutes while CodeDeploy launches a CloudFormation stack that provisions three EC2 instances and installs the CodeDeploy agent on each of these instances. Once the instances have been launched, click Next step.

create-pipeline-step-5.jpg

Enter an Application Name (or just use the default name) and click Next step. Click Next step from the Revision page. From the Deployment Group page, accept the defaults and click Next step. From the Service Role page, select an IAM role with access to CodeDeploy resources (this should have been previously created when you launched the EC2 instances for CodeDeploy). Click Next step. Review the Deployment Configuration and click Next step. From the Review page, click Deploy Now.

create-pipeline-step-11

Access the application deployed by CodeDeploy

By going through these steps, you will launch a deployment of a simple web application across three EC2 instances. Check the CodeDeploy console for deployment status. Once everything is successful, choose, click View All Instances and click on the one of the links in the in Instance ID column to bring up the EC2 console. From the details pane, copy the Public IP and paste it into your browser. You should see a page that says something similar to “Congratulations. This application was deployed using AWS CodeDeploy.”.

create-codedeploy-step-1

Upload an application distribution to S3

Next, you will download the CodePipeline/CodeDeploy application distribution by going to the link to the CodePipeline/CodeDeploy distribution provided in the Resources section in this post. Upload the zip file to an S3 bucket in your AWS account and make note of your bucket name and key as you will be using it when creating a pipeline in CodePipeline. Make sure you enable versioning on the S3 bucket by selecting Properties for the bucket, clicking on Versioning and then Enable Versioning.

Create a Pipeline

Once this is a complete, go to the CodePipeline console and click Create pipeline. Enter your pipeline name and click Next step. Choose Amazon S3 as the Source provider followed by the S3 Location of the location (e.g. s3://stelligent-tmp/aws-codepipeline-s3-aws-codedeploy_linux.zip where stelligent-tmp is the bucket name and aws-codepipeline-s3-aws-codedeploy_linux.zip is the key) where you uploaded the zip file located in GitHub. Click Next step. Choose No Build as the Build provider and click Next step. From the Beta page, choose AWS CodeDeploy as the Deployment provider and enter the Application name and Deployment group that you defined when create the sample CodeDeploy deployment. Click Next step.

create-pipeline-step-13
Choose AWS-CodePipeline-Service as your IAM Role name (or another role that has proper access to the CodePipeline services) on the AWS Service Role page and click Next step. Review your pipeline and click Create pipeline.

Access the application deployed by CodeDeploy and orchestrated by CodePipeline

From the CodePipeline console you can monitor the stages and actions for this simple two-stage pipeline. Once it successfully completes, refresh the page where you copied the URL when setting up CodePipeline. You should see a message similar to “Congratulations! You have successfully created a pipeline that retrieved this source application from an Amazon S3 bucket and deployed it to three Amazon EC2 instances using AWS CodeDeploy.” You’ve successfully orchestrated a CodeDeploy deployment that deploy a simple application from an S3 bucket.

create-pipeline-step-17

Delete your Pipeline

From the CodePipeline console, select the pipeline you created and click Edit. Then, click Delete and enter the name of the pipeline you’d like to delete. You can delete your CodeDeploy application and CloudFormation stacks as well.

Resources

Running AWS Lambda Functions in AWS CodePipeline using CloudFormation

Recently, AWS announced that they’ve added support for triggering AWS Lambda functions into AWS CodePipeline – AWS’ Continuous Delivery service. They also provided some great step-by-step documentation to describe the process for configuring a new stage in CodePipeline to run a Lambda function. In this article, I’ll describe how I codified the provisioning of all of the AWS resources in the documentation using CloudFormation.

aws_code_pipeline_lambda

This announcement is really big news as it opens up a whole realm of possibilities about what can be run from CodePipeline. Now, you can run event-driven functions any time you want from your pipelines. With this addition, CodePipeline added a new Invoke action category that adds to the list of other actions such as Build, Deploy, Test and Source.

NOTE: All of the CloudFormation examples in this article are defined in the codepipeline-lambda.json file.

tl;dr

If you’d rather not read the detailed explanation of the resources and code snippets of this solution, just click on the CloudFormation Launch Stack button below to automatically provision the AWS resources described herein. You will be charged for your AWS usage. 
Launch Stack

CloudFormation

I went through the 20+ pages of instructions which were easy to follow but, as I often do when going through this kind of documentation, I thought about how I’d make it easier for me and others to run it again without copying/pasting, clicking multiple buttons and so on. In other words, I’m lazy and don’t enjoy repeatedly going over the same thing again and again and I figured this would be something I’d (and others) like to use often in the future. Of course, this leads me to writing a template in CloudFormation since I can define everything in code and type a single command or click a button to reliably and repeatedly provision all the necessary resources to run Invoke actions within a Lambda stage in CodePipeline.

There are six core services that compose this infrastructure architecture. They are CloudFormation, CodePipeline, Lambda, IAM, EC2 and CodeDeploy.

To launch the infrastructure stacks that make up this solution, type the following from the command line. The command will only work if you’ve installed the AWS CLI.

Command for launching CodePipeline Lambda stacks

aws cloudformation create-stack 
--stack-name CodePipelineLambdaStack  
--template-body https://raw.githubusercontent.com/stelligent/cloudformation_templates/master/labs/codepipeline/codepipeline-cross-account-pipeline.json
--region us-east-1 
--disable-rollback --capabilities="CAPABILITY_IAM" 
--parameters ParameterKey=KeyName,ParameterValue=YOUREC2KEYPAIRNAME

EC2

From my CloudFormation template, I launched a single EC2 instance that installed a CodeDeploy agent onto it. I used the sample provided by AWS at http://s3.amazonaws.com/aws-codedeploy-us-east-1/templates/latest/CodeDeploy_SampleCF_Template.json and added one small modification to return the PublicIp of the EC2 instance after it’s launched as a CloudFormation Output. Because of this modification, I created a new template based on AWS’ sample.
CloudFormation JSON to define EC2 instance used by CodeDeploy

    "CodeDeployEC2InstancesStack":{
      "Type":"AWS::CloudFormation::Stack",
      "Properties":{
        "TemplateURL":"https://s3.amazonaws.com/stelligent-public/cloudformation-templates/github/labs/codepipeline/codedeploy-ec2.json",
        "TimeoutInMinutes":"60",
        "Parameters":{
          "TagValue":{
            "Ref":"AWS::StackName"
          },
          "KeyPairName":{
            "Ref":"KeyName"
          }
        }
      }
    },

When the stack is complete, you’ll see that one EC2 instance has been launched and automatically tagged with the name you entered when naming your CloudFormation stack. This name is used to run CodeDeploy operations on instance(s) with this tag.
codepipeline_lambda_ec2

CodeDeploy

AWS CodeDeploy automates code deployments to any instance. Previously, I had automated the steps of the Simple Pipeline Walkthrough which included the provisioning of AWS CodeDeploy resources as well so I used this CloudFormation template as a starting point. I uploaded the sample Linux app provided by CodePipeline in the walkthrough to Amazon S3 and used S3 as the Source action in the Source stage in my pipeline in CodePipeline. Below, you see a snippet of defining the CodeDeploy stack from the codepipeline-lambda.json. The nested stack defined in the TemplateURL property defines the CodeDeploy application and the deployment group.

CloudFormation JSON to define EC2 instance used by CodeDeploy

    "CodeDeploySimpleStack":{
      "Type":"AWS::CloudFormation::Stack",
      "DependsOn":"CodeDeployEC2InstancesStack",
      "Properties":{
        "TemplateURL":"https://s3.amazonaws.com/stelligent-public/cloudformation-templates/github/labs/codepipeline/codedeploy-deployment.json",
        "TimeoutInMinutes":"60",
        "Parameters":{
          "TagValue":{
            "Ref":"AWS::StackName"
          },
          "RoleArn":{
            "Fn::GetAtt":[
              "CodeDeployEC2InstancesStack",
              "Outputs.CodeDeployTrustRoleARN"
            ]
          },
          "Bucket":{
            "Ref":"S3Bucket"
          },
          "Key":{
            "Ref":"S3Key"
          }
        }
      }
    },

The screenshot below is that of a CodeDeploy deployment that was generated from the CloudFormation stack launch.
codepipeline_lambda_codedeploy

The CodeDeploy provisioning of this is described in more detail in my article on this topic: Automating AWS CodeDeploy Provisioning in CloudFormation.

CodePipeline

I took a CodePipeline example that I’d written in CloudFormation that defines a simple three-stage pipeline (based on the Simple Pipeline Walkthrough)  and added a new stage in the CloudFormation Resource block to invoke the Lambda function. If I were manually adding this stage, I’d go to my specific pipeline in AWS CodePipeline, click add Stage and then add an action to the stage. Below, you see of a screenshot of what you’d do if you were manually defining this configuration within an AWS CodePipeline action. This is also what got generated from the CloudFormation stack.

codepipeline_lambda_stage

AWS::CodePipeline::Pipeline

At the beginning of the snippet below, you see the use of the AWS::CodePipeline::Pipeline CloudFormation resource type. It has dependencies on the CodeDeploySimpleStack and CodePipelineLambdaTest resources. One of the reasons for this is that there needs to be an EC2 instance type defined already so that I can get access to the PublicIp that I use to run a Lambda function later when verifying the application is up and running. The other is that we need to set the FunctionName property of the Configuration of the Lambda stage in CodePipeline. This function name is generated by the AWS::Lambda::Function resource type that I’ll describe later. By using this approach, you don’t need to know the name of the Lambda function when defining the CloudFormation template.
CloudFormation JSON to define IAM Role for Lambda function execution

    "GenericPipeline":{
      "Type":"AWS::CodePipeline::Pipeline",
      "DependsOn":[
        "CodeDeploySimpleStack",
        "CodePipelineLambdaTest"
      ],
      "Properties":{
        "DisableInboundStageTransitions":[
          {
            "Reason":"Demonstration",
            "StageName":"Production"
          }
        ],
        "RoleArn":{
          "Fn::Join":[
            "",
            [
              "arn:aws:iam::",
              {
                "Ref":"AWS::AccountId"
              },
              ":role/AWS-CodePipeline-Service"
            ]
          ]
        },
        "Stages":[
...
          {
            "Name":"LambdaStage",
            "Actions":[
              {
                "InputArtifacts":[

                ],
                "Name":"MyLambdaAction",
                "ActionTypeId":{
                  "Category":"Invoke",
                  "Owner":"AWS",
                  "Version":"1",
                  "Provider":"Lambda"
                },
                "OutputArtifacts":[

                ],
                "Configuration":{
                  "FunctionName":{
                    "Ref":"CodePipelineLambdaTest"
                  },
                  "UserParameters":{
                    "Fn::Join":[
                      "",
                      [
                        "http://",
                        {
                          "Fn::GetAtt":[
                            "CodeDeployEC2InstancesStack",
                            "Outputs.PublicIp"
                          ]
                        }
                      ]
                    ]
                  }
                },
                "RunOrder":1
              }
            ]
          },...

Lambda

AWS Lambda lets you run event-based functions without provisioning or managing servers. That said, there’s still a decent amount of configuration you’ll need to define in running your Lambda functions. In the example provided by AWS, the Lambda function tests whether it can access a website without receiving an error. If it succeeds, the CodePipeline action and stage succeed, turn to green, and it automatically transitions to the next stage or completes the pipeline. If it fails, that pipeline instance fails, turns red, and ceases any further actions from occurring. It’s a very typical test you’d run to be sure your application was successfully deployed. In the example, AWS has you manually enter the URL for the application. Since this requires manual intervention, I needed to figure out a way to get this URL dynamically. I did this by setting the PublicIp of the EC2 instance that was launched earlier in the stack as an Output of the nested stack. Then I used this PublicIp as an input to the UserParameters property of the Invoke action within the Lambda stage that I defined in CloudFormation for my CodePipeline pipeline.

Once the function has been generated by the stack, you’ll be able to go to a list of Lambda functions in your AWS Console and see the function that was created from the stack.

codepipeline_lambda_function

AWS::IAM::Role

In the CloudFormation code snippet you see below, I’m defining an IAM role that’s capable of calling Lambda functions.
CloudFormation JSON to define IAM Role for Lambda function execution

    "CodePipelineLambdaRole":{
      "Type":"AWS::IAM::Role",
      "Properties":{
        "AssumeRolePolicyDocument":{
          "Version":"2012-10-17",
          "Statement":[
            {
              "Effect":"Allow",
              "Principal":{
                "Service":[
                  "lambda.amazonaws.com"
                ]
              },
              "Action":[
                "sts:AssumeRole"
              ]
            }
          ]
        },
        "Path":"/"
      }
    },

AWS::IAM::Policy

The code snippet below depends on the creation of the IAM role I showed in the example above. The IAM policy that’s attached to the IAM role provides access to the AWS logs and the CodePipeline results so that it can signal success or failure to the CodePipeline action that I defined earlier.
CloudFormation JSON to define IAM Policy for IAM Role for Lambda function execution

    "LambdaCodePipelineExecutionPolicy":{
      "DependsOn":[
        "CodePipelineLambdaRole"
      ],
      "Type":"AWS::IAM::Policy",
      "Properties":{
        "PolicyName":"LambdaRolePolicy",
        "Roles":[
          {
            "Ref":"CodePipelineLambdaRole"
          }
        ],
        "PolicyDocument":{
          "Version":"2012-10-17",
          "Statement":[
            {
              "Effect":"Allow",
              "Action":[
                "logs:*"
              ],
              "Resource":[
                "arn:aws:logs:*:*:*"
              ]
            },
            {
              "Effect":"Allow",
              "Action":[
                "codepipeline:PutJobSuccessResult",
                "codepipeline:PutJobFailureResult"
              ],
              "Resource":[
                "*"
              ]
            }
          ]
        }
      }
    },

AWS::Lambda::Function

In the code snippet below, you see how I’m defining the Lambda function in CloudFormation. There are several things to point out here. I uploaded some JavaScript (Node.js) code to S3 with the name Archive.zip into a bucket specified by the S3Bucket parameter that I set when I launched the CloudFormation stack. This S3 bucket needs to have S3 Versioning enabled on it. Moreover, the Archive.zip file needs to have the .js file used by Lambda in the root of the Archive.zip. Keep in mind that I can call the .zip file whatever I want, but once I name the file and upload it then my CloudFormation template needs to refer to the correct name of the file.

Also, you see that I’ve defined a Handler named validateurl.handler. This means that the JavaScript file in the Archive.zip that hosts the file(s) that Lambda runs must be named validateurl.js. If I want to use a different name, I must change both the JavaScript filename and the CloudFormation template that references it.

CloudFormation JSON to define Lambda function execution

    "CodePipelineLambdaTest":{
      "Type":"AWS::Lambda::Function",
      "DependsOn":[
        "CodePipelineLambdaRole",
        "LambdaCodePipelineExecutionPolicy"
      ],
      "Properties":{
        "Code":{
          "S3Bucket":{
            "Ref":"S3Bucket"
          },
          "S3Key":"Archive.zip"
        },
        "Role":{
          "Fn::GetAtt":[
            "CodePipelineLambdaRole",
            "Arn"
          ]
        },
        "Description":"Validate a website URL",
        "Timeout":20,
        "Handler":"validateurl.handler",
        "Runtime":"nodejs",
        "MemorySize":128
      }
    },

Lambda Test Function

With all of this configuration to get something to run, sometimes it’s easy to overlook that we’re actually executing something useful and not just configuring the support infrastructure. The snippet below is the actual test that gets run as part of the Lambda action in the Lambda stage that I defined in the CloudFormation template for CodePipeline. This code is taken directly from the Integrate AWS Lambda Functions into Pipelines in AWS CodePipeline instructions from AWS. This JavaScript code verifies that it can access the supplied website URL of the deployed application. If it fails, it signals for CodePipeline to cease any further actions in the pipeline.

JavaScript to test access to a website

var assert = require('assert');
var AWS = require('aws-sdk');
var http = require('http');

exports.handler = function(event, context) {

    var codepipeline = new AWS.CodePipeline();

    // Retrieve the Job ID from the Lambda action
    var jobId = event["CodePipeline.job"].id;

    // Retrieve the value of UserParameters from the Lambda action configuration in AWS CodePipeline, in this case a URL which will be
    // health checked by this function.
    var url = event["CodePipeline.job"].data.actionConfiguration.configuration.UserParameters;

    // Notify AWS CodePipeline of a successful job
    var putJobSuccess = function(message) {
        var params = {
            jobId: jobId
        };
        codepipeline.putJobSuccessResult(params, function(err, data) {
            if(err) {
                context.fail(err);
            } else {
                context.succeed(message);
            }
        });
    };

    // Notify AWS CodePipeline of a failed job
    var putJobFailure = function(message) {
        var params = {
            jobId: jobId,
            failureDetails: {
                message: JSON.stringify(message),
                type: 'JobFailed',
                externalExecutionId: context.invokeid
            }
        };
        codepipeline.putJobFailureResult(params, function(err, data) {
            context.fail(message);
        });
    };

    // Validate the URL passed in UserParameters
    if(!url || url.indexOf('http://') === -1) {
        putJobFailure('The UserParameters field must contain a valid URL address to test, including http:// or https://');
        return;
    }

    // Helper function to make a HTTP GET request to the page.
    // The helper will test the response and succeed or fail the job accordingly
    var getPage = function(url, callback) {
        var pageObject = {
            body: '',
            statusCode: 0,
            contains: function(search) {
                return this.body.indexOf(search) > -1;
            }
        };
        http.get(url, function(response) {
            pageObject.body = '';
            pageObject.statusCode = response.statusCode;

            response.on('data', function (chunk) {
                pageObject.body += chunk;
            });

            response.on('end', function () {
                callback(pageObject);
            });

            response.resume();
        }).on('error', function(error) {
            // Fail the job if our request failed
            putJobFailure(error);
        });
    };

    getPage(url, function(returnedPage) {
        try {
            // Check if the HTTP response has a 200 status
            assert(returnedPage.statusCode === 200);
            // Check if the page contains the text "Congratulations"
            // You can change this to check for different text, or add other tests as required
            assert(returnedPage.contains('Congratulations'));

            // Succeed the job
            putJobSuccess("Tests passed.");
        } catch (ex) {
            // If any of the assertions failed then fail the job
            putJobFailure(ex);
        }
    });
};

Post-Commit Git Hook for Archiving and Uploading to S3

I’m in the process of figuring out how to add a post-commit hook that moves files committed to a specific directory in a Git repository, zip up the necessary artifacts and upload to a pre-defined directory in S3 so that I can remove this manual activity as well.

Summary

By adding the ability to invoke Lambda functions directly from CodePipeline, AWS has opened a whole new world of what can be orchestrated into our software delivery processes in AWS. You learned how to automate the provisioning of not just the Lambda configuration, but dependent AWS resources including the automated provisioning of your pipelines in AWS CodePipeline. If you have any questions, reach out to us on Twitter @stelligent or @paulduvall.

Resources