One-Button Everything in AWS
There’s an approach that we seek at Stelligent known informally as “One-Button Everything”. At times, it can be an elusive goal but it’s something that we often aim to achieve on behalf of our customers. The idea is that we want to be able to create the complete, functioning software system by clicking a single button. Since all of the work we do for our customers is on the Amazon Web Services (AWS) platform, this often results in a single Launch Stack button (as shown below).
The button is a nice metaphor while also being a literal thing in AWS CloudFormation. It’s also emblematic of the principles of simplicity, comprehensiveness, and consistency (discussed later). For example, while you might be able to do the same with a single command, you would need to take into account the setup and configuration required in order to run that single command which might reduce simplicity and consistency.
In this post, I describe the principles and motivation, user base, scope of a complete software system, assumptions and prerequisites, alternative scenarios, and documentation.
Principles
These are the three principles that the “one-button everything” mantra is based upon:
- Comprehensive – The goal is to orchestrate the full solution stack, not partial implementations of the solution stack
- Consistent – Works the same way every time. Documentation is similar across solution stacks. Once you require “one-off” implementations, it makes it susceptible to errors
- Simple – Few steps and dependencies. Make it difficult to make mistakes.
These three principles guide the design of these one-button systems.
The Users are Us
The users of your one-button systems are often other engineers within your organization. A tired argument you might hear is that you don’t need to create simple systems for other engineers since they’re technical too. I could not disagree more with this reasoning. As engineers, we should not be spending time on repetitive, failure-prone activities and put the burden on others – at scale. This belief doesn’t best serve the needs of the organization as most engineers should be spending time on providing features to users who receive value of their work.
What is the complete software system?
A common question we get is “what makes up a complete software system?” To us, the complete software system refers to all of the infrastructure and software that composes the system. For example, this includes:
- Networks (e.g. VPC)
- Compute (EC2, Containers, Serverless, etc.)
- Storage (e.g. S3, EBS, etc.)
- Database and Data (RDS, DynamoDB, etc.)
- Version control repositories (e.g. CodeCommit)
- Deployment Pipelines
- Orchestration of software delivery workflows
- Execution of these workflows
- Building and deploying application/Service code
- Test execution
- Static Analysis
- Security hardening, tests and analysis
- Notification systems
- Monitoring systems
Assumptions and Prerequisites
In order to create an effective single-button system, the following patterns are assumed:
- Everything as Code – The application, configuration, infrastructure, data, tests, and the process to launch the system are all defined in code;
- Everything is Versioned – All of this code must be versioned in a version-control repository;
- Everything is Automated – The process for going from zero to working system including the workflow and the “glue code” that puts it all together is defined in code and automated;
- Client configuration is not assumed – Ideally, you don’t want users to require a certain client-side configuration as it presents room for error and confusion.
As for prerequisites, there might be certain assumptions you document – as long as they are truly one-time only operations. For example, we included the following prerequisites in a demo system we open sourced:
Given a version-control repository, the bootstrapping and the application must be capable of launching from a single CloudFormation command and a CloudFormation button click – assuming that an EC2 Key Pair and Route 53 Hosted Zone has been configured. The demo should not be required to run from a local environment.
So, it assumes the user has created or, in this case, cloned the Git repository and that they’ve established a valid EC2 Key Pair and a Route 53 Hosted Zone. Given that, users should be able to click the Launch Stack button in their AWS account and launch the complete working system with a running deployment pipeline. In this case, the working system includes a VPC network (and associated network resources), IAM, ENI, DynamoDB, a utility EC2 instance, a Jenkins server and a running pipeline in CodePipeline. This pipeline then uses CloudFormation to provision the application infrastructure (e.g. EC2 instances, connect to DynamoDB), etc. Then, as part of the pipeline and its integration with Jenkins, it runs tasks to use Chef to configure the environment to run Node.js, and run automated infrastructure and application tests in RSpec, ServerSpec, Mocha, and Chai. All of this behavior has been provisioned, configured and orchestrated in a way so that anyone can initially click one Launch Stack button to go from zero to fully working system in less than 30 minutes. Once the initial environment is up and running, the application provisioning, configuration, deployment and testing runs in less than 10 minutes. For more on this demo environment, see below.
Decompose system based on lifecycles
While you might create a system that’s capable of recreating the system from a single launch stack button, it doesn’t mean that you apply all changes from the ground up for every change. This is because building everything from “scratch” every time erodes fast feedback. So, while you might have a single button to launch the entire solution stack, you won’t be clicking it all that much. This might sound antithetical to everything I’ve been saying so far but it’s really about viewing it as a single system that can be updated based on change frequency to logical architectural layers.
…while you might have a single button to launch the entire solution stack, you won’t be clicking it all that much.
For example, because you want quick feedback, you won’t rebuild your deployment pipeline, environment images, or the network if there is only a change to the application/service code. Each of these layers might have their own deployment pipelines that get triggered when there’s a code change (which is, often, less frequent than application code changes). As part of the application deployment pipeline, it can consume artifacts generated by the other pipelines.
This approach can take some time to get right as you still want to rebuild your system whenever there’s a code commit; you just need to be judicious in terms of what gets rebuilt depending with each change type. As illustrated in Figure 1, here are some examples of different layers that we often decompose our tech stacks into along with their typical change frequency (your product may vary):
- Network (VPC) – Once a week
- Storage – Once a week
- Routing – Once a week. For example, Route 53 changes.
- Database – Once a week
- Deployment Pipeline – Once a week. Apply a CloudFormation stack update.
- Environment Images – Once per day
- Application/Service – Many times a day
- Data- Many times a day
Launching Non-CloudFormation Solutions
As mentioned, one of the nice features of CloudFormation is the ability to provide a single link to launch a stack. While the stack launch is initially driven by CloudFormation, it can also orchestrate to any other type of tool you might use via the AWS::CloudFormation::Init resource type. This way you can get the benefits of launching from a single Launch Stack button in your AWS account while leveraging the integration with many other types of tools.
For example, you might have implemented your using Ruby, Docker, Python, etc. In this scenario, let’s imagine you’re using CFNDSL (albeit, a little circular considering you’re using Ruby to generate CloudFormation). In this scenario, users could still initiate the solution by clicking on a single Launch Stack button – which would launch a CloudFormation stack. The CloudFormation stack would configure a client that uses Ruby to run CFNDSL which uses the Ruby code to generate CloudFormation behavior. By driving this through CloudFormation, you don’t rely on the user to properly configure the client.
Alternatively, you can meet many of the design goals by implementing the same through a single command but, realize, there can be a potential cost in simplicity and consistency as a result.
Documentation
The documentation provided to users/engineers should be simple to understand and execute and difficult to commit errors. Figure 2 shows an example from https://github.com/stelligent/cloudformation_templates in which an engineer can view the prerequisites, supported AWS regions, an active architecture illustration, a how-to video, and finally, a Launch Stack button. You might use this as a sample for the documentation you write on behalf of the users of your AWS infrastructure.
Summary
You learned how you can create one button (or command) to launch your entire software system from code, prerequisites, what makes up that software system, how you might decompose subsystems based on lifecycles and, finally, a way of documenting the instructions for launching the solution.
Stelligent is hiring! Do you enjoy working on complex problems like figuring out ways to automate all the things as part of a deployment pipeline? Do you believe in the “everything-as-code” mantra? If your skills and interests lie at the intersection of DevOps automation and the AWS cloud, check out the careers page on our website.
Stelligent Amazon Pollycast
|