Deployment Pipelines – Introduction
The software “deployment pipeline” has become a common mechanism in the modern enterprise. A deployment pipeline is a sequence of automation that produces or deploys a software artifact. This artifact can take many forms, for example, a programming library, a web application, or even automation to converge infrastructure and security controls. The sequence of actions includes some variation of static analysis, test automation, packaging, certification, and deployment. As a concrete example, consider a pipeline to deliver a Rails web application. The pipeline may run rubocop, RSpec tests, package a Docker image, push the image to ECR with a certification label and then deploy to an ECS cluster on AWS.
A common pattern is for the specification of the pipeline to live with or near the source code of the artifact to be delivered. This makes perfect sense; it’s very similar to storing the build automation – a Makefile, build.xml, etc. at the root of the repository for a given artifact. Anything necessary to build and deliver a product should ideally be stored alongside that product so that it’s more clear that they must evolve together. As a concrete example, the pipeline specification for the Stelligent static analysis tool cfn_nag is stored in the repository alongside the code that it packages and deploys
Pipelines and Compliance Issues
While this storage pattern has value, it can create a compliance problem for organizations. Consider an enterprise with a compliance requirement that every software artifact is scanned with a security tool. For a mature two pizza box team at a startup, there’s likely nothing more to this story. The developers deem they want to do an inspection with a security tool before release, so they bake it into the pipeline and it’s done. At a large enterprise with a diversity of teams and technical stacks, this is a challenge. If there are thousands of “snowflake” pipelines spread out across thousands of repositories, what is the scalable mechanism for enforcement?
Publishing compliance guidelines and trusting development teams to do the right thing is not a viable option especially around security controls. It’s an invitation for negligence and even malevolence. At the very least, not all development teams have the same level of maturity to execute written compliance guidelines. At many enterprises where the stakes are high, proof of compliance is necessary, not just trust.
Centralization of Control to Support Compliance
A centralized compliance team that controls the definitions of pipelines can enforce compliance. While this chokepoint provides a solution for compliance issues, it is not scalable and would likely be ruinous for the velocity of any given development team depending on those pipelines.
- Can this centralized compliance team “keep up” with development by tracking, auditing and scanning each and every one of the potential thousands of repositories, hunting for pipeline definitions?
- Given the diverse nature of pipelines and the artifacts they can produce across platforms and languages (J2EE, Django, Rails, CloudFormation, Ruby, python, chef, etc.) having engineers outside of the development team edit or manage the pipelines is error-prone at best.
- Every time a development team wants to enhance the pipeline for their software artifact, they have to fill out a change request form that takes a day to a week or worse to go through review.
The important question then becomes how to enforce pipeline compliance while allowing decentralized control?
Patterns for Decentralization of Control with Compliance
There are a few distinctive patterns to bridge the gap between the extremes of full centralization with compliance and full decentralization with no compliance:
- Production Deployment Gate is a pattern whereby the compliance team has control over the deployment gate to production and enforces compliance at that point. While this approach leaves the pipeline in the hands of development, by the time an artifact is being deployed, the feedback on compliance is coming far too late to be helpful for the development teams. For example, an application security scanning tool might find a fundamental design flaw in a web app that requires substantial rework.
- Pipeline Factory is a pattern whereby a centralized team controls the definitions of pipelines, but then provides an element of self-service to development teams to invoke the factory to create pipelines for themselves. For a great example of this approach using AWS CodePipeline, CloudFormation and Service Catalog, see Using AWS to Achieve Both Autonomy and Governance at 3M.
- Pipeline Extension Points is a pattern whereby a core portion of the pipeline definition is controlled by a centralized team in their own code repository, but development teams can extend the pipeline to call out to any extra customizations they have (but not change anything in the core).
- Pipeline Services is a pattern whereby the core compliance and inspection activities are realized via independent services that pipelines call out to, and obtain proof of invocation from. The proof can subsequently be recorded for audit or passed along in a distribution to interested deployment gates. Digital signatures can be used for proof of invocation.
The rest of this article will explore ideas and examples around the implementation of the Pipeline Services pattern.
Pipeline Services Pattern
The advantage of the Pipeline Services pattern is that control of the pipeline itself is decentralized and in the hands of development teams, while the compliance activities can still be enforced from a centralized authority. The compliance team has control over the services and the inspection rules they provide. The tradeoff is in the complexity of developing and operating such services. Understanding and applying the mechanisms of proof also involves added complexity.
As a simple example, consider the static analysis tool cfn_nag. It accepts a CloudFormation template as input and scans it for obvious security violations. A compliance organization at an enterprise might decree that all CloudFormation templates deployed to AWS must pass a cfn_nag scan without any failing violations. In order to satisfy this decree via the Pipeline Services pattern, the compliance team would deploy and operate a “cfn_nag service”.
High-level Description of cfn_nag Service
To support the Pipeline Services pattern, the cfn_nag service must provide a mechanism to prove that it was invoked. At a high level, the service:
- Exposes an endpoint that accepts a Cloudformation template in a request
- Runs cfn_nag against the template
- Aggregates the template, the violations and the applied rules into a document
- Using public-key cryptography, digitally signs the aggregated document and returns it in a response
In addition to the cfn_nag service, there must be some kind of deployment service or deployment gate that accepts the digitally signed response. This deployment service uses the cfn_nag service’s public key to validate the signature and deploys the template embedded in the document.
End-User Workflow
Given the existence of such a cfn_nag service, the flow of events in a compliant pipeline with decentralized control would include:
- A developer commits a change to code – in this case a CloudFormation template
- The pipeline, controlled by the development team, triggers and makes a request to the cfn_nag service, controlled by the Compliance team, passing the template in the body of the request
- The service runs cfn_nag against the template
- The service returns a document that includes the template, the rules applied to the template and the result all signed together
- The pipeline does whatever else the development team wants
- When it comes time to deploy the template, the pipeline sends a request to a Deployment Service containing the template and the signed results
- The Deployment service uses the cfn_nag service’s public key to verify the signature
- If the signature is invalid, deployment is rejected
- If the signature is valid, the template is deployed to AWS
Details of an Example Conversation
The following demonstrates the actual request-response conversation between the client in the pipeline and the cfn_nag service. The client in the pipeline sends a CloudFormation template in the request to cfn_nag service:
The service returns the JSON response:
This response is perhaps less than satisfying in its current form. There are two elements, the encoded results and the digital signature for those results, both encoded in base64. The results themselves are encoded in base64 to avoid any unpleasantries with whitespace – the signature will match up with the encoded results. Decoding the “base64 mess” reveals a more interesting structure:
As the set of rules for an inspection tool evolves over time, it’s critical to sign the list of rules applied alongside the artifact under test and the results. Otherwise, at some future date, it will be impossible to tell if the new set of rules was applied, or the old set of rules was applied. Additionally, there is a requirement that rules be immutable to some degree so that a rule executed in the past will yield the same result in the present or future. The results are specified, the rules applied to the template and the template itself. A deployment gate would likely reject this template given it has no encryption.
Verification of Signatures
For the sake of argument, consider the situation where the violations are only warnings or there are no violations. The next step for a deployment gate is to verify the signature. In the above example conversation, the signature is actually computed with the ed25519 algorithm (but any asymmetric cryptographic system can be used). Example code from rbnacl/libsodium below verifies the signature. Under the hood, the signature is decrypted with cfn_nag’s public key, the hash of the results_encoded is computed, and the two are compared for agreement.
cfn_nag Service Implementation
A working prototype implementation of the described cfn_nag service is available at https://github.com/stelligent/cfn-nag-service.
This implementation provides a Docker image that can be deployed in an ECS cluster or the like, or as an AWS Lambda function.
Further Research
The CloudFormation template and cfn_nag are a relatively simple demonstration of the Pipeline Services pattern. It’s “easy” to pass the template around as the distribution alongside proof of inspection. Theoretically, something similar could be done for a binary package like a JAR or a Ruby gem – signing the distribution with a native packaging mechanism or by including some kind of manifest with a signature.
On the other hand, there are some hurdles. The cfn_nag tool is open-source and free, but what of expensive scanning tools with oppressive licenses? Also, how might an “opaque” distribution like a machine image (AMI) or Docker image communicate proof? If a computing resource is created from a machine image, what are the transient details of the instantiation that need to be captured and proven? Some kind of system of record trusted by the inspection service and the deployment gates would likely be necessary? This system of record for proving the outcomes of inspection likely deserves further research. Existing CI/CD tools are notoriously transient and bad at being “systems of record”, so this possibly points to a new important element in the CI/CD ecosystem for when the number of pipelines scales up.
Conclusion
The proliferation of “snowflake” pipelines within the enterprise presents a problem for compliance, especially with respect to security controls. The extremes of control, centralized and decentralized, can cause undesirable trade-offs between compliance and development velocity. It is proposed that a service-based approach powered by inspection services with digital signatures can afford development teams the power to control their own pipelines while still providing proof of compliance with enterprise-wide standards.