Firewalls, controlled by a Pipeline?

Is updating your firewall a painful, slow process? Does the communication gap between development teams and security teams cause frustration? If so, you’re not alone. In technology organizations, changes to firewalls tend to be slow and typically cause developer teams and security teams numerous headaches. However, controlling firewall and security settings with a pipeline, managed with CloudFormation can provide your organization faster turn around, less risk, and in turn, more business value.

So what might a firewall update process currently look like?

First, fill out a ticket in a tool far outside the normal realm of a developer’s typical workflow

The person requesting the firewall change flounders through an often tedious workflow is often not well documented and unclear. They may input partial, unnecessary, or unwittingly incorrect information. The ticket in turn becomes unhelpful and uninformative for the security personnel implementing the change.
If the ticket says “Instances need ping access”, what does this mean? Do all instances need it? On which port? For all the message types, or just echo request? The ticket system gives no clear path forward for communication to clarify these questions. In the worst situation, unruly developers like myself will become frustrated and come up with methods to work-around the system, creating even more security holes and issues.

Next, wait, while the ticket falls into a black hole in the galaxy of the security team

In organizations, this waiting game could take months, and in the case of unlucky forgotten tickets, perhaps a year. Delays in delivering business value ensue: features and products are delayed in their release. Some of these processes give no information to those relying on this firewall change. There is no way to see if it has been assigned, or to whom. There is little opportunity to follow-up and escalate the importance of the changes.

Finally, enjoy the updates to the firewall!

On some dark, stormy night, the security team will work deep into the night applying a change set. If the team works in AWS, this could be a combination of updates to Security Groups, IAM Roles, Routing Tables, and more. Perhaps, these tweaks are even applied manually, using the AWS console, making them subject to human error via fat-fingering, lapses in attention and more. What’s more, these updates may be hard to test, resulting in unexpected glitches the next day followed by hours of debugging, if all has not gone smoothly.

Why is this process an issue?

The process described above is at odds with the ideas behind Continuous Delivery, where teams aim to push code into production frequently, which could be as rarely as once a sprint, but as rapidly as multiple times per day. When delivering in such a way, the infrastructure has to be as agile and quick-moving as the code. By requiring weeks, even months for an infrastructure change, the release of some features can be delayed for long periods of time, instead of being delivered as soon as the feature is working. The development teams are stifled, unable to deliver new features providing new business value while they wait on the slow change process.

CloudFormation to the Rescue

CloudFormation is an AWS product that allows users to easily create, update, and destroy AWS resources by running specific scripts called templates. The templates create stacks, which is a collection of resources created by the template. CloudFormation allows engineers to create templates that manage security and network rules related to firewall security for different VPNs, subnets, and individual EC2 instances running on AWS. These templates can be stored in private repositories in source control where the security folks can create, update, and test new security rules and features.

What could a future firewall update process look like?

To manage this, the developer team and the security team would be responsible for separate CloudFormation stacks. The security team’s stacks would manage resources like security groups and firewall settings. The dev team’s stack would manage EC2 instances and other resources that would use this security group or live behind this firewall.

When the developer team needs security update, the developer creates a change to the firewall CloudFormation template and tests it in isolated environment
She then submits a pull request to the Security team’s CloudFormation repository to implement her firewall change
The security team can assign the pull request to one of their engineers and the developer can watch the discussion and progress around the ticket.
If the security team approves the change by and kicks off the automated deployment process
If the security team does not approve the change, the pull request is rejected which notifies the developer, giving them a forum to discuss why it was denied and possible changes that would allow it to be approved

What does this new process fix?

This new process provides major benefits for both teams.
First, it gives both teams more speed and eliminates the black hole of “wait”. The change has already been made; the security team simply needs to review it. The developer requesting the change can stay informed on the progress of the change as well, giving all parties a forum for discussion and clarity.
Second, it builds in quality. By handling the configuration of firewalls and security settings with CloudFormation, fat-fingering configuration commands is all but eliminated, as well as other manual errors. The CloudFormation templates can be verified before running for correct syntax. Additionally, the ease of spinning up resources with CloudFormation makes spinning up testing resources viable, easy, and fast, thereby encouraging testing.

Example Implementation

Below is a simplistic way to achieve this, by following the Red/Green approach from Extreme Programming. In this approach, first we write a failing test (the Red stage), then we make changes that make the test pass (the Green stage). First, we want to verify that, based on our security group setup, the ec2 instances do not have ping (icmp) access. We do this by setting up our environment and running a test script. We want to see that our test fails. Next, we update our security group to allow ping access, and re-run our test script. Then, we want to see that the test script passes.
First, we set up our templates to create our environment
Here is a file called security-group-stelligent-blog.template. It creates a security group that allows ssh access and no icmp (ping) traffic. The template outputs the id of the security group created.

Here is a file called ec2-instances-stelligent-blog.template. It creates two t1.micro EC2 linux instances named InstanceOneStelligentBlog and InstanceTwoStelligentBlog, both which reference the security group created in the template above.

In my case, this-is-the-security-group-output looks like this “stelligent-blog-security-groups-PingAndSshSecurityGroupStelligentBlog-DGVKCW4AH7A4”.
Then, we test to see whether our current environment has the functionality we want (It shouldn’t, as we have not implemented it yet)
Here is a test script that can verify whether the given target endpoint can be pinged.

Create both the security group stack and ec2 instances’ stack, then copy the test file from your local machine onto one of the instances:

Then we log on to our instance via ssh:

And run the test script

Here, we should see the test output “Test Failed”
Next, we update the script to make the test pass
In security-group-stelligent-blog.template, update the SecurityGroupIngress as seen below. It now allows icmp (ping) traffic from anywhere.

Now deploy (not create!) the changes to your security group stack and run the test script again.

Here, we should see the test output “Test Passed”

Now run with it…

Expanding on this approach in other ways can improve your current infrastructure delivery process. Instead of just managing security groups and firewalls, think about managing other security resources in AWS like IAM Roles, S3 Bucket Access, or Route Tables for subnets. By managing resources with CloudFormation, you can then utilize linting tools for your templates. For example cfn_nag will prevent overly permissive IAM rules. Furthermore, process around required testing could blossom. Security teams could require a corresponding test be submitted with the pull request, thereby building up a test suite to ensure their users get the security settings they need. Additionally, the security team will have confidence that new changes won’t break their regression suite of tests. Make your security process one that everyone involved is happy to participate in.

Resources:

cfn_nag – https://github.com/stelligent/cfn_nag

Stelligent Amazon Pollycast