Stelligent

Measuring DevOps Success with Four Key Metrics

In the book Accelerate by Forsgren, et al., they describe four key software delivery metrics:

Over the years at Stelligent, we’ve used many metrics and have debated on which are best for our customers. Therefore, it’s extremely beneficial to finally have a canonical source for the relevant metrics that matter to organizations that is backed by data and analysis – if for no other reason then to stop wasting time on debating the finer points of certain metrics. Fortunately, there’s much more value than simply ending debates. Focusing on only these metrics also empower organizations by having objective measures of determining if the changes they’re making have an actual impact on enterprises.   

When we describe the benefits of DevOps to enterprises, we focus on accelerating the speed and confidence of feedback between customers and developers. This notion is illustrated below.

DevOps Accelerates the Speed and Confidence of Delivery to Production

The results of this acceleration come in the form of faster lead time for changes to production, increased deployment frequency to production, faster time to restore service to production, and reduction in the change failure rate to production.

Ultimately, DevOps is any organizational, process, culture, or tooling changes that accelerate the speed and confidence of these feedback loops between users and engineers.

In this post, I’ll describe the four metrics in more detail while providing ways you might instrument these metrics using AWS-native services such as AWS CodePipeline. Keep in mind that you can also survey your teams to obtain this information. In our experience, people tend to have misperceptions on how often they’re actually deploying to production or the lead time for changes to production. Therefore, it can be helpful to augment your survey data by instrumenting your Continuous Delivery and other tools to get a more accurate picture on how teams are doing when it comes to the four key metrics. Even more importantly is to have a consistent way of viewing the trends of these metrics rather than the actual numbers. As always, we want to see these trends improving.

Deployment frequency

In Accelerate, the authors state “Therefore, we settled on deployment frequency as a proxy for batch size since it is easy to measure and typically has low variability. By “deployment” we mean a software deployment to production or to an app store.” The reason the frequency of production deployments matters is because it tells you how often you’re delivering something of value to end users and/or getting feedback from users. 

The other essential point with any of these metrics is that they are all based on production. Not staging, or “what we’d intended to deploy to production”: only production. Production is the only thing that matters because it’s only in production that value can be realized. The value usually comes in the form of revenue, feedback, or a combination of both.

With an AWS-native approach we can obtain this number by getting the time at which CodePipeline successfully completes its last action in the pipeline and track how many times per time period deployments to production are occurring.

According to the DORA 2018 Report, Elite performers have a deployment frequency of multiple times per day and Low performers have a deployment frequency that is between 1 week and 1 month.

Lead time for changes

In the book, they describe lead time for change as “the time it takes to go from code committed to code successfully running in production”. 

You can obtain this metric by capturing the time at which each revision was initiated in CodePipeline and then updating the row with the time at which that same revision successfully runs the last action of the deployment pipeline. By comparing these numbers, you get the lead time for changes. By averaging these numbers over a period of time, you obtain the mean time lead for changes to production.

According to the DORA 2018 Report, Elite performers have a lead time for changes of less than 1 hour and Low performers have a lead time for changes that is between 1 month and 6 months.  

Time to restore service

The time to restore service or mean time to recover (MTTR) metric calculates the average time it takes to restore service.

One way of instrumenting this data in an AWS-native world is to run automated synthetic tests in production that test key scenarios in production. If there are any failures, we capture the time at which the failure was discovered and enter it as a row in a database – such as DynamoDB – and then update that row once our synthetic tests begin reporting success again. This information can be reported back to CodePipeline so that we know when a failure occurs and what kind of test produced this failure.

According to the DORA 2018 Report, Elite performers have an MTTR that is less than 1 hour and Low performers have an MTTR that is between 1 week and 1 month.  

Change failure rate

The change failure rate is a measure of how often deployment failures occur in production that require immediate remedy (particularity, rollbacks).

To obtain this information in an AWS-native manner, you track each deployment and indicate whether it was successful or not. Then, you track the ratio of successful to unsuccessful deployments to production over time.

According to the DORA 2018 Report, Elite performers have a change failure rate between 0-15% and Low performers have a rate from 46-60%.  

Continuous Improvement

While we have captured some of these and other metrics with some of our customers, we’ve not acquired them in a consistent manner over the years. We even open sourced a CloudWatch Dashboard for deployment pipelines that use CodePipeline that captures some of these metrics along with some others. It’s called pipeline-dashboard.

A way to obtain these metrics is through brief surveys and by instrumenting the deployment pipelines for certain applications/services. This way you can ensure teams are seeing improvements and, if they are not, identify ways to remedy this. Then, you might anonymize this data and make it available across an enterprise. This helps ensure you are accelerating the speed and confidence of feedback between end users and developers.

Summary

In this post, we covered four key metrics for DevOps success. When tracking these, you can find ways to accelerates the speed and confidence of delivery of features to production. All of these metrics are time to value of features running in production because it’s only when software is in production that your end users and engineers receive the value of the investment. A summary of the four key metrics taken from the DORA 2018 State of DevOps Report is listed in the table below. 

Aspect of Software Delivery Performance Elite High Medium Low
Deployment Frequency Multiple Deploys Per Day Between once per hour and once per day Between once per week and once per month Between once per week and once per month
Lead time for changes Less than one hour Between one day and one week Between one week and once month Between one month and 6 months
Mean time to restore service Less than one hour Less than one day Less than one day Between one week and one month
Change failure rate 0-15% 0-15% 0-15% 46-60%
[Source]

Additional Resources

Stelligent Amazon Pollycast