Thought Experiment: Proposed Complexity Metric for IAM Policy Documents

Code Complexity Metrics

When a professional software developer writes code, they measure the complexity of the code they write. The contrapositive holds true as well that those who don’t measure the complexity of their code are not professional. There are a variety of code complexity metrics available to the professional developer. The “better” metrics are vetted by empirical studies and attempt to correlate measured values with bug density. For a given metric, there may be studies that appear to prove correlation and other studies that don’t do so definitively.

Regardless of specific metrics, there is common sense truth that a developer is more likely to make a mistake when developing code that is more complex than code that is simpler. The question is does a given metric actually measure complexity in a useful way to demonstrate the reality?

Cyclomatic Complexity

One of the more famous (and measured) complexity metrics is McCabe’s cyclomatic complexity. A good rule of thumb is that when a function’s CCM approaches 10, it becomes less readable, harder to reason about and in turn more likely to include an error.

As CCM approaches 10, a developer knows to decompose the function into something simpler that is easier to understand; it’s a great guide for directing improvement. It does not appear that there is a similar metric/tool available for the author of an IAM policy document. What might such a metric look like?

Policy Document Complexity

It is quite obvious that some policy documents are more complex than others and therefore more likely not to do what the author originally intended. What’s not apparent is how to measure that complexity in a way that can help to write a “better” policy document. The point isn’t to judge how secure or correct a policy is. The point is to quantify how easy a policy is to understand comparatively and to reason about it.

First Principles

Before throwing wild first-cut numbers at policies, it is important to enumerate the principles for what makes a policy more complex than another. For a metric to have value, it will need to be aligned with these principles.

The simplest policy document contains one Statement. The one statement allows actions from one service to operate on resources within that same service. This policy should have the lowest complexity metric.
Policies that are “positive” are easier to understand than policies that are “negative”, especially when the positive and negative “compete”. In other words, Allow is less complex than Deny, and “overlapping” Statements that both Allow and Deny are more complex.
The NotAction and NotResource specifications can invert the meaning of a Statement in a confusing way. Therefore NotAction and NotResource are more complex than Action and Resource.
Conditions are complex unto themselves. They are powerful but hard to test and very easy to get wrong. Conditions that include operations on multiple values, existence checks or policy variables are more complex than a singular condition.
The more statements a policy has, the more complex it is. However, having multiple simple statements should be considered less complex than having a single “large” statement that jumbles together actions and resources from multiple services.
There is no penalty for a wildcard “*” in Resource or Action by itself. It may not observe the principle of least privilege, but it is simple to understand. If a full wildcard is mixed with other non-wildcards, then it adds complexity in that it obscures what the original intention or the policy was.

First Cut Scoring

Keeping these principles in mind, the Stelligent Policy Complexity Metric (SPCM) is initially defined as:

For each Statement, +1
For each Deny, +1
For each NotAction, +1
For each NotResource, +1
Within the (Not)Resource and (Not)Action specifications combined, +2 for every service mentioned beyond one service
For each service mentioned in the (Not)Resource specification and not mentioned in the (Not)Action specification (and vice versa), +1
If (Not)Resource includes a full wildcard mixed with other resources, +1
If (Not)Action includes a full wildcard mixed with other actions, +1
For each Condition operator, +2
For each use of ForAllValues or ForAnyValue, +2
For each use of the IfExists suffix in a Condition operator, +1
For each use of the Null Condition operator +1
For each value referenced in Condition that includes a policy variable, +1

Implementation

A prototype of this algorithm is available at https://github.com/stelligent/iam_complexity_metrics

The prototype provides two approaches for targeting the policies for measurement:

Static analysis of CloudFormation templates
Analysis of live IAM policy documents

Static Analysis

The first approach attempted was to do “static analysis” on CloudFormation templates a la cfn_nag. The prototype scans a given directory for CloudFormation templates and picks out the resources AWS::IAM::Policy and AWS::IAM::Role to apply the metric to their PolicyDocument properties. The emitted result is a JSON document correlating file path to a dictionary of policy documents and their complexity scores.

It is a great use case to measure policy documents before they are ever converged in an AWS account, but there are a number of complexities to dealing with “missing” values from pseudo-functions like Ref, GetAtt, FindInMap, etc. The prototype “works” and includes a best effort parser that mostly ignores these missing values.

To experiment with the static analysis, first, be sure to have Ruby installed. For more information, see Installing Ruby. Then obtain a collection of CloudFormation templates. For example, AWS makes sample templates available for download. Unzip the templates into /var/tmp and then run the metric analyzer:

Live Analysis

The second approach is to select an AWS account to measure the complexity of the “live” role and policies that already exist. This approach avoids any unpleasantries with missing values, so for the purpose of exploration (below), this approach is used. The emitted result is a JSON document correlating policies to their complexity scores.

To experiment with the live analysis, first, be sure to have Ruby installed. For more information, see Installing Ruby. Then presuming the existence of an AWS CLI profile named “dev_account” to access a live account, run the live metric analyzer:

Aggregate Statistics of SPCM

Running the live_iam_metrics rake target against a live AWS account returns the SPCM metric for all of the AWS managed policies (this target can easily be altered in account.rb).

The set of all AWS managed policies isn’t a “controlled” dataset and evolves all the time as new services come out, so there’s not necessarily any categorical conclusions to be drawn from it. That said, it’s available to anybody with an AWS account, and is seemingly interesting to experiment with versus designing a new dataset whole cloth.

At the time of writing, the aggregate statistics for SPCM applied to AWS Managed Policies are:

Mode: 1
Median: 8
Min: 1
Max: 295

Meaning the most common score is 1, the average is 8, but the maximum score is a shockingly high 295.

There are 634 IAM policies in the data set. These scatter plots graph SPCM on the X-Axis versus the number of AWS managed policies that have that score on the Y-Axis. The following graphs all reflect the same data but at different ranges of SPCM (1-300, 1-50, 1-25).

They show the majority are relatively simple and within the range of 1 and 20 and even just between 1 and 5. Getting down to the actual numbers, out of 634 policies, 583 falls within 1 and 20 and 396 falls within 1 and 5 (with 1, the mode being dominant). However, there are indeed policies with greater complexity to investigate.

Examples of SPCM

Aggregate numbers aside, the more interesting exercise is to compare a policy with its complexity score to see if the score indicates anything. The following four policies with low, medium, high and ludicrously high scores have been drawn at random from the set of policies.

AmazonESReadOnlyAccess	1
AWSDeepRacerCloudFormationAccessPolicy	9
AmazonECS_FullAccess	47
SupportUser	135

AmazonESReadOnlyAccess

This policy is indeed simple as the score of 1 indicates:

The actions are only defined against one service
The resources are full wildcard
The effect is positive (Allow)
There is only 1 statement

AWSDeepRacerCloudFormationAccessPolicy

This policy is definitely more complex at 9, but it’s not overly complex:

There are 6 statements, but each statement is confined to one service
There is a condition on one of the statements

AmazonECS_FullAccess

At 47, this policy should be enough to make heads spin. The number is so high because:

There are 7 statements, but more importantly one of the statements has actions for 16 different services
There are 5 statements with conditions

The policy is too large to embed, but you can view it here.

Given that the statement with actions for 16 services is operating on the full wildcard for resources, one might argue it’s not that complicated versus the high number 47 assigned. However, wading through that list of actions to figure out if an action is or is not allowed is definitely not a trivial undertaking.

SupportUser

The SupportUser policy is actually “simple” in that it’s just one statement. This one statement has a full wildcard resource which is also simple. However, it mentions just about every other AWS service in the ecosystem which causes its score to be sky-high. One can argue the metric is misleading in this case, and given this is for a support user, it must be written this way to avoid policy length limitations.

The policy is too large to embed, but you can view it here.

Outcomes

What does all this really mean? A bunch of fancy numbers isn’t good for much unless it can be used to produce better “software”.

First and foremost, this metric can bring attention to the policy documents that need special attention with respect to testing. Given there is only so much time to test software, including IAM policy documents, this can direct the limited time available to the most complex documents that likely have “bugs”.
Even if an organization isn’t interested in developing tests for its policies, this metric can at least bring awareness to which policies are more complex and therefore more likely to contain errors.
There are some situations where the metric can encourage a rewrite of a policy document. A policy document with simpler Statements aligned to services will score better than a big jumble Statement with multiple services. On the other hand, some policy documents cannot be rewritten given length restrictions, so the metric can’t serve the full purpose of something like what CCM does for “real code”.

Conclusion

The SPCM metric is a first cut experimental attempt to score the complexity of IAM policy documents. A rough correlation between simple policies/low scores and complex policies/high scores has been demonstrated with an interesting exception of sorts in SupportUser. The hope is that this metric can draw attention to the “scariest” policies to help focus attention on which policies need the most testing. While this metric appears to have value from a common-sense perspective, it doesn’t have the empirical backing of other software complexity metrics like CCM. Perhaps that is an area for further research. Additionally, given this metric gets some real-world usage, a next step is to consider expanding the metric, or creating a new metric to score IAM roles and their trust policies.