Deleting a Stuck CloudFormation Stack
One of the things I have come across many times over the years is attempting to delete an AWS CloudFormation Stack and getting an error like this:
Role arn:aws:iam::123456789012:role/CloudFormationTrustRole-2CDE9F7RUUTH is invalid or cannot be assumed
In this case, an IAM Role used by the stack either got deleted manually or by another stack when it was deleted.
A few years back, someone asked about this problem on Stack Overflow: Unable to delete cfn stack, role is invalid or cannot be assumed. I responded to the query indicating that the user should create an IAM Role of the same name listed in the error, then remove the CloudFormation stack and, finally, remove the role. The problem with this is that there are numerous manual steps and there is a pretty good chance you’re going to forget to remove the role after you created it. This leads to a mess of IAM resources that exist – but are never removed – in AWS accounts. This also reduces your overall security posture.
Another respondent suggested to delete the stack using a role that has the permissions to delete that stack as shown below.
aws cloudformation delete-stack --role-arn arn:aws:iam::xxxx:role/anyrolewithpermissions --stack-name StuckStack
This works too but it also requires you to create a new role or at least know of a role that has permissions to remove the CloudFormation stack and its associated resources which usually leads you back to my suggestion. AWS Support shares a similar solution here: How do I delete an AWS CloudFormation stack that’s stuck in the DELETE_FAILED status?
Frustrated with these manual solutions, I decided I wanted to automate the creation and deletion of a temporary IAM role and then automatically delete all associated CloudFormation stacks.
In this post, you will learn how to resolve this “stuck stack” issue so that you can continue to accelerate the speed of feedback between users and builders.
Resolving the “Stuck Stack” Scenario
In this scenario, I will launch a CloudFormation stack that provisions a pipeline that builds and deploys a simple Lambda function. The pipeline provisioned by the “primary” CloudFormation stack creates its own CloudFormation stack as part of the deployment of the Lambda function. The primary CloudFormation stack provisions a role that is shared with the Lambda stack and so if you delete the primary stack first, it deletes the IAM role role used by the stack it deploys. This leads to the error you saw at the beginning of this post.
To begin testing this scenario, run the commands in Listing 1. Alternatively, you can create your own simple CloudFormation stack and then delete the IAM role associated with the stack and then attempt to delete the stack.
Listing 1 – Commands to Setup a “Stuck Stack”
After running the commands in Listing 1, it launches a stack that includes a CodePipeline Pipeline resource and automatically initiates a CodePipeline revision. To see the pipeline in action and verify it’s successfully completed, go to the CloudFormation console and select the primary stack generated when running the above commands. Then, select the Output pane and click on the PipelineUrl Output value.
Once the pipeline is complete, select the primary CloudFormation stack and attempt to delete it. It should delete successfully. You can see me deleting CloudFormation stacks in the console in Figure 1.
Figure 1- Deleting CloudFormation Stacks
After you have successfully deleted the primary stack, attempt to delete the secondary stack. In this case, the name is [PRIMARY-STACK-NAME]-AWS-REGION
. When attempting to delete this stack, you will get an error similar to what you see in Figure 2.
Figure 2- Missing IAM Role When Deleting CloudFormation Stack
To fix this, I created a CloudFormation template that creates an IAM role, deletes the stack you were attempting to delete and then deletes itself so that you don’t have extraneous stacks and roles lingering in your AWS account. Part of this template is based on the CloudFormation template described in the blog post: Scheduling automatic deletion of AWS CloudFormation stacks.
To get this going, run the commands in Listing 2 – replacing the ROLE_NAME
, STACK_NAME
, and TTL
parameters with values. The ROLE_NAME
argument is the role name in red that looks similar to what you see in Figure 2 (not the full ARN, just the IAM role name). The STACK_NAME
argument is the name of the secondary stack: [PRIMARY-STACK-NAME]-AWS-REGION
. TTL
is the number of minutes until the stack gets automatically deleted. For example, the last command I ran was: ./create-cfn-role.sh lambda-pmd-CloudFormationTrustRole-HG1TOKQSBT3T lambda-pmd-us-east-1 5
Listing 2 – Commands to Delete a “Stuck Stack”
When you run this bash script, it waits for the creation and deletion of the CloudFormation stacks before it completes. You will see something like Figure 3 after running the command.
Figure 3 – Running bash script to create and delete an IAM Role and CloudFormation Stacks
In Listing 3, you see a CloudFormation snippet that provisions the IAM Role.
Listing 3 – IAM Role used provisioned so that CloudFormation stack can be deleted
In Listing 4, you see a Python snippet in the CloudFormation template that provisions a Lambda function that deletes the current stack (i.e. self deletes).
Listing 4 – Lambda Function Provisioned in a CloudFormation Template to Delete a CloudFormation Stack
In Listing 5, you see a snippet in bash that deletes the CloudFormation stack that you were unable to delete before creating the temporary IAM Role.
Listing 5 – Delete the “Stuck” CloudFormation stack
Troubleshooting
Sometimes you will receive a DELETE_FAILED
error when the delete-stack commands are running. In looking at the error within the CloudFormation events, you might see something like The security token included in the request is invalid
. If this happens, you can usually run the script again and you shouldn’t receive the error.
Figure 4 – Delete Stack Failure
As you can see in Figure 4, it will also automatically delete the stack you launched so that there are no unnecessary resources in your account.
Conclusion
In this post, you learned how you can create an IAM Role for a CloudFormation you are unable to delete. What’s more, you learned how you can automate all these steps so that you don’t have a bunch of unused resources in your AWS account.
Photo by Sidney Pearce on Unsplash
Stelligent Amazon Pollycast
|