While OpsWorks gives you a lot of power and flexibility for configuring your infrastructure, it’s most powerful feature is its ability to get you through your Netflix queue: OpsWorks takes a really long time to do anything, so you’re left twiddling your thumbs a lot. And if you’re trying to troubleshoot an issue, you can sometimes have an entire episode of The Next Generation between when you kick off a stack and when it finally fails. However, there are some tricks you can do to mitigate that, so let’s talk about one today.
We’ve come up with a couple of techniques to mitigate OpsWorks slowness, and one of them is to run problematic cookbooks on the failed OpsWorks instances until you can figure out a solution. When OpsWorks fails to set up an instance, it doesn’t destroy the instance; you can still SSH into it. By doing that, you can attempt to manually re-run your failed cookbook from that point, instead of having to wait for OpsWorks to recreate everything.
(Of course, to be able to do any of this, your OpsWorks instances need to be associated with an SSH keypair, or you need to have SSH permissions enabled for the stack and have a public key associated with your account in the OpsWorks settings. So make sure you’re doing that, at least during development!)
A normal workflow for us is we’ll see a cookbook failure, bang our head on our desk for making such a simple mistake, and then we’ll go fix the cookbook. After the change is committed to our repo, we’ll log into the failed OpsWorks instance and run these commands:
sudo -i
cd /opt/aws/opsworks/current/
opsworks-agent-cli get_json > attributes.json
bin/chef-solo -c conf/solo.rb -j attributes.json -o recipe[whatever],recipe[whatever_else::specific_recipe]
This will kick off Chef, but only run the recipe(s) you define. While waiting around for OpsWorks is something you can’t get rid of entirely, using this technique will let you save a bit of time (so you might not be able to finish Season 3 today).
Once you fix the issue, don’t forget to commit your changes to source control! Also, you’ll still need to create a whole new OpsWorks stack; the instance will still show as failed in the OpsWorks console and features like deployments and auto-healing won’t be applied to the instance.