In part 1 of this series, I introduced the Continuous Delivery (CD) pipeline for the Manatee Tracking application. In part 2, I went over how we use this CD pipeline to deliver software from checkin to production. In part 3, we focused on how CloudFormation is used to script the virtual AWS components that create the Manatee infrastructure. Then in part 4, we focused on a “property file less” environment by dynamically setting and retrieving properties. Part 5 explained how we use Capistrano for scripting our deployment. A list of topics for each of the articles is summarized below:
Part 1: Introduction – Introduction to continuous delivery in the cloud and the rest of the articles;
Part 2: CD Pipeline – In-depth look at the CD Pipeline;
Part 3: CloudFormation – Scripted virtual resource provisioning;
Part 4: Dynamic Configuration – “Property file less” infrastructure;
Part 5: Deployment Automation – Scripted deployment orchestration;
Part 6: Infrastructure Automation – What you’re reading now;
In this part of the series, I am going to show how we use Puppet in combination with CloudFormation to script our target environment infrastructure, preparing it for a Manatee application deployment.
What is Puppet?
Puppet is a Ruby based infrastructure automation tool. Puppet is primarily used for provisioning environments and managing configuration. Puppet is made to support multiple operating systems, making your infrastructure automation cross-platform.
How does Puppet work?
Puppet uses a library called Facter which collects facts about your system. Facter returns details such as the operating system, architecture, IP address, etc. Puppet uses these facts to make decisions for provisioning your environment. Below is an example of the facts returned by Facter.
# Facter
architecture => i386
...
ipaddress => 172.16.182.129
is_virtual => true
kernel => Linux
kernelmajversion => 2.6
...
operatingsystem => CentOS
operatingsystemrelease => 5.5
physicalprocessorcount => 0
processor0 => Intel(R) Core(TM)2 Duo CPU     P8800  @ 2.66GHz
processorcount => 1
productname => VMware Virtual Platform

Puppet uses the operating system fact to decide the service name as show below:

case $operatingsystem {
  centos, redhat: {
    $service_name = 'ntpd'
    $conf_file    = 'ntp.conf.el'
  }
}

With this case statement, if the operating environment is either centos or redhat the service name ntpd and the configuration file ntp.conf.el are used.
Puppet is declarative by nature. Inside a Puppet module you define the end state the environment end state after the Puppet run. Puppet enforces this state during the run. If at any point the environment does not conform to the desired state, the Puppet run fails.
Anatomy of a Puppet Module
To script the infrastructure Puppet uses modules for organizing related code to perform a specific task. A Puppet module has multiple sub directories that contain resources for performing the intended task. Below are these resources:
manifests/: Contains the manifest class files for defining how to perform the intended task
files/: Contains static files that the node can download during the installation
lib/: Contains plugins
templates/: Contains templates which can be used by the module’s manifests
tests/: Contains tests for the module
Puppet also uses manifests to manage multiple modules together site.pp. Puppet also uses another manifest to define what to install on each node, default.pp.
How to run Puppet
Puppet can be run using either a master agent configuration or a solo installation (puppet apply).
Master Agent: With a master agent installation, you configure one main master puppet node which manages and configure all of your agent nodes (target environments). The master initiates the installation of the agent and manages it throughout its lifecycle. This model enables infrastructure changes to your agents in parallel by controlling the master node.
Solo: In a solo Puppet run, it’s up to the user to place the desired Puppet module on the target environment. Once the module is on the target environment, the user needs run puppet apply --modulepath=/path/to/modules/ /path/to/site.pp. Puppet will then provision the server with the provided modules and site.pp without relying on another node.
Why do we use Puppet?
We use Puppet to script and automate our infrastructure — making our environment provisioning repeatable, fully automated, and less error prone. Furthermore, scripting our environments gives us complete control over our infrastructure and the ability to terminate and recreate environments as often as they choose.
Puppet for Manatees
In the Manatee infrastructure, we use Puppet for provisioning our target environments. I am going to go through our manifests and modules while explaining their use and purpose. In our Manatee infrastructure, we create a new target environment as part of the CD pipeline – discussed in part 2 of the series, CD Pipeline. Below I provide a high-level summary of the environment provisioning process:
1. CloudFormation dynamically creates a params.pp manifest with AWS variables
2. CloudFormation runs puppet apply as part of UserData
3. Puppet runs the modules defined in hosts/default.pp.
4. Cucumber acceptance tests are run to verify the infrastructure was provisioned correctly.
Now that we know at a high-level what’s being done during the environment provisioning, let’s take a deeper look at the scripts in more detail. The actual scripts can be found here: Puppet
First we will start off with the manifests.
The site.pp (shown below) serves two purposes. It loads the other manifests default.pp, params.pp and also sets stages pre, main and post.

import "hosts/*"
import "classes/*"
stage { [pre, post]: }
Stage[pre] -> Stage[main] -> Stage[post]

These stages are used to define the order in which Puppet modules should be run. If the Puppet module is defined as pre,it will run before Puppet modules defined as main or post. Moreover if stages aren’t defined, Puppet will determine the order of execution. The default.pp (referenced below) shows how staging defined for executing puppet modules.

node default {
  class { "params": stage => pre }
  class { "java": stage => pre }
  class { "system": stage => pre }
  class { "tomcat6": stage => main }
  class { "postgresql": stage => main }
  class { "subversion": stage => main }
  class { "httpd": stage => main }
  class { "groovy": stage => main }
}

The default.pp manifest also defines which Puppet modules to use for provisioning the target environment.
params.pp (shown below), loaded from site.pp, is dynamically created using CloudFormation. params.pp is used for setting AWS property values that are used later in the Puppet modules.

class params {
  $s3_bucket = ''
  $application_name = ''
  $hosted_zone = ''
  $access_key = ''
  $secret_access_key = ''
  $jenkins_internal_ip = ''
}

Now that we have an overview of the manifests used, lets take a look at the Puppet modules themselves.
In our java module, which is run in the pre stage, we are running a simple installation using packages. This is easily dealt with in Puppet by using the package resource. This relies on Puppet’s knowledge of the operating system and the package manager. Puppet simply installs the package that is declared.

class java {
  package { "java-1.6.0-openjdk": ensure => "installed" }
}

The next module we’ll discuss is system. System is also run during the pre stage and is used for the setup of all the extra operations that don’t necessarily need their own module. These actions include setting up general packages (gcc, make, etc.), installing ruby gems (AWS sdk, bundler, etc.), and downloading custom scripts used on the target environment.

class system {
  include params
  $access_key = $params::access_key
  $secret_access_key = $params::secret_access_key
  Exec { path => '/usr/bin:/bin:/usr/sbin:/sbin' }
  package { "gcc": ensure => "installed" }
  package { "mod_proxy_html": ensure => "installed" }
  package { "perl": ensure => "installed" }
  package { "libxslt-devel": ensure => "installed" }
  package { "libxml2-devel": ensure => "installed" }
  package { "make": ensure => "installed" }
  package {"bundler":
    ensure => "1.1.4",
    provider => gem
  }
  package {"trollop":
    ensure => "2.0",
    provider => gem
  }
  package {"aws-sdk":
    ensure => "1.5.6 ",
    provider => gem,
    require => [
      Package["gcc"],
      Package["make"]     ]   }
  file { "/home/ec2-user/aws.config":
    content => template("system/aws.config.erb"),
    owner => 'ec2-user',
    group => 'ec2-user',
    mode => '500',
  }
  define download_file($site="",$cwd="",$creates=""){
    exec { $name:
      command => "wget ${site}/${name}",
      cwd => $cwd,
      creates => "${cwd}/${name}"
    }
  }
  download_file {"database_update.rb":
    site => "https://s3.amazonaws.com/sea2shore",
    cwd => "/home/ec2-user",
    creates => "/home/ec2-user/database_update.rb",
  }
  download_file {"id_rsa.pub":
    site => "https://s3.amazonaws.com/sea2shore/private",
    cwd => "/tmp",
    creates => "/tmp/id_rsa.pub"
  }
  exec {"authorized_keys":
    command => "cat /tmp/id_rsa.pub >> /home/ec2-user/.ssh/authorized_keys",
    require => Download_file["id_rsa.pub"]     }
  }

First I want to point out that at the top we are specifying to include params. This enables the system module to access the params.pp file. This way we can use the properties defined in params.pp.

include params
$access_key = $params::access_key
$secret_access_key = $params::secret_access_key

This enables us to define the parameters in one central location and then reference that location with other module.
As we move through the script we are using the package resource similar to previous modules. For each rubygem we use the package resource and explicitly tell Puppet to use the gem provider. You can specify other providers like rpm and yum.
We use the file resource to create files from templates.

AWS.config(
  :access_key_id => "",
  :secret_access_key => ""
)

In the aws.config.erb template (referenced above) we are using the properties defined in params.pp for dynamically creating an aws.config credential file. This file is then used by our database_update.rb script for connecting to S3.
Speaking of the database_update.rb script, we need to get it on the target environment. To do this, we define a download_file resource.

define download_file($site="",$cwd="",$creates=""){
  exec { $name:
    command => "wget ${site}/${name}",
    cwd => $cwd,
    creates => "${cwd}/${name}"
  }
}

This creates a new resource for Puppet to use. Using this we are able to download both the database_update.rb and id_rsa.pub public SSH key.
As a final step for setting up the system, we execute a bash line for copying the id_rsa.pub contents into the authorized_keys file for the ec2-user. This enables clients with the connected id_rsa key to ssh into the target environment as ec2-user.
The Manatee infrastructure uses Apache for the webserver, Tomcat for the app server, and PostgreSQL for its database. Puppet these up as part of the main stage, meaning they run in order after the pre stage modules are run.
In our httpd module, we are performing several steps discussed previously. The httpd package is installed and creating a new file from a template.

class httpd {
  include params
  $application_name = $params::application_name
  $hosted_zone = $params::hosted_zone
  package { 'httpd':
    ensure => installed,
  }
  file { "/etc/httpd/conf/httpd.conf":
    content => template("httpd/httpd.conf.erb"),
    require => Package["httpd"],
    owner => 'ec2-user',
    group => 'ec2-user',
    mode => '664',
  }
  service { 'httpd':
    ensure => running,
    enable => true,
    require => [
      Package["httpd"],
      File["/etc/httpd/conf/httpd.conf"]],
      subscribe => Package['httpd'],
    }
  }

The new piece of functionality used in our httpd module is service. service allows us define the state the httpd service should be in at the end of our run. In this case, we are declaring that it should be running.
The Tomcat module again uses package to define what to install and service to declare the end state of the tomcat service.

class tomcat6 {
  Exec { path => '/usr/bin:/bin:/usr/sbin:/sbin' }
  package { "tomcat6":
    ensure => "installed"
  }
  $backup_directories = [
    "/usr/share/tomcat6/.sarvatix/",
    "/usr/share/tomcat6/.sarvatix/manatees/",
    "/usr/share/tomcat6/.sarvatix/manatees/wildtracks/",
    "/usr/share/tomcat6/.sarvatix/manatees/wildtracks/database_backups/",
    "/usr/share/tomcat6/.sarvatix/manatees/wildtracks/database_backups/backup_archive",
  ]   file { $backup_directories:
    ensure => "directory",
    owner => "tomcat",
    group => "tomcat",
    mode => 777,
    require => Package["tomcat6"],
  }
  service { "tomcat6":
    enable => true,
    require => [
      File[$backup_directories],
      Package["tomcat6"]],
    ensure => running,
  }
}

Tomcat uses the file resource differently then previous modules. tomcat uses file for creating directories. This is defined using ensure => “directory”.
We are using the package resource for installing PostgreSQL, building files from templates using the file resource, performing bash executions with exec, and declaring the intended state of the PostgreSQL using the service resource.

class postgresql {
  include params
  $jenkins_internal_ip = $params::jenkins_internal_ip
  Exec { path => '/usr/bin:/bin:/usr/sbin:/sbin' }
  define download_file($site="",$cwd="",$creates=""){
    exec { $name:
      command => "wget ${site}/${name}",
      cwd => $cwd,
      creates => "${cwd}/${name}"
    }
  }
  download_file {"wildtracks.sql":
    site => "https://s3.amazonaws.com/sea2shore",
    cwd => "/tmp",
    creates => "/tmp/wildtracks.sql"
  }
  download_file {"createDbAndOwner.sql":
    site => "https://s3.amazonaws.com/sea2shore",
    cwd => "/tmp",
    creates => "/tmp/createDbAndOwner.sql"
  }
  package { "postgresql8-server":
    ensure => installed,
  }
  exec { "initdb":
    command => "service postgresql initdb",
    require => Package["postgresql8-server"]   }
  file { "/var/lib/pgsql/data/pg_hba.conf":
    content => template("postgresql/pg_hba.conf.erb"),
    require => Exec["initdb"],
    owner => 'postgres',
    group => 'postgres',
    mode => '600',
  }
  file { "/var/lib/pgsql/data/postgresql.conf":
    content => template("postgresql/postgresql.conf.erb"),
    require => Exec["initdb"],
    owner => 'postgres',
    group => 'postgres',
    mode => '600',
  }
  service { "postgresql":
    enable => true,
    require => [
      Exec["initdb"],
      File["/var/lib/pgsql/data/postgresql.conf"],
      File["/var/lib/pgsql/data/pg_hba.conf"]],
    ensure => running,
  }
  exec { "create-user":
    command => "echo CREATE USER root | psql -U postgres",
    require => Service["postgresql"]   }
  exec { "create-db-owner":
    require => [
      Download_file["createDbAndOwner.sql"],
      Exec["create-user"],
      Service["postgresql"]],
    command => "psql [
      Download_file["wildtracks.sql"],
      Exec["create-user"],
      Service["postgresql"],
      Exec["create-db-owner"]],
    command => "psql -U manatee_user -d manatees_wildtrack -f /tmp/wildtracks.sql"
  }
}

In this module we are creating a new user on the PostgreSQL database:

exec { "create-user":
  command => "echo CREATE USER root | psql -U postgres",
  require => Service["postgresql"] }

In this next section we download the latest Manatee database SQL dump.

download_file {"wildtracks.sql":
  site => "https://s3.amazonaws.com/sea2shore",
  cwd => "/tmp",
  creates => "/tmp/wildtracks.sql"
}

In the section below, we load the database with the SQL file. This builds our target environments with the production database content giving developers an exact replica sandbox to work in.

exec { "load-database":
  require => [
    Download_file["wildtracks.sql"],
    Exec["create-user"],
    Service["postgresql"],
    Exec["create-db-owner"]],
  command => "psql -U manatee_user -d manatees_wildtrack -f /tmp/wildtracks.sql"
  }
}

Lastly in our Puppet run, we install subversion and groovy on the target node. We could have just included these in our system module, but they seemed general purpose enough to create individual modules.
Subversion manifest:

class subversion {
  package { "subversion":
    ensure => "installed"
  }
}

Groovy manifest:

class groovy {
  Exec { path => '/usr/bin:/bin:/usr/sbin:/sbin' }
  define download_file($site="",$cwd="",$creates=""){
    exec { $name:
    command => "wget ${site}/${name}",
    cwd => $cwd,
    creates => "${cwd}/${name}"
    }
  }
  download_file {"groovy-1.8.2.tar.gz":
    site => "https://s3.amazonaws.com/sea2shore/resources/binaries",
    cwd => "/tmp",
    creates => "/tmp/groovy-1.8.2.tar.gz",
  }
  file { "/usr/bin/groovy-1.8.2/":
    ensure => "directory",
    owner => "root",
    group => "root",
    mode => 755,
    require => Download_file["groovy-1.8.2.tar.gz"],
  }
  exec { "extract-groovy":
    command => "tar -C /usr/bin/groovy-1.8.2/ -xvf /tmp/groovy-1.8.2.tar.gz",
    require => File["/usr/bin/groovy-1.8.2/"],
  }
}

The Subversion manifest is relatively straightforward as we are using the package resource. The Groovy manifest is slightly different, we are downloading the Groovy tar, placing it on the filesystem, and then extracting it.
We’ve gone through how the target environment is provisioned. We do however have one more task, testing. It’s not enough to assume that if Puppet doesn’t error out, that everything got installed successfully. For this reason, we use Cucumber to do acceptance testing against our environment. Our tests check if services are running, configuration files are present and if the right packages have been installed.
Puppet allows us to completely script and version our target environments. Consequently, this enables us to treat environments as disposable entities. As a practice, we create a new target environment every time our CD pipeline is run. This way we are always deploying against a known state.
As our blog series is coming to a close, let’s recap what we’ve gone through. In the Manatee infrastructure we use a combination of CloudFormation for scripting AWS resources, Puppet for scripting target environments, Capistrano for deployment automation, Simple DB and CloudFormation for dynamic properties and
Jenkins for coordinating all the resources into one cohesive unit for moving a Manatee application change from check-in to production in just a single click.