Stelligent

DataOps: Self-Service Databases with Amazon Aurora and Stelligent’s database-artifact-factory

Development against Legacy Databases

A “legacy” relational production database can be a challenge to develop new software against, whether it be “reports” or a feature for a web application.  The schema sets a minimum for what to expect, but the actual data content may be “dirty” or inconsistent.  There can be any number of curiosities in a longer-lived database that the logic in new software must contend with. 

Exploratory Testing of Production Databases

When a developer doesn’t understand all the possible vagaries of a legacy database, exploratory testing against a recent copy of that database is a reasonable approach to take.

Given that software needs to be tested against a production database, there are a number of possible impediments to testing against such a database whether it be done against a “live” database or a copy of the database. 

Self-Service Production Database with Aurora

Given that there aren’t any issues with sensitive information (or other methods are in place to solve these issues), an ideal solution for exploratory testing would provide a self-service tool that with a single-click can quickly make a copy of a production database that is directly available to the developer.  The developer can then perform the following actions:

Amazon Aurora’s ability to rapidly “clone” databases in a matter of minutes provides a mechanism to implement such a self-service tool.  The magic of how the clone works is beyond the scope of this article.  For more information on the technical details of cloning Aurora databases, please see the Aurora User Guide.

Database Artifact Factory Pattern

Amazon Aurora provides the underlying technology to deliver a self-service tool, but the API to create the clone is not trivial to use.  Stelligent has implemented proof-of-concept tooling based upon the Database Artifact Factory pattern to provide a more user-friendly self-service cloning experience.

A “Database Artifact” is a bundle of all the metadata and automation necessary to create a database from a source database.  This metadata includes the following:

The “factory” in turn allows for building a Database Artifact and making it available to consumers for invocation.  The pattern describes several elements:

To make this more concrete:

  1. The DataOps Engineer collects all the information about the source database and what specification the target database will have.
  2. The configuration information is saved in a file, and builder software is invoked with that configuration.
  3. The builder software emits a Docker Image with all the CloudFormation and boto automation necessary to clone the database; the entry point of the image is set to software that will clone the database.
  4. The DataOps Engineer can optionally push the Docker image to a registry like DockerHub.
  5. A Database Developer that wishes to clone the database does a docker pull and runs the image, which takes a matter of minutes to run.
  6. The logic within the image clones the source database to a new target database and supplies the endpoint location to the Database Developer
  7. The Database Developer can make whatever changes to the new target database and it won’t affect the source database.

Stelligent’s database-artifact-factory

The proof-of-concept tooling that Stelligent has developed to realize the Database Artifact Factory pattern for Aurora Postgres is available at https://github.com/stelligent/database-artifact-factory.  The following section walks through the necessary steps to build an artifact and invoke it to clone a database.

Installation 

Before installing database-artifact-factory, a Docker engine and Python 3.7 or greater must be installed.

For more information on how to install Docker, please see: https://docs.docker.com/engine/install/.

For more information on installing Python, please see: https://www.python.org/about/gettingstarted/

Once the prerequisites are in place, the tool can be installed via pip:

See the gist on github.

Building Database Artifacts

Source and Target Database Configuration INI File

The first task in building an artifact is to collect all the necessary configuration about the source database and decide upon the settings for the target database.  The builder requires an “INI” formatted configuration file to capture this information.

The INI file is broken down into a source_db and a target_db section for readability.  To start, copy the following example to a local file named artifact_config.ini

See the gist on github.

Record appropriate values for the following three fields in artifact-config.ini:

AWS Credentials

Next, ensure that there are default AWS credentials in the current execution environment.  For more information on configuring credentials, please see: cil-configure-envvars.

The building process inspects the source database cluster for configuration and does some queries to determine networking configuration.  Therefore the ambient credentials will require authorization to invoke the API actions:  DescribeSubnets, DescribeDbClusters and DescribeDbSubnetGroups.

Building

Once the INI file is populated and AWS credentials are available within the environment, run the following command:

See the gist on github.

The builder will invoke docker and emit the progress of image creation to stdout.  When the build completes, the following message will appear in stdout:

See the gist on github.

From here, db-artifact:latest can be published to a registry like DockerHub or Artifactory.

Invoking the Database Artifact

AWS Credentials

First, ensure that there are AWS credentials defined for a profile in the current execution environment.  These credentials will require a broader set of authorizations as invoking the artifact will require permission to create RDS database clusters, instances and security groups, etc.  A “not quite” least privilege policy is available in the db-artifact-factory repository here.  

Invocation

Given the db-artifact Docker image is pulled locally, and AWS credentials are defined in the default profile, the command to clone the database is:

See the gist on github.

Note: the username is determined by the source database.  The password is reset here, but the username from the source database must be communicated to users of the artifact.

This command can take a while to run – usually about ten minutes depending on the DB instance class.  As it runs, the progress can be seen in the CloudFormation console under the stack named dbartifactXXXX-prod-clone where XXXX is an epoch timestamp.

When execution is complete, the endpoint for the new cluster is emitted to stdout:

See the gist on github.

Verification

From here, there are two tests to perform to verify the clone worked as expected:

Termination

When the source database is cloned, the name of the CloudFormation stack that created the clone is emitted alongside the endpoint.  When there is no further use for the cloned database, this stack can be deleted from the AWS CLI and all the resources will be released:

See the gist on github.

Conclusion

Exploratory testing against potentially massive production databases is historically a painful and expensive process.  With the advent of Aurora’s cloning technology, making copies of such databases is almost trivialized.  Stelligent’s database-artifact-factory tooling, which relies upon Aurora, provides a pattern for making these copies in a self-service manner such that database developers can more easily do exploratory testing.