DataOps: Self-Service Databases with Amazon Aurora and Stelligent’s database-artifact-factory
Development against Legacy Databases
A “legacy” relational production database can be a challenge to develop new software against, whether it be “reports” or a feature for a web application. The schema sets a minimum for what to expect, but the actual data content may be “dirty” or inconsistent. There can be any number of curiosities in a longer-lived database that the logic in new software must contend with.
Exploratory Testing of Production Databases
When a developer doesn’t understand all the possible vagaries of a legacy database, exploratory testing against a recent copy of that database is a reasonable approach to take.
Given that software needs to be tested against a production database, there are a number of possible impediments to testing against such a database whether it be done against a “live” database or a copy of the database.
- The database may contain sensitive data or PHI/PII that a developer isn’t authorized to access.
- A “live” database may not support the extra load for certain types of testing (e.g. querying against unindexed columns).
- Destructive tests that mutate or remove data from the database are inappropriate for a live production database.
- Copying a massive TB/PB production database can take a long time and require hosting resources that are expensive or of limited availability.
- Providing secure access to a copy of the production database may require operational expertise that is also of limited availability.
Self-Service Production Database with Aurora
Given that there aren’t any issues with sensitive information (or other methods are in place to solve these issues), an ideal solution for exploratory testing would provide a self-service tool that with a single-click can quickly make a copy of a production database that is directly available to the developer. The developer can then perform the following actions:
- Execute destructive or slow tests against the copy that don’t affect the live database.
- Throw away the copy after testing to control hosting resource costs.
Amazon Aurora’s ability to rapidly “clone” databases in a matter of minutes provides a mechanism to implement such a self-service tool. The magic of how the clone works is beyond the scope of this article. For more information on the technical details of cloning Aurora databases, please see the Aurora User Guide.
Database Artifact Factory Pattern
Amazon Aurora provides the underlying technology to deliver a self-service tool, but the API to create the clone is not trivial to use. Stelligent has implemented proof-of-concept tooling based upon the Database Artifact Factory pattern to provide a more user-friendly self-service cloning experience.
A “Database Artifact” is a bundle of all the metadata and automation necessary to create a database from a source database. This metadata includes the following:
- Network configuration for where to create the database, e.g. VPC and subnet identifiers.
- Hardware settings for database servers, e.g. instance type for CPU and memory.
- Operational settings for the database like maintenance windows, backup windows, etc.
- Location information about the source database, i.e. the RDS cluster identifier, point-in-time, snapshot identifier, etc.
The “factory” in turn allows for building a Database Artifact and making it available to consumers for invocation. The pattern describes several elements:
- The Database Artifact.
- Builder software to create the artifact.
- A registry to store the artifact.
- A consumer to invoke the artifact to (re)create an actual database.
To make this more concrete:
- The DataOps Engineer collects all the information about the source database and what specification the target database will have.
- The configuration information is saved in a file, and builder software is invoked with that configuration.
- The builder software emits a Docker Image with all the CloudFormation and boto automation necessary to clone the database; the entry point of the image is set to software that will clone the database.
- The DataOps Engineer can optionally push the Docker image to a registry like DockerHub.
- A Database Developer that wishes to clone the database does a docker pull and runs the image, which takes a matter of minutes to run.
- The logic within the image clones the source database to a new target database and supplies the endpoint location to the Database Developer
- The Database Developer can make whatever changes to the new target database and it won’t affect the source database.
The proof-of-concept tooling that Stelligent has developed to realize the Database Artifact Factory pattern for Aurora Postgres is available at https://github.com/stelligent/database-artifact-factory. The following section walks through the necessary steps to build an artifact and invoke it to clone a database.
Before installing database-artifact-factory, a Docker engine and Python 3.7 or greater must be installed.
For more information on how to install Docker, please see: https://docs.docker.com/engine/install/.
For more information on installing Python, please see: https://www.python.org/about/gettingstarted/
Once the prerequisites are in place, the tool can be installed via pip:
Building Database Artifacts
Source and Target Database Configuration INI File
The first task in building an artifact is to collect all the necessary configuration about the source database and decide upon the settings for the target database. The builder requires an “INI” formatted configuration file to capture this information.
The INI file is broken down into a source_db and a target_db section for readability. To start, copy the following example to a local file named artifact_config.ini.
Record appropriate values for the following three fields in artifact-config.ini:
- This is the Aurora Postgres RDS cluster identifier for the cluster to clone. This can be located in the RDS console.
- These are the subnets to create the cloned cluster within. If the value here is source, it will use the subnets of the source database. Otherwise, a comma-separated list of two or more explicit subnet identifiers can be specified. Each subnet must be in a separate availability zone.
- This is the size of the instance to create within the cluster. This value has no relationship with the source cluster’s instance type; it can be legal AWS instance types.
Next, ensure that there are default AWS credentials in the current execution environment. For more information on configuring credentials, please see: cil-configure-envvars.
The building process inspects the source database cluster for configuration and does some queries to determine networking configuration. Therefore the ambient credentials will require authorization to invoke the API actions: DescribeSubnets, DescribeDbClusters and DescribeDbSubnetGroups.
Once the INI file is populated and AWS credentials are available within the environment, run the following command:
The builder will invoke docker and emit the progress of image creation to stdout. When the build completes, the following message will appear in stdout:
From here, db-artifact:latest can be published to a registry like DockerHub or Artifactory.
Invoking the Database Artifact
First, ensure that there are AWS credentials defined for a profile in the current execution environment. These credentials will require a broader set of authorizations as invoking the artifact will require permission to create RDS database clusters, instances and security groups, etc. A “not quite” least privilege policy is available in the db-artifact-factory repository here.
Given the db-artifact Docker image is pulled locally, and AWS credentials are defined in the default profile, the command to clone the database is:
Note: the username is determined by the source database. The password is reset here, but the username from the source database must be communicated to users of the artifact.
This command can take a while to run – usually about ten minutes depending on the DB instance class. As it runs, the progress can be seen in the CloudFormation console under the stack named dbartifactXXXX-prod-clone where XXXX is an epoch timestamp.
When execution is complete, the endpoint for the new cluster is emitted to stdout:
From here, there are two tests to perform to verify the clone worked as expected:
- Login to the cloned database to confirm the data is present.
- Make a change to the cloned database and login to the source database to confirm the change is not reflected there.
When the source database is cloned, the name of the CloudFormation stack that created the clone is emitted alongside the endpoint. When there is no further use for the cloned database, this stack can be deleted from the AWS CLI and all the resources will be released:
Exploratory testing against potentially massive production databases is historically a painful and expensive process. With the advent of Aurora’s cloning technology, making copies of such databases is almost trivialized. Stelligent’s database-artifact-factory tooling, which relies upon Aurora, provides a pattern for making these copies in a self-service manner such that database developers can more easily do exploratory testing.