Stelligent

Automating Amazon Polly for WordPress

In our previous post on automating AWS Budgets, I included an embedded audio file that read each word of the blog post. I did this using an AWS service called Amazon Polly – which turns text into lifelike speech using deep learning. I got the idea after noticing that Jeff Barr, Chief Evangelist at AWS, had started providing this capability on the AWS News blog. I really enjoy listening to books from my Audible collection so this was particularly intriguing to me. Featured Image Photo by Greg Hill on Unsplash
To create an audio file of my last post, I created an Amazon S3 bucket, copied the text from the blog post and pasted it into the text field for Amazon Polly. Then, I configured Polly to create the audio file and clicked a button to synthesize it to S3. Once it was complete, I made the mp3 file public read in S3, and copied the URL to the blog post in WordPress.
Since then, I learned that there’s a WordPress plugin for Amazon Polly. Since we host Stelligent’s website and blog on WordPress.com, this plugin would make it very convenient to automatically provide text-to-speech capabilities for all of our blog posts.
In this post, I provide automation examples for two scenarios. One that uses the Amazon Polly plugin for WordPress and another that’s useful for websites you’d like to provide text-to-speech capabilities that don’t use WordPress.

Example 1: WordPress

There are 4 steps you’ll go through to provide recordings of your WordPress blog posts. They are:

  1. Create a new IAM Group using CloudFormation
  2. Update the CloudFormation Termination Protection on the Stack
  3. Create a new IAM User and Assign the Group you created in Step 1
  4. Install and Configure the Amazon Polly WordPress Plugin

These steps are described in greater detail below. AWS provides the fully manual steps for the WordPress plugin installation and configuration in this blog post: Give Your WordPress Blog a Voice With Our New Amazon Polly Plugin.

Create a New IAM Group using CloudFormation

Create a new IAM Group by launching the CloudFormation stack below. The WordPressPollyGroup Output provides the name of the provisioned IAM group.

Here’s an example of running the same from the command line:

aws cloudformation create-stack --stack-name YOURSTACKNAME --template-body file:///home/ec2-user/environment/cloudformation_templates/labs/polly/wordpress-polly.yml --capabilities CAPABILITY_NAMED_IAM --no-disable-rollback

Update the CloudFormation Termination Protection on the Stack

Since you’ll want to keep these resources running in your account, you’ll likely want to enable termination protection on your CloudFormation stack. To do this, see an example of running this command below.

aws cloudformation update-termination-protection --enable-termination-protection --stack-name YOURSTACKNAME

Create a new IAM User and Assign it to an IAM Group

Next, you’ll create a new IAM user and assign it to the IAM Group you created in the previous step. Here are the instructions:

Install and Configure the Amazon Polly WordPress Plugin

Go to your WordPress Admin page. For example, in WordPress.com, the admin page might be https://yourwebsite.com/wp-admin/.
Go to Plugins and search for Polly. You should see a listing that looks similar to Figure 1.

Figure 1 – Amazon Polly for WordPress Plugin

Click Install Now and then Activate. Then, click on Settings|Polly from the Admin page. Here you’ll need to enter the AWS access key and AWS secret key in the Polly plugin configuration and click Save Changes.

Figure 2 – Amazon Polly WordPress Settings

Then, you can make edits to the many configuration fields available in the plugin including email address, the pollycast name you want displayed, and categories. In particular, you can also make bulk changes to all of your blog posts. You do this by clicking Bulk Changes which enables text–to-speech for all of your blog posts.

Figure 3 – Apply Polly Text-to-Speech for all blog posts

Example 2: Other Websites

Here’s an example of a process that enables Amazon Polly’s text-to-speech capabilities for your non-Wordpress websites. It should also give you a better understanding of the underlying services that enable these features.
There are 5 steps you’ll go through to publish a recording to your blog post or website. They are:

  1. Copy display text from website
  2. Commit text file to GitHub
  3. Launch CloudFormation stack
  4. Get URL from deployment pipeline
  5. Manually update HTML with audio file

Step 1 – Copy display text from website

Go to the webpage you’d like to convert to speech and copy the visible text and paste it into a text file. Save the file. You can fork the https://github.com/stelligent/cloudformation_templates repository to see this in action using the https://github.com/stelligent/cloudformation_templates/tree/master/labs/polly example.

Figure 4 – Example of copying text from web page

Another option is to use a command-line utility to get only the display text from the RSS feed. I wasted a lot of time with this option and decided to simply copy the text.

Step 2 – Commit Text File to GitHub

Commit the text file to your GitHub repository – as shown in the example below.

git add blog.text
git commit -am "update blog text" && git push

Step 3 – Launch CloudFormation Stack

In this section you’ll launch the CloudFormation stack that provisions all the necessary AWS resources for this solution including S3 bucket for storage, IAM Roles, CodeBuild, and CodePipeline.

Architecture and Implementation

Architecture Diagram

Figure 5 – CloudFormation Architecture Diagram for Amazon Polly Solution

CloudFormation Templates resources

Here’s a list of the AWS resources used in this solution:

CodeBuild

Below, you see an example of the CloudFormation resource that uses CodeBuild to run commands that use Polly to create an MP3 file and upload it to Amazon S3. ./labs/polly/ refers to a path available through the CodePipeline Input Artifact located in a secured bucket in S3.

  RunPollyCommands:
    Type: AWS::CodeBuild::Project
    DependsOn: CodeBuildRole
    Properties:
      Name: !Sub ${AWS::StackName}-PollyCommands
      Description: Deploy site to S3
      ServiceRole: !GetAtt CodeBuildRole.Arn
      Artifacts:
        Type: CODEPIPELINE
      Environment:
        Type: !Ref BuildType
        ComputeType: !Ref BuildComputeType
        Image: !Sub ${BuildImage}
      Source:
        Type: CODEPIPELINE
        BuildSpec: !Sub |
          version: 0.2
          phases:
            post_build:
              commands:
                - aws --version
                - testvar=$(cat ./labs/polly/blog.txt)
                - aws polly start-speech-synthesis-task --output-format mp3 --output-s3-bucket-name ${PollyRecordingsBucket} --text "$testvar" --voice-id Joanna
                - pollyObjectId=$(aws polly list-speech-synthesis-tasks --max-results 1 --query 'SynthesisTasks[].TaskId' --output text)
                - pollyObjectTaskId=$(echo $pollyObjectId.mp3)
                - sleep 45
                - aws s3api put-object-acl --bucket ${PollyRecordingsBucket} --key "$pollyObjectTaskId" --acl public-read
          artifacts:
            files:
            - '**/*'
      TimeoutInMinutes: 10

I put a sleep in the commands run via CodeBuild to wait for the mp3 file to upload to S3.

Deployment Steps

There are three steps to deploying the solution: preparing an AWS account, launching the stack, and testing the deployment. Each are described below.

Prepare an AWS Account

Create or login AWS account at http://aws.amazon.com by following the instructions on the site.

Launch the Stack

Click on the Launch Stack button below to launch the CloudFormation Stack to set up your Amazon Polly solution.
Stack Assumptions: The pipeline stack assumes the stack is launched in the US East (N. Virginia) Region (us-east-1) and may not function properly if this region is not used.
.

You can launch the same stack using the AWS CLI. Here’s an example:

aws cloudformation create-stack --stack-name YOURSTACKNAME --template-body file:///home/ec2-user/environment/cloudformation_templates/labs/polly/pipeline.yml --parameters ParameterKey=GitHubToken,ParameterValue=GITHUBTOKEN --capabilities CAPABILITY_NAMED_IAM --no-disable-rollback
Parameters
Parameters Description
GitHubUser Your unique GitHub userid. Default is stelligent
GitHubRepo GitHub Repo to pull from. Only the Name. not the URL. Default is cloudformation_templates
GitHubBranch GitHub Branch. Default is master
GitHubToken Available at https://github.com/settings/tokens. Should have access to the following
BuildType The build container type to use for building the app. Default is LINUX_CONTAINER
BuildComputeType The build compute type to use for building the app. Default is BUILD_GENERAL1_SMALL
BuildImage The build image to use for building the app. Default is aws/codebuild/ubuntu-base:14.04

Test the Deployment

Go to the AWS Polly Synthesis Tasks and verify the audio recordings have been generated. There’s an S3 URL column available to download the file as well.

Step 4 – Deployment Pipeline

The deployment pipeline that’s automatically provisioned in Step 3 of this section will look similar to Figure 6.

Figure 6 – Deployment Pipeline for Polly Solution in AWS CodePipeline

There are two stages: Source and Build. The Source stage polls a GitHub repository for changes. If it discovers changes – for example, to a text file of the content that you want to convert to speech – it will download the files to S3 via AWS CodePipeline and then run a build using AWS CodeBuild in the next stage. In the Build stage, it calls Polly from the command line to convert the text to speech and upload it to an S3 bucket. CodeB ild displays the S3 URL that you can copy to your website in Step 5 or you can get the URL from Polly or S3. One final step in this stage is CodeBuild runs a command to make the MP3 file publicread so that it’s available from your website.

Step 5 – Manually Update the HTML

Once you get the S3 URL from CodeBuild, S3, or Polly, you can then write simple HTML to display an audio plugin so that users can click a button to have it play the MP3 audio file. This is usually a one-time operation unless there are updates to your blog post. See an example of the HTML in the snippet below.

<audio controls="controls">
  <source src="https://s3.amazonaws.com/pmd-polly-1334-polly-files/blog.01c3a373-70e3-446f-8bf3-2da95d52c3e5.mp3" type="audio/mpeg" />
  Your browser does not support the audio tag.</audio>

Costs

This section outlines cost considerations for provisioning an Amazon Polly text-to-speech solution.

Additional Resources

Summary

You learned how to create an audio recording of a blog post using WordPress or non-Wordpress-enabled websites and embed a player so readers can listen to an audio rendition of your post. You might also notice that we’ve provided a “Stelligent Amazon Pollycast” for all 233 (and counting) of our blog posts!

Stelligent Amazon Pollycast