Close

Give Your Products the Power of Speech Using Amazon Polly

Would you like to get your products more deeply integrated into your customers’ daily lives?

Of course, what business wouldn’t.

An important first step towards making that happen is to give your products the ability to interact with your customers on their own terms. And the easiest way to do that is through natural speech.

The Power of Spoken Language

Human beings have been speaking to each other since the dawn of time. Speech is our most natural form of communication — and one of the reasons why we’ve been so successful as a species.

So let’s dive into what it takes to give your applications and devices the ability to speak in a manner that’s natural and comfortable to your customers.

Recent advancements in artificial intelligence have made this super easy, so it’ll be quick.

Got 5 Minutes?

This short guide will walk you through converting written text into a spoken audio file using the Amazon Polly text-to-speech service.

Note: Amazon Polly only provides a one-way speech capability — converting written text into spoken audio (text-to-speech). If you want to be able to understand spoken audio as well, you’ll additionally need a speech-to-text service, like Amazon Lex.

An easy on ramp for A.I.

This is a how-to guide intended for developers or tech-savvy business leaders looking for a proven entry point into A.I.-powered business systems.

The scripts we’ll be using are simple and easy to read — Amazon’s SDK has already done most of the heavy lifting for you.

So let’s get right to it…

What You’ll Need

Right off the bat, let’s get the initial requirements knocked out.

Download the source repository.

To start, let’s pull down the source files. (You’ll need a git client installed on your computer for this step.)

And for a change of pace, we’re going to use PHP for these scripts. You’ll need a command line interpreter for PHP installed and Composer.

Note: If you prefer a different programming language, AWS provides SDKs for nearly every major language — and the scripts are very easy to port over.

Move to the directory you want to use for this demo and run the following commands in a terminal…

# Download source repository & install dependencies
git clone https://github.com/10xNation/amazon-polly-demo-php.git
cd amazon-polly-demo-php
composer install

Feel free to leave the terminal window open — you’ll need it soon.

Create an AWS account.

If you don’t already have an AWS account, go ahead and set one up.

Verify user permissions.

And if you aren’t using an administrator-level user account for AWS, you’ll need to make sure your account has full control over the Polly service.

Enter your credentials.

You’ll need to enter your API credentials into the script files. And you can do that by opening speak_text.php and speak_ssml.php and editing the following section in both files

  'credentials' => [ // Change these to your respective AWS credentials
    'key' => 'XXXXXXXXXXXXXXXXXXXX',
    'secret' => 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
  ]

Replace XXXXXXXXXXXXXXXXXXXX with your user account’s “Access key ID” and XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX with your respective “secret.”

Now that the formalities are out of the way, let’s get to the good stuff…

Plain Text to Speech

Go to your Polly Dashboard.

Once you’re signed in, select a language and voice then hit “Listen to speech” to test them out — there are close to 50 different voices.

Interacting with the API.

Open up the speak_text.php file and edit the text you want to speak

// Change this to whatever text you want to convert to audio
'Text' => 'Hi! My name is Emma. Welcome to the Amazon Polly demo.',

Simply change Hi! My name is Emma. Welcome to the Amazon Polly demo.

You can also change the “VoiceId” if you want to use a different voice

'VoiceId' => 'Emma'

Then to send your request to the API, simply run the following command (in the terminal you set up in What You’ll Need

php speak_text.php

And that should deposit an audio file called text.mp3 in the same directory — play it.

Easy enough. Let’s try it using a little Speech Synthesis Markup Language (SSML)…

SSML to Speech

Go back to your Polly Dashboard.

Still signed in? Select a language and voice then hit “Listen to speech” to test them out in SSML mode.

Interacting with the API.

Open up the speak_ssml.php file and edit the text you want to speak

  // Change this to whatever SSML you want to convert to audio
  'Text' => '
    <speak>
    Hi! My name is Emma.
    Welcome to the Amazon Polly demo.
    Today is <say-as interpret-as="date">????0406</say-as>
    </speak>',

Simply change Hi! My name is Emma. Welcome to the Amazon Polly demo. Today is <say-as interpret-as="date">????0406</say-as> to your desired output.

If you’d like to dive deeper into the markup syntax — which I highly recommend — here is an SSML reference. Compared to the plain text, SSML gives more granular control over the pronunciation, volume, and speech rate.

And as above, you can change the “VoiceId” if you want to use a different voice

'VoiceId' => 'Emma'

And again, to send your request to the API, simply run the following command (in the terminal you set up in What You’ll Need

php speak_ssml.php

That should deposit an audio file called ssml.mp3 in the same directory — play it.

Lexicons

Custom pronunciation lexicons give you the ability to control how the system pronounces words in your text and SSML.

Once again, go back to your Polly Dashboard.

Assuming you’re still signed in…Click on the “Lexicons” link.

Upload the metals.pls file from the source code you downloaded in What You’ll Need. Here’s what it looks like…

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
      xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
        http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
      alphabet="ipa"
      xml:lang="en-US">
  <lexeme><grapheme>Au</grapheme><alias>Gold</alias></lexeme>
  <lexeme><grapheme>Ag</grapheme><alias>Silver</alias></lexeme>
  <lexeme><grapheme>Fe</grapheme><alias>Iron</alias></lexeme>
</lexicon>

This lexicon tells the system to pronounce ‘Au,’ ‘Ag,’ and ‘Fe’ using their common names — assuming you activate the lexicon when making the API call. Make sure your lexicon and speech languages match as well.

You can test it by activating the lexicon on the Polly Dashboard…

Click on “Customize pronunciation” then select your lexicon from the drop down menu and enter Au, Ag, and Fe are metals. in the text field — then hit “Listen to speech.”

Note: Currently, you can apply up to five lexicons to any given chunk of text.

Congratulations!

You’ve built a text-to-speech engine that you can use for nearly anything — a mobile app, an IoT device, a chatbot — anything with access to a speaker.

And just a quick reminder…If you also integrate a speech-to-text engine, like Amazon Lex, in addition to Polly, you can give your products and apps the full two-way power of conversation.

You can dig deeper into Amazon’s Polly API — including additional tutorials — in the developer documentation.

Enjoy!