In this article we are going to use Amazon Polly from Golang to generate a text-to-speech audio files.

What is Amazon Polly?

  • Amazon Polly is a Text-To-Speech cloud-based service.
  • It uses advanced machine learning technologies to synthesize natural sounding human speech.
  • You can build speech-enabled applications in multiple languages.
  • Use Cases:
    • USA TODAY NETWORK produces audio content with Polly.
    • Mapbox uses Polly for voice guidance as part of its navigation solution.
    • Volley is a top developer of voice-controlled games that also uses Polly.

Create IAM Policy to use Polly

Let’s create a new policy from AWS Console so we are load to the user that owns the credentials that we have in our local environment to use the Amazon Polly API through the AWS SDK for Golang.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PollyDevelopersPolicy1",
            "Effect": "Allow",
            "Action": "polly.SynthesizeSpeech",
            "Resource": "*"
        }
    ]
}

Then attach this policy to the IAM user group named “rest-api-developers” that we already created in the previous article: Golang / Go Crash Course 09 | Connecting our REST API with Amazon (AWS) DynamoDB

Consume the Amazon Polly API

Now we have all the permissions ready on AWS so we can access Amazon Polly API through the AWS SDK for Golang. Let’s start working on a new project with Go module publication

$ mkdir golang-aws-polly
$ cd golang-aws-polly
$ go mod init github.com/favtuts/golang-amazon-polly

Next step we are going to install the AWS SDK for Golang

$ go get -u github.com/aws/aws-sdk-go
go: added github.com/aws/aws-sdk-go v1.44.235
go: added github.com/jmespath/go-jmespath v0.4.0

Now we are going to create a PollyService

package service

import (
	"io"
	"os"

	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/aws/session"
	"github.com/aws/aws-sdk-go/service/polly"
)

type PollyService interface {
	Synthesize(text string, fileName string) error
}

type pollyConfig struct {
	voice string
}

func NewKimberlyPollyService() PollyService {
	return &pollyConfig{
		voice: KIMBERLY_VOICE,
	}
}

func NewJoeyPollyService() PollyService {
	return &pollyConfig{
		voice: JOEY_VOICE,
	}
}

const (
	AUDIO_FORMAT   = "mp3"
	KIMBERLY_VOICE = "Kimberly"
	JOEY_VOICE     = "Joey"
)

func createPollyClient() *polly.Polly {
	session := session.Must(session.NewSessionWithOptions(session.Options{
		SharedConfigState: session.SharedConfigEnable,
	}))

	return polly.New(session)
}

func (config *pollyConfig) Synthesize(text string, fileName string) error {
	pollyClient := createPollyClient()

	input := &polly.SynthesizeSpeechInput{
		OutputFormat: aws.String(AUDIO_FORMAT),
		Text:         aws.String(text),
		VoiceId:      aws.String(config.voice),
	}

	output, err := pollyClient.SynthesizeSpeech(input)
	if err != nil {
		return err
	}

	outFile, err := os.Create(fileName)
	if err != nil {
		return err
	}

	defer outFile.Close()

	_, err = io.Copy(outFile, output.AudioStream)
	if err != nil {
		return err
	}

	return nil
}

Create Polly application

Let’s create the main.go and let’s create some examples of using PollyService

package main

import "github.com/favtuts/golang-amazon-polly/service"

var (
	kimberly service.PollyService = service.NewKimberlyPollyService()
	joey     service.PollyService = service.NewJoeyPollyService()
)

func main() {
	err := kimberly.Synthesize("Hi, I am Kimberly, how are you?", "kimberly.mp3")
	if err != nil {
		panic(err)
	}

	err = joey.Synthesize("Hi, I am Joey. Nice to meet you.", "yoey.mp3")
	if err != nil {
		panic(err)
	}
}

Run the Polly application

Let’s run the application to see what happen:

$ go run *.go

Now we have two mp3 files and let play them:

$ play kimberly.mp3
$ play joey.mp3

Download Source Code

$ git clone https://github.com/favtuts/golang-amazon-polly.git
$ cd golang-amazon-polly
$ go build

$ go run *.go

Leave a Reply

Your email address will not be published. Required fields are marked *