Simple AWS Rekognition and Polly Example

By | February 28, 2018

During my studies for the AWS Solutions Architect exam I’ve came across a couple of Amazon services that look very interesting. Two of these being, Rekognition, Amazons deep learning-based image and  video analysis, and Polly, turning text into lifelike speech using deep learning.

Using these services could make for a Ring type doorbell/security monitoring device that could be a help to those with visual impairment, or an elderly person that lives alone. Throw in Alexa or SES and you could have a nifty wee app, whereby when someone comes to the door Rekognition could analyze the face, see if’s a known person or not and then Polly could speak or message the audio results.

In this simple demo I’m using the aws cli with the python sdk boto3. Here are the requirements to get started. I’m using a Windows 10 computer, but with a simple modification it will work on a mac/Linux.

Make sure that you have the following requirements installed. I always use a virtual environment for my projects.

  • python
    • I’m using Python 3.6.4.
  • awscli
    • pip install awscli
  • Boto3
    • pip install boto3
  • playsound
    • pip install playsound

On the AWS side of things, make sure you have  an IAM account with developer access (Access key and Secret) and configure this with the aws cli configure command. Configure an s3 bucket to store you pics. Take note of the bucket name and the region where you created it.

(aws) PS C:\Users\mcnei\projects\aws_boto> aws configure
AWS Access Key ID [****************EKNA]:
AWS Secret Access Key [****************+r/1]:
Default region name [eu-west-1]:
Default output format [json]:
(aws) PS C:\Users\mcnei\projects\aws_boto>


import boto3
import json

from boto3 import  client
from io import StringIO
from playsound import playsound
from contextlib import closing

if __name__ == "__main__":

    def checkPicture(service,fileName,bucket,region):
        region = region

        response = client.detect_faces(
                'S3Object': {

        for faceDetail in response['FaceDetails']:
            Text = 'The detected face is between {0} and {1} years old'.format(str(faceDetail['AgeRange']['Low']), str(faceDetail['AgeRange']['High']))
            Text = Text + 'The person appears to be ' + str(faceDetail['Emotions'][0]['Type'])
            return Text 

polly = client("polly", "eu-west-1" )
response = polly.synthesize_speech(

if "AudioStream" in response:
    with closing(response["AudioStream"]) as stream:
        data =
        fo = open("c:\\temp\\pollytest.mp3", "wb")
        fo.write( data )


The above code is a VERY simple example where I have a picture of my daughter  in an s3 bucket, an api call is made to Rekognition using detect_faces. This will return a whole load of meta data regarding faces within the picture. It then takes the age range and emotion returned by Rekognition and creates a text string which Polly then converts to speech and reads back the results.

The next step would be to index known faces so that the next time they come to the door for example they could be greeted by name and allowed access etc.

Again this is just a simple example of a bare bones python script that allowed me to scan a pic and speak the results without having know how to code super complex AI algorithms.

Its a great starting point for a wee code club project with a Pi Zero 😉

I’m going to build on this as I play about….. watch this space 🙂


Category: AWS

Leave a Reply

Your email address will not be published. Required fields are marked *