During my studies for the AWS Solutions Architect exam I’ve came across a couple of Amazon services that look very interesting. Two of these being, Rekognition, Amazons deep learning-based image and video analysis, and Polly, turning text into lifelike speech using deep learning.
Using these services could make for a Ring type doorbell/security monitoring device that could be a help to those with visual impairment, or an elderly person that lives alone. Throw in Alexa or SES and you could have a nifty wee app, whereby when someone comes to the door Rekognition could analyze the face, see if’s a known person or not and then Polly could speak or message the audio results.
In this simple demo I’m using the aws cli with the python sdk boto3. Here are the requirements to get started. I’m using a Windows 10 computer, but with a simple modification it will work on a mac/Linux.
Make sure that you have the following requirements installed. I always use a virtual environment for my projects.
- python
- I’m using Python 3.6.4.
- awscli
- pip install awscli
- Boto3
- pip install boto3
- playsound
- pip install playsound
On the AWS side of things, make sure you have an IAM account with developer access (Access key and Secret) and configure this with the aws cli configure command. Configure an s3 bucket to store you pics. Take note of the bucket name and the region where you created it.
(aws) PS C:\Users\mcnei\projects\aws_boto> aws configure AWS Access Key ID [****************EKNA]: AWS Secret Access Key [****************+r/1]: Default region name [eu-west-1]: Default output format [json]: (aws) PS C:\Users\mcnei\projects\aws_boto>
import boto3 import json from boto3 import client from io import StringIO from playsound import playsound from contextlib import closing if __name__ == "__main__": def checkPicture(service,fileName,bucket,region): fileName=fileName bucket=bucket region = region client=boto3.client(service,region) response = client.detect_faces( Image={ 'S3Object': { 'Bucket':bucket, 'Name':fileName } }, Attributes=['ALL']) for faceDetail in response['FaceDetails']: Text = 'The detected face is between {0} and {1} years old'.format(str(faceDetail['AgeRange']['Low']), str(faceDetail['AgeRange']['High'])) Text = Text + 'The person appears to be ' + str(faceDetail['Emotions'][0]['Type']) return Text polly = client("polly", "eu-west-1" ) response = polly.synthesize_speech( Text=checkPicture("rekognition","20180215_194003.jpg","s3-jm-photos","eu-west-1"), OutputFormat="mp3", VoiceId="Emma") if "AudioStream" in response: with closing(response["AudioStream"]) as stream: data = stream.read() fo = open("c:\\temp\\pollytest.mp3", "wb") fo.write( data ) fo.close() playsound('c:\\temp\\pollytest.mp3')
The above code is a VERY simple example where I have a picture of my daughter in an s3 bucket, an api call is made to Rekognition using detect_faces. This will return a whole load of meta data regarding faces within the picture. It then takes the age range and emotion returned by Rekognition and creates a text string which Polly then converts to speech and reads back the results.
The next step would be to index known faces so that the next time they come to the door for example they could be greeted by name and allowed access etc.
Again this is just a simple example of a bare bones python script that allowed me to scan a pic and speak the results without having know how to code super complex AI algorithms.
Its a great starting point for a wee code club project with a Pi Zero 😉
I’m going to build on this as I play about….. watch this space 🙂