Wednesday, 30 August 2017

How Amazon Polly Breathed Life into Dan Brown’s Digital Assistant

Leave a Comment
Read More...

Benchmarking Training Time for CNN-based Detectors with Apache MXNet

Leave a Comment

This is a guest post by Cambron Carter, Director of Engineering, and Iris Fu, Computer Vision Scientist at GumGum. In their own words, “GumGum is an artificial intelligence company with deep expertise in computer vision, which helps their customers unlock the value of images and videos produced daily across the web, social media, and broadcast television.” 

The state-of-the-art in object detection 

Detection is one of many classic computer vision problems that has significantly improved with the adoption of convolutional neural networks (CNNs).  When CNNs rose to popularity for image classification, many relied on crude and expensive preprocessing routines for generating region-proposals.  Algorithms like Selective Search were used to generate candidate regions based on their “objectness” (how likely they are to contain an object) and those regions were subsequently fed to a CNN trained for classification.  While this approach produces accurate results, it has a significant runtime cost.  CNN architectures like Faster R-CNN, You Only Look Once (YOLO), and Single Shot MultiBox Detector (SSD) address this tradeoff by embedding the localization task into the network itself.

In addition to predicting class and confidence, these CNNs attempt to predict the extrema of regions containing certain objects.  In the case of this post, these extrema are simply the four corner points of a rectangle, often referred to as a bounding box.  The previously mentioned detection architectures require training data which has been annotated with bounding boxes, i.e. this image contains a person and that person is within this rectangular region.

Classification training data   Detection training data
  
Extraordinarily handsome and capable engineer Extraordinarily handsome and capable engineer

We sought out to compare the experience of training SSD using Apache MXNet and Caffe. The obvious motivation is to train these new architectures in a distributed fashion without suffering a reduction in accuracy. For more info on the architecture, have a look here, see “SSD: Single Shot MultiBox Detector.”

Tools for training 

For this set of experiments, we tried several NVIDIA GPUs: Titan X, 1080, K80, and K520.  We host a gang of Titan Xs and 1080s in house, but also use AWS GPU-based EC2 instances.  This post is restricted to the g2.2x and p2.8x instance types.  Luckily for us, there were already some available implementations of SSD using MXNet, such as this, which we used for the experiments in this discussion.  It is worth noting that for the experiment to be exhaustive, you should include benchmarking data for other popular frameworks, such as TensorFlow.

Speed: The effect of adjusting batch size and GPU count with MXNet

First and foremost, let’s have a look at the performance impact of multi-GPU training sessions using MXNet.  These first experiments are focused on training SSD with MXNet on EC2 instances.  We use the PASCAL VOC 2012 dataset.  The purpose of this first exercise is to understand the effects that GPU count and batch size have on speed for a particular GPU. We used several different GPUs to illustrate their absolute performance differences.  If you want to explore further, there’s a lot of information about the price and performance of different GPUs.  Let’s start with K520s, which ships with g2.2x instances:

Instance Type GPU Card GPUs Batch Size Speed (samples/sec) Total Time for 240 Epochs (Hrs) Epoch/Hr
g2.2x large K520 1 16 10.5 118 ~2
g2.2x large K520 2 16 9.83 112 ~3
g2.2x large K520 5 16 9-10 38 ~6

Note that, given a constant batch size, adding extra g2.2x instances (each with one K520) results in a roughly constant speed per machine.  As such, the number of epochs per hour increases almost linearly.  Adding machines should continue to decrease our runtime, but at a point there could be a cost in accuracy.  We don’t explore this here, but it is worth a mention.

Next, we wanted to check if this trend was observed on our local 1080s:

Instance Type GPU Card GPUs Batch Size Speed (samples/sec) Total Time for 240 Epochs (Hrs) Epoch/Hr
In-house GTX 1080 1 16 48.8 24 ~10
In-house GTX 1080 2 16 8.5 ~ 80 ~3

As expected, one 1080 outperforms five K520s with ease.  Apart from that observation, this experiment raised some eyebrows.  We found a tremendous reduction in per-GPU speed when adding a second 1080 into the mix.  Because interprocess communication in this experiment happens over the Ethernet, we first thought that we were being throttled by our office network.  According to Iperf, this hypothesis doesn’t hold:

Iperf between: Interval Transfer Bandwidth
Local GTX 1080 and Local GTX 1080 0.0-10s 1012 MB 848 Mb/sec
g2.2x large K520 and g2.2x large K520 0.0-10s 867 MB 727 Mb/sec

Our tests show that both the bandwidth and transfer are higher in our office network than between two EC2 instances.  Our next question: What if batch size is the cause of the resulting inefficiency?

GPU Card GPUs Batch Size Speed (samples/sec) Total Time for 240 Epochs (Hrs) Epoch/Hr
GTX 1080 2 16 8.5 ~80 ~ 3
GTX 1080 2 32 15.4 42 ~ 5

Ah-ha!  Although still far slower than with one 1080, an increase in batch-  along with an additional 1080 (on a separate machine) results in a speed increase.  Let’s check the impact of training on a single machine with multiple, interconnected GPUs:

GPU Card GPUs Batch Size Speed (samples/sec) Total Time for 240 Epochs (Hrs) Epoch/Hr
Titan X 1 32 40.2 ~ 34 ~ 7
Titan X 2 32 (tech 16 on each GPU) 70.8 ~ 20 ~ 12
Titan X 4 32 (tech 8 on each GPU) 110 ~ 12 ~ 20

We used our in-house NVIDIA DevBox, which houses four Titan Xs.  Holding batch size constant but increasing the number of GPUs (technically reducing the individual batch size for each GPU), increases speed roughly 2.5x!  This begs the obvious question: What if we increase the batch size per GPU?

GPU Card GPUs Batch Size Speed (samples/sec)
Titan X 1 16 35.57
Titan X 2 32 (tech 16 on each GPU) 70.8

We observe that when we keep the batch size constant (at 16) and increase the number of GPUs on the same machine, the resulting speed also increases roughly 2x.  We wanted to explore this a bit further, but the buzzing of our DevBox fans kept our batch sizes at bay.  The curious case of our 1080 experiment remains open, immortalized as an open issue.  A potential solution on AWS might be to use placement groups, which we don’t explore in this post.

Accuracy: The effect of adjusting batch size with MXNet

Because we are tinkering with batch size so much, it is worth exploring how tinkering affects accuracy.  The goal is simple: We want to reduce training time as much as possible while maintaining accuracy.  We conducted this set of experiments on an in-house dataset that we use for logo detection. We carried them out on a p2.8x large instance with MXNet.  We consider a detection a true positive if the intersection-over-union between the detected region and ground truth is at or above 20%.  First, we tried a batch size of 8:

The precision and recall curve at three different stages of training.  SSD uses input dimensions of 300×300 pixels, and our batch size is 8.

If we weight precision and recall equally, this leaves us at an operating point of roughly 65% for each.  Let’s see what happens when we adjust the input dimensions in SSD using MXNet:

SSD using input dimensions of 512×512 pixels, with all else held constant from the previous experiment.

Here we actually see an improvement in performance, with an operating point around approximately 70% for both precision and recall.  Sticking with input dimensions of 512×512 pixels, let’s investigate what adjusting batch size does to accuracy.  Again, the goal is to maintain accuracy while squeezing out as much runtime as possible.

SSD using input dimensions of 512×512 pixels and a batch size of 64.  Accuracy is comparable to previous experiments.

Yet again!  Accuracy remains consistent, and we can reap the benefits of reduced training time with our larger batch size.  In the spirit of science, let’s push it even further…

SSD using input dimensions of 512×512 pixels and a batch size of 192.  Accuracy is comparable to previous experiments.

Very much the same, although our operating accuracies have fallen a bit when compared to the previous curve.  Nonetheless, we have managed to generally retain accuracy while increasing batch size by 24x.  To reiterate, each of these experiments was performed on a p2.8x large instance with SSD implemented on MXNet.  Here’s a summary of batch size experiments, which shows comparable accuracies across experiments:

MXNet’s SSD is comparably accurate to an SSD trained with Caffe. 

The takeaway is that accuracy is comparable.  Getting back to the point, let’s investigate training times:

Summary of training times for Caffe and MXNet using multiple batch sizes.

The expected reduction in training time when batch size increases is obvious. However, why is there a drastic increase in training time when we increase batch size in Caffe?!  Let’s have a look at nvidia-smi for both of the Caffe experiments:

Output of nvidia-smi when training with Caffe, using a batch size of 8.  Note the fluctuating GPU usage. 

The GPU handles the heavy computational cost of backpropagation, which is reflected as a spike in usage.  Why does usage fluctuate?  It could be that the fluctuation illustrates the effect of shipping batches of training data from the CPU to the GPU.  In that case, usage drops to 0% while the GPU is waiting for the next batch to be loaded.  Let’s inspect the usage after doubling the batch size to 16:

 

Output of nvidia-smi while training with Caffe, using a batch size of 16.  

The lulls in usage are far more exaggerated and reveal an obvious inefficiency. This explains our observed increase in training time after increasing the batch size.  This is nothing new or unexpected.  In fact, we have also encountered this issue while training Siamese-VGG networks with Keras (and a TensorFlow backend).  Discussions on this topic generally gravitate toward “the more complex your model is, the less you’ll feel the CPU-to-GPU bottleneck.” Although this is true, it isn’t very helpful.  Yes, if you give the GPU more work by way of more gradients to compute, you’ll certainly see average GPU utilization increase.  We don’t want to increase the complexity of our model, and, even if we did, that’s not going to help us achieve our overall goal.  Our concern here is absolute runtime, not relativity.

To summarize our experiments, training time for SSD with Caffe is far higher than with the MXNet implementation.  With MXNet, we observe a steady decrease in training time until we reach critical mass with a batch size of 192.  We went from 21.5 hours of training time to 4.6 simply by adopting MXNet. We observed no degradation in accuracy while doing so.  This is not at all a knock on Caffe—a framework that we hold near and dear—but rather a high five for MXNet.  We could attack the data loading issue in a number of ways. Perhaps Caffe2 has even addressed this.  Point being, we didn’t need to, and if there’s anything that will warm the heart of a machine learning developer, it’s writing the least amount of code possible.  Although we still have a few unanswered questions, that’s natural when adopting a new tool. We are more than happy being guinea pigs, and are excited about the future of MXNet.

 


 

About the Authors

 

Iris Fu has been a Computer Vision Scientist at GumGum since 2016. She previously worked in the field of Computational Chemistry at UC Irvine. Her current projects revolve around developing custom Deep Learning and Computer Vision solutions at scale.

 

 

Cambron Carter is the Director of Computer Vision Technology at GumGum, where he is responsible for designing Computer Vision and Machine Learning solutions. He holds degrees in Physics and Electrical Engineering and in his spare time he creates music: Cambron – Pretty Nifty.

 

 

Powered by WPeMatico

The post Benchmarking Training Time for CNN-based Detectors with Apache MXNet appeared first on Artificial Intelligence Solutions.

Read More...

Monday, 21 August 2017

AWS CloudTrail Integration is Now Available in Amazon Lex

Leave a Comment

Amazon Lex is now integrated with AWS CloudTrail, a service that enables you to log, continuously monitor, and retain events related to API calls across your AWS infrastructure, to provide a history of API calls for your account. Amazon Lex API calls are captured from the Amazon Lex console or from your API operations using the SDKs directly. Your Amazon Lex API calls are delivered to an Amazon S3 bucket with your other AWS service records. Using the information collected by AWS CloudTrail, you can track requests made to Amazon Lex including the origination of the request, such as source IP address, the date and time the request was made, and the parameters requested.

Visit the documentation to learn more about AWS CloudTrail integration with Amazon Lex.

Powered by WPeMatico

The post AWS CloudTrail Integration is Now Available in Amazon Lex appeared first on Artificial Intelligence Solutions.

Read More...

A/B Testing at Scale – Amazon Machine Learning Research

Leave a Comment

This week, Amazon presented an academic paper at KDD 2017, the prestigious machine learning and big data conference. The paper shows Amazon’s research into tools that help us measure customers’ satisfaction and better learn how we can implement ideas that delight them. Specifically, we show an efficient bandit algorithm for multivariate testing, where one seeks to find an optimal series of actions with as little experimental effort as possible. One application of this research, for example, is optimizing the layout of a web page.

Please check out this fun, three-minute video that explains the paper and how the ideas are applied within Amazon. Also, it won the KDD 2017 Audience Appreciation Award!

Download the paper from KDD.org: An efficient bandit algorithm for realtime multivariate optimization.

Powered by WPeMatico

The post A/B Testing at Scale – Amazon Machine Learning Research appeared first on Artificial Intelligence Solutions.

Read More...

Apache MXNet Release Candidate Introduces Support for Apple’s Core ML and Keras v1.2

Leave a Comment

Apache MXNet is an effort undergoing incubation at the Apache Software Foundation (ASF). Last week, the MXNet community introduced a release candidate for MXNet v0.11.0, its first as an incubating project, and the community is now voting on whether to accept this candidate as a release. It includes the following major feature enhancements:

  • A Core ML model converter that allows you to train deep learning models with MXNet and then deploy them easily to Apple devices
  • Support for Keras v1.2 that enables you to use the Keras interface with MXNet as the runtime backend when building deep learning models

The v0.11.0 release candidate also includes additional feature updates, performance enhancements, and fixes as outlined in the release notes.

Run MXNet models on Apple devices using Core ML (developer preview)

This release includes a tool that you can use to convert MXNet deep learning models to Apple’s Core ML format. Core ML is a framework that application developers can use for deploying machine learning models onto Apple devices with minimal memory footprint and power consumption. It uses the Swift programming language and is available on the Xcode integrated development environment (IDE). It allows developers to interact with machine learning models like any other Swift object class.

With this conversion tool, you now have a fast pipeline for your deep learning enabled applications. Move from scalable and efficient distributed model training in the cloud using MXNet to fast runtime inference on Apple devices. This developer preview of the Core ML model converter includes support for computer vision models. For more details about the converter, see the incubator-mxnet GitHub repo.

Multi-GPU performance for Keras v1.2

This release also adds support for Keras v1.2, which is the popular high-level Python library for developing deep learning models. Keras provides an easy-to-use interface that has high-level building blocks for modeling neural networks. Developers have the option to configure Keras to use other frameworks like TensorFlow, Theano, and now MXNet as the runtime backend for performing the underlying complex computations and model training.

With MXNet as a backend for Keras, developers can achieve high performance scaling across multiple GPUs. Previously with Keras, it was inefficient to train models at scale with more than one GPU. Keras users can now get near linear scaling when training across multiple GPUs. The following code snippet shows how you can set the number of GPUs in Keras when using MXNet as the backend:

# Prepare the list of GPUs to be used in training
NUM_GPU = 16 # or the number of GPUs available on your machine
gpu_list = []
for i in range(NUM_GPU): gpu_list.append('gpu(%d)' % i)

# Compile your model by setting the context to the list of GPUs to be used in training.
model.compile(loss='categorical_crossentropy',
 optimizer=opt,
 metrics=['accuracy'], 
 context=gpu_list)

Now, it is possible to take advantage of the Keras interface and also achieve performance across multiple GPUs. NVIDIA has conducted extensive research on the performance benchmarks of Keras with MXNet as the backend. You can also learn more about using MXNet as the backend for Keras by visiting the this GitHub repo.

Access to Release Candidate

You can get access to the release candidate by building MXNet from source or by performing a pip install with the following command:

pip install mxnet==0.11.0.rc1

If you have questions or suggestions, please comment below. Apache MXNet is an effort undergoing incubation at the Apache Software Foundation (ASF). For more information, visit

http://ift.tt/2jZs9a9

 

Powered by WPeMatico

The post Apache MXNet Release Candidate Introduces Support for Apple’s Core ML and Keras v1.2 appeared first on Artificial Intelligence Solutions.

Read More...

Build Your Own Face Recognition Service Using Amazon Rekognition

Leave a Comment

Amazon Rekognition is a service that makes it easy to add image analysis to your applications. It’s based on the same proven, highly scalable, deep learning technology developed by Amazon’s computer vision scientists to analyze billions of images daily for Amazon Prime Photos. Facial recognition enables you to find similar faces in a large collection of images.

In this post, I’ll show you how to build your own face recognition service by combining the capabilities of Amazon Rekognition and other AWS services, like Amazon DynamoDB and AWS Lambda. This enables you to build a solution to create, maintain, and query your own collections of faces, be it for the automated detection of people within an image library, building access control, or any other use case you can think of.

If you want to get started quickly, launch this Cloudformation template to get started now. For the manual walkthrough, please ensure that you replace resource names with your own values.

How it works

The following figure shows the application workflow. It’s separated into two main parts:

  • Indexing (blue flow) is the process of importing images of faces into the collection for later analysis.
  • Analysis (black flow) is the process of querying the collection of faces for matches within the index.

Implementation

Before we can start to index the faces of our existing images, we need to prepare a couple of resources.

We start by creating a collection within Amazon Rekognition. A collection is a container for persisting faces detected by the IndexFaces API. You might choose to create one container to store all faces or create multiple containers to store faces in groups.

Your use case will determine the indexing strategy for your collection, as follows: 

  • Face match. You might want to find a match for a face within a collection of faces (as in our current example). Face match can support a variety of use cases. For example, whitelisting a group of people for a VIP experience, blacklisting to identify bad actors, or supporting logging scenarios. In those cases, you would create a single collection that contains a large number of faces or, in the case of the logging scenario, one collection for a certain time period, such as a day. 
  • Face verification. In cases where a person claims to be of a certain identity, and you are using face recognition to verify the identity (for example, for access control or authentication), you would actually create one collection per person. You would store a variety of face samples per person to improve the match rate. This also enables you to extend the recognition model with samples of different appearances, for example, where a person has grown a beard. 
  • Social tagging. In cases where you might like to automatically tag friends within a social network, you would employ one collection per application user. 

You can find more information about use cases and indexing strategies in the Amazon Rekognition Developer Guide.

Amazon Rekognition doesn’t store copies of the analyzed images. Instead, it stores face feature vectors as the mathematic representation of a face within the collection. This is often referred to as a thumbprint or faceprint.

You can manage collection containers through the API. Or if you installed and configured the AWS CLI, you can use the following command.

aws rekognition create-collection --collection-id family_collection --region eu-west-1 

The user or role that executes the commands must have permissions in AWS Identity and Access Management (IAM) to perform those actions. AWS provides a set of managed policies that help you get started quickly. For our example, you need to apply the following minimum managed policies to your user or role:

  • AmazonRekognitionFullAccess
  • AmazonDynamoDBFullAccess
  • AmazonS3FullAccess
  • IAMFullAccess

Be aware that we recommend you follow AWS IAM best practices for production implementations, which is out of scope for this blog post.

Next, we create an Amazon DynamoDB table. DynamoDB is a fully managed cloud database that supports both document and key-value store models. In our example, we’ll create a DynamoDB table and use it as a simple key-value store to maintain a reference of the FaceId returned from Amazon Rekognition and the full name of the person.

You can use either the AWS Management Console, the API, or the AWS CLI to create the table. For the AWS CLI, use the following command, which is documented in the CLI Command Reference.

aws dynamodb create-table --table-name family_collection 
--attribute-definitions AttributeName=RekognitionId,AttributeType=S 
--key-schema AttributeName=RekognitionId,KeyType=HASH 
--provisioned-throughput ReadCapacityUnits=1,WriteCapacityUnits=1 
--region eu-west-1

For the IndexFaces operation, you can provide the images as bytes or make them available to Amazon Rekognition inside an Amazon S3 bucket. In our example, we upload the images to an Amazon S3 bucket.

Again, you can create a bucket either from the AWS Management Console or from the AWS CLI. Use the following command, which is documented in the CLI Command Reference.

aws s3 mb s3://bucket-name --region eu-west-1

As shown earlier in the architecture diagram, we have separated the processes of image upload and face detection into two parts. Although all the preparation steps were performed from the AWS CLI, we use an AWS Lambda function to process the images that we uploaded to Amazon S3.

For this, we need to create an IAM role that grants our function the rights to access the objects from Amazon S3, initiate the IndexFaces function of Amazon Rekognition, and create multiple entries within our Amazon DynamoDB key-value store for a mapping between the FaceId and the person’s full name.

By now, you will have noticed that I do favor the AWS CLI over the use of the console. You can find detailed instructions for creating service roles using the AWS CLI in the documentation. To create the service role for Lambda, we need two JSON files that describe the trust and access policies: trust-policy.json and access-policy.json.

trust-policy.json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

access-policy.json

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::bucket-name/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:PutItem"
            ],
            "Resource": [
                "arn:aws:dynamodb:aws-region:account-id:table/family_collection"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "rekognition:IndexFaces"
            ],
            "Resource": "*"
        }
    ]
}

For the access policy, ensure you replace aws-region, account-id, and the actual name of the resources (e.g., bucket-name and family_collection) with the name of the resources in your environment.

Now we can use the AWS CLI again to create the service role that our indexing Lambda function will use to retrieve temporary credentials for authentication.

As described in the documentation, you first need to create the role that includes the trust policy.

aws iam create-role --role-name LambdaRekognitionRole --assume-role-policy-document file://trust-policy.json

This is followed by attaching the actual access policy to the role.

aws iam put-role-policy --role-name LambdaRekognitionRole --policy-name LambdaPermissions --policy-document file://access-policy.json

As a last step, we need to create the Lambda function that is triggered every time a new picture is uploaded to Amazon S3. To create the function using the Author from scratch option, follow the instructions in the AWS Lambda documentation.

On the configure triggers page, select S3, and the name of your bucket as the trigger. Then configure the Event type and Prefix as shown in the following example. This ensures that your Lambda function is triggered only when new objects that start with a key matching the index/ pattern are created within the bucket.

On the next page, you give your Lambda function a name and description, choose the Python 2.7 runtime, and paste the Python code below into the editor.

In a nutshell, the script performs two main activities:

  1. It uses the Amazon Rekognition IndexFaces API to detect the face in the input image and adds it to the specified collection.
  2. If successful, it retrieves the full name of the person from the metadata of the object in Amazon S3. Then it stores this as a key-value tuple with the FaceId in the DynamoDB table for later reference.
from __future__ import print_function

import boto3
from decimal import Decimal
import json
import urllib

print('Loading function')

dynamodb = boto3.client('dynamodb')
s3 = boto3.client('s3')
rekognition = boto3.client('rekognition')


# --------------- Helper Functions ------------------

def index_faces(bucket, key):

    response = rekognition.index_faces(
        Image={"S3Object":
            {"Bucket": bucket,
            "Name": key}},
            CollectionId="family_collection")
    return response
    
def update_index(tableName,faceId, fullName):
    response = dynamodb.put_item(
        TableName=tableName,
        Item={
            'RekognitionId': {'S': faceId},
            'FullName': {'S': fullName}
            }
        ) 
    
# --------------- Main handler ------------------

def lambda_handler(event, context):

    # Get the object from the event
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.unquote_plus(
        event['Records'][0]['s3']['object']['key'].encode('utf8'))

    try:

        # Calls Amazon Rekognition IndexFaces API to detect faces in S3 object 
        # to index faces into specified collection
        
        response = index_faces(bucket, key)
        
        # Commit faceId and full name object metadata to DynamoDB
        
        if response['ResponseMetadata']['HTTPStatusCode'] == 200:
            faceId = response['FaceRecords'][0]['Face']['FaceId']

            ret = s3.head_object(Bucket=bucket,Key=key)
            personFullName = ret['Metadata']['fullname']

            update_index('family_collection',faceId,personFullName)

        # Print response to console
        print(response)

        return response
    except Exception as e:
        print(e)
        print("Error processing object {} from bucket {}. ".format(key, bucket))
        raise e

Before you click Next, find the Lambda function handler and role section at the end of the page. In the Role field, select Choose an existing role, and then select the name of the Role we created earlier.

Now you can click Next, and then choose Create function in the summary page.

We can now upload our images to Amazon S3 to seed the face collection. For this example, we again use a small piece of Python code that iterates through a list of items that contain the file location and the name of the person within the image.

Look closely at line 16 in the following code example. Here we add additional metadata to the objects in Amazon S3. The Lambda function uses this metadata to extract the full name of the person within the image.

Depending on your needs, you can alter this process to include additional metadata or to load the metadata definition from another source, like a database or a metadata file.

import boto3

s3 = boto3.resource('s3')

# Get list of objects for indexing
images=[('image01.jpeg','Albert Einstein'),
      ('image02.jpeg','Albert Einstein'),
      ('image03.jpeg','Albert Einstein'),
      ('image04.jpeg','Niels Bohr'),
      ('image05.jpeg','Niels Bohr'),
      ('image06.jpeg','Niels Bohr')
      ]

# Iterate through list to upload objects to S3   
for image in images:
    file = open(image[0],'rb')
    object = s3.Object('rekognition-pictures','index/'+ image[0])
    ret = object.put(Body=file,
                    Metadata={'FullName':image[1]}
                    )

The reason I’m adding multiple references for a single person to the image collection is because adding multiple reference images per person greatly enhances the potential match rate for a person. It also provides additional matching logic to further enhance the results.

Analysis

Once the collection is populated, we can query it by passing in other images that contain faces. Using the SearchFacesByImage API, you need to provide at least two parameters: the name of the collection to query, and the reference to the image to analyze. You can provide a reference to the Amazon S3 bucket name and object key of the image, or provide the image itself as a bytestream.

In the following example, I show how to submit the image as a bytestream. In response, Amazon Rekognition returns a JSON object containing the FaceIds of the matches. This object includes a confidence score and the coordinates of the face within the image, among other metadata as documented.

 

import boto3
import io
from PIL import Image

rekognition = boto3.client('rekognition', region_name='eu-west-1')
dynamodb = boto3.client('dynamodb', region_name='eu-west-1')
    
image = Image.open("group1.jpeg")
stream = io.BytesIO()
image.save(stream,format="JPEG")
image_binary = stream.getvalue()


response = rekognition.search_faces_by_image(
        CollectionId='family_collection',
        Image={'Bytes':image_binary}                                       
        )
    
for match in response['FaceMatches']:
    print (match['Face']['FaceId'],match['Face']['Confidence'])
        
    face = dynamodb.get_item(
        TableName='family_collection',  
        Key={'RekognitionId': {'S': match['Face']['FaceId']}}
        )
    
    if 'Item' in face:
        print (face['Item']['FullName']['S'])
    else:
        print ('no match found in person lookup')

We could extend this further by providing a secondary match logic.

For example, if we have received three matches for Person A and a confidence score of more than 90 percent each, and one match for Person B with a confidence of 85 percent or below, we can reasonably assume that the person was indeed Person A. Based on your business requirements, you could also return “fuzzy” matches like this to a human process step for validation.

Enhancements

For the analysis part of the process, you need to understand that Amazon Rekognition tries to find a match for only the most prominent face within an image. If your image contains multiple people, you first need to use the DetectFaces API to retrieve the individual bounding boxes of the faces within the image. You can then use the retrieved x,y coordinates to cut out the faces from the image and submit them individually to the SearchFacesByImage API.
In the following code example, notice that I’m extending the boundaries of the face boxes by moving them 10 percent in either direction. I do this to simplify the definition of the box to crop. Ideally, I would have to adjust the location and orientation of the box to reflect any tilting of the head. Instead, I simply extend the size of the crop-out. This works because Amazon Rekognition tries to detect a person for only the largest face within an image. Slightly extending the size of the crop-out by up to 50 percent won’t unveil another face that is larger than the original detected face.

from __future__ import print_function

import boto3
import io
from PIL import Image
from pprint import pprint

rekognition = boto3.client('rekognition', region_name='eu-west-1')
dynamodb = boto3.client('dynamodb', region_name='eu-west-1')
    
image = Image.open("group1.jpeg")
stream = io.BytesIO()
image.save(stream,format="JPEG")
image_binary = stream.getvalue()

response = rekognition.detect_faces(
    Image={'Bytes':image_binary}                                        
        )
    
all_faces=response['FaceDetails']
    
# Initialize list object
boxes = []

# Get image diameters
image_width = image.size[0]
image_height = image.size[1]
   
# Crop face from image
for face in all_faces:
    box=face['BoundingBox']
    x1 = int(box['Left'] * image_width) * 0.9
    y1 = int(box['Top'] * image_height) * 0.9
    x2 = int(box['Left'] * image_width + box['Width'] * image_width) * 1.10
    y2 = int(box['Top'] * image_height + box['Height']  * image_height) * 1.10
    image_crop = image.crop((x1,y1,x2,y2))
    
    stream = io.BytesIO()
    image_crop.save(stream,format="JPEG")
    image_crop_binary = stream.getvalue()

    # Submit individually cropped image to Amazon Rekognition
    response = rekognition.search_faces_by_image(
            CollectionId='family_collection',
            Image={'Bytes':image_crop_binary}                                       
            )
    
    if len(response['FaceMatches']) > 0:
        # Return results
        print ('Coordinates ', box)
        for match in response['FaceMatches']:
                
            face = dynamodb.get_item(
                TableName='family_collection',               
                Key={'RekognitionId': {'S': match['Face']['FaceId']}}
                )
    
            if 'Item' in face:
                person = face['Item']['FullName']['S']
            else:
                person = 'no match found'
            
            print (match['Face']['FaceId'],match['Face']['Confidence'],person)

In addition to the manual approach I described above, you can also create a Lambda function that contains the face detection code. Instead of merely displaying the detected faces on the console, you would write the names of the persons that are detected to a database, like DynamoDB. This allows you to detect faces within a large collection of images at scale.

Conclusion

In this blog post, I provided you with an example you can use to design and build your own face recognition service. I have given you pointers that enable you to decide on your strategy for using collections, depending on your use case. I also provided guidance on how to integrate Amazon Rekognition with other AWS services such as AWS Lambda, Amazon S3, Amazon DynamoDB, or IAM. With this in hand, you can build your own solution that detects, indexes, and recognizes faces, whether that’s from a collection of family photos, a large image archive, or a simple access control use case.


Additional Reading

Extend you knowledge even further. Learn how to find distinct people in a video with Amazon Rekognition.


About the Author

Christian Petters is a Solutions Architect for Amazon Web Service in Germany. He has a background in the design, implementation and operation of large scale web and groupware applications. At AWS he is helping our customers to assemble the right building blocks to address their business challenges.

 

 

 

Powered by WPeMatico

The post Build Your Own Face Recognition Service Using Amazon Rekognition appeared first on Artificial Intelligence Solutions.

Read More...

Estimating the Location of Images Using MXNet and Multimedia Commons Dataset on AWS EC2

Leave a Comment

This is a guest post by Jaeyoung Choi of the International Computer Science Institute and Kevin Li of the University of California, Berkeley. This project demonstrates how academic researchers can leverage our AWS Cloud Credits for Research Program to support their scientific breakthroughs.

Modern mobile devices can automatically assign geo-coordinates to images when you take pictures of them. However, most images on the web still lack this location metadata. Image geo-location is the process of estimating the location of an image and applying a location label. Depending on the size of your dataset and how you pose the problem, the assigned location label can range from the name of a building or landmark to an actual geo-coordinate (latitude, longitude).

In this post, we show how to use a pre-trained model created with Apache MXNet to geographically categorize images. We use images from a dataset that contains millions of Flickr images taken around the world. We also show how to map the result to visualize it.

Our approach

The approaches to image geo-location can be divided into two categories: image-retrieval-based search approaches and classification-based approaches. (This blog post compares two state-of-the-art approaches in each category.)

Recent work by Weyand et al. posed image geo-location as a classification problem. In this approach, the authors subdivided the surface of the earth into thousands of geographic cells and trained a deep neural network with geo-tagged images. For a less technical description of their experiment, see this article.

Because the authors did not release their training data or their trained model, PlaNet, to the public, we decided to train our own image geo-locator. Our setup for training the model is inspired by the approach described in Weyand et al., but we changed several settings.

We trained our model, LocationNet, using MXNet on a single p2.16xlarge instance with geo-tagged images from the AWS Multimedia Commons dataset.

We split training, validation, and test images so that images uploaded by the same person do not appear in multiple sets. We used Google’s S2 Geometry Library to create classes with the training data. The model converged after 12 epochs, which took about 9 days with the p2.16xlarge instance. A full tutorial with a Jupyter notebook is available on GitHub.

The following table compares the setups used to train and test LocationNet and PlaNet.

             LocationNet PlaNet
Dataset source Multimedia Commons Images crawled from the web
Training set 33.9 million 91 million
Validation 1.8 million 34 million
S2 Cell Partitioning t1=5000, t2=500
→ 15,527 cells
t1=10,000, t2=50
→ 26,263 cells
Model ResNet-101 GoogleNet
Optimization SGD with Momentum and LR Schedule Adagrad
Training time 9 days on 16 NVIDIA K80 GPUs (p2.16xlarge EC2 instance),
12 epochs
2.5 months on 200 CPU cores
Framework MXNet DistBelief
Test set Placing Task 2016 Test Set (1.5 million Flickr images) 2.3 M geo-tagged Flickr images

At inference time, LocationNet outputs a probability distribution over the geographic cells. The center-of-mass geo-coordinate of the images in the cell with the highest likelihood is assigned as the geo-coordinate of the query image.

LocationNet is shared publicly in the MXNet Model Zoo.

Downloading LocationNet

Now download LocationNet, the pretrained model. LocationNet has been trained on the subset of geo-tagged images in the AWS Multimedia Commons dataset. The Multimedia Commons dataset contains more than 39 million images and 15 thousand geographic cells (classes).

LocationNet has two parts, a JSON file containing the model definition and a binary file containing the parameters. We load necessary packages and download the files from S3.

import os
import urllib
import mxnet as mx
import logging
import numpy as np
from skimage import io, transform
from collections import namedtuple
from math import radians, sin, cos, sqrt, asin

path = 'http://ift.tt/2usC0h7'
model_path = 'models/'
if not os.path.exists(model_path):
    os.mkdir(model_path)
urllib.urlretrieve(path+'RN101-5k500-symbol.json', model_path+'RN101-5k500-symbol.json')
urllib.urlretrieve(path+'RN101-5k500-0012.params', model_path+'RN101-5k500-0012.params')

Then, load the downloaded model. If you don’t have a GPU available, replace mx.gpu() with mx.cpu():

# Load the pre-trained model
prefix = "models/RN101-5k500"
load_epoch = 12
sym, arg_params, aux_params = mx.model.load_checkpoint(prefix, load_epoch)
mod = mx.mod.Module(symbol=sym, context=mx.gpu())
mod.bind([('data', (1,3,224,224))], for_training=False)
mod.set_params(arg_params, aux_params, allow_missing=True)

The grids.txt file contains the geographic cells used for training the model.

The i-th line is the i-th class, and the columns are: S2 Cell Token, Latitude, and Longitude. We load the labels to a list named grids.

# Download and load grids file 
urllib.urlretrieve('http://ift.tt/2uJy7Qi','grids.txt')

# Load labels.
grids = []
with open('grids.txt', 'r') as f:
    for line in f:
        line = line.strip().split('t')
        lat = float(line[1])
        lng = float(line[2])
        grids.append((lat, lng))


The model uses the haversine formula to measure the great-circle distance between points p1 and p2 in kilometers:

def distance(p1, p2):
        R = 6371 # Earth radius in km
        lat1, lng1, lat2, lng2 = map(radians, (p1[0], p1[1], p2[0], p2[1]))
        dlat = lat2 - lat1
        dlng = lng2 - lng1
        a = sin(dlat * 0.5) ** 2 + cos(lat1) * cos(lat2) * (sin(dlng * 0.5) ** 2)
        return 2 * R * asin(sqrt(a))

Before feeding the image to the deep learning network, the model preprocesses the image by cropping it and subtracting the mean:

# mean image for preprocessing
mean_rgb = np.array([123.68, 116.779, 103.939])
mean_rgb = mean_rgb.reshape((3, 1, 1))

def PreprocessImage(path, show_img=False):
    # load image.
    img = io.imread(path)
    # We crop image from center to get size 224x224.
    short_side = min(img.shape[:2])
    yy = int((img.shape[0] - short_side) / 2)
    xx = int((img.shape[1] - short_side) / 2)
    crop_img = img[yy : yy + short_side, xx : xx + short_side]
    resized_img = transform.resize(crop_img, (224,224))
    if show_img:
        io.imshow(resized_img)
    # convert to numpy.ndarray
    sample = np.asarray(resized_img) * 256
    # swap axes to make image from (224, 224, 3) to (3, 224, 224)
    sample = np.swapaxes(sample, 0, 2)
    sample = np.swapaxes(sample, 1, 2)
    # sub mean 
    normed_img = sample - mean_rgb
    normed_img = normed_img.reshape((1, 3, 224, 224))
    return [mx.nd.array(normed_img)]

Evaluating and comparing models

For evaluation, we use two datasets: the IM2GPS dataset and a test dataset of Flickr images that is used in MediaEval Placing 2016 Benchmark.

Results for the IM2GPS test set

The following values indicate the percentage of images in the IM2GPS test set that were correctly located within each distance from the actual location.

Method 1km 25km 200km 750km 2500km
PlaNet 8.4% 24.5% 37.6% 53.6% 71.3%
LocationNet 16.8% 39.2% 48.9% 67.9% 82.2%

Results for Flickr images

These results are not directly comparable because the test set images used in PlaNet have not been publicly released. The values indicate the percentage of images in the test set that were correctly located within each distance from the actual location.

Method 1km 25km 200km 750km 2500km
PlaNet 3.6% 10.1% 16.0% 28.4% 48.0%
LocationNet 6.2% 13.5% 20.8% 35.6% 55.2%

By visually inspecting the geo-located images, we can see that the model does well with landmark locations, but it is also capable of correctly geo-locating non-landmark scenes.

Estimating the geo-location of an image using a URL

Now let’s try to geo-locate an image on the web using a URL .

Batch = namedtuple('Batch', ['data'])
def predict(imgurl, prefix='images/'):
    download_url(imgurl, prefix)
    imgname = imgurl.split('/')[-1]
    batch = PreprocessImage(prefix + imgname, True)
    #predict and show top 5 results
    mod.forward(Batch(batch), is_train=False)
    prob = mod.get_outputs()[0].asnumpy()[0]
    pred = np.argsort(prob)[::-1]
    result = list()
    for i in range(5):
        pred_loc = grids[int(pred[i])]
        res = (i+1, prob[pred[i]], pred_loc)
        print('rank=%d, prob=%f, lat=%s, lng=%s' 
              % (i+1, prob[pred[i]], pred_loc[0], pred_loc[1]))
        result.append(res[2])
    return result    

def download_url(imgurl, img_directory):
    if not os.path.exists(img_directory):
        os.mkdir(img_directory)
    imgname = imgurl.split('/')[-1]
    filepath = os.path.join(img_directory, imgname)
    if not os.path.exists(filepath):
        filepath, _ = urllib.urlretrieve(imgurl, filepath)
        statinfo = os.stat(filepath)
        print('Succesfully downloaded', imgname, statinfo.st_size, 'bytes.')
    return filepath

Let’s see how our model does with an image of Tokyo Tower. The following code downloads the image from URL and outputs the model’s location prediction.

#download and predict geo-location of an image of Tokyo Tower
url = 'http://ift.tt/2uJyZo0'
result = predict(url)

The result lists the top-5 result with the confidence score (prob) and the geo-coordinate:

rank=1, prob=0.139923, lat=35.6599344486, lng=139.728919109
rank=2, prob=0.095210, lat=35.6546613641, lng=139.745685815
rank=3, prob=0.042224, lat=35.7098435803, lng=139.810458528
rank=4, prob=0.032602, lat=35.6641725688, lng=139.746648114
rank=5, prob=0.023119, lat=35.6901996892, lng=139.692857396

It is hard to tell the quality of the geo-location output with just the raw latitude and longitude values. Let’s map the output to visualize the results.

Visualizing results using Google Maps on the Jupyter notebook

To visualize the results of the prediction, we use Google Maps in the Jupyter notebook. This allows you to see if the prediction makes sense. We use a plugin called gmaps, which allows the use of Google Maps in the Jupyter Notebook. To install gmaps, follow the installation instructions on the gmaps GitHub page.

Visualizing the result with gmaps takes only a few lines of code. In your notebook, type the following:

import gmaps

gmaps.configure(api_key="") # Fill in with your API key 

fig = gmaps.figure()

for i in range(len(result)):
    marker = gmaps.marker_layer([result[i]], label=str(i+1))
    fig.add_layer(marker)
fig

The top-1 geo-location estimation result is, indeed, right on the spot where Tokyo Tower is.

Now, try to geo-locate images of your choice!

Acknowledgements

Training LocationNet on AWS has been graciously supported by AWS Programs for Research and Education. We also thank the AWS Public Dataset program for hosting the Multimedia Commons dataset for public use. Our work is also partially supported by a collaborative LDRD led by Lawrence Livermore National Laboratory (U.S. Dept. of Energy contract DE-AC52-07NA27344).


Additional Reading

Learn more about AWS Cloud Credits for Research! Read about Ottertune and how to tune your DBMS automatically with Machine Learning.

 

Powered by WPeMatico

The post Estimating the Location of Images Using MXNet and Multimedia Commons Dataset on AWS EC2 appeared first on Artificial Intelligence Solutions.

Read More...

Analyze Emotion in Video Frame Samples Using Amazon Rekognition on AWS

Leave a Comment

This guest post is by AWS Community Hero Cyrus Wong. Cyrus is a Data Scientist at the Hong Kong Vocational Education (Lee Wai Lee) Cloud Innovation Centre. He has achieved all 7 AWS Certifications and enjoys sharing his AWS knowledge with others through open-source projects, blog posts, and events.

HowWhoFeelInVideo is an application that analyzes faces detected in sampled video clips to interpret the emotion or mood of the subjects .  It identifies faces, analyzes the emotions displayed on those faces, generates corresponding Emoji overlays on the video, and logs emotion data. The application accomplishes all of this within a serverless architecture using Amazon Rekognition, AWS Lambda, AWS Step Functions, and other AWS services.

HowWhoFeelInVideo was developed as part of a research project at the Hong Kong Vocational Education (Lee Wai Lee) Cloud Innovation Centre.  The project is focused on childcare, elder care, and community services. However, emotion analysis can be used in many areas, including rehabilitative care, nursing care, and applied psychology. My initial focus has been on applying this technology to the classroom.

In this post, I explain how HowWhoFeelInVideo works and how to deploy and use it.

How it works

Teachers, such as myself, can use HowWhoFeelInVideo to get an overall measure of a student’s mood (e.g., happy, or calm, or confused) while taking attendance. The instructor can use this data to adjust his or her focus and approach to enhance the teaching experience. This research project is just beginning. I will update this post after I receive additional results.

To use HowWhoFeelInVideo, a teacher sets up a basic classroom camera to take each students’ attendance using face identification. The camera also captures how students feel during class. Teachers can also use HowWhoFeelInVideo to prevent students from falsely reporting attendance.

Architecture and design

HowWhoFeelInVideo is a serverless application built using AWS Lambda functions. Five of the Lambda functions are included in the HowWhoFeelInVideo state machine. AWS Step Functions streamlines coordinating the components of distributed applications and microservices using visual workflows. This simplifies building and running multi-step applications.

The HowWhoFeelInVideo state machine starts with the startFaceDetectionWorkFlowLambda function, which is triggered by an Amazon S3 PUT object event. startFaceDetectionWorkFlowLambda passes in the following information into the execution:

{
    "bucket": "howwhofeelinvideo",
    "key": "Test2.mp4"
}

With Step Functions, you can use meaningful names, making it easy to understand workflows. They also let you monitor the processing pipeline from the AWS console.

The HowWhoFeelInVideo state machine is available in the us-east-1 AWS Region. The video processing tasks are implemented using FFmpeg.

Behind the scenes

Before you start using HowWhoFeelInVideo, you need to understand how it works and a few general principles.

When you need to make use of other pre-built programs such as FFmpeg, you can run another program or start a new process in Lambda:

  1. Copy the program for a new process in Lambda to the /tmp directory.
  2. Call the shell and use chmod to give it execution
  3. Call the shell to run the program.

Lambda saves files for data processing in the /tmp directory, which is limited to 500 MB. To improve Lambda’s performance, the container is reused. This means that files in the /tmp directory might retain and use additional space during the next Lambda call. Therefore, you should always remove old files from /tmp, either at the beginning or the end of each step.

Face analysis is triggered by the ProcessImage Lambda function in Scala. The ProcessImage function processes only one image at a time. It performs the following tasks:

  1. Downloads an image from an S3 bucket
  2. Calls Amazon Rekognition to detect faces and emotion (with the detectFaces operation)
  3. Crops the face from the image using the bounding box provided by the detectFaces operation
  4. Attempts to identify the owner by searching each face (using the searchFacesByImage operation) in the specified face collection
  5. Joins the result of emotion and face identification
  6. Creates an Emoji Face overlap image and emotion report records
  7. Uploads the Emoji Face overlap image and emotion report records to an S3 bucket

Because AWS Lambda charges for memory usage per 100 ms and Amazon Rekognition charges by the number of requests, the system is designed to run at maximum concurrency. I pay the same price whether I process all screen capture images at once or one by one!

The Cascades Face Detection step asynchronously invokes the ProcessImage Lambda function for each screen capture image nearly in parallel. Each ProcessImage function calls Amazon Rekognition for each face detected.

The following is a parallel map function which invokes the ProcessImage function for each image frame.

let invokeLambda = (key) => new Promise((resolve, reject) => {
    let data = JSON.stringify({bucket: bucket, key: prefix + "/" + key});
    let params = {
        FunctionName: process.env['ProcessImage'], /* required */
        Payload: data /* required */
    };
    lambda.invoke(params, (err, data) => {
        if (err) reject(err, err.stack); // an error occurred
        else     resolve(data);           // successful response
    });
});

let invokeLambdaPromises = keys.map(invokeLambda);
Promise.all(invokeLambdaPromises).then(() => {
        let pngKey = keys.map(key => key.split(".")[0] + ".png");
        let data = {bucket: bucket, prefix: prefix, keys: pngKey};
        console.log("involveLambdaPromises complete!");
    callback(null, data);
    }
).catch(err => {
    console.log("involveLambdaPromises failed!");
    callback(err);
});

The following is a parallel map function that gets the face ID in Scala:

//Parallel the search request.
val faceMatchAndBoundBoxAndEmotion = faceImagesAndBoundBoxAndEmotion.par.map(f => {
  searchFacesByImage(f._1) match {
    case Some(face) => {
      val id = face.getFaceMatches.asScala.headOption match {
        case Some(a) => a.getFace.getExternalImageId
        case None => "?????"
      }
      (id, f._2, f._3)
    }
    case None => ("????", f._2, f._3)
  }
})
faceMatchAndBoundBoxAndEmotion.seq

The following service map shows the dependency trees with trace data that I can use to drill into specific services or issues. This provides a view of connections between services in your application and aggregated data for each service, including average latency and failure rates.

The following is a latency distribution histogram for an Amazon Rekognition API call:

Latency is the amount of time between the start of a request and when it completes. A histogram shows a distribution of latencies. This latency distribution histogram shows duration on the x-axis, and the percentage of requests that match each duration on the y-axis.

I set the maximum execution of the ProcessImage function to 1.5 minutes and added a 10-second wait step in the state machine to ensure that all images and emotion records are ready in Amazon S3.

The following Lambda cascading timeline shows how processing operates in a highly parallel manner:

The result

The result includes a single output image:

An output record from single image

[{"seq":"test5/0036","id":"????","happy":11.956384658813477,"sad":0.0,"angry":0.0,"confused":26.754457473754883,"disgusted":0.0,"surprised":16.45158576965332,"calm":0.0,"unknown":0.0},{"seq":"test5/0036","id":"2astudent21","happy":40.610809326171875,"sad":3.8441836833953857,"angry":0.0,"confused":11.73412799835205,"disgusted":0.0,"surprised":0.0,"calm":0.0,"unknown":0.0},{"seq":"test5/0036","id":"????","happy":97.30420684814453,"sad":19.768024444580078,"angry":0.0,"confused":0.0,"disgusted":0.0,"surprised":0.0,"calm":0.7546186447143555,"unknown":0.0}]

An output report for all images in .csv format:

Note:

I don’t index images of my students’ faces within the video. Instead, they are each allocated an unknown faceId.

With this report, you can easily aggregate data on overall student satisfaction and determine how each individual feels throughout the course or the event. For health research, we plan to objectively record emotional feedback during class for Special Education Needs (SEN) students when we use a different teaching method. For a class of non-SEN students, we import a CSV report into a database with the following simple SQL statement:

SELECT Report.id AS Student, Count(Report.seq) AS Attended, Sum(Report.happy) AS SumOfhappy, Sum(Report.sad) AS SumOfsad, Sum(Report.angry) AS SumOfangry, Sum(Report.confused) AS SumOfconfused, Sum(Report.disgusted) AS SumOfdisgusted, Sum(Report.surprised) AS SumOfsurprised, Sum(Report.calm) AS SumOfcalm, Sum(Report.unknown) AS SumOfunknown
FROM Report GROUP BY Report.id;

Demo Video Output

The following video is my TV interview with four of my students about Hong Kong Open Data and it has been processed using HowWhoFeelInVideo.

The step that extracts the video to images must complete within the maximum Lambda execution time of 5 minutes, so you cannot directly process long-running video. However, it is easy to create fragmented MP4 files with Amazon Elastic Transcoder and process the analysis over the MP4 fragments.

Overall AWS X-Ray Service Map

Source code for HowWhoFeelInVideo is available in GitHub.

Deploying HowWhoFeelInVideo

Deployment is very simple. I created an AWS CloudFormation template with AWS Serverless Application Model (AWS SAM). AWS SAM is a specification for describing Lambda-based applications. It offers a syntax designed specifically for expressing serverless resources. To deploy the application, perform the following steps:

    1. In Amazon Rekognition, create a face collection named student.
    2. Use the AWS CLI to store faces.
    3. Create an S3 source bucket in the us-east-1 AWS Region.
    4. Download the three files in the Deployment folder on GitHub.
    5. Upload two source packages that you get from the Deployment folder, FaceAnalysis-assembly-1.0.jar and ProcessVideoLambda_latest.zip, into the S3 source bucket.
    6. In the AWS CloudFormation console, choose Create Stack.
    7. Select Upload a template to Amazon S3, choose HowWhoFeelInVideo.yaml, then choose Next:
  1. Specify the following parameters, and choose Next.
    1. Stack name: howwhofeelinvideo (A unique name for the stack in your AWS resgion.)
    2. CollectionId: The name of the indexed face collection that you created in Step 1: student
    3. FaceMatchThreshold: Type 70. The face match threshold ranges from 0 to 100. It specifies the minimum confidence in the face match required to consider it a match.
    4. PackageBucket: The name of the S3 source bucket that you created in Step 3.
    5. VideoBucketName:The name of the bucket that you want to create. This bucket starts the workflow for .mp4 and .mov files. The bucket name must be unique. You cannot use the bucket name used in the following screenshot. When you delete the AWS CloudFormation stack, this bucket remains.
  1. On the Options page, choose Next:
  2. Select all acknowledgment boxes, and choose Create Change Set:
  3. When the change set has been created, choose Execute:
    >
  4. Wait while the AWS CloudFormation stack is created:

Try your deployment

  1. Go to S3 console and login into your AWS account.
  2. In the S3 console, upload a short video into the video bucket. If you are not familiar with the Amazon S3 console, please follow this tutorial: How Do I Upload Files and Folders to an S3 Bucket?
  3. In a new browser, open the Step Function console:
  4. When you see the task running, choose State Machines. You might need to refresh the browser.
  5. Select the execution instance of the state machine that is running:

    You will see an animation of the process:
  1. When the process has completed, refresh the Amazon S3 console. A new folder appears:
  2. Choose the new folder.
  3. In the Search box, type video, then open or download the video file:
  4. To get the report, in the Search box, type result, and open or download the report file.

Conclusion

HowWhoFeelInVideo can help us understand the emotions of all of the people captured in a video. It has a variety of applications, including education, training, rehabilitative care, and customer interactions. Deployment is simple with an AWS CloudFormation template. Just take a video with your smart phone and upload it to an S3 bucket. In a few minutes, you’ll get the emotional analytic report!

This project has been developed in collaboration with four of my students from the IT114115 Higher Diploma in Cloud and Data Centre Administration: Ng Ka Yin, Lai Kam To, Karlos Lam, and Pang Chin Wing. Also, thanks to the AWS Academy curriculum, which helps my students learn how to use AWS services!

 


Additional Reading

Learn how to create a serverless solution for video frame analysis and alerting with Amazon Rekognition.

 

 

 

 

Powered by WPeMatico

The post Analyze Emotion in Video Frame Samples Using Amazon Rekognition on AWS appeared first on Artificial Intelligence Solutions.

Read More...

Exploiting the Unique Features of the Apache MXNet Deep Learning Framework with a Cheat Sheet

Leave a Comment

Apache MXNet is a full-featured, highly scalable deep learning framework that supports creating and training state-of-the-art deep learning models. With it, you can create convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and others. It supports a variety of languages, including, but not limited to, Python, Scala, R, and Julia.

In this post, we showcase some unique features that make MXNet a developer friendly framework in the AWS Cloud. For developers who prefer symbolic expression, we also provide a cheat sheet for coding neural networks with MXNet in Python. The cheat sheet simplifies onboarding to MXNet. It’s also a handy reference for developers who already use the framework.

Multi-GPU support in a single line of code

The ability to run on multiple GPUs is a core part of the MXNet architecture. All you need to do is pass a list of devices that you want to train the model on. By default, MXNet uses data parallelism to partition the workload over multiple GPUs. For example, if you have 3 GPUs, each one receives a copy of the complete model and trains it on one-third of each training data batch.

import mxnet as mx 
# Single GPU
module = mx.module.Module(context=mx.gpu(0))

# Train on multiple GPUs
module = mx.module.Module(context=[mx.gpu(i) for i in range(N)], ...)

Training on multiple computers

MXNet is a distributed deep learning framework designed to simplify training on multiple GPUs on a single server or across servers. To train across servers, you need to install MXNet on all computers, ensure that they can communicate with each other over SSH, and then create a file that contains the server IPs.

$ cat hosts 
192.30.0.172 
192.30.0.171
python ../../tools/launch.py -n 2 --launcher ssh -H hosts python train_mnist.py --network lenet --kv-store dist_sync

MXNet uses a key-value store to synchronize gradients and parameters between machines. This allows you to perform distributed training, and makes sure that MXNet is compiled using USE_DIST_KVSTORE=1.

Custom data iterators and iterating data is stored in Amazon S3

In MXNet, data iterators are similar to Python iterator objects, except that they return a batch of data as a DataBatch object that contains “n” training examples along with corresponding labels. MXNet has prebuilt, efficient data iterators for common data types like NDArray and CSV. It also has a binary format for efficient I/O on distributed file systems, like HDFS. You can create custom data iterators by extending the mx.io.DataIter class. For information on how to implement this feature, see this tutorial.

Amazon Simple Storage Service (Amazon S3) is a popular choice for customers who need to store large amounts of data at very low cost. In MXNet, you can create iterators that reference the data stored in Amazon S3 in RecordIO, ImageRecordIO, CSV, or NDArray formats without needing to explicitly download the data to disk.

data_iter = mx.io.ImageRecordIter(     
     path_imgrec="s3://bucket-name/training-data/caltech_train.rec",
     data_shape=(3, 227, 227),
     batch_size=4,
     resize=256)


Visualizing neural nets

To enable you to visualize neural network architectures, MXNet is integrated with Graphviz.  To generate a network visualization, you use the symbol that references the last layer of a defined network, along with the shape of the network as defined by its node_atters attribute. The following example shows how to visualize the LeNet canonical CNN:

mx.viz.plot_network(symbol=lenet, shape=shape)

For detailed code and instructions on implementation, see this tutorial.

Profiler support

MXNet has a built-in profiler, which you enable by building MXNet with the USE_PROFILER=1 flag. This can help you profile execution times, layer by layer, in the network (at the symbol level). This feature complements general profiling tools, like nvprof and gprof, by summarizing at the operator level, instead of at the function, kernel, or instruction level. You can enable it for the entire Python program using an environment variable. Or, you can enable it for a subset of the program by integrating it into the code, as follows:

mx.profiler.profiler_set_config(mode='all', filename='output.json')     
mx.profiler.profiler_set_state('run')      
# Code to be profiled goes here...      
mx.profiler.profiler_set_state('stop')

You can load the profiler output into a browser, like Chrome, and view the profile by navigating to the browser’s tracing (chrome://tracing in a Chrome browser), as follows:

This screenshot shows the profile for training the MNIST dataset with the original LeNet architecture implemented in MXNet with profiler instrumentation.

Cheat Sheet

Now that you know about some of the unique features of MXNet, you probably can’t wait to get hands on. This cheat sheet helps you get started building neural networks. It includes some common architectures for CNN, RNN/LSTM, linear regression, and logistic regression. Use it to learn how to create data iterators and Amazon S3 iterators, implement checkpointing, and save model files. It even includes examples of how to build a complete model architecture, and how to fine tune a pretrained model.

Apache MXNet Cheat Sheet

Click to enlarge

To get started on deep learning with MXNet, see our tutorials.

The MXNet community is working on a dynamic, elegant, easy-to-use imperative interface called Gluon. Gluon will introduce new ways to build neural networks in MXNet. Stay tuned!


Additional Reading

Learn how to build a real-time object classification system with Apache MXNet on Raspberry Pi.


About the Author

Sunil Mallya is a Senior Solutions Architect in the AWS Deep Learning team. He helps our customers build machine learning and deep learning solutions to advance their businesses. In his spare time, he enjoys cooking, sailing and building self driving RC autonomous cars.

 

 

 

Powered by WPeMatico

The post Exploiting the Unique Features of the Apache MXNet Deep Learning Framework with a Cheat Sheet appeared first on Artificial Intelligence Solutions.

Read More...

ShareThis