Discover Your Customers’ Deepest Feelings Using Microsoft Facial Recognition

Body language can speak volumes — and it’s often said that body language doesn’t lie.

So imagine what your customers’ facial expressions could tell you about how they feel towards your products, services and brand experience?

And as luck would have it, artificial intelligence is stepping up to put that insight at your fingertips.

Rounding out a Clear View into Your Customer’s Mind

In my last post, we used Google Cloud to perform sentiment analysis on text — and in this post we’ll extract similar sentiment information from visual content.

More specifically, we’ll be using Microsoft’s Emotion API to classify facial expressions in photos and video based on a given set of emotions:

  • Happiness
  • Neutral
  • Sadness
  • Surprise
  • Disgust
  • Anger
  • Fear
  • Contempt

Giving you the power to further humanize your products and marketing with emotion recognition.

Reading Emotions Through Facial Recognition

You can use this tool in a wide variety of ways — to create systems that marketers and product developers can use to measure people’s reaction to a store display, a movie, a meal — you name it.

Or you can use it to add some emotional intelligence to an application — offering up different options based on emotions identified in a photo or video.

Ready to go?

Before we jump into it, let’s take a quick look at pricing:

Plan Limits Price
Free Images: 30,000 calls/mo
Video (uploads): 300/mo
Video (streaming): 3000/mo
Basic Images: 10 calls/sec $0.10 per 1000 calls
Standard Images: 10 calls/sec
Video (uploads): 3000/mo
Video (streaming): 30,000/mo
$0.25 per 1000 calls
Free for video

Microsoft gives you a healthy dose of free usage, so why not jump right into testing this API?

So without further ado…

What You’ll Need

Right off the bat, let’s get the initial requirements knocked out.

Download the source repository.

To start, let’s pull down the source files. (You’ll need a git client installed on your computer for this step.)

Move to the directory you want to use for this guide and run the following commands in a terminal…

# Download source repository
git clone
cd microsoft-emotion-recognition

The repository includes a few simple python scripts (compatible with Python 2.7) to demo the Emotion API.

Create an Azure account.

Go to the Azure home page (Azure is Microsoft’s cloud services platform).

If you don’t already have an Azure account, go ahead and create one by clicking on the “Free Account” button and completing the registration process.

Next, we’ll spin up the service…

Step 1: Create the Emotion API Instance

Go to the Azure Dashboard and sign in with your Azure account.

Click on the “+ New” button.

Then select the “Intelligence + analytics” and “Cognitive Services APIs” options.

Note: This will enable all of the Cognitive Services APIs — Text Analytics, Computer Vision, etc. But for the purposes of this guide, we’re going to stick with the Emotion API.

On the Cognitive Services API Create page, enter an “Account name” — select a “Subscription,” “Location,” and “Pricing tier,” — then create or select a “Resource group.” Enable “Account creation” if required. And of course, be sure to select “Emotion API” for “API type.”

Once everything is filled out, hit “Create.”

And after a few minutes, the new subscription will show up on your dashboard. Go ahead and click it.

Get the subscription key.

That should take you to the Overview tab for your new service.

Click on the “Keys” tab and copy the first key — we’ll need it in the next step.

Your Emotion API instance is now ready to go, so let’s move on…

Analyze an Image

We’ll start by submitting a series of images to the API, testing its capabilities from different angles.

Note: provides free images you can test with.

To submit an image, simply run the following command (in the terminal you set up in What You’ll Need) for each image:

# Submit an image to the API

Change “IMAGE_URL” in to the (publicly accessible) URL for the image you want to analyze — do this for each image, and “API_KEY” to the key you copied when creating the API instance.

So let’s get to it…

Contextual expressions.

Here’s an interesting test using a relatively complex facial expression, sour face

The system interpreted this expression as anger with a 42.1% confidence and happiness at 30.5%. Below is the API response:

"scores": {
  "anger": 0.421138048,
  "contempt": 0.00308842212,
  "disgust": 0.165829882,
  "fear": 0.0116129108,
  "happiness": 0.3052208,
  "neutral": 0.0149854887,
  "sadness": 0.074684456,
  "surprise": 0.00343998941

As these systems get smarter and smarter, it isn’t much of a stretch to see them not only understanding what ‘sour face’ looks like…but also recognizing and taking a cue from the lemon to better understand context.

Wearing glasses?

Here’s a test to see if the API can see through eye glasses…

The system interpreted this expression as happiness at 76.2% and neutral at 22.8%. Below is the API response:

"scores": {
  "anger": 0.000019574929,
  "contempt": 0.008707933,
  "disgust": 0.00002546404,
  "fear": 0.00000554377539,
  "happiness": 0.76220423,
  "neutral": 0.22810261,
  "sadness": 0.0001318246,
  "surprise": 0.000802798255

So it didn’t appear to have any issues with the glasses, so let’s test a hat…

Wearing headgear?

Here’s a test to see if the shape and shadows of a hat will trip up the system…

And the system came back with happiness at 99.9% confidence. Below is the API response:

"scores": {
  "anger": 1.50342938e-8,
  "contempt": 0.00000364264451,
  "disgust": 1.9485384e-8,
  "fear": 5.003781e-10,
  "happiness": 0.99922,
  "neutral": 0.000774948334,
  "sadness": 0.00000126923146,
  "surprise": 6.72026843e-8

So it doesn’t seem to have any trouble with headgear as well.

Let’s test angles…

Tilted expressions.

Here’s a test to see if head position affects the results…

The system came back with neutral at 91.6% confidence. Below is the API response:

"scores": {
  "anger": 0.0001890776,
  "contempt": 0.0129517857,
  "disgust": 0.000122931233,
  "fear": 0.0000413908238,
  "happiness": 0.000543844129,
  "neutral": 0.916418254,
  "sadness": 0.06961859,
  "surprise": 0.000114119932

And I’d call that another successful test. So let’s try multiple faces in the same image…

Multiple people.

Here’s a test to see if the system can deal with multiple people…

And the system described the girl on the left with happiness at 99.9%, and the girl on the right as neutral at 99.9%. Below is the API response:

"scores": {
  "anger": 9.171868e-10,
  "contempt": 6.706198e-7,
  "disgust": 2.71957e-9,
  "fear": 3.94775e-11,
  "happiness": 0.999625862,
  "neutral": 0.000373453251,
  "sadness": 2.47323229e-9,
  "surprise": 3.79722955e-8
"scores": {
  "anger": 8.55076053e-7,
  "contempt": 0.000147526938,
  "disgust": 6.414273e-7,
  "fear": 6.04651e-8,
  "happiness": 0.0000471687235,
  "neutral": 0.999756932,
  "sadness": 0.0000290200387,
  "surprise": 0.0000178117425

Personally, I’d say the girl on the right has a hint of happiness on her face — but that’s certainly debatable. Otherwise the API nailed it.


Here’s a test to see how far we can push the limits…

And the system failed to identify a face in this image. I would guess that’s because her head was tilted just a bit too far — Microsoft recommends less that a 45° angle.

Hair in the way?

Here’s a test to see if hair can trip things up…

The system came back with happiness at 99.9% confidence. Below is the API response:

"scores": {
  "anger": 8.054016e-9,
  "contempt": 5.76588235e-8,
  "disgust": 1.88049029e-7,
  "fear": 1.96213636e-11,
  "happiness": 0.999986231,
  "neutral": 0.00001347793,
  "sadness": 2.01189856e-8,
  "surprise": 2.07863149e-8

And I’d call that a successful test. So let’s push the envelope with hair…

Another limitation.

Here’s a little deeper test to see if even more hair can confuse the system…

And sure enough, the system was only able to identify the young girl — and came back with happiness at 99.9% confidence. Below is the API response:

"scores": {
  "anger": 1.23730332e-10,
  "contempt": 1.611374e-9,
  "disgust": 1.61184052e-10,
  "fear": 3.99333261e-10,
  "happiness": 0.9999991,
  "neutral": 2.53400572e-7,
  "sadness": 5.869352e-7,
  "surprise": 5.085489e-8

The API wasn’t able to pick out facial features of the woman through the glasses and hair, but it didn’t have any problem with the young girl.

Next, let’s test out images with a busy background…

Noisy images.

Here’s a test to see if a washed out background color and noise will affect the results…

And the system interpreted this expression as neutral at 76.5% and happiness at 19.6%. Below is the API response:

"scores": {
  "anger": 0.0005269143,
  "contempt": 0.00700004352,
  "disgust": 0.0005294192,
  "fear": 0.000477180554,
  "happiness": 0.196570709,
  "neutral": 0.7652676,
  "sadness": 0.00578457443,
  "surprise": 0.02384355

No problem there. So let’s move on and test a video…

Analyze a Video

Processing a video is actually a two-step process. First, you submit the video. Then, after some processing time, you request the result.

Note: also provides free videos you can test with.

To submit a video, simply run the following command in the terminal:

# Submit a video to the API

Change “VIDEO_URL” in to the (publicly accessible) URL for the video you want to analyze, and “API_KEY” to the key you copied when creating the API instance.

And be sure to copy the OID that the API responds with, you’ll need it to capture the results. Here’s an example “Operation-Location” URL — you want the OID (highlighted portion):

To request the analysis result, run the following command:

# Get video processing result

Change “VIDEO_OID” in to the OID you copied when submitting the video, and “API_KEY” to the key you copied when creating the API instance.

The system will provide you with a (lengthy) response for the entire video — it can also do it in near real-time — that lists faces and emotions for specific timestamps. Below is an abbreviated API response:

  "version": 1,
  "timescale": 24000,
  "offset": 0,
  "framerate": 23.976,
  "width": 1280,
  "height": 720,
  "fragments": [
      "start": 0,
      "duration": 48048,
      "interval": 12012,
      "events": [
            "windowFaceDistribution": {
              "neutral": 0,
              "happiness": 1,
              "surprise": 0,
              "sadness": 0,
              "anger": 0,
              "disgust": 0,
              "fear": 0,
              "contempt": 0
            "windowMeanScores": {
              "neutral": 5.4903e-8,
              "happiness": 0.999995,
              "surprise": 2.99175e-8,
              "sadness": 3.09663e-7,
              "anger": 4.44625e-7,
              "disgust": 0.00000404055,
              "fear": 2.69972e-10,
              "contempt": 9.6594e-9
            "windowFaceDistribution": {
              "neutral": 0,
              "happiness": 1,
              "surprise": 0,
              "sadness": 0,
              "anger": 0,
              "disgust": 0,
              "fear": 0,
              "contempt": 0
            "windowMeanScores": {
              "neutral": 2.45505e-8,
              "happiness": 0.999998,
              "surprise": 5.09985e-8,
              "sadness": 1.34317e-7,
              "anger": 2.34993e-7,
              "disgust": 0.00000210687,
              "fear": 2.49678e-9,
              "contempt": 8.54064e-9

And that’s it for video. Not too much more complicated than submitting static images.


If you aren’t getting the expected results, check the quality of your images. Ideally, you’ll want to use unobstructed, full frontal views of faces.

Expect accuracy to drop with partial faces and when faces are rotated more than 45°.

Here are some of the technical details:

  • Supported file formats: JPEG, PNG, GIF (first frame only), BMP
  • Maximum file size: 4MB
  • Detectable face size range (within the image): 36×36 to 4096×4096 pixels

Outside of these few guidelines, the API should be pretty straightforward to use.

Next Steps

But this is just a start. What will you do will this new tool?

The real power comes when you plug this type of analysis into all of your customer experience and marketing tools capturing images and video.

You can dig deeper into the Emotion API in the developer documentation.