Computers Are Opening Their Eyes — and They’re Already Better at Seeing Than We Are

For the past several decades we’ve been teaching computers to understand the visual world. And like everything artificial intelligence these days, computer vision is making rapid strides. So much so that it’s starting to beat us at ‘name that object.’

Every year the ImageNet project runs a competition testing the current capability of computers to identify objects in photographs. And in 2015, they hit a milestone…

Microsoft reported a 4.94% error rate for their vision system, compared to a 5.1% human counterpart.

While that doesn’t quite give computers the ability to do everything that human vision can (yet), it does mean that computer vision is ready for prime time. In fact, computer vision is very good — and lightning fast — at narrow tasks. Tasks like:

  • Social listening: Track buzz about your brand and products in images posted to social media
  • Visual auditing: Remotely monitor for damage, defects or regulatory compliance in a fleet of trucks, planes, or windmills
  • Insurance: Quickly process claims by instantly classifying new submissions into different categories
  • Manufacturing: Ensure components are being positioned correctly on an assembly line
  • Social commerce: Use an image of a food dish to find out which restaurant serves it, or use a travel photo to find vacation suggestions based on similar experiences, or find similar homes for sale
  • Retail: Find stores with similar clothes in stock or on sale, or use a travel image to find retail suggestions in that area

This is a game-changer for business. An A.I.-powered tool that can digitize the visual world can add value to a wide range of business processes — from marketing to security to fleet management.

Unlocking Data in Visual Content

So here’s a step-by-step guide for building a powerful image recognition service — powered by IBM Watson — and capable of facial recognition, age estimation, object identification, etc.

The application wrapped around this service (originally developed by IBM’s Watson Developer Cloud) is preconfigured to identify objects, faces, text, scenes and other contexts in images.

A quick example.

And by the way…Here’s what Watson found in our featured image above:

Classes Score
bass (musical instrument)
musical instrument
orange color
Type Hierarchy
/device/bass (musical instrument)
Faces Score
age 18 – 24

Not too shabby. Watson correctly identified the image as a person with a guitar. It also found the face, which is pretty impressive. But was unsure about the guitarist’s age and gender.

Personally, I would guess our guitarist is a woman based on the longer hair and fingernails. And no doubt Watson will be able to pick up those subtle clues as well in the near future.

Note: The “Score” is a numerical representation (0-1) of how confident the system is in a particular classification. The higher the number, the higher the confidence.

A.I.-Powered Vision

Using an artificial intelligence platform to instantly translate the things we see into written common language, is like having an army of experts continuously reviewing and describing your images.

Allowing you to quickly — and accurately — organize visual information. Turning piles of images — or video frames — into useful data for your business. Data that can then be acted upon, shared or stored.

What will you learn from your visual data?

Let’s find out…

If you’d like to preview the source code, here’s our fork of the application on GitHub.

The End Result

The steps in this guide will create an application similar to the following…

You can also preview a live version of this application. The major features are:

  • Object determination — Classifies things in the image
  • Text extraction — Extracts text displayed in the image
  • Face detection — Detects human faces, including an estimation of age & gender
  • Celebrity identifier — Names the person if your image includes a public figure (when a face is found)

And this is just the beginning, this application can be extended in many different ways — it’s only limited by your imagination.

How it works.

Here’s a quick diagram of the major components…

The application uses just one cloud-based service from IBM Watson:

Note: Most of the following steps can be accomplished through command line or point-and-click. To keep it as visual as possible, this guide focuses on point-and-click. But the source code also includes command line scripts if that’s your preference.

What You’ll Need

Before we create the service instance and application container, let’s get the system requirements knocked.

Download the source repository.

To start, go ahead and download the source files.

Note: You’ll need a git client installed on your computer for this step.

Simply move to the directory you want to use for this demo and run the following commands in a terminal…

# Download source repository git clone cd ibm-watson-visual-recognition

At this point, you can keep the terminal window open and set it aside for now…we’ll need it in a later step.

Name the application.

Right away, let’s nail down a name for your new image recognition app.

...   # Application name - name: xxxxxxxxxxxxxxx ...

Replace xxxxxxxxxxxxxxx in the manifest.yml file with a globally unique name for your instance of the application.

The name you choose will be used to create the application’s URL — eg.

Create a Bluemix account.

Go to the Bluemix Dashboard page (Bluemix is IBM’s cloud platform).

If you don’t already have one, create a Bluemix account by clicking on the “Sign Up” button and completing the registration process.

Install Cloud-foundry.

A few of the steps in this guide require a command line session, so you’ll need to install the Cloud-foundry CLI tool. This toolkit allows you more easily interact with Bluemix.

Open a terminal session with Bluemix.

Once the Cloud-foundry CLI tool is installed, you’ll be able to log into Bluemix through the terminal.

# Log into Bluemix cf api cf login -u YOUR_BLUEMIX_ID -p YOUR_BLUEMIX_PASSOWRD

Replace YOUR_BLUEMIX_ID and YOUR_BLUEMIX_PASSOWRD with the respective username and password you created above.

Step 1: Create the Application Container

Go to the Bluemix Dashboard page.

Then on the next page, click on the “Create App” button to add a new application.

In this demo, we’ll be using a Node application, so click on “SDK for Node.js.”

Then fill out the information required, using the application name you chose in What You’ll Need — and hit “Create.”

Set the application memory.

Before we move on, let’s give the application a little more memory to work with.

Click on your application.

Then click on the “plus” sign for “MB MEMORY PER INSTANCE” — set it to 512 MB — and hit “Save.”

Step 2: Create the Visual Recognition Instance

To set up your Visual Recognition service, jump back to the Bluemix Dashboard page.

Click on your application again.

And that should take you to the Overview tab for your application. And since this is a brand new application, you should see a “Create new” button in the Connections widget — click that button.

You should now see a long list of services. Click “Watson” in the Categories filter and then click on “Visual Recognition” to create an instance of that service.

Go ahead and choose a Service Name that makes sense for you — eg. Visual Recognition-Demo. For this demo, the “Free” Pricing Plan will do just fine. And by default, you should see your application’s name listed in the “Connected to” field.

Click the “Create” button when ready. And enter the Name and Pricing Plan you chose into the manifest.yml file…

...   # Visual Recognition   Visual Recognition-Demo:     label: watson_vision_combined     plan: free ... - services:    - Visual Recognition-Demo ...

If needed, replace both instances of Visual Recognition-Demo with your Service Name and free with your chosen Pricing Plan.

Feel free to “Restage” your application when prompted.

Enter service credentials.

After your Visual Recognition instance is created, click on the respective “View credentials” button.

And that will pop up a modal with your details.

Copy/paste your API key into the respective portion of your .env file.

# Environment variables VISUAL_RECOGNITION_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Replace xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx with the key listed for api_key.

Your Visual Recognition service is now ready. So let’s fire this thing up!

Step 3: Launch It

To bring the application to life, simply run the following command — making sure the terminal is still in the repository directory and logged into Bluemix…

cf push

This command will upload all the needed files, configure the settings — and start the application.

Note: You can use the same cf push command to update the same application after it’s originally published.

Take a look.

After the application has started, you’ll be able to open it in your browser at the respective URL.

The page should look something like this…

Play around with it and get a feel for the functionality.

Custom classifier.

The application also supports a custom classifier, which allows you to customize the type of objects the system can identify within your images.

To check it out, click on the “Train” button.

The “Free” pricing plan only supports one custom classifier. So if you want to test multiple versions, you’ll need to delete the previous one. And you can do that by deleting and recreating the Visual Recognition service — step #2 above. Or you can modify the existing service using the following command…

Note: You’ll need the curl command installed for this.

# Get classifier ID curl -X GET "" # Remove existing custom classifier curl -X DELETE ""

Replace 2017-05-06 with the date you created the classifier, xxxxxxxxxxxx with the classifier_id returned from the first command, and both instances of xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx with the Visual Recognition service api key you retrieved in step #2


If you’re having any problems with the application, be sure to check out the logs…

Just click on the “Logs” tab within your application page.

And that’s pretty much the end of the road. You’re now a computer vision pro!

Take it to the Next Level

Feel like you’re ready to give your applications and devices the power of sight? The sky’s the limit for how and where you apply this technology.

And under the current pricing, you can classify 250 images/day for free. So there’s no reason not to jump right in.

You can dig deeper into the Visual Recognition service at the Watson Developer documentation.