Integrating Google Cloud Vision with Kapow

What if your robots were able to understand the content of images? Google Cloud Vision is a powerful API that integrates perfectly with Kapow and dramatically enhances its capabilities with computer vision. Here, let’s have a look at what Google can do for you, first.

This photo of the Las Vegas Strip is categorised into several classes – Google calls them labels; and they clearly apply here. But wait, there’s more to come.

Here we have web entities – references to the image provided. Still, there’s more.

Yep, that’s right – Google Cloud Vision can provide you with full-page optical character recognition (OCR) results including spatial information – i.e. pages, blocks, paragraphs, words, and their coordinates on the page.

Let’s build a Robot

So, we’d like to create a robot that can do the following:

Retrieve an image from a URI,
Call the Google API,
Parse the response.

Integrating Google Cloud Vision is an easy task as Google’s REST API provides us with a Web Service that can be consumed with a – you guess it – Web Service action in Kapow. We need two JSON objects: one for the request that will tell Google which image to process and what service to use (for example OCR); and another one that will contain Google’s response. Let’s have a look at the request first:


{
  "requests" : [
                 {
                   "image" : {
                             },
                               "content" : "/9j/4AAQSkZJ...1X/Dn//Z"
                   "features" : [
                                  {
                                    "type" : "WEB_DETECTION",
                                    "maxResults" : 50
                                  }
                                ]
                 }
               ]
}

This tells us that we need the following variables:

A short text for storing the image URI,
an image variable for storing the binary image,
a short text variable for storing the base64 encoded image,
a short text storing the type (i.e. the service to use),
a variable storing the following properties (we’ll use a type for that):
- a short text for the base URI (https://vision.googleapis.com/v1/images:annotate),
- another short text for your API key,
- the request JSON, and
- the response JSON.

So, here we are:

You will see that the request variable has been set already, and in fact I decided not to build the JSON from scratch but rather provide a skeleton object with the following content:


{
  "requests" : [
                 {
                   "image" : {},
                   "features" : [
                                  {
                                    "type" : "",
                                    "maxResults" : 50
                                  }
                                ]
                 }
               ]
}

Encoding the Image

This will allow us to add the base64 encoded image later along with the type. First things first though, so we need to retrieve, open and store an image from an URI, and then encode it in base64.

At the end of there three steps you should have the encoded image stored in the variable called imageBase64.

Preparing the Request

The next step is to prepare the request JSON – so we insert both image and type:

At the end of these steps, the JSON should look like the first sample provided in this post.

Issuing the Request and Parsing the Response

The last step is calling Google’s API via REST and parsing the response. Note that you will need to generate and set an API key first!

Voilà – you just build a Robot that can interpret images!

Results and Conclusion

Within three blocks we showed how to encode an image, prepare the request and issue a web service call. What amazed me most was that Kapow had so many things already built-in: encode a string in base64, for once. Setting request and response headers for a web service call, for another – this allows for building prototypes in a matter of a few minutes.

From here on, it’s up to you to find possible use cases, here are some examples:

A robot that recognises faces in an image and compares them against images stored in a database in order to recognise people,
A robot that finds text in an image using Google’s OCR capabilities and regular expression actions in Kapow,
A robot that finds similar images to the one provided to check whether someone uses your photos without your permission.

Quipu Blog

Kofax Subject Matter Experts