Image Recognition with Computer Vision and Bot Framework

Tags bot, chat, cognitive services, computer vision, microsoftSeptember 5, 2017

Since we’re now able to request user input using Prompt Dialogs in Bot Framework, we can take it another step further. At this stage, we’ve only added LUIS as a form of Artificial Intelligence to the bot, but we’ll tap into more power by adding another one of the Cognitive Services. Just like with the mobile app, we’ll add Image Recognition to the bot using Computer Vision.

The goal

The goal for this demo would be that the user would use the bot to order food by simply sending in pictures of the food he/she would love to have instead of ordering it with words. The Computer Vision would recognize these pictures and would place the order. In this demo we’ll be using a picture of a hotdog and a pizza.

Requirements

In order to get everything up and running, we’re going to need the following:

Once again, I’ll be using Visual Studio as my IDE.
The Bot Framework project that we’ll be running in Azure. I’ll be using the one from my previous article.
A Microsoft Azure account.
We’ll also need the Vision API Client Library added to the project to call the Microsoft Cognitive Service. It still has Project Oxford in the namespace, but that was later rebranded.

Now that we have everything in place, we can continue with the next steps.

API Key & Root

We’ll need to create the Computer Vision API and get the API key in order to use it. Head over to the Azure Portal and Create a new Service. Search the Marketplace for Computer Vision API and create the service. Fill in all the required fields create the service.

Now that we have our service, we’ll need to search a couple of values that we’ll need inside our app. Simply open it up and grab the following values:

Find your API Key under Show access keys. Copy it and store it for later.
The Root can be found under Endpoint and is essentially the Azure Location. Also copy this value and store it.

The `ComputerVisionService`

I added a class to the project called ComputerVisionService that wraps around the functionality from the VisionServiceClient from Cognitive Services and only returns what we currently need.


public static class ComputerVisionService
{
    private static string COMPUTER_VISION_KEY
        = "<COMPUTER_VISION_KEY>";
    private static string COMPUTER_VISION_ROOT
        = "https://<AZURE_LOCATION>.api.cognitive.microsoft.com/vision/v1.0";

    private static VisionServiceClient _client
        = new VisionServiceClient(COMPUTER_VISION_KEY, COMPUTER_VISION_ROOT);

    public static async Task<Caption> DescribeAsync(string url)
    {
        var analysisResult = await _client.DescribeAsync(url);
        return analysisResult.Description.Captions[0];
    }
}

Take note of the <COMPUTER_VISION_KEY> and <AZURE_LOCATION> and change those accordingly to the API Key and Root we got in our previous step.

Calling the service

Now that we created the Service, we’ll call it from out bot code. We’ll ask the user to send a picture using a Prompt Dialog and let the Computer Vision work it’s magic.


[LuisIntent("OrderFood")]
public async Task OrderFood(IDialogContext context, LuisResult result)
{
    PromptDialog.Attachment(context, ResumeAfterAttachmentClarification, "How should your order look like?");
}

private async Task ResumeAfterAttachmentClarification(IDialogContext context, IAwaitable<IEnumerable<Attachment>> result)
{
    var descriptions = new List<string>();

    var orders = await result;
    foreach(var order in orders)
    {
        var caption = await ComputerVisionService.DescribeAsync(order.ContentUrl);
        descriptions.Add(caption.Text);
    }
    
    await context.PostAsync($"I think your order should have _{string.Join(", ", descriptions)}_, I'll see what I can do!.");
}

As you can see in this piece of code, we’ll be using the ContentUrl-property to pass along as URL to the Cognitive Service. Because this needs to be a publicly available URL, this doesn’t work when running locally. I published my bot to Azure and used the test-panel on Bot Framework to check if everything was working as expected.

As you can see, the bot was able to detect what kind of food it was and even recognized the topping on the pizza! That’s pretty neat and really shows the power of the bot framework in combination with AI.

Conclusion

Although this demo is incredibly simple, the potential is great. Think about using Custom Vision to recognize domain specific objects instead of the generic ones from Computer Vision. The combination of Cognitive Services and Bot Framework is incredibly powerful yet really simple to create. You can use this feature to make the bot more human and make it easier to navigate the user through the conversation. We’ll expand this in the future with the use of more Cognitive Services. Let me know what you think in the comments or on Twitter.

Want to learn more about this subject?
Join my “Weaving Cognitive and Azure Services“-presentation at TechDaysNL 2017!

3 comments On Image Recognition with Computer Vision and Bot Framework

bingxue
December 19, 2017 at 10:19 am - Reply

hello, I read your article and it’s very useful. I have another question and i wonder if you have met. Now i am gonna work on a project with bot framework which will contains a bar code recognition within the bot. Do you know how to solve this with Microsoft cognitive services? Looking forward to your reply and thanks in advance.
Jon
April 19, 2018 at 2:11 pm - Reply

what would be the version to running locally then? Would the Content URL change?
- Marco
  April 24, 2018 at 6:37 am - Reply
  
  Good question! When running locally, the images from the Bot have an URL starting with localhost:// which makes it impossible for Computer Vision to read and therefore, you can’t use the Content URL. You should send the raw image binary in the form of an application/octet stream to Computer Vision API to make sure it works both externally and locally. Good luck!

Marcofolio.net