/* */

Image Recognition with Computer Vision and Bot Framework

Since we’re now able to request user input using Prompt Dialogs in Bot Framework, we can take it another step further. At this stage, we’ve only added LUIS as a form of Artificial Intelligence to the bot, but we’ll tap into more power by adding another one of the Cognitive Services. Just like with the mobile app, we’ll add Image Recognition to the bot using Computer Vision.

The goal

The goal for this demo would be that the user would use the bot to order food by simply sending in pictures of the food he/she would love to have instead of ordering it with words. The Computer Vision would recognize these pictures and would place the order. In this demo we’ll be using a picture of a hotdog and a pizza.

Requirements

In order to get everything up and running, we’re going to need the following:

Now that we have everything in place, we can continue with the next steps.

API Key & Root

We’ll need to create the Computer Vision API and get the API key in order to use it. Head over to the Azure Portal and Create a new Service. Search the Marketplace for Computer Vision API and create the service. Fill in all the required fields create the service.

Now that we have our service, we’ll need to search a couple of values that we’ll need inside our app. Simply open it up and grab the following values:

  • Find your API Key under Show access keys. Copy it and store it for later.
  • The Root can be found under Endpoint and is essentially the Azure Location. Also copy this value and store it.

The ComputerVisionService

I added a class to the project called ComputerVisionService that wraps around the functionality from the VisionServiceClient from Cognitive Services and only returns what we currently need.


public static class ComputerVisionService
{
    private static string COMPUTER_VISION_KEY
        = "<COMPUTER_VISION_KEY>";
    private static string COMPUTER_VISION_ROOT
        = "https://<AZURE_LOCATION>.api.cognitive.microsoft.com/vision/v1.0";

    private static VisionServiceClient _client
        = new VisionServiceClient(COMPUTER_VISION_KEY, COMPUTER_VISION_ROOT);

    public static async Task<Caption> DescribeAsync(string url)
    {
        var analysisResult = await _client.DescribeAsync(url);
        return analysisResult.Description.Captions[0];
    }
}

Take note of the <COMPUTER_VISION_KEY> and <AZURE_LOCATION> and change those accordingly to the API Key and Root we got in our previous step.

Calling the service

Now that we created the Service, we’ll call it from out bot code. We’ll ask the user to send a picture using a Prompt Dialog and let the Computer Vision work it’s magic.


[LuisIntent("OrderFood")]
public async Task OrderFood(IDialogContext context, LuisResult result)
{
    PromptDialog.Attachment(context, ResumeAfterAttachmentClarification, "How should your order look like?");
}

private async Task ResumeAfterAttachmentClarification(IDialogContext context, IAwaitable<IEnumerable<Attachment>> result)
{
    var descriptions = new List<string>();

    var orders = await result;
    foreach(var order in orders)
    {
        var caption = await ComputerVisionService.DescribeAsync(order.ContentUrl);
        descriptions.Add(caption.Text);
    }
    
    await context.PostAsync($"I think your order should have _{string.Join(", ", descriptions)}_, I'll see what I can do!.");
}

As you can see in this piece of code, we’ll be using the ContentUrl-property to pass along as URL to the Cognitive Service. Because this needs to be a publicly available URL, this doesn’t work when running locally. I published my bot to Azure and used the test-panel on Bot Framework to check if everything was working as expected.

As you can see, the bot was able to detect what kind of food it was and even recognized the topping on the pizza! That’s pretty neat and really shows the power of the bot framework in combination with AI.

Conclusion

Although this demo is incredibly simple, the potential is great. Think about using Custom Vision to recognize domain specific objects instead of the generic ones from Computer Vision. The combination of Cognitive Services and Bot Framework is incredibly powerful yet really simple to create. You can use this feature to make the bot more human and make it easier to navigate the user through the conversation. We’ll expand this in the future with the use of more Cognitive Services. Let me know what you think in the comments or on Twitter.

Want to learn more about this subject?
Join my “Weaving Cognitive and Azure Services“-presentation at TechDaysNL 2017!

3 comments On Image Recognition with Computer Vision and Bot Framework

  • hello, I read your article and it’s very useful. I have another question and i wonder if you have met. Now i am gonna work on a project with bot framework which will contains a bar code recognition within the bot. Do you know how to solve this with Microsoft cognitive services? Looking forward to your reply and thanks in advance.

  • what would be the version to running locally then? Would the Content URL change?

    • Good question! When running locally, the images from the Bot have an URL starting with localhost:// which makes it impossible for Computer Vision to read and therefore, you can’t use the Content URL. You should send the raw image binary in the form of an application/octet stream to Computer Vision API to make sure it works both externally and locally. Good luck!

Leave a reply:

Your email address will not be published.

Site Footer