"We can use the Computer Vision API to prove to our clients the reliability of the data, so they can be confident making important business decisions based on that information."
- Leendert de Voogd, CEO of Vigiglobe
In the previous chapter, you were briefly introduced to Microsoft Cognitive Services. Throughout this chapter, we will dive into image-based APIs from the vision API. We will learn how to perform image analysis. Moving on, we will dive deeper into the Face API, which we briefly looked at in the previous chapter, and we will learn how you can identify people. Next, we will learn how to use the Face API to recognize emotions in faces. Finally, we will learn about the different ways to moderate content.
In this chapter, we will cover the following topics:
The Computer Vision API allows us to process an image and retrieve information about it. It relies on advanced algorithms to analyze the content of the image in different ways, based on our needs.
Throughout this section, we will learn how to take advantage of this API. We will look at the different ways to analyze an image through standalone examples. Some of the features we will cover will also be incorporated into our end-to-end application in a later chapter.
Calling any of the APIs will return one of the following response codes:
Code |
Description |
---|---|
|
Information of the extracted features in JSON format. |
|
Typically, this means bad request. It may be an invalid image URL, an image that is too small or too large, an invalid image format, or any other errors to do with the request body. |
|
Unsupported media type. |
|
Possible errors may include a failure to process the image, image processing timing out, or an internal server error. |
Before we go into the specifics of the API, we need to create an example project for this chapter. This project will contain all of the examples, which will not be put into the end-to-end application at this stage:
If you have not already done so, sign up for an API key for Computer Vision by visiting https://portal.azure.com.
Microsoft.ProjectOxford.Vision
package and install it into the project, as shown in the following screenshot:UserControls
files and add them into the ViewModel
folder:CelebrityView.xaml
DescriptionView.xaml
ImageAnalysisView.xaml
OcrView.xaml
ThumbnailView.xaml
ViewModel
instances from the following list into the ViewModel
folder:CelebrityViewModel.cs
DescriptionViewModel.cs
ImageAnalysisViewModel.cs
OcrViewModel.cs
ThumbnailViewModel.cs
Go through the newly created ViewModel
instances and make sure that all classes are public.
We will switch between the different views using a TabControl
tag. Open the MainView.xaml
file and add the following in the precreated Grid
tag:
<TabControl x: Name = "tabControl" HorizontalAlignment = "Left" VerticalAlignment = "Top" Width = "810" Height = "520"> <TabItem Header="Analysis" Width="100"> <controls:ImageAnalysisView /> </TabItem> <TabItem Header="Description" Width="100"> <controls:DescriptionView /> </TabItem> <TabItem Header="Celebs" Width="100"> <controls:CelebrityView /> </TabItem> <TabItem Header="OCR" Width="100"> <controls:OcrView /> </TabItem> <TabItem Header="Thumbnail" Width="100"> <controls:ThumbnailView /> </TabItem> </TabControl>
This will add a tab bar at the top of the application that will allow you to navigate between the different views.
Next, we will add the properties and members required in our MainViewModel.cs
file.
The following is the variable used to access the Computer Vision API:
private IVisionServiceClient _visionClient;
The following code declares a private variable holding the CelebrityViewModel
object. It also declares the public
property that we use to access the ViewModel
in our View
:
private CelebrityViewModel _celebrityVm; public CelebrityViewModel CelebrityVm { get { return _celebrityVm; } set { _celebrityVm = value; RaisePropertyChangedEvent("CelebrityVm"); } }
Following the same pattern, add properties for the rest of the created ViewModel
instances.
With all the properties in place, create the ViewModel
instances in our constructor using the following code:
public MainViewModel() { _visionClient = new VisionServiceClient("VISION_API_KEY_HERE", "ROOT_URI"); CelebrityVm = new CelebrityViewModel(_visionClient); DescriptionVm = new DescriptionViewModel(_visionClient); ImageAnalysisVm= new ImageAnalysisViewModel(_visionClient); OcrVm = new OcrViewModel(_visionClient); ThumbnailVm = new ThumbnailViewModel(_visionClient); }
Note how we first create the VisionServiceClient
object with the API key that we signed up for earlier and the root URI, as described in Chapter 1, Getting Started with Microsoft Cognitive Services. This is then injected into all the ViewModel
instances to be used there.
This should now compile and present you with the application shown in the following screenshot:
We start enabling generic image analysis by adding a UI to the ImageAnalysis.xaml
file. All the Computer Vision example UIs will be built in the same manner.
The UI should have two columns, as shown in the following code:
<Grid.ColumnDefinitions> <ColumnDefinition Width="*" /> <ColumnDefinition Width="*" /> </Grid.ColumnDefinitions>
The first one will contain the image selection, while the second one will display our results.
In the left-hand column, we create a vertically oriented StackPanel
label. To this, we add a label and a ListBox
label. The list box will display a list of visual features that we can add to our analysis query. Note how we have a SelectionChanged
event hooked up in the ListBox
label in the following code. This will be added behind the code, and will be covered shortly:
<StackPanel Orientation="Vertical"Grid.Column="0"> <TextBlock Text="Visual Features:" FontWeight="Bold" FontSize="15" Margin="5, 5" Height="20" /> <ListBox: Name = "VisualFeatures" ItemsSource = "{Binding ImageAnalysisVm.Features}" SelectionMode = "Multiple" Height="150" Margin="5, 0, 5, 0" SelectionChanged = "VisualFeatures_SelectionChanged" />
The list box will be able to select multiple items, and the items will be gathered in the ViewModel
.
In the same stack panel, we also add a button element and an image element. These will allow us to browse for an image, show it, and analyze it. Both the Button
command and the image source are bound to the corresponding properties in the ViewModel
, as shown in the following code:
<Button Content = "Browse and analyze" Command = "{Binding ImageAnalysisVm.BrowseAndAnalyzeImageCommand}" Margin="5, 10, 5, 10" Height="20" Width="120" HorizontalAlignment="Right" /> <Image Stretch = "Uniform" Source="{Binding ImageAnalysisVm.ImageSource}" Height="280" Width="395" /> </StackPanel>
We also add another vertically oriented stack panel. This will be placed in the right-hand column. It contains a title label, as well as a textbox, bound to the analysis result in our ViewModel
, as shown in the following code:
<StackPanel Orientation= "Vertical"Grid.Column="1"> <TextBlock Text="Analysis Results:" FontWeight = "Bold" FontSize="15" Margin="5, 5" Height="20" /> <TextBox Text = "{Binding ImageAnalysisVm.AnalysisResult}" Margin="5, 0, 5, 5" Height="485" /> </StackPanel>
Next, we want to add our SelectionChanged
event handler to our code-behind. Open the ImageAnalysisView.xaml.cs
file and add the following:
private void VisualFeatures_SelectionChanged(object sender, SelectionChangedEventArgs e) { var vm = (MainViewModel) DataContext; vm.ImageAnalysisVm.SelectedFeatures.Clear();
The first line of the function will give us the current DataContext
, which is the MainViewModel
class. We access the ImageAnalysisVm
property, which is our ViewModel
, and clear the selected visual features list.
From there, we loop through the selected items from our list box. All items will be added to the SelectedFeatures
list in our ViewModel
:
foreach(VisualFeature feature in VisualFeatures.SelectedItems) { vm.ImageAnalysisVm.SelectedFeatures.Add(feature); } }
Open the ImageAnalysisViewModel.cs
file. Make sure that the class inherits the ObservableObject
class.
Declare a private
variable, as follows:
private IVisionServiceClient _visionClient;
This will be used to access the Computer Vision API, and it is initialized through the constructor.
Next, we declare a private variable and the corresponding property for our list of visual features, as follows:
private List<VisualFeature> _features=new List<VisualFeature>(); public List<VisualFeature> Features { get { return _features; } set { _features = value; RaisePropertyChangedEvent("Features"); } }
In a similar manner, create a BitmapImage
variable and property called ImageSource
. Create a list of VisualFeature
types called SelectedFeatures
and a string called AnalysisResult
.
We also need to declare the property for our button, as follows:
public ICommandBrowseAndAnalyzeImageCommand {get; private set;}
With that in place, we create our constructor, as follows:
public ImageAnalysisViewModel(IVisionServiceClientvisionClient) { _visionClient = visionClient; Initialize(); }
The constructor takes one parameter, the IVisionServiceClient
object, which we have created in our MainViewModel
file. It assigns that parameter to the variable that we created earlier. Then we call an Initialize
function, as follows:
private void Initialize() { Features = Enum.GetValues(typeof(VisualFeature)) .Cast<VisualFeature>().ToList(); BrowseAndAnalyzeImageCommand = new DelegateCommand(BrowseAndAnalyze); }
In the Initialize
function, we fetch all the values from the VisualFeature
variable of the enum
type. These values are added to the features list, which is displayed in the UI. We also created our button, and now that we have done so, we need to create the corresponding action, as follows:
private async void BrowseAndAnalyze(object obj) { var openDialog = new Microsoft.Win32.OpenFileDialog(); openDialog.Filter = "JPEG Image(*.jpg)|*.jpg"; bool? result = openDialog.ShowDialog(); if (!(bool)result) return; string filePath = openDialog.FileName; Uri fileUri = new Uri(filePath); BitmapImage image = new BitmapImage(fileUri); image.CacheOption = BitmapCacheOption.None; image.UriSource = fileUri; ImageSource = image;
The first lines of the preceding code are similar to what we did in Chapter 1, Getting Started with Microsoft Cognitive Services. We open a file browser and get the selected image.
With an image selected, we run an analyze on it, as follows:
try { using (StreamfileStream = File.OpenRead(filePath)) { AnalysisResult analysisResult = await _visionClient.AnalyzeImageAsync(fileStream, SelectedFeatures);
We call the AnalyzeImageAsync
function of our _visionClient
. This function has four overloads, all of which are quite similar. In our case, we pass on the image as a Stream
type and the SelectedFeatures
list, containing the VisualFeatures
variable to analyze.
The request parameters are as follows:
Parameter |
Description |
---|---|
Image (required) |
|
Visual features (optional) |
A list indicating the visual feature types to return. It can include categories, tags, descriptions, faces, image types, color, and whether or not it is adult content. |
Details (optional) |
A list indicating what domain-specific details to return. |
The response to this request is the AnalysisResult
string.
We then check to see if the result is null
. If it is not, we call a function to parse it and assign the result to our AnalysisResult
string, as follows:
if (analysisResult != null) AnalysisResult = PrintAnalysisResult(analysisResult);
Remember to close the try
clause and finish the method with the corresponding catch
clause.
The AnalysisResult
string contains data according to the visual features requested in the API call.
Data in the AnalysisResult
variable is described in the following table:
Visual feature |
Description |
---|---|
Categories |
Images are categorized according to a defined taxonomy. This includes everything from animals, buildings, and outdoors, to people. |
Tags |
Images are tagged with a list of words related to the content. |
Description |
This contains a full sentence describing the image. |
Faces |
This detects faces in images and contains face coordinates, gender, and age. |
ImageType |
This detects whether an image is clipart or a line drawing. |
Color |
This contains information about dominant colors, accent colors, and whether or not the image is in black and white. |
Adult |
This detects whether an image is pornographic in nature and whether or not it is racy. |
To retrieve data, for example for categories, you can use the following:
if (analysisResult.Description != null) { result.AppendFormat("Description: {0} ", analysisResult.Description.Captions[0].Text); result.AppendFormat("Probability: {0} ", analysisResult.Description.Captions[0].Confidence); }
A successful call would present us with the following result:
Sometimes, you may only be interested in the image description. In such cases, it is wasteful to ask for the kind of full analysis that we have just done. By calling the following function, you will get an array of descriptions:
AnalysisResultdescriptionResult = await _visionClient.DescribeAsync(ImageUrl, NumberOfDescriptions);
In this call, we have specified a URL for the image and the number of descriptions to return. The first parameter must always be included, but it may be an image upload instead of a URL. The second parameter is optional, and in cases where it is not provided, it defaults to one.
A successful query will result in an AnalysisResult
object, which is the same as the one that was described in the preceding code. In this case, it will only contain the request ID, image metadata, and an array of captions. Each caption contains an image description and the confidence of that description being correct.
We will add this form of image analysis to our smart-house application in a later chapter.
One of the features of the Computer Vision API is the ability to recognize domain-specific content. At the time of writing, the API only supports celebrity recognition, where it is able to recognize around 200,000 celebrities.
For this example, we choose to use an image from the internet. The UI will then need a textbox to input the URL. It will need a button to load the image and perform the domain analysis. There should be an image element to see the image and a textbox to output the result.
The corresponding ViewModel
should have two string
properties for the URL and the analysis result. It should have a BitmapImage
property for the image and an ICommand
property for our button.
Add a private
variable for the IVisionServiceClient
type at the start of the ViewModel
, as follows:
private IVisionServiceClient _visionClient;
This should be assigned in the constructor, which will take a parameter of the IVisionServiceClient
type.
As we need a URL to fetch an image from the internet, we need to initialize the Icommand
property with both an action and a predicate. The latter checks whether the URL property is set or not, as shown in the following code:
public CelebrityViewModel(IVisionServiceClient visionClient) { _visionClient = visionClient; LoadAndFindCelebrityCommand = new DelegateCommand(LoadAndFindCelebrity, CanFindCelebrity); }
The LoadAndFindCelebrity
load creates a Uri
with the given URL. Using this, it creates a BitmapImage
and assigns this to ImageSource
, the BitmapImage
property, as shown in the following code. The image should be visible in the UI:
private async void LoadAndFindCelebrity(object obj) { UrifileUri = new Uri(ImageUrl); BitmapImage image = new BitmapImage(fileUri); image.CacheOption = BitmapCacheOption.None; image.UriSource = fileUri; ImageSource = image;
We call the AnalyzeImageInDomainAsync
type with the given URL, as shown in the following code. The first parameter we pass in is the image URL. Alternatively, this could have been an image that was opened as a Stream
type:
try { AnalysisInDomainResultcelebrityResult = await _visionClient.AnalyzeImageInDomainAsync(ImageUrl, "celebrities"); if (celebrityResult != null) Celebrity = celebrityResult.Result.ToString(); }
The second parameter is the domain model name, which is in a string
format. As an alternative, we could have used a specific Model
object, which can be retrieved by calling the following:
VisionClient.ListModelsAsync();
This would return an array of Models
, which we can display and select from. As there is only one available at this time, there is no point in doing so.
The result from AnalyzeImageInDomainAsync
is an object of the AnalysisInDomainResult
type. This object will contain the request ID, metadata of the image, and the result, containing an array of celebrities. In our case, we simply output the entire result array. Each item in this array will contain the name of the celebrity, the confidence of a match, and the face rectangle in the image. Do try it in the example code provided.
For some tasks, optical character recognition (OCR) can be very useful. Say that you took a photo of a receipt. Using OCR, you can read the amount from the photo itself and have it automatically added to accounting.
OCR will detect text in images and extract machine-readable characters. It will automatically detect language. Optionally, the API will detect image orientation and correct it before reading the text.
To specify a language, you need to use the BCP-47 language code. At the time of writing, the following languages are supported: simplified Chinese, traditional Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Turkish, Arabic, Romanian, Cyrillic Serbian, Latin Serbian, and Slovak.
In the code example, the UI will have an image element. It will also have a button to load the image and detect text. The result will be printed to a textbox element.
The ViewModel
will need a string
property for the result, a BitmapImage
property for the image, and an ICommand
property for the button.
Add a private
variable to the ViewModel
for the Computer Vision API, as follows:
private IVisionServiceClient _visionClient;
The constructor should have one parameter of the IVisionServiceClient
type, which should be assigned to the preceding variable.
Create a function as a command for our button. Call it BrowseAndAnalyze
and have it accept object
as the parameter. Then, open a file browser and find an image to analyze. With the image selected, we run the OCR analysis, as follows:
using (StreamfileStream = File.OpenRead(filePath)) { OcrResultsanalysisResult = await _visionClient.RecognizeTextAsync (fileStream); if(analysisResult != null) OcrResult = PrintOcrResult(analysisResult); }
With the image opened as a Stream
type, we call the RecognizeTextAsync
method. In this case, we pass on the image as a Stream
type, but we could just as easily have passed on a URL to an image.
Two more parameters may be specified in this call. First, you can specify the language of the text. The default is unknown, which means that the API will try to detect the language automatically. Second, you can specify whether or not the API should detect the orientation of the image. The default is set to false
.
If the call succeeds, it will return data in the form of an OcrResults
object. We send this result to a function, the PrintOcrResult
function, where we will parse it and print the text, as follows:
private string PrintOcrResult(OcrResultsocrResult) { StringBuilder result = new StringBuilder(); result.AppendFormat("Language is {0} ", ocrResult.Language); result.Append("The words are: ");
First, we create a StringBuilder
object, which will hold all the text. The first content we add to it is the language of the text in the image, as follows:
foreach(var region in ocrResult.Regions) { foreach(var line in region.Lines) { foreach(var text in line.Words) { result.AppendFormat("{0} ", text.Text); } result.Append(" "); } result.Append(" "); }
The result has an array, which contains the Regions
property. Each item represents recognized text, and each region contains multiple lines. The line
variables are arrays, where each item represents recognized text. Each line contains an array of the Words
property. Each item in this array represents a recognized word.
With all the words appended to the StringBuilder
function, we return it as a string. This will then be printed in the UI, as shown in the following screenshot:
The result also contains the orientation and angle of the text. Combining this with the bounding box, also included, you can mark each word in the original image.
In today's world, we, as developers, have to consider different screen sizes when displaying images. The Computer Vision API offers some help with this by providing the ability to generate thumbnails.
Thumbnail generation, in itself, is not that big a deal. What makes the API clever is that it analyzes the image and determines the region of interest.
It will also generate smart cropping coordinates. This means that if the specified aspect ratio differs from the original, it will crop the image, with a focus on the interesting regions.
In the example code, the UI consists of two image elements and one button. The first image is the image in its original size. The second is for the generated thumbnail, which we specify to be 250 x 250 pixels in size.
The View
model will need the corresponding properties, two BitmapImages
methods to act as image sources, and one ICommand
property for our button command.
Define a private variable in the ViewModel
, as follows:
private IVisionServiceClient _visionClient;
This will be our API access point. The constructor should accept an IVisionServiceClient
object, which should be assigned to the preceding variable.
For the ICommand
property, we create a function, BrowseAndAnalyze
, accepting an object
parameter. We do not need to check whether we can execute the command. We will browse for an image each time.
In the BrowseAndAnalyze
function, we open a file dialog and select an image. When we have the image file path, we can generate our thumbnail, as follows:
using (StreamfileStream = File.OpenRead(filePath)) { byte[] thumbnailResult = await _visionClient.GetThumbnailAsync(fileStream, 250, 250); if(thumbnailResult != null &&thumbnailResult.Length != 0) CreateThumbnail(thumbnailResult); }
We open the image file so that we have a Stream
type. This stream is the first parameter in our call to the GetThumbnailAsync
method. The next two parameters indicate the width and height that we want for our thumbnail.
By default, the API call will use smart cropping, so we do not have to specify it. If we have a case where we do not want smart cropping, we could add a bool
variable as the fourth parameter.
If the call succeeds, we get a byte
array back. This is the image data. If it contains data, we pass it on to a new function, CreateThumbnail
, to create a BitmapImage
object from it, as follows:
private void CreateThumbnail(byte[] thumbnailResult) { try { MemoryStreamms = new MemoryStream(thumbnailResult); ms.Seek(0, SeekOrigin.Begin);
To create an image from a byte
array, we create a MemoryStream
object from it. We make sure that we start at the beginning of the array.
Next, we create a BitmapImage
object and begin to initialize it. We specify the CacheOption
and set the StreamSource
to the MemoryStream
variables we created earlier. Finally, we stop the BitmapImage
initialization and assign the image to our Thumbnail
property, as shown in the following code:
BitmapImage image = new BitmapImage(); image.BeginInit(); image.CacheOption = BitmapCacheOption.None; image.StreamSource = ms; image.EndInit(); Thumbnail = image;
Close up the try
clause and add the corresponding catch
clause. You should now be able to generate thumbnails.