C H A P T E R  3

Depth Image Processing

The production of three-dimensional data is the primary function of Kinect. It is up to you to create exciting experiences with the data. A precondition to building a Kinect application is having an understanding of the output of the hardware. Beyond simply understanding, the intrinsic meaning of the 1's and 0's is a comprehension of its existential significance. Image-processing techniques exist today that detect the shapes and contours of objects within an image. The Kinect SDK uses image processing to track user movements in the skeleton tracking engine. Depth image processing can also detect non-human objects such as a chair or coffee cup. There are numerous commercial labs and universities actively studying techniques to perform this level of object detection from depth images. There are many different uses and fields of study around depth input that it would be impossible to cover them or cover any one topic with considerable profundity in this book much less a single chapter. The goal of this chapter is to detail the depth data down to the meaning of each bit, and introduce you to the possible impact that adding just one additional dimension can have on an application. In this chapter, we discuss some basic concepts of depth image processing, and simple techniques for using this data in your applications.

Seeing Through the Eyes of the Kinect

Kinect is different from all other input devices, because it provides a third dimension. It does this using an infrared emitter and camera. Unlike other Kinect SDKs such as OpenNI, or libfreenect, the Microsoft SDK does not provide raw access to the IR stream. Instead, the Kinect SDK processes the IR data returned by the infrared camera to produce a depth image. Depth image data comes from a DepthImageFrame, which is produced by the DepthImageStream.

Working with the DepthImageStream is similar to the ColorImageStream. The DepthImageStream and ColorImageStream both share the same parent class ImageStream. We create images from a frame of depth data just as we did with the color stream data. Begin to see the depth stream images by following these steps, which by now should look familiar. They are the same as from the previous chapter where we worked with the color stream.

  1. Create a new WPF Application project.
  2. Add a reference to Microsoft.Kinect.dll.
  3. Add an Image element to MainWindow.xaml and name it “DepthImage”.
  4. Add the necessary code to detect and initialize a KinectSensor object. Refer to Chapter 2 as needed.
  5. Update the code that initializes the KinectSensor object so that it matches Listing 3-1.

    Listing 3-1. Initializing the DepthStream

    this._KinectDevice.DepthStream.Enable();
    this._KinectDevice.DepthFrameReady += KinectDevice_DepthFrameReady;
  6. Add the DepthFrameReady event handler code, as shown in Listing 3-2 For the sake of being brief with the code listing, we are not using the WriteableBitmap to create depth images. We leave this as a refactoring exercise for you to undertake. Refer to Listing 2-5 of Chapter 2 as needed.

    Listing 3-2. DepthFrameReady Event Handler

    using(DepthImageFrame frame = e.OpenDepthImageFrame())
    {
        if(frame != null)
        {
            short[] pixelData = new short[frame.PixelDataLength];
            frame.CopyPixelDataTo(pixelData);
            int stride        = frame.Width * frame.BytesPerPixel;
            DepthImage.Source = BitmapSource.Create(frame.Width, frame.Height, 96, 96,
                                                    PixelFormats.Gray16, null,
                                                  pixelData, stride);
        }
    }
  7. Run the application!

When Kinect has a new depth image frame available for processing, the KinectSensor fires the DepthFrameReady event. Our event handler simply takes the image data and creates a bitmap, which is then displayed in the UI window. The screenshot in Figure 3-1 is an example of the depth stream image. Objects near Kinect are a dark shade of gray or black. The farther an object is from Kinect, the lighter the gray.

images

Figure 3-1. Raw depth image frame

Measuring Depth

The IR or depth camera has a field of view just like any other camera. The field of view of Kinect is limited, as illustrated in Figure 3-2. The original purpose of Kinect was to play video games within the confines of game room or living room space. Kinect's normal depth vision ranges from around two and a half feet (800mm) to just over 13 feet (4000mm). However, a recommended usage range is 3 feet to 12 feet as the reliability of the depth values degrade at the edges of the field of view.

images

Figure 3-2. Kinect field of view

Like any camera, the field of view of the depth camera is pyramid shaped. Objects farther away from the camera have a greater lateral range than objects nearer to Kinect. This means that height and width pixel dimensions, such as 640×480, do not correspond with a physical location in the camera's field of view. The depth value of each pixel, however, does map to a physical distance in the field of view. Each pixel represented in a depth frame is 16 bits, making the BytesPerPixel property of each frame a value of two. The depth value of each pixel is only 13 of the 16 bits, as shown in Figure 3-3.

images

Figure 3-3. Layout of the depth bits

Getting the distance of each pixel is easy, but not obvious. It requires some bit manipulation, which sadly as developers we do not get to do much of these days. It is quite possible that some developers have never used or even heard of bitwise operators. This is unfortunate, because bit manipulation is fun and can be an art. At this point, you are likelythinking something to the effect of, “this guy's a nerd” and you'd be right. Included in the appendix is more instruction and examples of using bit manipulation and math.

As Figure 3-3, shows, the depth value is stored in bits 3 to 15. To get the depth value into a workable form, we have to shift the bits to the right to remove the player index bits. We discuss the significance of the player index bits later. Listing 3-3 shows sample code to get the depth value of a pixel. Refer to Appendix A for a thorough explanation of the bitwise operations used in Listing 3-3. In the listing, the pixelData variable is assumed an array of short values originating from the depth frame. The pixelIndex variable is calculated based on the position of the desired pixel. The Kinect for Windows SDK defines a constant on the DepthImageFrame class, which specifies the number of bits to shift right to get the depth value named PlayerIndexBitmaskWidth. Applications should use this constant instead of using hard-coded literals as the number of bits reserved for players may increase in future releases of Kinect hardware and the SDK.

Listing 3-3. Bit Manipulation to Get Depth

int pixelIndex = pixelX + (pixelY * frame.Width);
int depth = pixelData[pixelIndex] >> DepthImageFrame.PlayerIndexBitmaskWidth

An easy way to see the depth data is to display the actual numbers. Let us update our code to output the depth value of a pixel at a particular location. This demonstration uses the position of the mouse pointer when the mouse is clicked on the depth image. The first step is to create a place to display the depth value. Update the MainWindow.xaml, to look like Listing 3-4

Listing 3-4. New TextBlock to Display Depth Values

<Window x:Class=" BeginningKinect.Chapter3.DepthImage.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="MainWindow" Height="600" Width="800">
             
    <Grid>
        <StackPanel>
            <TextBlock x:Name="PixelDepth" FontSize="48" HorizontalAlignment="Left"/>
            <Image x:Name="DepthImage" Width="640" Height="480"/>
        </StackPanel>
    </Grid>
</Window>

Listing 3-2 shows the code for the mouse-up event handler. Before adding this code, there are a couple of changes to note. The code in Listing 3-5 assumes the project has been refactored to use a WriteableBitmap. The code changes specific to this demonstration start by creating a private member variable named _LastDepthFrame. In the KinectDevice_DepthFrameReady event handler, set the value of the _LastDepthFrame member variable to the current frame each time the DepthFrameReady event fires. Because we need to keep a reference to the last depth frame, the event handler code does not immediately dispose of the frame object. Next, subscribe to the MouseLeftButtonUp event on the DepthFrame image object. When the user clicks the depth image, the DepthImage_MouseLeftButtonUp event handler executes, which locates the correct pixel by the mouse coordinates. The last step is to display the value in the TextBlock named PixelDepth created in Listing 3-4.

Listing 3-5. Response to a Mouse Click

private void KinectDevice_DepthFrameReady(object sender, DepthImageFrameReadyEventArgs e)
{            
    if(this._LastDepthFrame != null)
    {
        this._LastDepthFrame.Dispose();
        this._LastDepthFrame = null;
    }

    this._LastDepthFrame = e.OpenDepthImageFrame();
             
    if(this._LastDepthFrame != null)
    {
        this._LastDepthFrame.CopyPixelDataTo(this._DepthImagePixelData);
        this._RawDepthImage.WritePixels(this._RawDepthImageRect, this._DepthImagePixelData,
                                        this._RawDepthImageStride, 0);
    }
}


private void DepthImage_MouseLeftButtonUp(object sender, MouseButtonEventArgs e)
{
    Point p = e.GetPosition(DepthImage);
             
   if(this._DepthImagePixelData != null && this._DepthImagePixelData.Length > 0)
   {
       int pixelIndex      = (int) (p.X + ((int) p.Y * this._LastDepthFrame.Width));
       int depth           = this._DepthImagePixelData[pixelIndex] >>
                             DepthImageFrame.PlayerIndexBitmaskWidth;
       int depthInches     = (int) (depth * 0.0393700787);
       int depthFt         = depthInches / 12;
       depthInches         = depthInches % 12;
             
       PixelDepth.Text = string.Format("{0}mm ~ {1}'{2}"", depth, depthFt, depthInches);
    }
}

It is important to point out a few particulars with this code. Notice that the Width and Height properties of the Image element are hard-coded (Listing 3-4). If these values are not hard-coded, then the Image element naturally scales with the size of its parent container. If the Image element's dimensions were to be sized differently from the depth frame dimensions, this code returns incorrect data or more likely throw an exception when the image is clicked. The pixel array in the frame is a fixed size based on the DepthImageFormat value given to the Enabled method of the DepthImageStream. Not setting the image size means that it will scale with the size of its parent container, which, in this case, is the application window. If you let the image scale automatically, you then have to perform extra calculations to translate the mouse position to the depth frame dimensions. This type of scaling exercise is actually quite common, as we will see later in this chapter and the chapters that follow, but here we keep it simple and hard-code the output image size.

We calculate the pixel location within the byte array using the position of the mouse within the image and the size of the image. With the pixel's starting byte located, convert the depth value using the logic from Listing 3-3. For completeness, we display the depth in feet and inches in addition to millimeters. All of the local variables only exist to make the code more readable on these pages and do not materially affect the execution of the code.

Figure 3-4 shows the output produced by the code. The depth frame image displays on the screen and provides a point of reference for the user to target. In this screenshot, the mouse is positioned over the palm of the user's hand. On mouse click, the position of the mouse cursor is used to find the depth value of the pixel at that position. With the pixel located, it is easy to extract the depth value.

images

Figure 3-4. Displaying the depth value for a pixel

images Note A depth value of zero means that the Kinect was unable to determine the depth of the pixel. When processing depth data, treat zero depth values as a special case; in most instances, you will disregard them. Expect a zero depth value for any pixel where there is an object too close to the Kinect.

Enhanced Depth Images

Before going any further, we need to address the look of the depth image. It is naturally difficult to see. The shades of gray fall on the darker end of the spectrum. In fact, the images in Figures 3-1 and 3-4 had to be altered with an image-editing tool to be printable in the book! In the next set of exercises, we manipulate the image bits just as we did in the previous chapter. However, there will be a few differences, because as we know, the data for each pixel is different. Following that, we examine how we can colorize the depth images to provide even greater depth resolution.

Better Shades of Gray

The easiest way to improve the appearance of the depth image is to invert the bits. The color of each pixel is based on the depth value, which starts from zero. In the digital color spectrum, black is 0 and 65536 (16-bit gray scale) is white. This means that most depths fall into the darker end of the spectrum. Additionally, do not forget that all undeterminable depths are set to zero. Inverting or complementing the bits shifts the bias towards the lighter end of the spectrum. A depth of zero is now white.

We keep the original depth image in the UI for comparison with the enhanced depth image. Update MainWindow.xaml to include a new StackPanel and Image element, as shown in Listing 3-6. Notice the adjustment to the window's size to ensure that both images are visible without having to resize the window.

Listing 3-6. Updated UI for New Depth Image

<Window x:Class=" BeginningKinect.Chapter3.DepthImage.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="MainWindow" "Height="600" Width="1280">
    <Grid>
        <StackPanel>
            <TextBlock x:Name="PixelDepth" FontSize="48" HorizontalAlignment="Left"/>
             
            <StackPanel Orientation="Horizontal">
                <Image x:Name="DepthImage" Width="640" Height="480"/>
                <Image x:Name="EnhancedDepthImage" Width="640" Height="480"/>
            </StackPanel>
        </StackPanel>
    </Grid>
</Window>

Listing 3-7 shows the code to flip the depth bits to create a better depth image. Add this method to your project code, and call it from the KinectDevice_DepthFrameReady event handler. The simple function of this code is to create a new byte array, and do a bitwise complement of the bits. Also, notice this method filters out some bits by distance. Because we know depth data becomes inaccurate at the edges of the depth range, we set the pixels outside of our threshold range to black. In this example, any pixel greater than 10 feet and closer than 4 feet is white (0xFF).

Listing 3-7. A Light Shade of Gray Depth Image

private void CreateLighterShadesOfGray(DepthImageFrame depthFrame , short[] pixelData)
{      
    int depth;          
    int loThreshold        = 1220;
    int hiThreshold        = 3048;
    short[] enhPixelData   = new short[depthFrame.Width * depthFrame.Height];
             

    for(int i = 0; i < pixelData.Length; i++)
    {
        depth = pixelData[i] >> DepthImageFrame.PlayerIndexBitmaskWidth;
             
        if(depth < loThreshold || depth > hiThreshold)
        {
            enhPixelData [i] = 0xFF;
        }
        else
        {
            enhPixelData [i] = (short) ~pixelData[i];
        }
  }
             
  EnhancedDepthImage.Source = BitmapSource.Create(depthFrame.Width, depthFrame.Height,
                                                  96, 96, PixelFormats.Gray16, null,
                                                  enhPixelData,
                                                  depthFrame.Width *
                                                  depthFrame.BytesPerPixel);
}

Note that a separate method is doing the image manipulation, whereas up to now all frame processing has been performed in the event handlers. Event handlers should contain as little code as possible and should delegate the work to other methods. There may be instances, mostly driven by performance considerations, where the processing work will have to be done in a separate thread. Having the code broken out into methods like this makes these types of changes easy and painless.

Figure 3-5 shows the application output. The two depth images are shown side by side for contrast. The image on the left is the natural depth image output, while the image on the right is produced by the code in Listing 3-7. Notice the distinct inversion of grays.

images

Figure 3-5. Lighter shades of gray

While this image is better, the range of grays is limited. We created a lighter shade of gray and not a better shade of gray. To create a richer set of grays, we expand the image from being a 16-bit grayscale to 32 bits and color. The color gray occurs when the colors (red, blue, and green) have the same value. This gives us a range from 0 to 255. Zero is black, 255 is white, and everything else in between is a shade of gray. To make it easier to switch between the two processed depth images, we create a new version of the method, as shown in Listing 3-8.

Listing 3-8. The Depth Image In a Better Shade of Gray

private void CreateBetterShadesOfGray(DepthImageFrame depthFrame , short[] pixelData)

  int depth;          
  int gray;
  int loThreshold           = 1220;
  int hiThreshold           = 3048;
  int bytesPerPixel         = 4;
  byte[] enhPixelData       = new byte[depthFrame.Width * depthFrame.Height * bytesPerPixel];
             
  for(int i = 0, j = 0; i < pixelData.Length; i++, j += bytesPerPixel)
  {
      depth = pixelData[i] >> DepthImageFrame.PlayerIndexBitmaskWidth;
              
      if(depth < loThreshold || depth > hiThreshold)
      {
          gray = 0xFF;
      }
      else
      {  
          gray = (255 * depth / 0xFFF);
      }
             
      enhPixelData[j]    = (byte) gray;
      enhPixelData[j + 1]  = (byte) gray;
      enhPixelData[j + 2]  = (byte) gray;
  }
  EnhancedDepthImage.Source = BitmapSource.Create(depthFrame.Width, depthFrame.Height,
                                                  96, 96,PixelFormats.Bgr32, null,
                                                  enhPixelData
,                                                 depthFrame.Width * bytesPerPixel);
}

The code in bold in Listing 3-8 represents the differences between this processing of the depth image and the previous attempt (Listing 3-7). The color image format changes to Bgr32, which means there are a total of 32 bits (4 bytes per pixel). Each color gets 8 bits and there are 8 unused bits. This limits the number of possible grays to 255. Any value outside of the threshold range is set to the color white. All other depths are represented in shades of gray. The intensity of the gray is the result of dividing the depth by the 4095(0xFFF), which is the largest possible depth value, and then multiplying by 255. Figure 3-6 shows the three different depth images demonstrated so far in the chapter.

images

Figure 3-6. Different visualizations of the depth image—from left to right: raw depth image, depth image from Listing 3-7, and depth image from Listing 3-8

Color Depth

The enhanced depth image produces a shade of gray for each depth value. The range of grays is only 0 to 255, which is much less than our range of depth values. Using colors to represent each depth value gives more depth to the depth image. While there are certainly more advanced techniques for doing this, a simple method is to convert the depth values into hue and saturation values. Listing 3-9 shows an example of one way to colorize a depth image.

Listing 3-9. Coloring the Depth Image

private void CreateColorDepthImage(DepthImageFrame depthFrame , short[] pixelData)
{
    int depth;
    double hue;   
    int loThreshold       = 1220;
    int hiThreshold       = 3048;
    int bytesPerPixel     = 4;
    byte[] rgb            = new byte[3];
    byte[] enhPixelData   = new byte[depthFrame.Width * depthFrame.Height * bytesPerPixel];
                         

    for(int i = 0, j = 0; i < pixelData.Length; i++, j += bytesPerPixel)
    {
        depth = pixelData[i] >> DepthImageFrame.PlayerIndexBitmaskWidth;
             
        if(depth < loThreshold || depth > hiThreshold)
        {
            enhPixelData[j]       = 0x00;
            enhPixelData[j + 1]   = 0x00;          
            enhPixelData[j + 2]   = 0x00;
        }
        else
        {  
            hue = ((360 * depth / 0xFFF) + loThreshold);
            ConvertHslToRgb(hue, 100, 100, rgb);

            enhPixelData[j]     = rgb[2]; //Blue
            enhPixelData[j + 1] = rgb[1]; //Green
            enhPixelData[j + 2] = rgb[0]; //Red
        }
    }

    EnhancedDepthImage.Source = BitmapSource.Create(depthFrame.Width, depthFrame.Height,
                                                    96, 96, PixelFormats.Bgr32, null,
                                                    enhPixelData,
                                                    depthFrame.Width * bytesPerPixel);
}

Hue values are measured in degrees of a circle and range from 0 to 360. The hue value is proportional to the depth offset integer and the depth threshold. The ConvertHslToRgb method uses a common algorithm to convert the HSL values to RGB values, and is included inthe downloadable code for this book. This example sets the saturation and lightness values to 100%.

The running application generates a depth image like the last image in Figure 3-7. The first image in the figure is the raw depth image, and the middle image is generated from Listing 3-8. Depths closer to the camera are shades of blue. The shades of blue transition to purple, and then to red the farther from Kinect the object is. The values continue along this scale.

images

Figure 3-7. Color depth image compared to grayscale

You will notice that the performance of the application is suddenly markedly sluggish. It takes a copious amount of work to convert each pixel (640×480 = 307200 pixels!) into a color value using this method. We do not recommend you do this work on the UI thread as we have in this example. A better approach is to do this work on a background thread. Each time the KinectSensor fires the frame-ready event, your code stores the frame in a queue. A background thread would continuously convert the next frame in the queue to a color image. After the conversion, the background thread uses WPF's Dispatcher to update the Image source on the UI thread. This type of application architecture is very common in Kinect-based applications, because the work necessary to process the depth data is performance intensive. It is bad application design to do this type of work on the UI as it will lower the frame rate and ultimately create a bad user experience.

Simple Depth Image Processing

To this point, we have extracted the depth value of each pixel and created images from the data. In previous examples, we filtered out pixels that were beyond certain threshold values. This is a form of image processing, not surprisingly called thresholding. Our use of thresholding, while crude, suits our needs. More advanced processes use machine learning to calculate threshold values for each frame.

images Note Kinect returns 4096 (0 to 4095) possible depth values. Since a zero value always means the depth is undeterminable, it can always be filtered out. Microsoft recommends using only depths from 4 to 12.5 feet. Before doing any other depth processing, you can build thresholds into your application and only process depth ranging from 1220 (4') to 3810 (12.5').

Using statistics is common when processing depth image data. Thresholds can be calculated based on the mean or median of depth values. Probabilities help determine if a pixel is noise, a shadow, or something of greater meaning, such as being part of a user's hand. If you allow your mind to forget the visual meaning of a pixel, it transitions into raw data at which point data mining techniques become applicable. The motivation behind processing depth pixels is to perform shape and object recognition. With this information, applications can determine where a user is in relation to Kinect, where that user's hand is, and if that hand is in the act of waving.

Histograms

The histogram is a tool for determining statistical distributions of data. Our concern is the distribution of depth data. Histograms visually tell the story of how recurrent certain data values are for a given data set. From a histogram we discern how frequently and how tightly grouped depth values are. With this information, it is possible to make decisions that determine thresholds and other filtering techniques, which ultimately reveal the contents of the depth image. To demonstrate this, we next build and display a histogram from a depth frame, and then use simple techniques to filter unwanted pixels.

Let's start fresh and create a new project. Perform the standard steps of discovering and initializing a KinectSensor object for depth-only processing, including subscribing to the DepthFrameReady event. Before adding the code to build the depth histogram, update the MainWindow.xaml with the code shown in Listing 3-10.

Listing 3-10. Depth Histogram UI

<Window x:Class=" BeginningKinect.Chapter3.DepthHistograms.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="MainWindow" Height="800" Width="1200">
    <Grid>
        <StackPanel>
            <StackPanel Orientation="Horizontal">
                <Image x:Name="DepthImage" Width="640" Height="480"/>
                <Image x:Name="FilteredDepthImage" Width="640" Height="480"/>
            </StackPanel>

            <ScrollViewer Margin="0,15" HorizontalScrollBarVisibility="Auto"
                                        VerticalScrollBarVisibility="Auto">
                <StackPanel x:Name="DepthHistogram" Orientation="Horizontal" Height="300"/>
            </ScrollViewer>
        </StackPanel>
    </Grid>
</Window>

Our approach to creating the histogram is simple. We create a series of Rectangle elements and add them to the DepthHistogram (StackPanel element). While the graph will not have a high fidelity for this demonstration, it serves us well. Most applications calculate histogram data and use it for internal processing only. However, if our intent were to include the histogram data within the UI, we would certainly put more effort into the look and feel of the graph. The code to build and display the histogram is shown in Listing 3-11.

Listing 3-11. Building a Depth Histogram

private void KinectDevice_DepthFrameReady(object sender, ImageFrameReadyEventArgs e)
{
    using(DepthImageFrame frame = e.OpenDepthImageFrame())
    {
        if(frame != null)
        {
            frame.CopyPixelDataTo(this._DepthPixelData);
            CreateBetterShadesOfGray(frame, this._DepthPixelData); //See Listing 3-8
            CreateDepthHistogram(frame, this._DepthPixelData);
        }
    }
}

private void CreateDepthHistogram(DepthImageFrame depthFrame , short[] pixelData )
{
   int depth;
   int[] depths            = new int[4096];
   int maxValue            = 0;
   double chartBarWidth    = DepthHistogram.ActualWidth / depths.Length;
             
   DepthHistogram.Children.Clear();
             
   //First pass - Count the depths.
   for(int i = 0; i < pixels.Length; i += depthFrame.BytesPerPixel)
   {
       depth = pixelData[i] >> DepthImageFrame.PlayerIndexBitmaskWidth;
             
       if(depth != 0)
       {
           depths[depth]++;
       }
  }
             
  //Second pass - Find the max depth count to scale the histogram to the space available.
  //              This is only to make the UI look nice.
  for(int i = 0; i < depths.Length; i++)
  {
      maxValue = Math.Max(maxValue, depths[i]);
  }
             
  //Third pass - Build the histogram.
  for(int i = 0; i < depths.Length; i++)
  {
      if(depths[i] > 0)
      {
          Rectangle r         = new Rectangle();
          r.Fill              = Brushes.Black;
          r.Width             = chartBarWidth;
          r.Height            = DepthHistogram.ActualHeight *
                                (depths[i] / (double) maxValue);
          r.Margin            = new Thickness(1,0,1,0);
          r.VerticalAlignment = System.Windows.VerticalAlignment.Bottom;
          DepthHistogram.Children.Add(r);
      }
   }
}

Building the histogram starts by creating an array to hold a count for each depth. The array size is 4096, which is the number of possible depth values. The first step is to iterate through the depth image pixels, extract the depth value, and increment the depth count in the depths array to the frequency of each depth value. Depth values of zero are ignored, because they represent out-of-range depths. Figure 3-8 shows a depth image with a histogram of the depth values. The depth values are along the X-axis. The Y-axis represents the frequency of the depth value in the image.

images

Figure 3-8. Depth image with histogram

As you interact with the application, it is interesting (and cool) to see how the graph flows and changes as you move closer and farther away from Kinect. Grab a friend and see the results as multiple users are in view. Another test is to add different objects, the larger the better, and place them in the view area to see how this affects the histogram. Take notice of the two spikes at the end of the graph in Figure 3-8 These spikes represent the wall in this picture. The wall is about seven feet from Kinect, whereas the user is roughly five feet away. This is an example of when to employ thresholding. In this instance, it is undesirable to include the wall. The images shown in Figure 3-9 are the result of hard coding a threshold range of 3-6.5 feet. Notice how the distribution of depth changes.

images

Figure 3-9. Depth images and histogram with the wall filtered out of the image; the second image (right) shows the user holding a newspaper approximately two feet in front of the user

While watching the undulations in the graph change in real time is interesting, you quickly begin wondering what the next steps are. What else can we do with this data and how can it be useful in an application? Analysis of the histogram can reveals peaks and valleys in the data. By applying image-processing techniques, such as thresholding to filter out data, the histogram data can reveal more about the image. Further application of other data processing techniques can reduce noise or normalize the data, lessening the differences between the peaks and valleys. As a result of the processing, it then becomes possible to detect edges of shapes and blobs of pixels. The blobs begin to take on recognizable shapes, such as people, chairs, or walls.

Further Reading

A study of image-processing techniques falls far beyond the scope of this chapter and book. The purpose here is show that raw depth data is available to you, and help you understand possible uses of the data. More than likely, your Kinect application will not need to process depth data extensively. For applications that require depth data processing, it quickly becomes necessary to use tools like the OpenCV library. Depth image processing is often resource intensive and needs to be executed at a lower level than is achievable with a language like C#.

images Note The OpenCV (Open Source Computer Vision – opencv.willowgarage.com) library is a collection of commonly used algorithms for processing and manipulating images. This group is also involved in the Point Cloud Library (PCL) and Robot Operating System (ROS), both of which involve intensive processing of depth data. Anyone looking beyond beginner's material should research OpenCV.

The more common reason an application would process raw depth data is to determine the positions of users in Kinect's view area. While the Microsoft Kinect SDK actually does much of this work for you through skeleton tracking, your application needs may go beyond what the SDK provides. In the next section, we walk through the process of easily detecting the pixels that belong to users. Before moving on, you are encouraged to research and study image-processing techniques. Below are several topics to help further your research:

  • Image Processing (general)
    • Thresholding
    • Segmentation
  • Edge/Contour Detection
    • Guaussian filters
    • Sobel, Prewitt, and Kirsh
    • Canny-edge detector
    • Roberts'Cross operator
    • Hough Transforms
  • Blob Detection
  • Laplacian of the Guaussian
  • Hessian operator
  • k-means clustering

Depth and Player Indexing

The SDK has a feature that analyzes depth image data and detects human or player shapes. It recognizes as many six players at a time. The SDK assigns a number to each tracked player. The number or player index is stored in the first three bits of the depth pixel data (Figure 3-10). As discussed in an earlier section of this chapter, each pixel is 16 bits. Bits 0 to 2 hold the player index value, and bits 3 to 15 hold the depth value. A bit mask of 7 (0000 0111) gets the player index from the depth value. For a detailed explanation of bit masks, refer to Appendix A. Fortunately, the Kinect SDK defines a pair of constants focused on the player index bits. They are DepthImageFrame.PlayerIndexBitmaskWidth and DepthImageFrame.PlayerIndexBitmask. The value of the former is 3 and the latter is 7. Your application should use these constants and not use the literal values as the values maychange in future versions of the SDK.

images

Figure 3-10. Depth and player index bits

A pixel with a player index value of zero means no player is at that pixel, otherwise players are numbered 1 to 6. However, enabling only the depth stream does not activate player tracking. Player tracking requires skeleton tracking. When initializing the KinectSensor object and the DepthImageStream, you must also enable the SkeletonStream. Only with the SkeletonStream enabled will player index values appear in the depth pixel bits. Your application does not need to subscribe to the SkeletonFrameReady event to get player index values.

Let's explore the player index bits. Create a new project that discovers and initializes a KinectSensor object. Enable both DepthImageStream and SkeletonStream, and subscribe to the DepthFrameReady event on the KinectSensor object. In the MainWindow.xaml add two Image elements named RawDepthImage and EnhDepthImage. Add the member variables and code to support creating images using the WriteableBitmap. Finally, add the code in Listing 3-12. This example changes the value of all pixels associated with a player to black and all other pixels to white. Figure 3-11 shows the output of this code. For contrast, the figure shows the raw depth image on the left.

Listing 3-12. Displaying Users in Black and White

private void KinectDevice_DepthFrameReady(object sender, DepthImageFrameReadyEventArgs e)
{
    using(DepthIm*ageFrame frame = e.OpenDepthImageFrame())
    {
        if(frame != null)
        {
            frame.CopyPixelDataTo(this._RawDepthPixelData);
            this._RawDepthImage.WritePixels(this._RawDepthImageRect, this._RawDepthPixelData,
                                            this._RawDepthImageStride, 0);
           CreatePlayerDepthImage(frame, this._RawDepthPixelData);
       }
   }
}


private void GeneratePlayerDepthImage(DepthImageFrame depthFrame, short[] pixelData)
{
    int playerIndex;   
    int depthBytePerPixel = 4;
    byte[] enhPixelData  = new byte[depthFrame.Height * this._EnhDepthImageStride];
             
    for(int i = 0, j = 0; i < pixelData.Length; i++, j += depthBytePerPixel)
    {         
        playerIndex = pixelData[i] & DepthImageFrame.PlayerIndexBitmask;
                  
        if(playerIndex == 0)
        {
            enhPixelData[j]     = 0xFF;
            enhPixelData[j + 1] = 0xFF;
            enhPixelData[j + 2] = 0xFF;
        }
        else
        {
            enhPixelData[j]     = 0x00;    
            enhPixelData[j + 1] = 0x00;
            enhPixelData[j + 2] = 0x00;
        }
    }
   
    this._EnhDepthImage.WritePixels(this._EnhDepthImageRect, enhPixelData,
                                    this._EnhDepthImageStride, 0);
}
images

Figure 3-11. Raw depth image (left) and processed depth image with player indexing (right)

There are several possibilities for enhancing this code with code we wrote earlier in this chapter. For example, you can apply a grayscale to the player pixels based on the depth and black out all other pixels. In such a project, you could build a histogram of the player's depth value and then determine the grayscale value of each depth in relation to the histogram. Another common exercise is to apply a solid color to each different player where Player 1's pixels red, Player 2 blue, Player 3 green and so on. The KinectExplorer sample application that comes with the SDK does this. You could apply, of course, the color intensity for each pixel based on the depth value too. Since the depth data is the differentiating element of Kinect, you should use the data wherever and as much as possible.

As a word of caution, do not code to specific player indexes as they are volatile. The actual player index number is not always consistent and does not coordinate with the actual number of visible users. Forexample, a single user might be in view of Kinect, but Kinect will return aplayer index of three for that user's pixels. To demonstrate this, update the code to display a list of player indexes for all visible users. You will notice that sometimes when there is only a single user, Kinect does not always identify that user as player 1. To test this out, walk out of view, wait for about 5 seconds, and walk back in. Kinect will identify you as a new player. Grab several friends to further test this by keeping one person in view at all times and have the others walk in and out of view. Kinect continually tracks users, but once a user has left the view area, it forgets about them. This is just something to keep in mind as you as develop your Kinect application.

Taking Measure

An interesting exercise is to measure the pixels of the user. As discussed in the Measuring Depth section of this chapter, the X and Y positions of the pixels do not coordinate to actual width or height measurements; however, it is possible to calculate them. Every camera has a field of view. The focal length and size of the camera's sensor determines the angles of the field. The Microsoft's Kinect SDK Programming Guide tells us that the view angles are 57 degrees horizontal and 43 vertical. Since we know the depth values, we can determine the width and height of a player using trigonometry, as illustrated in Figure 3-12, where we calculate a player's width.

images

Figure 3-12. Finding the player real world width

The process described below is not perfect and in certain circumstances can result in inaccurate and distorted values, however, so too is the data returned by Kinect. The inaccuracy is due to the simplicity of the calculations and they do not take into account other physical attributes of the player and space. Despite this, the values are accurate enough for most uses. The motivation here is to provide an introductory example of how Kinect data maps to the real world. You are encouraged to research the physics behind camera optics and field of view so that you can update this code to ensure the output is more accurate.

Let us walk through the math before diving into the code. As Figure 3-12 shows, the angle of view of the camera is an isosceles triangle with the player's depth position forming the base. The actual depth value is the height of the triangle. We can evenly split the triangle in half to create two right triangles, which allows us to calculate the width of the base. Once we know the width of the base, we translate pixel widths into real-world widths. For example, if we calculate the base of the triangle to have a width of 1500mm (59in), the player's pixel width to be 100, and the pixel width of the image to be 320, then the result is a player width of 468.75mm (18.45in). For us to perform the calculation, we need to know the player's depth and the number of pixels wide the player spans. We take an average of depths for each of the player's pixels. This normalizes the depth because in reality no person is completely flat. If that were true, it certainly would make our calculations much easier. The calculation is the same for the player's height, but with a different angle and image dimension.

Now that we know the logic we need to perform, let us walk through the code. Create a new project that discovers and initializes a KinectSensor object. Enable both DepthStream and SkeletonStream, and subscribe to the DepthFrameReady event on the KinectSensor object. Code the MainWindow.xaml to match Listing 3-13.

Listing 3-13. The UI for Measuring Players

<Window x:Class="BeginningKinect.Chapter3.TakingMeasure.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="MainWindow" Height="800" Width="1200">
                         
    <Grid>
        <StackPanel Orientation="Horizontal">
            <Image x:Name="DepthImage"/>
              
            <ItemsControl x:Name="PlayerDepthData" Width="300" TextElement.FontSize="20">
                <ItemsControl.ItemTemplate>
                    <DataTemplate>
                        <StackPanel Margin="0,15">
                            <StackPanel Orientation="Horizontal">
                                <TextBlock Text="PlayerId:"/>
                                <TextBlock Text="{Binding Path=PlayerId}"/>
                            </StackPanel>
                            <StackPanel Orientation="Horizontal">
                                <TextBlock Text="Width:"/>
                                <TextBlock Text="{Binding Path=RealWidth}"/>
                            </StackPanel>
                            <StackPanel Orientation="Horizontal">
                                <TextBlock Text="Height:"/>
                                <TextBlock Text="{Binding Path=RealHeight}"/>
                            </StackPanel>
                        </StackPanel>
                    </DataTemplate>
                </ItemsControl.ItemTemplate>
            </ItemsControl>
        </StackPanel>
    </Grid>
</Window>

The purpose of the ItemsControl is to display player measurements. Our approach is to create an object to collect player depth data and perform the calculations determining the real width and height values of the user. The application maintains an array of these objects and the array becomes the ItemsSource for the ItemsControl. The UI defines a template to display relevant data for each player depth object, which we will call PlayerDepthData. Before creating this class, let's review the code that interfaces with the class it to see how to it is used. Listing 3-14 shows a method named CalculatePlayerSize, which is called from the DepthFrameReady event handler.

Listing 3-14. Calculating Player Sizes

private void KinectDevice_DepthFrameReady(object sender, DepthImageFrameReadyEventArgs e)
{
    using(DepthImageFrame frame = e.OpenDepthImageFrame())
    {
        if(frame != null)
        {
            frame.CopyPixelDataTo(this._DepthPixelData);
            CreateBetterShadesOfGray(frame, this._DepthPixelData);
            CalculatePlayerSize(frame, this._DepthPixelData);
        }
    }
}


private void CalculatePlayerSize(DepthImageFrame depthFrame, short[] pixelData)
{
    int depth;
    int playerIndex;
    int pixelIndex;      
    int bytesPerPixel = depthFrame.BytesPerPixel;           
    PlayerDepthData[] players = new PlayerDepthData[6];


    //First pass - Calculate stats from the pixel data
    for(int row = 0; row < depthFrame.Height; row++)
    {
        for(int col = 0; col < depthFrame.Width; col++)
        {
            pixelIndex = col + (row * depthFrame.Width);
            depth = pixelData[pixelIndex] >> DepthImageFrame.PlayerIndexBitmaskWidth;
               
            if(depth != 0)
            {
                playerIndex = (pixelData[pixelIndex] & DepthImageFrame.PlayerIndexBitmask);
                playerIndex -= 1;
        
                if(playerIndex > -1)
                {
                    if(players[playerIndex] == null)
                    {
                        players[playerIndex] = new PlayerDepthData(playerIndex + 1,
                                                depthFrame.Width,depthFrame.Height);
                    }

                    players[playerIndex].UpdateData(col, row, depth);
                }
            }
        }
    }
      
      
    PlayerDepthData.ItemsSource = players;
}

The bold lines of code in Listing 3-14 reference uses of the PlayerDepthData object in some way. The logic of the CalculatePlayerSize method goes pixel by pixel through the depth image and extracts the depth and player index values. The algorithm ignores any pixel with a depth value of zero and not associated with a player. For any pixel belonging to a player, the code calls the UpdateData method on the PlayerDepthData object of that player. After processing all pixels, the code sets the player's array to be the source for the ItemsControl named PlayerDepthData. The real work of calculating each player's size is encapsulated within the PlayerDepthData object, which we'll turn our attention to now.

Create a new class named PlayerDepthData. The code is shown in Listing 3-15. This object is the workhorse of the project. It holds and maintains player depth data, and calculates the real-world width accordingly.

Listing 3-15. Object to Hold and Maintain Player Depth Data

public class PlayerDepthData
{
    #region Member Variables
    private const double MillimetersPerInch       = 0.0393700787;
    private static readonly double HorizontalTanA = Math.Tan(28.5 * Math.PI / 180);
    private static readonly double VerticalTanA   = Math.Abs(Math.Tan(21.5 * Math.PI / 180));
             
    private int _DepthSum;
    private int _DepthCount;
    private int _LoWidth;
    private int _HiWidth;
    private int _LoHeight;
    private int _HiHeight;
    #endregion Member Variables


    #region Constructor
    public PlayerDepthData(int playerId, double frameWidth, double frameHeight)
    {
        this.PlayerId     = playerId;
        this.FrameWidth   = frameWidth;
        this.FrameHeight  = frameHeight;
        this._LoWidth     = int.MaxValue;
        this._HiWidth     = int.MinValue;
        this._LoHeight    = int.MaxValue;
        this._HiHeight    = int.MinValue;
    }
    #endregion Constructor


    #region Methods
    public void UpdateData(int x, int y, int depth)
    {
        this._DepthCount++;
        this._DepthSum  += depth;
        this._LoWidth    = Math.Min(this._LoWidth, x);
        this._HiWidth    = Math.Max(this._HiWidth, x);
        this._LoHeight   = Math.Min(this._LoHeight, y);
        this._HiHeight   = Math.Max(this._HiHeight, y);
    }
    #endregion Methods


    #region Properties
    public int PlayerId { get; private set; }
    public double FrameWidth { get; private set; }
    public double FrameHeight { get; private set; }


    public double Depth
    {
        get { return this._DepthSum / (double) this._DepthCount; }
    }


    public int PixelWidth
    {
        get { return this._HiWidth - this._LoWidth; }
    }


    public int PixelHeight
    {
        get { return this._HiHeight - this._LoHeight; }
    }


    public double RealWidth
    {
        get
        {
            double opposite = this.Depth * HorizontalTanA;
            return this.PixelWidth * 2 * opposite / this.FrameWidth * MillimetersPerInch;
        }
    }

    public double RealHeight
    {
        get
        {
            double opposite = this.Depth * VerticalTanA;
            return this.PixelHeight * 2 * opposite / this.FrameHeight * MillimetersPerInch;
        }
    }

    #endregion Properties
}

The primary reason the PlayerDepthData class exists is to encapsulate the measurement calculations and make the process easier to understand. The class accomplishes this by having two input points and two outputs. The constructor and the UpdateData method are the two forms of input and the RealWidth and RealHeight properties are the output. The code behind each of the output properties calculates the result based on the formulas detailed in Figure 3-12. Each formula relies on a normalized depth value, measurement of the frame (width or height), and the total pixels consumed by the player. The normalized depth and total pixel measure derive from data passed to the UpdateData method. The real width and height values are only as good as the data supplied to the UpdateData method.

Figure 3-13 shows the results of this project. Each frame exhibits a user in different poses. The images show a UI different from the one in our project in order to better illustrate the player measurement calculations. The width and height calculations adjust for each altered posture. Note that the width and height values are only for the visible area. Take the first frame of Figure 3-13. The user's height is not actually 42 inches, but the height of the user seen by Kinect is 42 inches. The user's real height is 74 inches, which means that only just over half of the user is visible. The width value has a similar caveat.

images

Figure 3-13. Player measurements in different poses

Aligning Depth and Video Images

In our previous examples, we altered the pixels of the depth image to better indicate which pixels belong to users. We colored the player pixels and altered the color of the non-player pixels. However, there are instances where you want to alter the pixels in the video image based on the player pixels. There is an effect used by moviemakers called green screening or, more technically, chroma keying. This is where an actor stands in front of a green backdrop and acts out a scene. Later, the backdrop is edited out of the scene and replaced with some other type of background. This is common in sci-fi movies where it is impossible to send actors to Mars, for example, to perform a scene. We can create this same type of effect with Kinect, and the Microsoft SDK makes this easy. The code to write this type of application is not much different from what we have already coded in this chapter.

images Note This type of application is a basic example of augmented reality experience. Augmented reality applications are extremely fun and captivatingly immersive experiences. Many artists are using Kinect to create augmented reality interactive exhibits. Additionally, these types of experiences are used as tools for advertising and marketing.

We know how to get Kinect to tell us which pixels belong to users, but only for the depth image. Unfortunately, the pixels of the depth image do not translate one-to-one with those created by the color stream, even if you set the resolutions of each stream to the same value. The pixels of the two cameras are not aligned because they are positioned on Kinect just like the eyes on your face. Your eyes see in stereo in what is called stereovision. Close your left eye and notice how your view of the world is different. Now close your right eye and open your left. What you see is different from what you saw when only your right eye was open. When both of your eyes are open, your brain does the work to merge the images you see from each eye into one.

The calculations required to translate pixels from one camera to the other is not trivial either. Fortunately, the SDK provides methods that do the work for us. The methods are located on the KinectSensor and are named MapDepthToColorImagePoint, MapDepthToSkeletonPoint, MapSkeletonPointToColor, and MapSkeletonPointToDepth. The DepthImageFrame object has methods with slightly different names, but function the same (MapFromSkeletonPoint, MapToColorImagePoint, and MapToSkeletonPoint). For this project, we use the MapDepthToColorImagePoint method to translate a depth image pixel position into a pixel position on a color image. In case you are wondering, there is not a method to get the depth pixel based on the coordinates of a color pixel.

Create a new project and add two Image elements to the MainWindow.xaml layout. The first image is the background and can be hard-coded to whatever image you want. The second image is the foreground and is the image we will create. Listing 3-16 shows the XAML for this project.

Listing 3-16. Green Screen App UI

<Window x:Class="Apress.BeginningKinect.Chapter3.GreenScreen.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="MainWindow">
    <Grid>
        <Image Source="/WineCountry.JPG" />
        <Image x:Name="GreenScreenImage"/>
    </Grid>
</Window>

In this project, we will employ polling to ensure that the color and depth frames are as closely aligned as possible. The cut out is more accurate the closer the frames are in Timestamp, and every millisecond counts. While it is possible to use the AllFramesReady event on the KinectSensor object, this does not guarantee that the frames given by the event arguments are close in time with one another. The frames will never be in complete synchronization, but the polling model gets the frames as close as possible. Listing 3-17 shows the infrastructure code to discover a device, enable the streams, and poll for frames.

Listing 3-17. Polling Infrastructure

#region Member Variables
private KinectSensor _KinectDevice;
private WriteableBitmap _GreenScreenImage;
private Int32Rect _GreenScreenImageRect;
private int _GreenScreenImageStride;
private short[] _DepthPixelData;
private byte[] _ColorPixelData;
#endregion Member Variables


private void CompositionTarget_Rendering(object sender, EventArgs e)
{
    DiscoverKinect();  


    if(this.KinectDevice != null)
    {
        try
        {
            ColorImageStream colorStream = this.KinectDevice.ColorStream;
            DepthImageStream depthStream = this.KinectDevice.DepthStream;

            using(ColorImageFrame colorFrame = colorStream.OpenNextFrame(100))
            {
                using(DepthImageFrame depthFrame = depthStream.OpenNextFrame(100))
                {
                   RenderGreenScreen(this.KinectDevice, colorFrame, depthFrame);
                }
            }
        }
        catch(Exception)
        {
            //Handle exception as needed
        }
    }
}


private void DiscoverKinect()
{
    if(this._KinectDevice != null && this._KinectDevice.Status != KinectStatus.Connected)
    {
        this._KinectDevice.ColorStream.Disable();
        this._KinectDevice.DepthStream.Disable();
        this._KinectDevice.SkeletonStream.Disable();
        this._KinectDevice.Stop();
        this._KinectDevice = null;
    }


    if(this._KinectDevice == null)
    {
        this._KinectDevice = KinectSensor.KinectSensors.FirstOrDefault(x => x.Status ==
                                                                      KinectStatus.Connected);


        if(this._KinectDevice != null)
        {
           this._KinectDevice.SkeletonStream.Enable();
           this._KinectDevice.DepthStream.Enable(DepthImageFormat.Resolution640x480Fps30);
           this._KinectDevice.ColorStream.Enable(ColorImageFormat.RgbResolution1280x960Fps12);


           DepthImageStream depthStream = this._KinectDevice.DepthStream;
           this._GreenScreenImage       = new WriteableBitmap(depthStream.FrameWidth,
                                                              depthStream.FrameHeight, 96, 96,
                                                              PixelFormats.Bgra32, null);
           this._GreenScreenImageRect   = new Int32Rect(0, 0,
                                                      (int) Math.Ceiling(depthStream.Width),
                                                      (int) Math.Ceiling(depthStream.Height));
           this._GreenScreenImageStride = depthStream.FrameWidth * 4;              
           this.GreenScreenImage.Source = this._GreenScreenImage;

           this._DepthPixelData = new short[depthStream.FramePixelDataLength];
           int colorFramePixelDataLength =
           this._ColorPixelData = new
byte[this._KinectDevice.ColorStream.FramePixelDataLength];

           this._KinectDevice.Start();
        }

    }
}

The basic implementation of the polling model in Listing 3-17 should be common and straightforward by now. There are a few lines of code of note, which are marked in bold. The first line of code in bold is the method call to RenderGreenScreen. Comment out this line of code for now. We implement it next. The next two lines code in bold, which enable the color and depth stream, factor into the quality of our background subtraction process. When mapping between the color and depth images, it is best that the color image resolution be twice that of the depth stream, to ensure the best possible pixel translation.

The RenderGreenScreen method does the actual work of this project. It creates a new color image by removing the non-player pixels from the color image. The algorithm starts by iterating over each pixel of the depth image, and determines if the pixel has a valid player index value. The next step is to get the corresponding color pixel for any pixel belonging to a player, and add that pixel to a new byte array of pixel data. All other pixels are discarded. The code for this method is shown in Listing 3-18.

Listing 3-18. Performing Background Substraction

private void RenderGreenScreen(KinectSensor kinectDevice, ColorImageFrame colorFrame,
                               DepthImageFrame depthFrame)
{
    if(kinectDevice != null && depthFrame != null && colorFrame != null)
    {
        int depthPixelIndex;
        int playerIndex;
        int colorPixelIndex;
        ColorImagePoint colorPoint;
        int colorStride         = colorFrame.BytesPerPixel * colorFrame.Width;
        int bytesPerPixel       = 4;
        byte[] playerImage      = new byte[depthFrame.Height * this._GreenScreenImageStride];
        int playerImageIndex    = 0;

        depthFrame.CopyPixelDataTo(this._DepthPixelData);
        colorFrame.CopyPixelDataTo(this._ColorPixelData);


        for(int depthY = 0; depthY < depthFrame.Height; depthY++)
        {
            for(int depthX = 0; depthX < depthFrame.Width; depthX++,
                                                           playerImageIndex += bytesPerPixel)
            {
                depthPixelIndex = depthX + (depthY * depthFrame.Width);
                playerIndex     = this._DepthPixelData[depthPixelIndex]
                                  DepthImageFrame.PlayerIndexBitmask;

                if(playerIndex != 0)
                {
                    colorPoint = kinectDevice.MapDepthToColorImagePoint(depthX, depthY,
                                              this._DepthPixelData[depthPixelIndex],
                                              colorFrame.Format, depthFrame.Format);
                    colorPixelIndex = (colorPoint.X * colorFrame.BytesPerPixel) +
                                      (colorPoint.Y * colorStride);
                    playerImage[playerImageIndex] =
                                          this._ColorPixelData[colorPixelIndex];  //Blue
                    playerImage[playerImageIndex + 1] =
                                          this._ColorPixelData[colorPixelIndex + 1];  //Green
                    playerImage[playerImageIndex + 2] =
                                          this._ColorPixelData[colorPixelIndex + 2];  //Red
                    playerImage[playerImageIndex + 3] = 0xFF;  //Alpha
                }
            }
        }

        this._GreenScreenImage.WritePixels(this._GreenScreenImageRect, playerImage,
                                           this._GreenScreenImageStride, 0);
    }
}

The byte array playerImage holds the color pixels belonging to players. Since the depth image is the source of our player data input, it becomes the lowest common denominator. The image created from these pixels is the same size as the depth image. Unlike the depth image, which uses two bytes per pixel, the player image uses four bytes per pixel: blue, green, red, and alpha. The alpha bits are important to this project as they determine the transparency of each pixel. The player pixels get set to 255 (0xFF), meaning they are fully opaque, whereas the non-player pixels get a value of zero and are transparent.

The byte array playerImage holds the color pixels belonging to players. Since the depth image is the source of our player data input, it becomes the lowest common denominator. The image created from these pixels is the same size as the depth image. Unlike the depth image, which uses two bytes per pixel, the player image uses four bytes per pixel: blue, green, red, and alpha. The alpha bits are important to this project as they determine the transparency of each pixel. The player pixels get set to 255 (0xFF), meaning they are fully opaque, whereas the non-player pixels get a value of zero and are transparent.

The MapDepthToColorImagePoint takes in the depth pixel coordinates and the depth, and returns the color coordinates. The format of the depth value requires mentioning. The mapping method requires the raw depth value including the player index bits; otherwise the returned result is incorrect.

The remaining code of Listing 3-16 extracts the color pixel values and stores them in the playerImage array. After processing all depth pixels, the code updates the pixels of the player bitmap. Run this program, and it is quickly apparent the effect is not perfect. It works perfectly when the user stands still. However, if the user moves quickly, the process breaks down, because the depth and color frames cannot stay aligned. Notice in Figure 3-14, the pixels on the user's left side are not crisp and show noise. It is possible to fix this, but the process is non-trivial. It requires smoothing of the pixels around the player. For the best results, it is necessary to merge several frames of images into one. We pick this project back up in Chapter 8 to demonstrate how tools like OpenCV can do this work for us.

images

Figure 3-14. A visit to wine country

Depth Near Mode

The original purpose for Kinect was to serve as a game control for the Xbox. The Xbox is primarily played in a living room space where the user is a few feet away from the TV screen and Kinect. After the initial release, developers all over the world began building applications using Kinect on PCs. Several of these PC-based applications require Kinect to see or focus at a much closer range than is available with the original hardware. The developer community called on Microsoft to update Kinect so that Kinect could return depth data for distances nearer than 800mm (31.5 inches).

Microsoft answered by releasing new hardware specially configured for use on PCs. The new hardware goes by the name Kinect for Windows and the original hardware by Kinect for Xbox. The Kinect for Windows SDK has a number of API elements specific to the new hardware. The Range property sets the view range of the Kinect sensor. The Range property is of DepthRange—an enumeration with two options, as showing in Table 3-1. All depth ranges are inclusive.

images

The Range property can be changed dynamically while the DepthImageStream is enabled and producing frames. This allows for dynamic and quick changes in focus as needed without having to restart the KinectSensor of the DepthImageStream. However, the Range property is sensitive to the type of Kinect hardware being used. Any change to the Range property to DepthRange.Near when using Kinect for Xbox hardware results in an InvalidOperationExeception exception with a message of, “The feature is not supported by this version of the hardware.” Near mode viewing is only supported by Kinect for Windows hardware.

Two additional properties accompany the near depth range feature. They are MinDepth and MaxDepth. These properties describe the boundaries of Kinect's depth range. Both values update on any change to the Range property value.

One final feature of note with the depth stream is the special treatment of depth values that exceed the boundaries of the depth range. The DepthImageStream defines two properties named TooFarDepth and TooNearDepth, which give the application more information about the out of range depth. There are instances when a depth is completely indeterminate and is give a value equal to the UnknownDepth property on the DepthImageStream.

Summary

Depth is fundamental to Kinect. Depth is what differentiates it from all other input devices. Understanding how to work with Kinect's depth data is equally fundamental to developing Kinect experiences. Any Kinect application that does not incorporate depth is underutilizing the hardware, and ultimately limiting the user experience from reaching its fullest potential. While not every application needs to access and process the raw depth data, as a developer or application architect you need to know this data is available and how to exploit it to the benefit of the user experience. Further, while your application may not process the data directly, it will receive a derivative of the data. Kinect processes the original depth data to determine which pixels belong to each user. The skeleton tracking engine component of the SDK performs more extensive processing of depth data to produce user skeleton information.

It is less frequent for a real-world Kinect experience to use the raw depth data directly. It is more common to use third-party tools such as OpenCV to process this data, as we will show in Chapter 9. Processing of raw depth data is not always a trivial process. It also can have extreme performance demands. This alone means that a managed language like C# is not always the best tool for the job. This is not to say it is impossible, but often requires a lower level of processing than C# can provide. If the kind of depth image processing you want to do is unachievable with an existing third-party library, create your own C/C++ library to do the processing. Your WPF application can then use it.

Depth data comes in two forms. The SDK does some processing to determine which pixels belong to a player. This is powerful information to have and provides a basis for at least rudimentary image processing to build interactive experiences. By creating simple statistics around a player's depth values, we can tell when a player is in view area of Kinect or more specifically, where they are in relation to the entire viewing area. Using the data your application could perform some action like play a sound clip of applause when Kinect detects a new user, or a series “boo” sounds when a user leaves the view area. However, before you run off and start writing code that does this, wait until next chapter when we introduce skeleton tracking. The Kinect for Windows SDK's skeleton tracking engine makes this an easier task. The point is that the data is available for you to use if your application needs it.

Calculating the dimensions of objects in real-world space is one reason to process depth data. In order to do this you must understand the physics behind camera optics, and be proficient in trigonometry. The view angles of Kinect create triangles. Since we know the angles of these triangles and the depth, we can measure anything in the view field. As an aside, imagine how proud your high school trig teacher would be to know you are using the skills she taught you.

The last project of this chapter converted depth pixel coordinates into color stream coordinates to perform background subtraction on an image. The example code demonstrated a very simple and practical use case. In gaming, it is more common to use an avatar to represent the user. However, many other Kinect experiences incorporate the video camera and depth data, with augmented reality concepts being the most common.

Finally, this chapter covered the near depth mode available on Kinect for Windows hardware. Using a simple set of properties, an application can dynamically change the depth range viewable by Kinect. This concludes coverage of the depth stream. Now let's move on to review the skeleton stream.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset