C H A P T E R  8

Beyond the Basics

In the preface to his book The Order of Things, the philosopher Michel Foucault credits Georges Luis Borges for inspiring his research with a passage about “a certain encyclopaedia” in which it is written that “animals are divided into: (a) belonging to the Emperor, (b) embalmed, (c) tame, (d) suckling pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification, (i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from a long way off look like flies.” This supposed Chinese Encyclopedia cited by Borges was called the Celestial Emporium of Benevolent Knowledge.

After doing our best to break down the various aspects of the Kinect SDK into reasonably classified chunks of benevolent knowledge in the previous seven chapters, the authors of the present volume have finally reached the et cetera chapter where we try to cover a hodge podge of things remaining about the Kinect SDK that has not yet been addressed thematically. In a different sort of book this chapter might have been entitled Sauces and Pickles. Were we more honest, we would simply call it et cetera (or possibly even things that from a long way off look like flies). Following the established tradition of technical books, however, we have chosen to call it Beyond the Basics.

The reader will have noticed that after learning to use the video stream, depth camera, skeleton tracking, microphone array, and speech recognition in the prior chapters, she is still a distance away from being able to produce the sorts of Kinect experiences seen on YouTube. The Kinect SDK provides just about everything the other available Kinect libraries offer and in certain cases much more. In order to take Kinect for PC programming to the next level, however, it is necessary to apply complex mathematics as well as combine Kinect with additional libraries not directly related to Kinect programming. The true potential of Kinect is actualized only when it is combined with other technologies into a sort of mashup.

In this chapter, you explore some additional software libraries available to help you work with and manipulate the data provided by the Kinect sensor. A bit like duct taping pipes together, you create mashups of different technologies to see what you can really do with Kinect. On the other hand, when you leave the safety of an SDK designed for one purpose, code can start to get messy. The purpose of this chapter is not to provide you with ready-made code for your projects but rather simply to provide a taste of what might be possible and offer some guidance on how to achieve it. The greatest difficulty with programming for Kinect is generally not any sort of technical limitation, but rather a lack of knowledge about what is available to be worked with. Once you start to understand what is available, the possible applications of Kinect technology may even seem overwhelming.

In an effort to provide benevolent knowledge about these various additional libraries, I run the risk of covering certain libraries that will not last and of failing to duly cover other libraries that will turn out to be much more important for Kinect hackers than they currently are. The world of Kinect development is progressing so rapidly that this danger can hardly be avoided. However, by discussing helper libraries, image processing libraries, et cetera, I hope at least to indicate what sorts of software are valuable and interesting to the Kinect developer. Over the next year, should alternative libraries turn out to be better than the ones I write about here, it is my hope that the current discussion, while not attending to them directly, will at least point the way to those better third-party libraries.

In this chapter I will discuss several tools you might find helpful including the Coding4Fun Kinect Toolkit, Emgu (the C# wrapper for a computer vision library called OpenCV), and Blender. I will very briefly touch on a 3D gaming framework called Unity, the gesture middleware FAAST, and the Microsoft Robotics Developer Studio. Each of these rich tools deserves more than a mere mention but each, unfortunately, is outside the scope of this chapter.

The structure of Beyond the Basics is pragmatic. Together we will walk through how to build helper libraries and proximity and motion detectors. Then we'll move into face detection applications. Finally we'll build some simulated holograms. Along the way, you will pick up skills and knowledge about tools for image manipulation that will serve as the building blocks for building even more sophisticated applications on your own.

Image Manipulation Helper Methods

There are many different kinds of images and many libraries available to work with them. In the .NET Framework alone, both a System.Windows.Media.Drawing abstract class as well as a System.Drawing namespace is provided belonging to PresentationCore.dll and System.Drawing.dll respectively. To complicate things a little more, both System.Windows and System.Drawing namespaces contain classes related to shapes and colors that are independent of one another. Sometimes methods in one library allow for image manipulations not available in the other. To take advantage of them, it may be necessary to convert images of one type to images of another and then back again.

When we throw Kinect into the mix, things get exponentially more complex. Kinect has its own image types like the ImageFrame. In order to make types like ImageFrame work with WPF, the ImageFrame must be converted into an ImageSource type, which is part of the System.Windows.Media.Imaging namespace. Third-party image manipulation libraries like Emgu do not know anything about the System.Windows.Media namespace, but do have knowledge of the System.Drawing namespace. In order to work with Kinect and Emgu, then, it is necessary to covert Microsoft.Reseach.Kinect types to System.Drawing types, convert System.Drawing types to Emgu types, convert Emgu types back to System.Drawing types after some manipulations, and then finally convert these back to System.Windows.Media types so WPF can consume them.

The Coding4Fun Kinect Toolkit

Clint Rutkas, Dan Fernandez, and Brian Peek have put together a library called the Coding4Fun Kinect Toolkit that provides some of the conversions necessary for translating class types from one library to another. The toolkit can be downloaded at http://c4fkinect.codeplex.com. It contains three separate dlls. The Coding4Fun.Kinect.Wpf library, among other things, provides a set of extension methods for working between Microsoft.Kinect types and System.Windows.Media types. The Coding4Fun.Kinect.WinForm library provides extension methods for transforming Microsoft.Kinect types into System.Drawing types. System.Drawing is the underlying .NET graphics library primarily for WinForms development just as System.Windows.Media contains types for WPF.

The unfortunate thing about the Coding4Fun Kinect Toolkit is that it does not provide ways to convert types between those in the System.Drawing namespace and those in System.Windows.Media namespace. This is because the goal of the Toolkit in the first iteration appears to be to provide ways to simplify writing Kinect demo code for distribution rather than to provide a general-purpose library for working with image types. Consequently, some methods one might need for WPF programming are contained in a dll called WinForms. Moreover, useful methods for working with the very complex depth image data from the depth stream are locked away inside a method that simply transforms a Kinect image type to a WPF ImageSource object.

There are two great things about the Coding4Fun Kinect Toolkit that make negligible any quibbling criticisms I might have concerning it. First, the source code is browsable, allowing us to study the techniques used by the Coding4Fun team to work with the byte arrays that underlie the image manipulations they perform. While you have seen a lot of similar code in the previous chapters, it is extremely helpful to see the small tweaks used to compose these techniques into simple one-call methods. Second, the Coding4Fun team brilliantly decided to structure these methods as extension methods.

In case you are unfamiliar with extension methods, they are simply syntactic sugar that allows a stand-alone method to look like it has been attached to a preexisting class type. For instance, you might have a C# method called AddOne that adds one to any integer. This method can be turned into an extension method hanging off the Integer type simply by making it a static method, placing it in a top-level static class, and adding the key word this to the first parameter of the method, as shown in Listing 8-1. Once this is done, instead of calling AddOne(3) to get the value four, we can instead call 3.AddOne().

Listing 8-1. Turning Normal Methods Into Extension Methods

public int AddOne(int i)
{
    return i + 1;
}


// becomes the extension method

public static class myExtensions
{
    public static int AddOne(this int i)
    {
        return i + 1;
    }
}

To use an extension method library, all you have to do is include the namespace associated with the methods in the namespace declaration of your own code. The name of the static class that contains the extensions (MyExtensions in the case above) is actually ignored. When extension methods are used to transform image types from one library to image types from another, they simplify work with images by letting us perform operations like:

var bitmapSource = imageFrame.ToBitmapSource();
image1.Source = bitmapSource;

Table 8-1 outlines the extension methods provided by version 1.0 of the Coding4Fun Kinect Toolkit. You should use them as a starting point for developing applications with the Kinect SDK. As you build up experience, however, you should consider building your own library of helper methods. In part, this will aid you as you discover that you need helpers the Coding4Fun libraries do not provide. More important, because the Coding4Fun methods hide some of the complexity involved in working with depth image data, you may find that they do not always do what you expect them to do. While hiding complexity is admittedly one of the main purposes of helper methods, you will likely feel confused to find that, when working with the depth stream and the Coding4Fun Toolkit, e.ImageFrame.ToBitmapSource() returns something substantially different from e.ImageFrame.Image.Bits.ToBitmapSource(e.ImageFrame.Image.Width, e.ImageFrame.Image.Height). Building your own extension methods for working with images will help simplify developing with Kinect while also allowing you to remain aware of what you are actually doing with the data streams coming from the Kinect sensor.

images

Your Own Extension Methods

We can build our own extension methods. In this chapter I walk you through the process of building a set of extension methods that will be used for the image manipulation projects. The chief purpose of these methods is to allow us to convert images freely between types from the System.Drawing namespace, which are more commonly used, and types in the System.Windows.Media namespace, which tend to be specific to WPF programming. This in turn provides a bridge between third-party libraries (and even found code on the Internet) and the WPF platform. These implementations are simply standard implementations for working with Bitmap and BitmapSource objects. Some of them are also found in the Coding4Fun Kinect Toolkit. If you do not feel inclined to walk through this code, you can simply copy the implementation from the sample code associated with this chapter and skip ahead.

Instead of creating a separate library for our extension methods, we will simply create a class that can be copied from project to project. The advantage of this is that all the methods are well exposed and can be inspected if code you expect to work one way ends up working in an entirely different way (a common occurrence with image processing).

Create a WPF Project

Now we are ready to create a new sample WPF project in which we can construct and test the extension methods class. We will build a MainWindow.xaml page similar to the one in Listing 8-2 with two images, one called rgbImage and one called depthImage.

Listing 8-2. Extension Methods Sample xaml Page

<Window x:Class="ImageLibrarySamples.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="Image Library Samples" >
    <Grid>
        <Grid.ColumnDefinitions>
            <ColumnDefinition/>
            <ColumnDefinition/>
        </Grid.ColumnDefinitions>
        <Image Name="rgbImage" Stretch="Uniform" Grid.Column="0"/>
        <Image Name="depthImage" Stretch="Uniform" Grid.Column="1"/>
    </Grid>
</Window>

This process should feel second nature to you by now. For the code-behind, add a reference to Microsoft.Kinect.dll. Declare a Microsoft.Kinect.KinectSensor member and instantiate it in the MainWindow constructor, as shown in Listing 8-3 (and as you have already done a dozen times if you have been working through the projects in this book). Initialize the KinectSensor object, handle the VideoFrameReady and DepthFrameReady events, and then open the video and depth streams, the latter without player data.

Listing 8-3. Extension Methods Sample MainWindow Code-Behind

Microsoft.Kinect.KinectSensor _kinectSensor;

public MainWindow()
{
    InitializeComponent();

    this.Unloaded += delegate
    {
        _kinectSensor.ColorStream.Disable();
        _kinectSensor.DepthStream.Disable();
    };

    this.Loaded += delegate
        {
            _kinectSensor = KinectSensor.KinectSensors[0];
            _kinectSensor.ColorStream.Enable(ColorImageFormat.RgbResolution640x480Fps30);
            _kinectSensor.DepthStream.Enable(DepthImageFormat.Resolution320x240Fps30);
            _kinectSensor.ColorFrameReady += ColorFrameReady;
            _kinectSensor.DepthFrameReady += DepthFrameReady;

            _kinectSensor.Start();
        };
}



void DepthFrameReady(object sender, DepthImageFrameReadyEventArgs e)
{
}

void ColorFrameReady(object sender, ColorImageFrameReadyEventArgs e)
{
}
Create a Class and Some Extension Methods

Add a new class to the project called ImageExtensions.cs to contain the extension methods. Remember that while the actual name of the class is unimportant, the namespace does get used. In Listing 8-4, I use the namespace ImageManipulationExtensionMethods. Also, you will need to add a reference to System.Drawing.dll. As mentioned previously, both the System.Drawing namespace and the System.Windows.Media namespace share similarly named objects. In order to prevent namespace collisions, for instance with the PixelFormat classes, we must select one of them to be primary in our namespace declarations. In the code below, I use System.Drawing as the default namespace and create an alias for the System.Windows.Media namespace abbreviated to Media. Finally, create extension methods for the two most important image transformations: for turning a byte array into a Bitmap object and for turning a byte array into a BitmapSource object. These two extensions will be used on the bytes of a color image. Create two more extension methods for transforming depth images by replacing the byte arrays in these method signatures with short arrays since depth images come across as arrays of the short type rather than bytes.

Listing 8-4. Image Manipulation Extension Methods

using System;
using System.Drawing;
using Microsoft.Kinect;
using System.Drawing.Imaging;
using System.Runtime.InteropServices;
using System.Windows;
using System.IO;
using Media = System.Windows.Media;

namespace ImageManipulationExtensionMethods
{
    public static class ImageExtensions
    {
        public static Bitmap ToBitmap(this byte[] data, int width, int height
            , PixelFormat format)
        {
            var bitmap = new Bitmap(width, height, format);

            var bitmapData = bitmap.LockBits(
                new System.Drawing.Rectangle(0, 0, bitmap.Width, bitmap.Height),
                ImageLockMode.WriteOnly,
                bitmap.PixelFormat);
            Marshal.Copy(data, 0, bitmapData.Scan0, data.Length);
            bitmap.UnlockBits(bitmapData);
            return bitmap;
        }

        public static Bitmap ToBitmap(this short[] data, int width, int height
            , PixelFormat format)
        {
            var bitmap = new Bitmap(width, height, format);

            var bitmapData = bitmap.LockBits(
                new System.Drawing.Rectangle(0, 0, bitmap.Width, bitmap.Height),
                ImageLockMode.WriteOnly,
                bitmap.PixelFormat);
            Marshal.Copy(data, 0, bitmapData.Scan0, data.Length);
            bitmap.UnlockBits(bitmapData);
            return bitmap;
        }

        public static Media.Imaging.BitmapSource ToBitmapSource(this byte[] data
            , Media.PixelFormat format, int width, int height)
        {
            return Media.Imaging.BitmapSource.Create(width, height, 96, 96
                , format, null, data, width * format.BitsPerPixel / 8);
        }

        public static Media.Imaging.BitmapSource ToBitmapSource(this short[] data
        , Media.PixelFormat format, int width, int height)
        {
            return Media.Imaging.BitmapSource.Create(width, height, 96, 96
                , format, null, data, width * format.BitsPerPixel / 8);
        }
    }
}

The implementations above are somewhat arcane and not necessarily worth going into here. What is important is that, based on these two methods, we can get creative and write additional helper extension methods that decrease the number of parameters that need to be passed.

Create Additional Extension Methods

Since the byte arrays for both the color and depth image streams are accessible from the ColorImageFrame and DepthImageFrame types, we can also create additional extension methods (as shown in Listing 8-5), which hang off of these types rather than off of byte arrays.

In taking bit array data and transforming it into either a Bitmap or a BitmapSource type, the most important factor to take into consideration is the pixel format. The video stream returns a series of 32-bit RGB images. The depth stream returns a series of 16-bit RGB images. In the code below, I use 32-bit images without transparencies as the default. In other words, video stream images can always simply call ToBitmap or ToBitmapSource. Other formats are provided for by having method names that hint at the pixel format being used.

Listing 8-5. Additional Image Manipulation Helper Methods

// bitmap methods

public static Bitmap ToBitmap(this ColorImageFrame image, PixelFormat format)
{
    if (image == null || image.PixelDataLength == 0)
        return null;
    var data = new byte[image.PixelDataLength];
    image.CopyPixelDataTo(data);
    return data.ToBitmap(image.Width, image.Height
        , format);
}

public static Bitmap ToBitmap(this DepthImageFrame image, PixelFormat format)
{
    if (image == null || image.PixelDataLength == 0)
        return null;
    var data = new short[image.PixelDataLength];
    image.CopyPixelDataTo(data);
    return data.ToBitmap(image.Width, image.Height
        , format);
}

public static Bitmap ToBitmap(this ColorImageFrame image)
{
    return image.ToBitmap(PixelFormat.Format32bppRgb);
}

public static Bitmap ToBitmap(this DepthImageFrame image)
{
    return image.ToBitmap(PixelFormat.Format16bppRgb565);
}

// bitmapsource methods

public static Media.Imaging.BitmapSource ToBitmapSource(this ColorImageFrame image)
{
    if (image == null || image.PixelDataLength == 0)
        return null;
    var data = new byte[image.PixelDataLength];
    image.CopyPixelDataTo(data);
    return data.ToBitmapSource(Media.PixelFormats.Bgr32, image.Width, image.Height);
}

public static Media.Imaging.BitmapSource ToBitmapSource(this DepthImageFrame image)
{
    if (image == null || image.PixelDataLength == 0)
        return null;
    var data = new short[image.PixelDataLength];
    image.CopyPixelDataTo(data);
    return data.ToBitmapSource(Media.PixelFormats.Bgr555, image.Width, image.Height);
}

public static Media.Imaging.BitmapSource ToTransparentBitmapSource(this byte[] data
    , int width, int height)
{
    return data.ToBitmapSource(Media.PixelFormats.Bgra32, width, height);
}

You will notice that three different pixel formats show up in the Listing 8-5 extension methods. To complicate things just a little, two different enumeration types from two different libraries are used to specify the pixel format, though this is fairly easy to figure out. The Bgr32 format is simply a 32-bit color image with three color channels. Bgra32 is also 32-bit, but uses a fourth channel, called the alpha-channel, for transparencies. Finally, Bgr555 is a format for 16-bit images. Recall from the previous chapters on depth processing that each pixel in the depth image is represented by two bytes. The digits 555 indicate that the blue, green, and red channels use up five bits each. For depth processing, you could equally well use the Bgr565 pixel format, which uses 6 bits for the green channel. If you like, you can add additional extension methods. For instance, I have chosen to have ToTransparentBitmapSource hang off of a bit array only and not off of a color byte array. Perhaps one off of the ColorImageFrame would be useful, though. You might also decide in your own implementations that using 32-bit images as an implicit default is simply confusing and that every conversion helper should specify the format being converted. The point of programming conventions, after all, is that they should make sense to you and to those with whom you are sharing your code.

Invoke the Extension Methods

In order to use these extension methods in the MainWindow code-behind, all you are required to do is to add the ImageManipulationExtensionMethods namespace to your MainWindow namespace declarations. You now have all the code necessary to concisely transform the video and depth streams into types that can be attached to the image objects in the MainWindow.xaml UI, as demonstrated in Listing 8-6.

Listing 8-6. Using Image Manipulation Extension Methods

void DepthFrameReady(object sender, DepthImageFrameReadyEventArgs e)
{
    this.depthImage.Source = e.OpenDepthImageFrame().ToBitmap().ToBitmapSource();
}

void ColorFrameReady(object sender, ColorImageFrameReadyEventArgs e)
{
    this.rgbImage.Source = e.OpenColorImageFrame().ToBitmapSource();
}
Write Conversion Methods

There is a final set of conversions that I said we would eventually want. It is useful to be able to convert System.Windows.Media.Imaging.BitmapSource objects into System.Drawing.Bitmap objects and vice versa. Listing 8-7 illustrates how to write these conversion extension methods. Once these methods are added to your arsenal of useful helpers, you can test them out by, for instance, setting the depthImage.Source to e.Image.Frame.Image.ToBitmapSource().ToBitmap().ToBitmapSource(). Surprisingly, this code works.

Listing 8-7. Converting Between BitmapSource and Bitmap Types

[DllImport("gdi32")]
private static extern int DeleteObject(IntPtr o);

public static Media.Imaging.BitmapSource ToBitmapSource(this Bitmap bitmap)
{
    if (bitmap == null) return null;
    IntPtr ptr = bitmap.GetHbitmap();
    var source = System.Windows.Interop.Imaging.CreateBitmapSourceFromHBitmap(
    ptr,
    IntPtr.Zero,
    Int32Rect.Empty,
    Media.Imaging.BitmapSizeOptions.FromEmptyOptions());
    DeleteObject(ptr);
    return source;
}

public static Bitmap ToBitmap(this Media.Imaging.BitmapSource source)
{
    Bitmap bitmap;
    using (MemoryStream outStream = new MemoryStream())
    {
        var enc = new Media.Imaging.PngBitmapEncoder();
        enc.Frames.Add(Media.Imaging.BitmapFrame.Create(source));
        enc.Save(outStream);
        bitmap = new Bitmap(outStream);
    }
    return bitmap;
}

The DeleteObject method in Listing 8-7 is something called a PInvoke call, which allows us to use a method built into the operating system for memory management. We use it in the ToBitmapSource method to ensure that we are not creating an unfortunate memory leak.

Proximity Detection

Thanks to the success of Kinect on the Xbox, it is tempting to think of Kinect applications as complete experiences. Kinect can also be used, however, to simply augment standard applications that use the mouse, keyboard, or touch as primary input modes. For instance, one could use the Kinect microphone array without any of its visual capabilities as an alternative speech input device for productivity or communication applications where Kinect is only one of several options for receiving microphone input. Alternatively, one could use Kinect's visual analysis merely to recognize that something happened visually rather than try to do anything with the visual, depth, or skeleton data.

In this section, we will explore using the Kinect device as a proximity sensor. For this purpose, all that we are looking for is whether something has occurred or not. Is a person standing in front of Kinect? Is something that is not a person moving in front of Kinect? When the trigger we specify reaches a certain threshold, we then start another process. A trigger like this could be used to turn on the lights in a room when someone walks into it. For commercial advertising applications, a kiosk can go into an attract mode when no one is in range, but then begin more sophisticated interaction when a person comes close. Instead of merely writing interactive applications, it is possible to write applications that are aware of their surroundings.

Kinect can even be turned into a security camera that saves resources by recording video only when something significant happens in front of it. At night I leave food out for our outdoor cat that lives on the back porch. Recently I have begun to suspect that other critters are stealing my cat's food. By using Kinect as a combination motion detector and video camera that I leave out overnight, I can find out what is really happening. If you enjoy nature shows, you know a similar setup could realistically be used over a longer period of time to capture the appearance of rare animals. Through conserving hard drive space by turning the video camera on only when animals are near, the setup can be left out for weeks at a time provided there is a way to power it. If, like me, you sometimes prefer more fanciful entertainment than what is provided on nature shows, you could even scare yourself by setting up Kinect to record video and sound in a haunted house whenever the wind blows a curtain aside. Thinking of Kinect as an augmentation to, rather than as the main input for, an application, many new possibilities for using Kinect open up.

Simple Proximity Detection

As a proof of concept, we will build a proximity detector that turns the video feed on and off depending on whether someone is standing in front of Kinect. Naturally, this could be converted to perform a variety of other tasks when someone is in Kinect's visual range. The easiest way to build a proximity detector is to use the skeleton detection built into the Kinect SDK.

Begin by creating a new WPF project called ProximityDetector. Add a reference to Microsoft.Kinect.dll as well as a reference to System.Drawing. Copy the ImageExtensions.cs class file we created in the previous section into this project and add the ImageManipulationExtensionMethods namespace declaration to the top of the MainWindow.cs code-behind. As shown in Listing 8-8, the XAML for this application is very simple. We just need an image called rgbImage that we can populate with data from the Kinect video stream.

Listing 8-8. Proximity Detector UI

<Grid >
    <Image   Name="rgbImage" Stretch="Fill"/>
</Grid>

Listing 8-9 shows some of the initialization code. For the most part, this is standard code for feeding the video stream to the image control. In the MainWindow constructor we initialize the Nui.Runtime object, turning on both the video stream and the skeleton tracker. We create an event handler for the video stream and open the video stream. You have seen similar code many times before. What you may not have seen before, however, is the inclusion of a Boolean flag called _isTracking that is used to indicate whether our proximity detection algorithm has discovered anyone in the vicinity. If it has, the video image is updated from the video stream. If not, we bypass the video stream and assign null to the Source property of our image control.

Listing 8-9. Baseline Proximity Detection Code

Microsoft.KinectSensor _kinectSensor;
bool _isTracking = false;

// . . .

public MainWindow()
{
    InitializeComponent();

    this.Unloaded += delegate{
        _kinectSensor.ColorStream.Disable();
        _kinectSensor.SkeletonStream.Disable();
    };

    this.Loaded += delegate
    {
        _kinectSensor = Microsoft.Kinect.KinectSensor.KinectSensors[0];
        _kinectSensor.ColorFrameReady += ColorFrameReady;
        _kinectSensor.ColorStream.Enable();
    // . . .

        _kinectSensor.Start();
    };

    // . . .
}

void ColorFrameReady(object sender, ColorImageFrameReadyEventArgs e)
{
    if (_isTracking)
    {
 using (var frame = e.OpenColorImageFrame())
 {if (frame != null)
        rgbImage.Source = frame.ToBitmapSource();};
    }
    else
        rgbImage.Source = null;
}

private void OnDetection()
{
    if (!_isTracking)
        _isTracking = true;
}

private void OnDetectionStopped()
{
    _isTracking = false;
}

In order to toggle the _isTracking flag on, we will handle the KinectSensor.SkeletonFrameReady event. The SkeletonFrameReady event is basically something like a heartbeat. As long as there are objects in front of the camera, the SkeletonFrameReady event will keep getting invoked. In our own code, all we need to do to take advantage of this heartbeat effect is to check the skeleton data array passed to the SkeletonFrameReady event handler and verify that at least one of the items in the array is recognized and being tracked as a real person. The code for this is shown in Listing 8-10.

The tricky part of this heartbeat metaphor is that, like a heartbeat, sometimes the event does not get thrown. Consequently, while we always have a built-in mechanism to notify us when a body has been detected in front of the camera, we do not have one to tell us when it is no longer detected. In order to work around this, we start a timer whenever a person has been detected. All the timer does is check to see how long it has been since the last heartbeat was fired. If the time gap is greater than a certain threshold, we know that there has not been a heartbeat for a while and that we should end the current proximity session since, figuratively speaking, Elvis has left the building.

Listing 8-10. Completed Proximity Detection Code

// . . .

int _threshold = 100;
DateTime _lastSkeletonTrackTime;
DispatcherTimer _timer = new DispatcherTimer();



        public MainWindow()
        {
            InitializeComponent();

            // . . .

            this.Loaded += delegate
            {
                _kinectSensor = Microsoft.Kinect.KinectSensor.KinectSensors[0];

            // . . .


                _kinectSensor.SkeletonFrameReady += Pulse;
                _kinectSensor.SkeletonStream.Enable();
                _timer.Interval = new TimeSpan(0, 0, 1);
                _timer.Tick += new EventHandler(_timer_Tick);

                _kinectSensor.Start();
            };
        }

        void _timer_Tick(object sender, EventArgs e)
        {

            if (DateTime.Now.Subtract(_lastSkeletonTrackTime).TotalMilliseconds > _threshold)
            {
                _timer.Stop();
                OnDetectionStopped();
            }
        }

        private void Pulse(object sender, SkeletonFrameReadyEventArgs e)
        {
            using (var skeletonFrame = e.OpenSkeletonFrame())
            {
                if (skeletonFrame == null || skeletonFrame.SkeletonArrayLength == 0)
                    return;

                Skeleton[] skeletons = new Skeleton[skeletonFrame.SkeletonArrayLength];
                skeletonFrame.CopySkeletonDataTo(skeletons);

                for (int s = 0; s < skeletons.Length; s++)
                {
                    if (skeletons[s].TrackingState == SkeletonTrackingState.Tracked)
                    {
                        OnDetection();

                        _lastSkeletonTrackTime = DateTime.Now;

                        if (!_timer.IsEnabled)
                        {
                            _timer.Start();
                        }
                        break;
                    }
                }
            }
        }

Proximity Detection with Depth Data

This code is just the thing for the type of kiosk application we discussed above. Using skeleton tracking as the basis for proximity detection, a kiosk will go into standby mode when there is no one to interact with and simply play some sort of video instead. Unfortunately, skeleton tracking will not work so well for catching food-stealing raccoons on my back porch or for capturing images of Sasquatch in the wilderness. This is because the skeleton tracking algorithms are keyed for humans and a certain set of body types. Outside of this range of human body types, objects in front of the camera will either not be tracked or, worse, tracked inconsistently.

To get around this, we can use the Kinect depth data, rather than skeleton tracking, as the basis for proximity detection. As shown in Listing 8-11, the runtime must first be configured to capture the color and depth streams rather than color and skeletal tracking.

Listing 8-11. Proximity Detection Configuration Using the Depth Stream

_kinectSensor.ColorFrameReady += ColorFrameReady;
_kinectSensor.DepthFrameReady += DepthFrameReady;
_kinectSensor.ColorStream.Enable();
_kinectSensor.DepthStream.Enable();

There are several advantages to using depth data rather than skeleton tracking as the basis of a proximity detection algorithm. First, the heartbeat provided by the depth stream is continuous as long as the Kinect sensor is running. This obviates the necessity of setting up a separate timer to monitor whether something has stopped being detected. Second, we can set up a minimum and maximum threshold within which we are looking for objects. If an object is closer to the depth camera than a minimum threshold or farther away from the camera than a maximum threshold, we toggle the _isTracking flag off. The proximity detection code in Listing 8-12 detects any object between 1000 and 1200 millimeters from the depth camera. It does this by analyzing each pixel of the depth stream image and determining if any pixel falls within the detection range. If it finds a pixel that falls within this range, it stops analyzing the image and sets _isTracking to true. The separate code for handling the VideoFrameReady event picks up on the fact that something has been detected and begins updating the image control with video stream data.

Listing 8-12. Proximity Detection Algorithm Using the Depth Stream

void DepthFrameReady(object sender, DepthImageFrameReadyEventArgs e)
{
    bool isInRange = false;
    using (var imageData = e.OpenDepthImageFrame())
    {
        if (imageData == null || imageData.PixelDataLength == 0)
            return;
        short[] bits = new short[imageData.PixelDataLength];
        imageData.CopyPixelDataTo(bits);
        int minThreshold = 1000;
        int maxThreshold = 1200;

        for (int i = 0; i < bits.Length; i += imageData.BytesPerPixel)
        {
            var depth = bits[i] >> DepthImageFrame.PlayerIndexBitmaskWidth;

            if (depth > minThreshold && depth < maxThreshold)
            {
                isInRange = true;
                OnDetection();
                break;
            }
        }
    }

    if(!isInRange)
        OnDetectionStopped();

}

A final advantage of using depth data rather than skeletal tracking data for proximity detection is that it is much faster. Even though skeletal tracking occurs at a much lower level than our analysis of the depth stream data, it requires that a full human body is in the camera's field of vision. Additional time is required to analyze the entire human body image with the decision trees built into the Kinect SDK and verify that it falls within certain parameters set up for skeletal recognition. With this depth image algorithm, we are simply looking for one pixel within a given range rather than identify the entire human outline. Unlike the skeletal tracking algorithm we used previously, the depth algorithm in Listing 8-12 will trigger the OnDetection method as soon as something is within range even at the very edge of the depth camera's field of vision.

Refining Proximity Detection

There are also shortcomings to using the depth data, of course. The area between the minimum and maximum depth range must be kept clear in order to avoid having _isTracking always set to true. While depth tracking allows us to relax the conditions that set off the proximity detection beyond human beings, it may relax it a bit too much since now even inanimate objects can trigger the proximity detector. Before moving on to implementing a motion detector to solve this problem of having a proximity detector that is either too strict or too loose, I want to introduce a third possibility for the sake of completeness.

Listing 8-13 demonstrates how to implement a proximity detector that combines both player data and depth data. This is a good choice if the skeleton tracking algorithm fits your needs but you would like to constrain it further by only detecting human shapes between a minimum and a maximum distance from the depth camera. This could be useful, again, for a kiosk type application set up in an open area. One set of interactions can be triggered when a person enters the viewable area in front of Kinect. Another set of interactions can be triggered when a person is within a meter and a half of Kinect, and then a third set of interactions can occur when the person is close enough to touch the kiosk itself. To set up this sort of proximity detection, you will want to reconfigure the KinectSensor in the MainWindow constructor by enabling skeleton detection in order to use depth as well as player data rather than Depth data alone. Once this is done, the event handler for the DepthFrameReady can be rewritten to check for depth thresholds as well as the presence of a human shape. All the remaining code can stay the same.

Listing 8-13. Proximity Detection Algorithm Using the Depth Stream and Player Index

void DepthFrameReady(object sender, DepthImageFrameReadyEventArgs e)
{
    bool isInRange = false;
    using (var imageData = e.OpenDepthImageFrame())
    {
        if (imageData == null || imageData.PixelDataLength == 0)
            return;
        short[] bits = new short[imageData.PixelDataLength];
        imageData.CopyPixelDataTo(bits);
        int minThreshold = 1700;
        int maxThreshold = 2000;


        for (int i = 0; i < bits.Length; i += imageData.BytesPerPixel)
        {
            var depth = bits[i] >> DepthImageFrame.PlayerIndexBitmaskWidth;
            var player = bits[i] & DepthImageFrame.PlayerIndexBitmask;

            if (player > 0 && depth > minThreshold && depth < maxThreshold)
            {
                isInRange = true;
                OnDetection();
                break;
            }
        }
    }

    if(!isInRange)
    OnDetectionStopped();
}

Detecting Motion

Motion detection is by far the most interesting way to implement proximity detection. The basic strategy for implementing motion detection is to start with an initial baseline RGB image. As each image is received from the video stream, it can be compared against the baseline image. If differences are detected, we can assume that something has moved in the field of view of the RGB camera.

You have no doubt already found the central flaw in this strategy. In the real world, objects get moved. In a room, someone might move the furniture around slightly. Outdoors, a car might be moved or the wind might shift the angle of a small tree. In each of these cases, since there has been a change even though there is no continuous motion, the system will detect a false positive and will indicate motion where there is none. In these cases, what we would like to be able to do is to change the baseline image intermittently.

Accomplishing something like this requires more advanced image analysis and processing than we have encountered so far. Fortunately an open source project known as OpenCV (Open Computer Vision) provides a library for performing these sorts of complex real-time image processing operations. Intel Research initiated Open CV in 1999 to provide the results of advanced vision research to the world. In 2008, the project was updated by and is currently supported through Willow Garage, a technology incubation company. Around the same time, a project called Emgu CV was started, which provides a .NET wrapper for Open CV. We will be using Emgu CV to implement motion detection and also for several subsequent sample projects.

The official Emgu CV site is at www.emgu.com. The actual code and installation packages are hosted on SourceForge at http://sourceforge.net/projects/emgucv/files/. In the Kinect SDK projects discussed in this book we use the 2.3.0 version of Emgu CV. Actual installation is fairly straightforward. Simply find the executable suitable for your windows operating system and run it. There is one caveat, however. Emgu CV seems to run best using the ×86 architecture. If you are developing on a 64-bit machine, you are best off explicitly setting your platform target for projects using the Emgu library to ×86, as illustrated in Figure 8-1. (You can also pull down the Emgu source code and compile it yourself for ×64, if you wish.) To get to the Platform Target setting, select the properties for your project either by right clicking on your project in the Visual Studio Solutions pane or by selecting Project | Properties on the menu bar at the top of the Visual Studio IDE. Then select the Build tab, which should be the second tab available.

images

Figure 8-1. Setting the platform target

In order to work with the Emgu library, you will generally need to add references to three dlls: Emgu.CV, Emgu.CV.UI, and Emgu.Util. These will typically be found in the Emgu install folder. On my computer, they are found at C:Emguemgucv-windows-x86 2.3.0.1416in.

There is an additional rather confusing, and admittedly rather messy, step. Because Emgu is a wrapper of C++ libraries, you will also need to place several additional unmanaged dlls in a location where the Emgu wrappers expects to find them. Emgu looks for these files in the executable directory. If you are compiling a debug project, this would be the bin/Debug folder. For release compilation, this would be the bin/Release subdirectory of your project.Eleven files need to be copied into your executable directory: opencv_calib3d231.dll, opencv_conrib231.dll, opencv_core231.dll, opencv_features2d231.dll, opencv_ffmpeg.dll, opencv_highgui231.dll, opencv_imgproc231.dll, opencv_legacy231.dll, opencv_ml231.dll, opencv_objectdetect231.dll,and opencv_video231.dll. These can be found in the bin subdirectory of the Emgu installation. For convenience, you can also simply copy over any dll in that folder that begins with “ opencv_*”.

As mentioned earlier, unlocking the full potential of the Kinect SDK by combining it with additional tools can sometimes get messy. By adding the image processing capabilities of OpenCV and Emgu, however, we begin to have some very powerful toys to play with. For instance, we can begin implementing a true motion tracking solution.

We need to add a few more helper extension methods to our toolbox first, though. As mentioned earlier, each library has its own core image type that it understands. In the case of Emgu, this type is the generic Image<TColor, TDepth> type, which implements the Emgu.CV.IImage interface. Listing 8-14 shows some extension methods for converting between the image types we are already familiar with and the Emgu specific image type. Create a new static class for your project called EmguImageExtensions.cs. Give it a namespace of ImageManipulationExtensionMethods. By using the same namespace as our earlier ImageExtensions class, we can make all of the extension methods we have written available to a file with only one namespace declaration. This class will have three conversions: from Microsoft. Kinect.ColorFrameImage to Emgu.CV.Image<TColor, TDepth>, from System.Drawing.Bitmap to Emgu.CV.Image<TColor, TDepth>, and finally from Emgu.CV.Image<TColor, TDepth> to System.Windows.Media.Imaging.BitmapSource.

Listing 8-14. Emgu Extension Methods

namespace ImageManipulationExtensionMethods
{
    public static class EmguImageExtensions
    {
        public static Image<TColor, TDepth> ToOpenCVImage<TColor, TDepth>(
            this ColorImageFrame image)
            where TColor : struct, IColor
            where TDepth : new()
        {
            var bitmap = image.ToBitmap();
            return new Image<TColor, TDepth>(bitmap);
        }

        public static Image<TColor, TDepth> ToOpenCVImage<TColor, TDepth>(
            this Bitmap bitmap)
            where TColor : struct, IColor
            where TDepth : new()
        {
            return new Image<TColor, TDepth>(bitmap);
        }

        public static System.Windows.Media.Imaging.BitmapSource ToBitmapSource(
            this IImage image)
        {
            var source = image.Bitmap.ToBitmapSource();
            return source;
        }
    }
}

In implementing motion detection with the Emgu library, we will use the polling technique introduced in earlier chapters rather than eventing. Because image processing can be resource intensive, we want to throttle how often we perform it, which is really only possible by using polling. It should be pointed out that this is only a proof of concept application. This code has been written chiefly with a goal of readability—in particular printed readability—rather than performance.

Because the video stream is already being used to update the image control, we will use the depth stream to perform motion tracking. The premise is that all the data we need for motion tracking will be adequately provided by the depth stream. As discussed in earlier chapters, the CompositionTarget.Rendering event is generally used to perform polling on the video stream. For the depth stream, however, we will create a BackgroundWorker object for the depth stream. As shown in Listing 8-15, the background worker will call a method called Pulse to poll the depth stream and perform some resource-intensive processing. When the threaded background worker completes an iteration, it will again poll for another depth image and perform another processing operation. Two Emgu objects are declared as members: a MotionHistory object and an IBGFGDetector object. These two objects will be used together to create the constantly updating baseline image we will compare against to detect motion.

Listing 8-15. Motion Detection Configuration

        KinectSensor _kinectSensor;
        private MotionHistory _motionHistory;
        private IBGFGDetector<Bgr> _forgroundDetector;
        bool _isTracking = false;

        public MainWindow()
        {
            InitializeComponent();
            this.Unloaded += delegate
            {
                _kinectSensor.ColorStream.Disable();
            };

            this.Loaded += delegate
            {
                _motionHistory = new MotionHistory(
                    1.0, //in seconds, the duration of motion history you wants to keep
                    0.05, //in seconds, parameter for cvCalcMotionGradient
                    0.5); //in seconds, parameter for cvCalcMotionGradient

                _kinectSensor = KinectSensor.KinectSensors[0];
                _kinectSensor.ColorStream.Enable();
                _kinectSensor.Start();

        BackgroundWorker bw = new BackgroundWorker();
        bw.DoWork += (a, b) => Pulse();
        bw.RunWorkerCompleted += (c, d) => { bw.RunWorkerAsync(); };
        bw.RunWorkerAsync();
    }

Listing 8-16 shows the actual code used to perform image processing in order to detect motion. The code is a modified version of sample code provided with the Emgu install. The first task in the Pulse method is to convert the ColorImageFrame provided by the color stream into an Emgu image type. The _forgroundDetector is then used both to update the _motionHistory object, which is the container for the constantly revised baseline image, as well as to compare against the baseline image to see if any changes have occurred. An image is created to capture any discrepancies between the baseline image and the current image from the color stream. This image is then transformed into a sequence of smaller images that break down any motion detected. We then loop through this sequence of movement images to see if they have surpassed a certain threshold of movement we have established. If the movement is substantial, we finally show the video image. If none of the movements are substantial or if none are captured, we hide the video image.

Listing 8-16. Motion Detection Algorithm

private void Pulse()
{
    using (ColorImageFrame imageFrame = _kinectSensor.ColorStream.OpenNextFrame(200))
    {
        if (imageFrame == null)
            return;

        using (Image<Bgr, byte> image = imageFrame.ToOpenCVImage<Bgr, byte>())
        using (MemStorage storage = new MemStorage()) //create storage for motion components
        {
            if (_forgroundDetector == null)
            {
                _forgroundDetector = new BGStatModel<Bgr>(image
                    , Emgu.CV.CvEnum.BG_STAT_TYPE.GAUSSIAN_BG_MODEL);
            }

            _forgroundDetector.Update(image);

            //update the motion history
            _motionHistory.Update(_forgroundDetector.ForgroundMask);

            //get a copy of the motion mask and enhance its color
            double[] minValues, maxValues;
            System.Drawing.Point[] minLoc, maxLoc;
            _motionHistory.Mask.MinMax(out minValues, out maxValues
                , out minLoc, out maxLoc);
            Image<Gray, Byte> motionMask = _motionHistory.Mask
                .Mul(255.0 / maxValues[0]);

            //create the motion image
            Image<Bgr, Byte> motionImage = new Image<Bgr, byte>(motionMask.Size);
            motionImage[0] = motionMask;

            //Threshold to define a motion area
            //reduce the value to detect smaller motion
            double minArea = 100;

            storage.Clear(); //clear the storage
            Seq<MCvConnectedComp> motionComponents =
_motionHistory.GetMotionComponents(storage);
            bool isMotionDetected = false;
            //iterate through each of the motion component
            for (int c = 0; c < motionComponents.Count(); c++)
            {
                MCvConnectedComp comp = motionComponents[c];
                //reject the components that have small area;
                if (comp.area < minArea) continue;

                OnDetection();
                isMotionDetected = true;
                break;
            }
            if (isMotionDetected == false)
            {
                OnDetectionStopped();
                this.Dispatcher.Invoke(new Action(() => rgbImage.Source = null));
                return;
            }

            this.Dispatcher.Invoke(
                new Action(() => rgbImage.Source = imageFrame.ToBitmapSource())
                );
        }
    }
}

Saving the Video

It would be nice to be able to complete this project by actually recording a video to the hard drive instead of simply displaying the video feed. Video recording, however, is notoriously tricky and, while you will find many Kinect samples on the Internet showing you how to save a still image to disk, very few demonstrate how to save a complete video to disk. Fortunately, Emgu provides a VideoWriter type that allows us to do just that.

Listing 8-17 illustrates how to implement a Record and a StopRecording method in order to write images streamed from the Kinect RGB camera to an AVI file. For this code I have created a folder called vids on my D drive. To be written to, this directory must exist. When recording starts, we create a file name based on the time at which the recording begins. We also begin aggregating the images from the video stream into a generic list of images. When stop recording is called, this list of Emgu images is passed to the VideoWriter object in order to write to disk. This particular code does not use an encoder and consequently creates very large AVI files. You can opt to encode the AVI file to compress the video written to disk, though the tradeoff is this process is much more processor intensive.

Listing 8-17. Recording Video

        bool _isRecording = false;
        string _baseDirectory = @"d:vids";
        string _fileName;
        List<Image<Rgb,Byte>> _videoArray = new List<Image<Rgb,Byte>>();

        void Record(ColorImageFrame image)
        {
            if (!_isRecording)
            {
                _fileName = string.Format("{0}{1}{2}", _baseDirectory
         , DateTime.Now.ToString("MMddyyyyHmmss"), ".avi");
                _isRecording = true;
            }
            _videoArray.Add(image.ToOpenCVImage<Rgb,Byte>());
        }

        void StopRecording()
        {
            if (!_isRecording)
                return;

            using (VideoWriter vw = new VideoWriter(_fileName, 0, 30, 640, 480, true))
            {
                for (int i = 0; i < _videoArray.Count(); i++)
                    vw.WriteFrame<Rgb, Byte>(_videoArray[i]);
            }
            _fileName = string.Empty;
            _videoArray.Clear();
            _isRecording = false;

        }

The final piece of this motion detection video camera is simply to modify the RGB polling code to not only stream images to the image controller in our UI but also to call the Record method when motion is detected and to call the StopRecording method when no motion is detected, as shown in Listing 8-18. This will provide you with a fully working sophisticated prototype that analyzes raw stream data to detect any changes in the viewable area in front of Kinect and also does something useful with that information.

Listing 8-18. Calling the Record and StopRecording Methods

if (isMotionDetected == false)
{
    OnDetectionStopped();
    this.Dispatcher.Invoke(new Action(() => rgbImage.Source = null));
    StopRecording();
    return;
}

this.Dispatcher.Invoke(
    new Action(() => rgbImage.Source = imageFrame.ToBitmapSource())
    );
Record(imageFrame);

Identifying Faces

The Emgu CV library can also be used to detect faces. While actual facial recognition—identifying a person based on an image of him—is too complex to be considered here, processing an image in order to find portions of it that contain faces is an integral first step in achieving full facial recognition capability.

Most facial detection software is built around something called Haar-like features, which is an application of Haar wavelets, a sequence of mathematically defined square shapes. Paul Viola and Michael Jones developed the Viola-Jones object detection framework in 2001 based on identifying Haar-like features, a less computationally expensive method than other available techniques for performing facial detection. Their work was incorporated into OpenCV.

Facial detection in OpenCV and Emgu CV is built around a set of rules enshrined in an XML file written by Rainer Lienhart. The file is called haarcascade_frontalface_default.xml and can be retrieved from the Emgu samples. It is also included in the sample code associated with this chapter and is covered under to OpenCV BSD license. There is also a set of rules available for eye recognition, which we will not use in the current project.

To construct a simple face detection program to use with the Kinect SDK, create a new WPF project called FaceFinder. Add references to the following dlls: Microsoft.Kinect, System.Drawing, Emgu.CV, Emgu.CV.UI, and Emgu.Util. Add the opencv_* dlls to your build folder. Finally, add the two extension library files we created earlier in this chapter to the project: ImageExtensions.cs and EmguImageExtensions.cs. The XAML for this project is as simple as in previous examples. Just add an image control to root Grid in MainWindow and name it rgbImage.

Instantiate a KinectSensor object in the MainWindow constructor and configure it to use only the video stream. Since the Emgu CV library is intended for image processing, we typically use it with RGB images rather than depth images. Listing 8-19 shows what this setup code should look like. We will use a BackgroundWorker object to poll the video stream. Each time the background worker has completed an iteration, it will poll the video stream again.

Listing 8-19. Face Detection Setup

KinectSensor _kinectSensor;

public MainWindow()
{
    InitializeComponent();

    this.Unloaded += delegate
    {
        _kinectSensor.ColorStream.Disable();
    };

    this.Loaded += delegate
    {
        _kinectSensor = KinectSensor.KinectSensors[0];
        _kinectSensor.ColorStream.Enable();
        _kinectSensor.Start();

        BackgroundWorker bw = new BackgroundWorker();
        bw.RunWorkerCompleted += (a, b) => bw.RunWorkerAsync();
        bw.DoWork += delegate { Pulse(); };
        bw.RunWorkerAsync();
    };
}

The Pulse method, which handles the background worker’s DoWork event, is the main workhorse here. The code shown in Listing 8-20 is modified from samples provided with the Emgu install. We instantiate a new HaarCascade instance based on the provided face detection rules file. Next, we retrieve an image from the video stream and convert it into an Emgu image type. This image is grayscaled and a higher contrast is applied to it to make facial detection easier. The Haar detection rules are applied to the image in order to generate a series of structures that indicate where in the image faces were found. A blue rectangle is drawn around any detected faces. The composite image is then converted into a BitmapSource type and passed to the image control. Because of the way WPF threading works, we have to use the Dispatcher object here to perform the assignment in the correct thread.

Listing 8-20. Face Detection Algorithm

string faceFileName = "haarcascade_frontalface_default.xml";
        public void Pulse()
        {
            using (HaarCascade face = new HaarCascade(faceFileName))
            {
                var frame = _kinectSensor.ColorStream.OpenNextFrame(100);
                var image = frame.ToOpenCVImage<Rgb, Byte>();
        //Convert it to Grayscale
                using (Image<Gray, Byte> gray = image.Convert<Gray, Byte>())  
                {
                    //normalizes brightness and increases contrast of the image
                    gray._EqualizeHist();

                    MCvAvgComp[] facesDetected = face.Detect(
                       gray,
                       1.1,
                       10,
                       Emgu.CV.CvEnum.HAAR_DETECTION_TYPE.DO_CANNY_PRUNING,
                       new System.Drawing.Size(20, 20));

                    foreach (MCvAvgComp f in facesDetected)
                    {
                        
                        image.Draw(f.rect, new Rgb(System.Drawing.Color.Blue), 2);
                    }

                    Dispatcher.BeginInvoke(new Action(() => {
              rgbImage.Source = image.ToBitmapSource();
              }));
                }
            }

Figure 8-2 shows the results of applying this code. The accuracy of the blue frame around detected faces is much better than what we might get by trying to perform similar logic using skeletal tracking.

images

Figure 8-2. Finding faces

Since the structures contained in the facesDetected clearly provide location information, we can also use the face detection algorithm to build an augmented reality application. The trick is to have an image available and then, instead of drawing a blue rectangle into the video stream image, draw the standby image instead. Listing 8-21 shows the code we would use to replace the blue rectangle code.

Listing 8-21. Augmented Reality Implementation

Image<Rgb, Byte> happyMan = new Image<Rgb, byte>("happy_man.jpg");
                    foreach (MCvAvgComp f in facesDetected)
                    {

                        ///image.Draw(f.rect, new Rgb(System.Drawing.Color.Blue), 2);
                        var rect = new System.Drawing.Rectangle(f.rect.X - f.rect.Width / 2
                            , f.rect.Y - f.rect.Height / 2
                            , f.rect.Width * 2
                            , f.rect.Height * 2);

                        var newImage = happyMan.Resize(rect.Width, rect.Height
                        , Emgu.CV.CvEnum.INTER.CV_INTER_LINEAR);

                        for (int i = 0; i < (rect.Height); i++)
                        {
                            for (int j = 0; j < (rect.Width); j++)
                            {
                                if (newImage[i, j].Blue != 0 && newImage[i, j].Red != 0
                && newImage[i, j].Green != 0)
                                    image[i + rect.Y, j + rect.X] = newImage[i, j];
                            }

                        }
                    }

The resulting effect shown in Figure 8-3 demonstrates the concept of hiding one’s identity on video much as a person currently hides his identity on the Internet with an alias. Because the underlying algorithm is so good, the image scales as faces approach or move away from the camera.

images

Figure 8-3. Anonymous Augmented Reality Effect

With some additional work, this basic code can be adapted to take one person’s face and superimpose it on someone else’s head. You could even use it to pull data from multiple Kinects and merge faces and objects together. All this requires is having the appropriate Haar cascades for the objects you want to superimpose.

Holograms

Another interesting effect associated with Kinect is the pseudo-hologram. A 3D image can be made to tilt and shift based on the various positions of a person standing in front of Kinect. When done right, the effect creates the illusion that the 3D image exists in a 3D space that extends into the display monitor. Because of the 3D vector graphics capabilities of WPF, what I’ve described is actually easy to implement using Kinect and WPF. Figure 8-4 shows a simple 3D cube that can be made to rotate and scale depending on an observer’s position. The illusion only works when there is only a single observer, however.

images

Figure 8-4. 3D cube

This effect actually goes back to a Wii Remote hack that Johnny Chung Lee demonstrated at his 2008 TED talk. This is the same Johnny Lee who worked on the Kinect team for a while and also inspired the AdaFruit contest to hack together a community driver for the Kinect sensor. In Lee’s implementation, an infrared sensor from the Wii remote was placed on a pair of glasses, to track a person wearing the glasses as he moved around the room. The display would then rotate a complex 3D image based on the movements of the pair of glasses to create the hologram effect.

The Kinect SDK implementation for this is relatively simple. Kinect already provides X, Y, and Z coordinates for a player skeleton, represented in meters. The difficult part is creating an interesting 3D vector image in XAML. For this project I use a tool called Blender, which is an open source 3D model creation suite available at www.blender.org . To get 3D meshes to export as XAML, however, it is necessary to find an add-in to Blender that will allow you to do so. The version of Blender I use is 2.6 and while there is an exporter available for it, it is somewhat limited. Dan Lehenbauer also has a XAML exporter for Blender available on CodePlex, but it only works on older versions of Blender. As with most efforts to create interesting mashups with the Kinect SDK, this is once again an instance in which some elbow grease and lots of patience is required.

The central concept of 3D vector graphics in WPF is the Viewport3D object. The Viewport3D can be thought of as a 3D space into which we can deposit objects, light sources, and a camera. To build the 3D effect, create a new WPF project in Visual Studio called Hologram and add a reference to the Microsoft.Kinect dll. In the MainWindow UI, create a new Viewport3D element nested in the root Grid. Listing 8-22 shows what the markup for the fully drawn cube looks like. The markup is also available in the sample projects associated with this chapter. In this project, the only part of this code that interacts with Kinect is the Viewport3D camera. Consequently, it is very important to name the camera.

The camera in Listing 8-22 has a position expressed in X, Y, Z coordinate space. X increases in value from left to right. Y increases from the bottom moving up. Z increases as it leaves the plane of the screen and approaches the observer. The look direction, in this case, simply inverts the position. This tells the camera to look directly back to the 0,0,0 coordinate. UpDirection, finally, indicates the orientation of the camera—in this case, up is the positive Y direction.

Listing 8-22. The Cube

    <Viewport3D>
        <Viewport3D.Camera>
            <PerspectiveCamera x:Name="camera" Position="-40,160,100"
                 LookDirection="40,-160,-100"
                         UpDirection="0,1,0"  />
        </Viewport3D.Camera>
        <ModelVisual3D >
            <ModelVisual3D.Content>
                <Model3DGroup>
                    <DirectionalLight Color="White" Direction="-1,-1,-3" />
                    <GeometryModel3D >
                        <GeometryModel3D.Geometry>
                            <MeshGeometry3D
Positions="1.000000,1.000000,-1.000000 1.000000,-1.000000,-1.000000 -1.000000,-1.000000,
-1.000000 -1.000000,1.000000,-1.000000 1.000000,0.999999,1.000000 -1.000000,1.000000,
1.000000 -1.000000,-1.000000,1.000000 0.999999,-1.000001,1.000000 1.000000,
1.000000,-1.000000 1.000000,0.999999,1.000000 0.999999,-1.000001,1.000000 1.000000,-1.000000,
-1.000000 1.000000,-1.000000,-1.000000 0.999999,-1.000001,1.000000 -1.000000,
-1.000000,1.000000 -1.000000,-1.000000,-1.000000 -1.000000,-1.000000,-1.000000 -1.000000,
-1.000000,1.000000 -1.000000,1.000000,1.000000 -1.000000,1.000000,
-1.000000 1.000000,0.999999,1.000000 1.000000,1.000000,-1.000000 -1.000000,
1.000000,-1.000000 -1.000000,1.000000,1.000000"
TriangleIndices="0,1,3 1,2,3 4,5,7 5,6,7 8,9,11 9,10,11 12,13,15 13,14,15 16,17,
19 17,18,19 20,21,23 21,22,23"

Normals="0.000000,0.000000,-1.000000 0.000000,0.000000,-1.000000 0.000000,0.000000,
-1.000000 0.000000,0.000000,-1.000000 0.000000,-0.000000,1.000000 0.000000,-0.000000,
1.000000 0.000000,-0.000000,1.000000 0.000000,-0.000000,1.000000 1.000000,-0.000000,
0.000000 1.000000,-0.000000,0.000000 1.000000,-0.000000,0.000000 1.000000,-0.000000,
0.000000 -0.000000,-1.000000,-0.000000 -0.000000,-1.000000,-0.000000 -0.000000,
-1.000000,-0.000000 -0.000000,-1.000000,-0.000000 -1.000000,0.000000,-0.000000
-1.000000,0.000000,-0.000000 -1.000000,0.000000,-0.000000 -1.000000,0.000000,
-0.000000 0.000000,1.000000,0.000000 0.000000,1.000000,0.000000 0.000000,1.000000,
0.000000 0.000000,1.000000,0.000000"/>
                        </GeometryModel3D.Geometry>
                        <GeometryModel3D.Material>
                            <DiffuseMaterial Brush="blue"/>
                        </GeometryModel3D.Material>
                    </GeometryModel3D>
                    <Model3DGroup.Transform>
                        <Transform3DGroup>
                            <Transform3DGroup.Children>
                                <TranslateTransform3D OffsetX="0" OffsetY="0"
OffsetZ="0.0935395359992981"/>
                                <ScaleTransform3D ScaleX="12.5608325004577637"
ScaleY="12.5608322620391846" ScaleZ="12.5608325004577637"/>
                            </Transform3DGroup.Children>
                        </Transform3DGroup>
                    </Model3DGroup.Transform>
                </Model3DGroup>
            </ModelVisual3D.Content>
        </ModelVisual3D>
    </Viewport3D>

The cube itself is drawn using a series of eight positions, each represented by three coordinates. Triangle indices are then drawn over these points to provide a surface to the cube. To this we add a Material object and paint it blue. We also add a scale transform to the cube to make it bigger. Finally, we add a directional light to improve the 3D effect we are trying to create.

In the code-behind for MainWindow, we only need to configure the KinectSensor to support skeletal tracking, as shown in Listing 8-23. Video and depth data are uninteresting to us for this project.

Listing 8-23. Hologram Configuration

KinectSensor _kinectSensor;

public MainWindow()
{
    InitializeComponent();
    this.Unloaded += delegate
    {
        _kinectSensor.DepthStream.Disable();
        _kinectSensor.SkeletonStream.Disable();
    };

    this.Loaded += delegate
    {
        _kinectSensor = KinectSensor.KinectSensors[0];
        _kinectSensor.SkeletonFrameReady += SkeletonFrameReady;
        _kinectSensor.DepthFrameReady += DepthFrameReady;
        _kinectSensor.SkeletonStream.Enable();
        _kinectSensor.DepthStream.Enable();
        _kinectSensor.Start();
    };
}

To create the holographic effect, we will be moving the camera around our cube rather than attempting to rotate the cube itself. We must first determine if a person is actually being tracked by Kinect. If someone is, we simply ignore any additional players Kinect may have picked up. We select the skeleton we have found and extract its X, Y, and Z coordinates. Even though the Kinect position data is based on meters, our 3D cube is not, so it is necessary to massage these positions in order to maintain the 3D illusion. Based on these tweaked position coordinates, we move the camera around to roughly be in the same spatial location as the player Kinect is tracking, as shown in Listing 8-24. We also take these coordinates and invert them so the camera continues to point toward the 0,0,0 origin position.

Listing 8-24. Moving the Camera Based On User Position

void SkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e)
{
    float x=0, y=0, z = 0;
    //get angle of skeleton
    using (var frame = e.OpenSkeletonFrame())
    {
        if (frame == null || frame.SkeletonArrayLength == 0)
            return;

        var skeletons = new Skeleton[frame.SkeletonArrayLength];
        frame.CopySkeletonDataTo(skeletons);
        for (int s = 0; s < skeletons.Length; s++)
        {
            if (skeletons[s].TrackingState == SkeletonTrackingState.Tracked)
            {
                border.BorderBrush = new SolidColorBrush(Colors.Red);
                var skeleton = skeletons[s];
                x = skeleton.Position.X * 60;
                z = skeleton.Position.Z * 120;
                y = skeleton.Position.Y;
                break;
            }
            else
            {
                border.BorderBrush = new SolidColorBrush(Colors.Black);
            }

        }
    }
    if (Math.Abs(x) > 0)
    {
        camera.Position = new System.Windows.Media.Media3D.Point3D(x, y , z);
        camera.LookDirection = new System.Windows.Media.Media3D.Vector3D(-x, -y , -z);
    }
}

As interesting as this effect already is, it turns out that the hologram illusion is even better when more complex 3D objects are introduced. A 3D cube can easily be converted into an oblong shape, as illustrated in Figure 8-5, simply by increasing the scale of a cube in the Z direction. This creates an oblong that sticks out toward the player. We can also multiply the number of oblongs by copying the new oblong’s modelVisual3D element into the Viewport3D multiple times. Use the translate transform to place these oblongs in different locations on the X and Y axes and give each a different color. Since the camera is the only object the code-behind is aware of, transforming and adding new 3D objects to the 3D viewport does not affect the way the Hologram project works at all.

images

Figure 8-5. 3D oblongs

Libraries to Keep an Eye On

Several libraries and tools relevant to Kinect are expected to be expanded to work with the Kinect SDK over the next year. Of these, the most intriguing are FAAST, Unity3D, and Microsoft Robotics Developer Studio.

The Flexible Action and Articulated Skeleton Toolkit (FAAST) can best be thought of as a middle-ware library for bridging the gap between Kinect’s gestural interface and traditional interfaces. Written and maintained by the Institute for Creative Technologies at the University of Southern California, FAAST is a gesture library initially written on top of OpenNI for use with Kinect. What makes the toolkit brilliant is that it facilitates the mapping of these built-in gestures with almost any API and even allows mapping gestures to keyboard keystrokes. This has allowed hackers to use the toolkit to play a variety of video games using the Kinect sensor, including first-person shooters like Call of Duty and online games like Second Life and World of Warcraft. At last report, a version of FAAST is being developed to work with the Kinect SDK rather than OpenNI. You can read more about FAAST at http://projects.ict.usc.edu/mxr/faast.

Unity3D is a tool available in both free and professional versions that makes the traditionally difficult task of developing 3D games relatively easy. Games written in Unity3D can be exported to multiple platforms including the web, Windows, iOS, iPhone, iPad, Android, Xbox, Playstation, and Wii. It also supports third-party add-ins including several created for Kinect, allowing developers to create Windows games that use the Kinect sensor for input. Find out more about Unity3D at http://unity3d.com.

Microsoft Robotics Developer Studio is Microsoft’s platform for building software for robots. Integration with Kinect has been built into recent betas of the product. Besides access to Kinect services, Kinect support also includes specifications for a reference platform for Kinect-enabled robots (that may eventually be transformed into a kit that can be purchased) as well as an obstacle avoidance sample using the Kinect sensor. You can learn more about Microsoft Robotics Developer Studio at http://www.microsoft.com/robotics.

Summary

In this chapter you have learned that the Kinect SDK can be used with a range of libraries and tools to create fascinating and textured mashups. You were introduced to the OpenCV wrapper Emgu CV, which provides access to complex mathematical equations for analyzing and modifying image data. You also began building your own library of helper extension methods to simplify the task of making multiple image manipulation libraries work together effectively. You built several applications exemplifying facial detection, 3D illusions, and augmented reality, demonstrating how easy it actually is to create the rich Kinect experiences you might have seen on the Internet when you are aware of and know how to use the right tools.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset