C H A P T E R  5

Advanced Skeleton Tracking

This chapter marks the beginning of the second half of the book. The first set of chapters focused on the fundamental camera centric features of the Kinect SDK. We explored and experimented with every method and property of every object focused on these features. These are the nuts and bolts of Kinect development. You now have the technical knowledge necessary to write applications using Kinect and the SDK. However, knowing the SDK and understanding how to use it as a tool to build great applications and experiences are substantially different matters. The remaining chapters of the book change tone and course in their coverage of the SDK. Moving forward we discuss how to use the SDK in conjunction with WPF and other third-party tools and libraries to build Kinect-driven experiences. We will use all the information you learned in the previous chapters to progress to more advanced and complex topics.

At its core, Kinect only emits and detects the reflection of infrared light. From the reflection of the light, it calculates depth values for each pixel of view area. The first derivative of the depth data is the ability to detect blobs and shapes. Player index bits of each depth pixel data are a form of a first derivative. The second derivative determines which of these shapes matches the human form, and then calculates the location of each significant axis point on the human body. This is skeleton tracking, which we covered in the previous chapter.

While the infrared image and the depth data are critical and core to Kinect, they are less prominent than skeleton tracking. In fact, they are a means to an end. As the Kinect and other depth cameras become more prevalent in everyday computer use, the raw depth data will receive less direct attention from developers, and become merely trivia or part of passing conversation. We are almost there now. The Microsoft Kinect SDK does not give the developer access to the infrared image stream of Kinect, but other Kinect SDKs make it available. It is likely that most developers will never use the raw depth data, but will only ever work with the skeleton data. However, once pose and gesture recognition become standardized and integrated into the Kinect SDK, developers likely will not even access the skeleton data.

We hope to advance this movement, because it signifies the maturation of Kinect as a technology. This chapter keeps the focus on skeleton tracking, but the approach to the skeleton data is different. We focus on Kinect as an input device with the same classification as a mouse, stylus, or touch, but uniquely different because of its ability to see depth. Microsoft pitched Kinect for Xbox with, “You are the controller,” or more technically, you are the input device. With skeleton data, applications can do the same things a mouse or touch device can. The difference is the depth component allows the user and the application to interact as never before. Let us explore the mechanics through which the Kinect can control and interact with user interfaces.

User Interaction

Computers and the applications that run on them require input. Traditionally, user input comes from a keyboard and mouse. The user interacts directly with these hardware devices, which in turn, transmit data to the computer. The computer takes the data from the input device and creates some type of visual effect. It is common knowledge that every computer with a graphical user interface has a cursor, which is often referred to as the mouse cursor, because the mouse was the original vehicle for the cursor. However, calling it a mouse cursor is no longer as accurate as it once was. Touch or stylus devices also control the same cursor as the mouse. When a user moves the mouse or drags his or her finger across a touch screen, the cursor reacts to these movements. If a user moves the cursor over a button, more often than not the button changes visually to indicate that the cursor is hovering over the button. The button gives another type of visual indicator when the user presses the mouse button while hovering over a button. Still another visual indicator emerges when the user releases the mouse button while remaining over a button. This process may seem trivial to think through step by step, but how much of this process do you really understand? If you had to, could you write the code necessary to track changes in the mouse's position, hover states, and button clicks?

These are user interface or interactions developers often take for granted, because with user interface platforms like WPF, interacting with input devices is extremely easy. When developing web pages, the browser handles user interactions and the developer simply defines the visual treatments like mouse hover states using style sheets. However, Kinect is different. It is an input device that is not integrated into WPF. Therefore you, as the developer, are responsible for doing all of the work that the OS and WPF otherwise would do for you.

At a low level, a mouse, stylus, or touch essentially produces X, and Y coordinates, which the OS translates into the coordinate space of the computer screen. This process is similar to that discussed in the previous chapter (Space Transformations). It is the operating system's responsibly to extract data from the input device and make it available the graphical user interface and to applications. The graphical user interface of the OS displays a mouse cursor and moves the cursor around the screen in reaction to user input. In some instances, this work is not trivial and requires a thorough understanding of the GUI platform, which in our instance is WPF. WPF does not provide native support for the Kinect as it does for the mouse and other input devices. The burden falls on the developer to pull the data from the Kinect SDK and do the work necessary to interact with the Buttons, ListBoxes and other interface controls. Depending on the complexity of your application or user interface, this can be a sizable task and potentially one that is non-trivial and requires intimate knowledge of WPF.

A Brief Understanding of the WPF Input System

When building an application in WPF, developers do not have to concern themselves with the mechanics of user input. It is handled for us allowing us to focus more on reacting to user input. After all, as developers, we are more concerned with doing things with the user's input rather than reinventing the wheel each time just to collect user input. If an application needs a button, the developer adds a Button control to the screen, wires an event handler to the control's Click event and is done. In most circumstances, the developer will style the button to have a unique look and feel and to react visually to different mouse interactions such as hover and mouse down. WPF handles all of the low-level work to determine when the mouse is hovering over the button, or when the button is clicked.

WPF has a robust input system that constantly gathers input from attached devices and distributes that input to the affected controls. This system starts with the API defined in the System.Windows.Input namespace (Presentation.Core.dll). The entities defined within work directly with the operating system to get data from the input devices. For example, there are classes named Keyboard, Mouse, Stylus, Touch, and Cursor. The one class that is responsible for managing the input from the different input devices and marshalling that input to the rest of the presentation framework is the InputManager.

The other component to the WPF input system is a set of four classes in the System.Windows namespace (PresentationCore.dll). These classes are UIElement, ContentElement, FrameworkElement, and FrameworkContentElement. FrameworkElement inherits from UIElement and FrameworkContentElement inherits from ContentElement. These classes are the base classes for all visual elements in WPF such as Button, TextBlock, and ListBox.

images Note For more detailed information about WPF's input system, refer to the MSDN documentation at http://msdn.microsoft.com/en-us/library/ms754010.aspx.

The InputManager tracks all device input and uses a set of methods and events to notify UIElement and ContentElement objects that the input device is performing some action related to the visual element. For example, WPF raises the MouseEnterEvent event when the mouse cursor enters the visual space of a visual element. There is also a virtual OnMouseEnter method in the UIElement and ContentElement classes, which WPF also calls when the mouse enters the visual space of the object. This allows other objects, which inherit from the UIElement or ContentElement classes, to directly receive data from input devices. WPF calls these methods on the visual elements before it raises any input events. There are several other similar types of events and methods on the UIElement and ContentElement classes to handle the various types of interactions including MouseEnter, MouseLeave, MouseLeftButtonDown, MouseLeftButtonUp, TouchEnter, TouchLeave, TouchUp, and TouchDown, to name a few.

Developers have direct access to the mouse and other input devices needed. The InputManager object has a property named PrimaryMouseDevice, which returns a MouseDevice object. Using the MouseDevice object, you can get the position of the mouse at any time through a method named GetScreenPosition. Additionally, the MouseDevice has a method named GetPosition, which takes in a user interface element and returns the mouse position within the coordinate space of that element. This information is crucial when determining mouse interactions such as the mouse hover event. With each new SkeletonFrame generated by the Kinect SDK, we are given the position of each skeleton joint in relation to skeleton space; we then have to perform coordinate space transforms to translate the joint positions to be usable with visual elements. The GetScreenPosition and GetPosition methods on the MouseDevice object do this work for the developer for mouse input.

In some ways, Kinect is comparable with the mouse, but the comparisons abruptly break down. Skeleton joints enter and leave visual elements similar to a mouse. In other words, joints hover like a mouse cursor. However, the click and mouse button up and down interactions do not exist. As we will see in the next chapter, there are gestures that simulate a click through a push gesture. The button push metaphor is weak when applied to Kinect and so the comparison with the mouse ends with the hover.

Kinect does not have much in common with touch input either. Touch input is available from the Touch and TouchDevice classes. Single touch input is similar to mouse input, whereas multiple touch input is akin to Kinect. The mouse has only a single interaction point (the point of the mouse cursor), but touch input can have multiple input points, just as Kinect can have multiple skeletons, and each skeleton has twenty input points. Kinect is more informative, because we know which input points belong to which user. With touch input, the application has no way of knowing how many users are actually touching the screen. If the application receives ten touch inputs, is it one person pressing all ten fingers, or is it ten people pressing one finger each? While touch input has multiple input points, it is still a two-dimensional input like the mouse or stylus. To be fair, touch input does have breadth, meaning it includes a location (X, Y) of the point and the bounding area of the contact point. After all, a user pressing a finger on a touch screen is never as precise as a mouse pointer or stylus; it always covers more than one pixel.

While there are similarities, Kinect input clearly does not neatly conform to fit the form of any input device supported by WPF. It has a unique set of interactions and user interface metaphors. It has yet to be determined if Kinect should function in the same way as other input devices. At the core, the mouse, touch, or stylus report a single pixel point location. The input system then determines the location of the pixel point in the context of a visual element, and that visual element reacts accordingly. Current Kinect user interfaces attempt to use the hand joints as alternatives to mouse or touch input, but it is not clear yet if this is how Kinect should be used or if the designer and developer community is simply trying to make Kinect conform to known forms of user input.

The expectation is that at some point Kinect will be fully integrated into WPF. Until WPF 4.0, touch input was a separate component. Touch was first introduced with Microsoft's Surface. The Surface SDK included a special set of WPF controls like SurfaceButton, SurfaceCheckBox and SurfaceListBox. If you wanted a button that responded to touch events, you had to use the SurfaceButton control.

One can speculate that if Kinect input were to be assimilated into WPF, there might be a class named SkeletonDevice, which would look similar to the SkeletonFrame object of the Kinect SDK. Each Skeleton object would have a method named GetJointPoint, which would function like the GetPosition method on MouseDevice or the GetTouchPoint on TouchDevice. Additionally, the core visual elements (UIElement, ContentElement, FrameworkElement, and FrameworkContentElement) would have events and methods to notify and handle skeleton joint interactions. For example, there might be JointEnter, JointLeave, and JointHover events. Further, just as touch input has the ManipulationStarted and ManipulationEnded events, there might be GestureStarted and GestureEnded events associated with Kinect input.

For now, the Kinect SDK is a separate entity from WPF, and as such, it does not natively integrate with the input system. It is the responsibility of the developer to track skeleton joint positions and determine when joint positions intersect with user interface elements. When a skeleton joint is within the coordinate space of a visual element, we must then manually alter the appearance of the element to react to the interaction. Woe is the life of a developer when working with a new technology.

Detecting User Interaction

Before we can determine if a user has interacted with visual elements on the screen, we must define what it means for the user to interact with a visual element. Looking at a mouse- or cursor-driven application, there are two well-known interactions. A mouse hovers over a visual element and clicks. These interactions break down even further into other more granular interactions. For a cursor to hover, it must enter the coordinate space of the visual element. The hover interaction ends when the cursor leaves the coordinate space of the visual element. In WPF, the MouseEnter and MouseLeave events fire when the user performs these interactions. A click is the act of the mouse button being pressed down (MouseDown) and released (MouseUp).

There is another common mouse interaction beyond a click and hover. If a user hovers over a visual element, presses down the left mouse button, and then moves the cursor around the screen, we call this a drag. The drop interaction happens when the user releases the mouse button. Drag and drop is a complex interaction, much like a gesture.

For the purpose of this chapter, we focus on the first set of simple interactions where the cursor hovers, enters, and leaves the space of the visual element. In the Kinect the Dots project from the previous chapter, we had to determine when the user's hand was in the vicinity of a dot before drawing a connecting line. In that project, the application did not interact with the user interface as much as the user interface reacted to the user. This distinction is important. The application generated the locations of the dots within a coordinate space that was the same as the screen size, but these points were not derived from the screen space. They were just data stored in variables. We fixed the screen size to make it easy. Upon receipt of each new skeleton frame, the position of the skeleton hand was translated into the coordinate space of the dots, after which we determined if the position of the hand was the same as the current dot in the sequence. Technically, this application could function without a user interface. The user interface was created dynamically from data. In that application, the user is interacting with the data and not the user interface.

Hit Testing

Determining when a user's hand is near a dot is not as simple as checking if the coordinates of the hand match exactly the position of the dot. Each dot is just a single pixel, and it would be impossible for a user to place their hand easily and routinely in the same pixel position. To make the application usable, we do not require the position of the hand to be the same as the dot, but rather within a certain range. We created a circle with a set radius around the dot, with the dot being the center of the circle. The user just has to break the plane of the proximity circle for the hand to be considered hovering over the dot. Figure 5-1 illustrates this. The white dot within the visual element circle is the actual dot point and the dotted circle is the proximity circle. The hand image is centered to the hand point (white dot within the hand icon). It is therefore possible for the hand image to cross the proximity circle, but the hand point to be outside the dot. The process of checking to see if the hand point breaks the plane of the dot is called hit testing.

images

Figure 5-1. Dot proximity testing

Again, in the Kinect the Dots project, the user interface reacts to the data. The dots are drawn on the screen according to the generated coordinates. The application performs hit testing using the dot data and not the size and layout of the visual element. Most applications and games do not function this way. The user interfaces are more complex and often dynamic. Take, for example, the ShapeGame application (Figure 5-2) that comes with the Kinect for Windows SDK. It generates shapes that drop from the sky. The shapes pop and disappear when the user “touches” them.

images

Figure 5-2. Microsoft SDK sample ShapeGame

An application like ShapeGame requires a more complex hit testing algorithm than that of Kinect the Dots. WPF provides some tools to help hit test visual objects. The VisualTreeHelper class (System.Windows.Media namespace) has a method named HitTest. There are multiple overloads for this method, but the primary method signature takes in a Visual object and a point. It returns the top-most visual object within the specified visual object's visual tree at that point. If that seems complicated and it is not inherently obvious what this means, do not worry. A simple explanation is that WPF has a layered visual output. More than one visual element can occupy the same relative space. If more than one visual element is at the specified point, the HitTest method returns the element at the top layer. Due to WPF's styling and templating system, which allows controls to be composites of one or more visual elements and other controls, more often than not there are multiple visual elements at any given coordinate point.

Figure 5-3 helps to illustrate the layering of visual elements. There are three elements: a Rectangle, a Button, and an Ellipse. All three are in a Canvas panel. The ellipse and the button sit on top of the rectangle. In the first frame, the mouse is over the ellipse and a hit test at this point returns the ellipse. A hit test in the second frame returns the rectangle even though it is the bottom layer. While the rectangle is at the bottom, it is the only visual element occupying the pixel at the mouse's cursor position. In the third frame, the cursor is over the button. Hit testing at this point returns a TextBlock element. If the cursor were not on the text in the button, a hit test would return a ButtonChrome element. The button's visual representation is composed of one or more visual controls, and is customizable. In fact, the button has no inherent visual style. A Button is a visual element that inherently has no visual representation. The button shown in Figure 5-3 uses the default style, which is in part made up of a TextBlock and a ButtonChrome. It is important to understand hit testing on a control does not necessarily mean the hit test returns the desired or expected visual element or control, as is the case with the Button. In this example, we always get one of the elements that compose the button visual, but never the actual button control.

images

Figure 5-3. Layered UI elements

To make hit testing more convienant, WPF provides other methods to assist with hit testing. The UIElement class defines an InputHitTest method, which takes in a Point and returns an IInputElement that is at the specified point. The UIElement and ContentElement classes both implement the IInputElement interface. This means that virtually all user interface elements within WPF are covered. The VisualTreeHelper class also has a set of HitTest methods, which can be used more generically.

images Note The MSDN documentation for the UIElement.InputHitTest method states, “This method typically is not called from your application code. Calling this method is only appropriate if you intend to re-implement a substantial amount of the low level input features that are already present, such as recreating mouse device logic.” Kinect is not integrated into WPF's “low-level input features” therefore, it is necessary to recreate mouse device logic.

In WPF, hit testing depends on two variables, a visual element and a point. The test determines if the specified point lies within the coordinate space of the visual element. Let's use Figure 5-4 to better understand coordinate spaces of visual elements. Each visual element in WPF, regardless of shape and size, has what is called a bounding box: a rectangular shape around the visual element that defines the width and height of the visual element. This bounding box is used by the layout system to determine the overall dimensions of the visual element and how to arrange it on the screen. While the Canvas arranges its children based on values specified by the developer, an element's bounding box is fundamental to the layout algorithm of other panels such the Grid and StackPanel. The bounding box is not visually shown to the user, but is represented in Figure 5-4 by the dotted box surrounding each visual element. Additionally, each element has an X and Y position that defines the element's location within its parent container. To obtain the bounding box and position of an element call the GetLayoutSlot method of the LayoutInformation (static) class (System.Windows.Controls.Primitives).

Take, for example, the triangle. The top-left corner of the bounding box is point (0, 0) of the visual element. The width and height of the triangle are each 200 pixels. The three points of the triangle within the bounding box are at (100, 0), (200, 200), (0, 200). A hit test is only successful for points within the triangle and not for all points within the bounding box. A hit test for point (0, 0) is unsuccessful, whereas a test at the center of the triangle, point (100, 100), is successful.

images

Figure 5-4. Layout space and bounding boxes

Hit testing results depend on the layout of the visual elements. In all of our projects, we used the Canvas panel to hold our visual elements. The Canvas panel is the one visual element container that gives the developer complete control over the placement of the visual elements, which can be especially useful when working with Kinect. Basic functions like hands tracking are possible with other WPF panels, but require more work and do not perform as well as the Canvas panel. With the Canvas panel, the developer explicitly sets the X and Y position (CanvasLeft and CanvasTop respectively) of the child visual element. Coordinate space translation, as we have seen, is straightforward with the Canvas panel, which means less code to write and better performance because there is less processing needed.

The disadvantage to using a Canvas is the same reason for using the Canvas panel. The developer has complete control over the placement of visual elements and therefore is also responsible for things like updating element positions when the window resizes or arranging complex layouts. Panels such as the Grid and StackPanel make UI layout updates and resizings painless to the developer. However, these panels increase the complexity of hit testing by increasing the size of the visual tree and by adding additional coordinate spaces. The more coordinate spaces, the more point translations needed. These panels also honor the alignment (horizontal and vertical) and margin properties of the FrameworkElement, which further complicates the calculations necessary for hit testing. If there is any possibility that a visual element will have RenderTransforms, you will be smart to use the WPF hit testing and not attempt do this testing yourself.

A hybrid approach is to place visual elements that change frequently based on skeleton joint positions, such as hand cursors in a Canvas, and place all of UI elements in other panels. Such a layout scheme requires more coordinate space transforms, which can affect performance and possibly introduce bugs related to improper transform calculations. The hybrid method is at times the more appropriate choice because it takes full advantage of the WPF layout system. Refer to the MSDN documentation on WPF's layout system, panels and hit testing for a thorough understanding of these concepts.

Responding to Input

Hit testing only tells us that the user input point is within the coordinate space of a visual element. One of the important functions of a user interface is to give users feedback on their actions. When we move our mouse over a button, we expect the button to change visually in some way (change color, grow in size, animate, reveal a background glow), telling the user the button is clickable. Without this feedback, the user experience is not only flat and uninteresting, but also possibly confusing and frustrating. A failed application experience means the application as a whole is a failure, even if it technically functions flawlessly.

WPF has a fantastic system for notifying and responding to user input. The styling and template system makes developing user interfaces that properly respond to user input easy to build and highly customizable, but only if your user input comes from a mouse, stylus, or a touch device. Kinect developers have two options: do not use WPF's system and do everything manually, or create special controls that respond to Kinect input. The latter, while not overly difficult, is not a beginner's task.

With this in mind, we move to the next section where we build a game that applies hit testing and manually responds to user input. Before, moving on, consider a question, which we have purposefully not addressed until now. What does it mean for a Kinect skeleton to interact with the user interface? The core mouse interactions are: enter, leave, click. Touch input has enter, leave, down, and up interactions. A mouse has a single position point. Touch can have multiple position points, but there is always a primary point. A Kinect skeleton has twenty possible position points. Which of these is the primary point? Should there be a primary point? Should a visual element, such as a button, react when any one skeleton point enters the element's coordinate space, or should it react to only certain joint points, for instance the hands?

There is no one answer to all of these questions. It largely depends on the function and design of your user interface. These types of questions are part of a broader subject called Natural User Interface design, which is a significant topic in the next chapter. For most Kinect applications, including the projects in this chapter, the only joints that interact with the user interface are the hands. The starting interactions are enter and leave. Interactions beyond these become complicated quickly. We cover more complicated interactions later in the chapter and all of the next chapter, but now the focus is on the basics.

Simon Says

To demonstrate working with Kinect as an input device, we start our next project, which uses the hand joints as if they were a cross between a mouse and touch input. The project's goal is to give a practical, but introductory example of how to perform hit testing and create user interactions with WPF visual elements. The project is a game named Simon Says.

Growing up during my early grade school years, we played a game named Simon Says. In this game, one person plays the role of Simon and gives instructions to the other players. A typical instruction is, “Put your left hand on top of your head.” Players perform the instruction only if it is preceded by the words, “Simon Says.” For example, “Simon says, 'stomp your feet” in contrast to “stomp your feet.” Any player caught following an instruction not preceded by “Simon says” is out of the game. These are the game's rules. Did you play Simon Says as a child? Do kids still play this game? Look it up if you do not know the game.

images Tip The traditional version of Simon Says makes a fun drinking game—but only if you are old enough to drink. Please drink responsibly.

In the late 70s and early 80s the game company Milton Bradley created a hand-held electronic version of Simon Says named Simon. This game consisted of four colored (red, blue, green, and yellow) buttons. The electronic version of the game works with the computer, giving the player a sequence of buttons to press. When giving the instructions, the computer lights each button in the correct sequence. The player must then repeat the button sequence. After the player successfully repeats the button sequence, the computer presents another. The sequences become progressively more challenging. The game ends when the player cannot repeat the sequence.

We attempt to recreate the electronic version of Simon Says using Kinect. It is a perfect introductory example of using skeleton tracking to interact with user interface elements. The game also has a simple set of game rules, which we can quickly implement. Figure 5-5 illustrates our desired user interface. It consists of four rectangles, which serve as game buttons or targets. We have a game title at the top of the screen, and an area in the middle of the screen for game instructions.

images

Figure 5-5. Simon Says user interface

Our version of Simon Says works by tracking the player's hands; when a hand makes contact with one of the colored squares, we consider this a button press. It is common in Kinect applications to use hover or press gestures to interact with buttons. For now, our approach to player interactions remains simple. The game starts with the player placing her hands over the hand markers in the red boxes. Immediately after both hands are on the markers, the game begins issuing instructions. The game is over and returns to this state when the player fails to repeat the sequence. At this point, we have a basic understanding of the game's concept, rules, and look. Now we write code.

Simon Says, “Design a User Interface”

Start by building the user interface. Listing 5-1 shows the XAML for the MainWindow. As with our previous examples, we wrap our main UI elements in a Viewbox control to handle scaling to different monitor resolutions. Our UI dimensions are set to 1920x1080. There are four sections of our UI: title and instructions, game interface, game start interface, and cursors for hand tracking. The first TextBlock holds the title and the instruction UI elements are in the StackPanel that follows. These UI components serve only to help the player know the current state of the game. They have no other function and are not related to Kinect or skeleton tracking. However, the other UI elements are.

The GameCanvas, ControlCanvas, and HandCanvas all hold UI elements, which the application interacts with, based on the position of the player's hands. The hand positions obviously come from skeleton tracking. Taking these items in reverse order, the HandCanvas should be familiar. The application has two cursors that follow the movements of the player's hands, as we saw in the projects from the previous chapter. The ControlCanvas holds the UI elements that trigger the start of the game, and the GameCanvas holds the blocks, which the player presses during the game. The different interactive components are broken into multiple containers, making the user interface easier to manipulate in code. For example, when the user starts the game, we want to hide the ControlCanvas. It is much easier to hide one container than to write code to show and hide all of the children individually.

After updating the MainWindow.xaml file with the code in Listing in 5-1, run the application. The screen should look like Figure 5-1.

Listing 5-1. Simon Says User Interface

<Window x:Class="SimonSays.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:c="clr-namespace:SimonSays"
        Title="Simon Says" WindowState="Maximized">

    <Viewbox>
        <Grid x:Name="LayoutRoot" Height="1080" Width="1920" Background="White"
                TextElement.Foreground="Black">
            <TextBlock Text="Simon Says" FontSize="72" Margin="0,25,0,0"
                       HorizontalAlignment="Center" VerticalAlignment="Top"/>

            <StackPanel HorizontalAlignment="Center" VerticalAlignment="Center" Width="600">
                <TextBlock x:Name="GameStateElement" FontSize="55" Text="GAME OVER!"
                             HorizontalAlignment="Center"/>
                <TextBlock x:Name="GameInstructionsElement"
                             Text="Place hands over the targets to start a new game."
                             FontSize="45" HorizontalAlignment="Center"
                             TextAlignment="Center" TextWrapping="Wrap" Margin="0,20,0,0"/>
            </StackPanel>
            <Canvas x:Name="GameCanvas">
                <Rectangle x:Name="RedBlock" Height="400" Width="400" Fill="Red"
                             Canvas.Left="170" Canvas.Top="90" Opacity="0.2"/>
                <Rectangle x:Name="BlueBlock" Height="400" Width="400" Fill="Blue"
                             Canvas.Left="170" Canvas.Top="550" Opacity="0.2"/>
                <Rectangle x:Name="GreenBlock" Height="400" Width="400" Fill="Green"
                             Canvas.Left="1350" Canvas.Top="550" Opacity="0.2"/>
                <Rectangle x:Name="YellowBlock" Height="400" Width="400" Fill="Yellow"
                             Canvas.Left="1350" Canvas.Top="90" Opacity="0.2"/>
            </Canvas>

            <Canvas x:Name="ControlCanvas">
                <Border x:Name="RightHandStartElement" Background="Red" Height="200"
                          Padding="20" Canvas.Left="1420" Canvas.Top="440">
                    <Image Source="Images/hand.png"/>
                </Border>

                <Border x:Name="LeftHandStartElement" Background="Red" Height="200"
                          Padding="20" Canvas.Left="300" Canvas.Top="440">
                    <Image Source="Images/hand.png">
                        <Image.RenderTransform>
                            <TransformGroup>
                                <TranslateTransform X="-130"/>
                                <ScaleTransform ScaleX="-1"/>
                            </TransformGroup>
                        </Image.RenderTransform>
                    </Image>
                </Border>
            </Canvas>

            <Canvas x:Name="HandCanvas">
                <Image x:Name="RightHandElement" Source="Images/hand.png"
                         Visibility="Collapsed" Height="100" Width="100"/>

                <Image x:Name="LeftHandElement" Source="Images/hand.png"
                         Visibility="Collapsed" Height="100" Width="100">
                    <Image.RenderTransform>
                        <TransformGroup>
                            <ScaleTransform ScaleX="-1"/>
                            <TranslateTransform X="90"/>
                        </TransformGroup>
                    </Image.RenderTransform>
                </Image>
            </Canvas>
        </Grid>
    </Viewbox>
</Window>

Simon Says, “Build the Infrastructure”

With the UI in place, we turn our focus on the game's infrastructure. Update the MainWindow.xaml.cs file to include the necessary code to receive SkeletonFrameReady events. In the SkeletonFrameReady event handler, add the code to track player hand movements. The base of this code is in Listing 5-2. TrackHand is a refactored version of Listing 4-7, where the method takes in the UI element for the cursor and the parent element that defines the layout space.

Listing 5-2. Initial SkeletonFrameReady Event Handler

private void KinectDevice_SkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e)
{
    using(SkeletonFrame frame = e.OpenSkeletonFrame())
    {
        if(frame != null)
        {
            frame.CopySkeletonDataTo(this._FrameSkeletons);
            Skeleton skeleton = GetPrimarySkeleton(this._FrameSkeletons);

            if(skeleton == null)
            {
                LeftHandElement.Visibility  = Visibility.Collapsed;
                 RightHandElement.Visibility = Visibility.Collapsed;
            }
            else
            {
                TrackHand(skeleton.Joints[JointType.HandLeft], LeftHandElement, LayoutRoot);
                TrackHand(skeleton.Joints[JointType.HandRight], RightHandElement, LayoutRoot);
            }
        }
        }
    }


private static Skeleton GetPrimarySkeleton(Skeleton[] skeletons)
{
    Skeleton skeleton = null;

    if(skeletons != null)
    {
        //Find the closest skeleton       
        for(int i = 0; i < skeletons.Length; i++)
        {
            if(skeletons[i].TrackingState == SkeletonTrackingState.Tracked)
            {
                if(skeleton == null)
                {
                    skeleton = skeletons[i];
                }
                else
                {
                    if(skeleton.Position.Z > skeletons[i].Position.Z)
                    {
                        skeleton = skeletons[i];
                    }
               }
            }
        }
    }

    return skeleton;
}

For most games using a polling architecture is the better and more common approach. Normally, a game has what is called a gaming loop, which would manually get the next skeleton frame from the skeleton stream. However, this project uses the event model to reduce the code base and complexity. For the purposes of this book, less code means that it is easier to present to you, the reader, and easier to understand without getting bogged down in the complexities of gaming loops and possibly threading. The event system also provides us a cheap gaming loop, which, again, means we have to write less code. However, be careful when using the event system in place of a true gaming loop. Besides performance concerns, events are often not reliable enough to operate as a true gaming loop, which may result in your application being buggy or not performing as expected.

Simon Says, “Add Game Play Infrastructure”

The game Simon Says breaks down into three phases. The initial phase, which we will call GameOver, means no game is actively being played. This is the default state of the game. It is also the state to which the game reverts when Kinect stops detecting players. The game loops from Simon giving instructions to the player repeating or performing the instructions. This continues until the player cannot correctly perform the instructions. The application defines an enumeration to describe the game phases and a member variable to track the game state. Additionally, we need a member variable to track the current round or level of the game. The value of the level tracking variable increments each time the player successfully repeats Simon's instructions. Listing 5-3 details the game phase enumeration and member variables. The member variables initialization is in the class constructor.

Listing 5-3. Game Play Infrastructure

public enum GamePhase
{
    GameOver            = 0,
    SimonInstructing    = 1,
    PlayerPerforming    = 2
}


public partial class MainWindow : Window
{
    #region Member Variables
    private KinectSensor _KinectDevice;
    private Skeleton[] _FrameSkeletons;
    private GamePhase _CurrentPhase;
    private int _CurrentLevel;
    #endregion Member Variables


    #region Constructor
    public MainWindow()
    {
        InitializeComponent();

        //Any other constructor code such as sensor initialization goes here.

        this._CurrentPhase = GamePhase.GameOver
        this._CurrentLevel = 0;
    }
    #endregion Constructor

    #region Methods
    //Code from Listing 5-2 and any additional supporting methods
    #endregion Methods
}

We now revisit the SkeletonFrameReady event handler, which needs to determine what action to take based on the state of the application. The code in Listing 5-4 details the code changes. Update the SkeletonFrameReady event handler with this code and stub out the ChangePhase, ProcessGameOver, and ProcessPlayerPerforming methods. We cover the functional code of these methods later. The first method takes only a GamePhase enumeration value, while the latter two have a single parameter of Skeleton type.

When the application cannot find a primary skeleton, the game ends and enters the game over phase. This happens when the user leaves the view area of Kinect. When Simon is giving instructions to the user, the game hides hand cursors; otherwise, it updates the position of the hand cursors. When the game is in either of the other two phases, then the game calls special processing methods based on the particular game phase.

Listing 5-4. SkeletonFrameReady Event Handler

private void KinectDevice_SkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e)
{
    using(SkeletonFrame frame = e.OpenSkeletonFrame())
    {
        if(frame != null)
        {
            frame.CopySkeletonDataTo(this._FrameSkeletons);
            Skeleton skeleton = GetPrimarySkeleton(this._FrameSkeletons);

            if(skeleton == null)
            {
                ChangePhase(GamePhase.GameOver);
            }
            else
            {
                if(this._CurrentPhase == GamePhase.SimonInstructing)
                {
                    LeftHandElement.Visibility  = Visibility.Collapsed;
                    RightHandElement.Visibility = Visibility.Collapsed;
                }
                else
                {
                    TrackHand(skeleton.Joints[JointType.HandLeft],
                              LeftHandElement, LayoutRoot);
                    TrackHand(skeleton.Joints[JointType.HandRight],
                              RightHandElement, LayoutRoot);

                    switch(this._CurrentPhase)
                    {
                        case GamePhase.GameOver:
                            ProcessGameOver(skeleton);
                            break;

                        case GamePhase.PlayerPerforming:
                            ProcessPlayerPerforming(skeleton);
                            break;
                    }
                }
             }
         }
     }
}

Starting a New Game

The application has a single function when in the GameOver phase: detect when the user wants to play the game. The game starts when the player places her hands in the respective hand markers. The left hand needs to be within the space of the LeftHandStartElement and the right hand needs to be within the space of the RightHandStartElement. For this project, we use WPF's built-in hit testing functionality. Our UI is small and simple. The number of UI elements available for processing in an InputHitTest method call is extremely limited; therefore, there are no performance concerns. Listing 5-5 contains the code for the ProcessGameOver method and the GetHitTarget helper method. The GetHitTarget method is used in other places in the application.

Listing 5-5. Detecting When the User Is Ready to Start the Game

private void ProcessGameOver(SkeletonData skeleton)
{
    //Determine if the user triggers to start of a new game
    if(GetHitTarget (skeleton.Joints[JointType.HandLeft], LeftHandStartElement) != null &&
       GetHitTarget (skeleton.Joints[JointType.HandRight], RightHandStartElement) != null)
    {
        ChangePhase(GamePhase.SimonInstructing);
    }
}


private IInputElement GetHitTarget(Joint joint, UIElement target)
{
    Point targePoint = GetJointPoint(this.KinectDevice, joint,
                                     LayoutRoot.RenderSize, new Point());
    targetPoint = LayoutRoot.TranslatePoint(targetPoint, target);

    return target.InputHitTest(targetPoint);
}

The logic of the ProcessGameOver method is simple and straightforward: if each of the player's hands is in the space of their respective targets, change the state of the game. The GetHitTarget method is responsible for testing if the joint is in the target space. It takes in the source joint and the desired target, and returns the specific IInputElement occupying the coordinate point of the joint. While the method only has three lines of code, it is important to understand the logic behind the code.

Our hit testing algorithm consists of three basic steps. The first step gets the coordinates of the joint within the coordinate space of the LayoutRoot. The GetJointPoint method does this for us. This is the same method from the previous chapter. Copy the code from Listing 4-3 and paste it into this project.

Next, the joint point in the LayoutRoot coordinate space is translated to the coordinate space of the target using the TranslatePoint method. This method is defined in the UIElement class, of which Grid (LayoutRoot) is a descendent. Finally, with the point translated into the coordinate space of the target, we call the InputHitTest method, also defined in the UIElement class. If the point is within the coordinate space of the target, the InputHitTest method returns the exact UI element in the target's visual tree. Any non-null value means the hit test was successful.

It is important to note that the simplicity of this logic only works due to the simplicity of our UI layout. Our application consumes the entire screen and is not meant to be resizable. Having a static and fixed UI size dramatically simplifies the number of calculations. Additionally, by using Canvas elements to contain all interactive UI elements, we effectively have a single coordinate space. By using other panel types to contain the interactive UI elements or using automated layout features such as the HorizontalAlignment, VerticalAlignment, or Margin properties, you possibly increase the complexity of hit testing logic. In short, the more complicated the UI, the more complicated the hit testing logic, which also adds more performance concerns.

Changing Game State

Compile and run the application. If all goes well, your application should look like Figure 5-6. The application should track the player's hand movements and change the game phase from GameOver to SimonInstructing when the player moves his hands into the start position. The next task is to implement the ChangePhase method, as shown in Listing 5-6. This code is not related to Kinect. In fact, we could just as easily implemented this same game using touch or mouse input and this code would still be required.

images

Figure 5-6. Starting a new game of Simon Says

The function of ChangePhase is to manipulate the UI to denote a change in the game's state and maintain any data necessary to track the progress of the game. Specifically, the GameOver phase fades out the blocks, changes the game instructions, and presents the buttons to start a new game. The code for the SimonInstructing phase goes beyond updating the UI. It calls two methods, which generate the instruction sequence (GenerateInstructions), and displays these instructions to the player (DisplayInstructions). Following Listing 5-6 is the source code and further explanation for these methods as well as the definition of the _InstructionPosition member variables.

Listing 5-6. Controlling the Game State

private void ChangePhase(GamePhase newPhase)
{
    if(newPhase != this._CurrentPhase)
    {
        this._CurrentPhase = newPhase;

        switch(this._CurrentPhase)
        {
            case GamePhase.GameOver:
                this._CurrentLevel          = 0;
                RedBlock.Opacity            = 0.2;
                BlueBlock.Opacity           = 0.2;
                GreenBlock.Opacity          = 0.2;
                YellowBlock.Opacity         = 0.2;

                GameStateElement.Text           = "GAME OVER!";
                ControlCanvas.Visibility        = System.Windows.Visibility.Visible;
                GameInstructionsElement.Text    = "Place hands over the targets to start a
new game.";
                break;

            case GamePhase.SimonInstructing:
                this._CurrentLevel++;
                GameStateElement.Text = string.Format("Level {0}", this._CurrentLevel);
                ControlCanvas.Visibility        = System.Windows.Visibility.Collapsed;
                GameInstructionsElement.Text    = "Watch for Simon's instructions";
                GenerateInstructions();
                DisplayInstructions();
                break;

            case GamePhase.PlayerPerforming:
                this._InstructionPosition       = 0;
                GameInstructionsElement.Text    = "Repeat Simon's instructions";
                break;
        }
    }
}
Presenting Simon's Commands

Listing 5-7 details a new set of member variables and the GenerateInstructions method. The member variable _InstructionSequence holds a set of UIElements, which comprise Simon's instructions. The player must move his hand over each UIElement in the sequence order defined by the array positions. The instruction set is randomly chosen—each level with the number of instructions based on the current level or round. For example, round five has five instructions. Also included in this code listing is the DisplayInstructions method, which creates and then begins a storyboard animation to change the opacity of each block in the correct sequence.

Listing 5-7. Generating and Displaying Instructions

private int _InstructionPosition;
private UIElement[] _InstructionSequence;
private Random rnd = new Random();

private void GenerateInstructions()
{
    this._InstructionSequence = new UIElement[this._CurrentLevel];

    for(int i = 0; i < this._CurrentLevel; i++)
    {
        switch(rnd.Next(1, 4))
        {
            case 1:
                this._InstructionSequence[i] = RedBlock;
                break;

            case 2:
                this._InstructionSequence[i] = BlueBlock;
                break;

            case 3:
                this._InstructionSequence[i] = GreenBlock;
                break;

             case 4:
                 this._InstructionSequence[i] = YellowBlock;
                 break;
         }
    }
}


private void DisplayInstructions()
{
    Storyboard instructionsSequence = new Storyboard();
    DoubleAnimationUsingKeyFrames animation;

    for(int i = 0; i < this._InstructionSequence.Length; i++)
    {
        animation = new DoubleAnimationUsingKeyFrames();
        animation.FillBehavior = FillBehavior.Stop;
        animation.BeginTime = TimeSpan.FromMilliseconds(i * 1500);
        Storyboard.SetTarget(animation, this._InstructionSequence[i]);
        Storyboard.SetTargetProperty(animation, new PropertyPath("Opacity"));
        instructionsSequence.Children.Add(animation);

        animation.KeyFrames.Add(new EasingDoubleKeyFrame(0.3,
                                    KeyTime.FromTimeSpan(TimeSpan.Zero)));
        animation.KeyFrames.Add(new EasingDoubleKeyFrame(1,
                                    KeyTime.FromTimeSpan(TimeSpan.FromMilliseconds(500))));
        animation.KeyFrames.Add(new EasingDoubleKeyFrame(1,
                                    KeyTime.FromTimeSpan(TimeSpan.FromMilliseconds(1000))));
        animation.KeyFrames.Add(new EasingDoubleKeyFrame(0.3,
                                    KeyTime.FromTimeSpan(TimeSpan.FromMilliseconds(1300))));
    }


    instructionsSequence.Completed += (s, e) =>
    {
        ChangePhase(GamePhase.PlayerPerforming);
    };
    instructionsSequence.Begin(LayoutRoot);
}

Running the application now, we can see the application starting to come together. The player can start the game, which then causes Simon to begin issuing instructions.

Doing as Simon Says

The final aspect of the game is to implement the functionality to capture the player acting out the instructions. Notice that when the storyboard completes animating Simon's instructions, the application calls the ChangePhase method to transition the application into the PlayerPerforming phase. Refer back to Listing 5-4, which has the code for the SkeletonFrameReady event handler. When in the PlayerPerforming phase, the application executes the ProcessPlayerPerforming method. On the surface, implementing this method should be easy. The logic is such that a player successfully repeats an instruction when one of his hands enters the space of the target user interface element. Essentially, this is the same hit testing logic we already implemented to trigger the start of the game (Listing 5-5). However, instead of testing against two static UI elements, we test for the next UI element in the instruction array. Add the code in Listing 5-8 to the application. Compile and run it. You will quickly notice that the application works, but is very unfriendly to the user. In fact, the game is unplayable. Our user interface is broken.

Listing 5-8. Processing Player Movements When Repeating Instructions

private void ProcessPlayerPerforming(Skeleton skeleton)
{
    IInputElement leftTarget;
    IInputElement rightTarget;
    FrameworkElement correctTarget;

    correctTarget = this._InstructionSequence[this._InstructionPosition];
    leftTarget    = GetHitTarget(skeleton.Joints[JointType.HandLeft], GameCanvas);
    rightTarget   = GetHitTarget(skeleton.Joints[JointType.HandRight], GameCanvas);


    if(leftTarget != null && rightTarget != null)
    {
        ChangePhase(GamePhase.GameOver);
    }
    else if(leftTarget == null && rightTarget == null)
    {
        //Do nothing - target found
    }
    else if((leftHandTarget == correctTarget && rightHandTarget == null) ||
            (rightHandTarget == correctTarget && leftHandTarget == null)
    {
        this._InstructionPosition++;

        if(this._InstructionPosition >= this._InstructionSequence.Length)
        {
            ChangePhase(GamePhase.SimonInstructing);
        }
    }
    else
    {
        ChangePhase(GamePhase.GameOver);
    }
}

Before breaking down the flaws in the logic, let's understand essentially what this code attempts to accomplish. The first lines of code get the target element, which is the current instruction in the sequence. Then through hit testing, it gets the UI elements at the points of the left and right hand. The rest of the code evaluates these three variables. If both hands are over UI elements, then the game is over. Our game is simple and only allows a single block at a time. When neither hand is over a UI element, then there is nothing for us to do. If one of the hands matches the expected target, then we increment our instruction position in the sequence. The process continues while there are more instructions or until the player reaches the end of the sequence. When this happens, the game phase changes to SimonInstruction, and moves the player to the next round. For any other condition, the application transitions to the GameOver phase.

This works fine, as long as the user is heroically fast, because the instruction position increments as soon as the user enters the UI element. The user is given no time to clear their hand from the UI element's space before their hand's position is evaluated against the next instruction in the sequence. It is impossible for any player to get past level two. As soon as the player successfully repeats the first instruction of round two, the game abruptly ends. This obviously ruins the fun and challenge of the game.

We solve this problem by waiting to advance to the next instruction in the sequence until after the user's hand has cleared the UI element. This gives the user an opportunity to get her hands into a neutral position before the application begins evaluating the next instruction. We need to track when the user's hand enters and leaves a UI element.

In WPF, each UIElement object has events that fire when a mouse enters and leaves the space of a UI element, MouseEnter and MouseLeave, respectively. Unfortunately, as noted, WPF does not natively support UI interactions with skeleton joints produced by Kinect. This project would be a whole lot easier if each UIElement had events name JointEnter and JointLeave that fired each time a skeleton joint interacts with a UIElement. Since we are not afforded this luxury, we have to write the code ourselves. Implementing the same reusable, elegant, and low-level tracking of joint movements that exists for the mouse is non-trivial and impossible to match given the accessibility of certain class members. This type of development is also quite beyond the level and scope of this book. Instead, we code specifically for our problem.

The fix for the game play problem is easy to make. We add a couple of new member variables to track the UI element over which the player's hand last hovered. When the player's hand enters the space of a UI element, we update the tracking target variable. With each new skeleton frame, we check the position of the player's hand; if it is determined to have left the space of the UI element, then we process the UI element. Listing 5-9 shows the updated code for the ProcessPlayerPerforming method. The key changes to the method are in bold.

Listing 5-9. Detecting Users' Movements During Game Play

private FrameworkElement _LeftHandTarget;
private FrameworkElement _RightHandTarget;


private void ProcessPlayerPerforming(Skeleton skeleton)
{
    UIElement correctTarget   = this._InstructionSequence[this._InstructionPosition];
    IInputElement leftTarget  = GetHitTarget(skeleton.Joints[JointType.HandLeft], GameCanvas);
    IInputElement rightTarget = GetHitTarget(skeleton.Joints[JointType.HandRight], GameCanvas);

    if((leftTarget != this._LeftHandTarget) || (rightTarget != this._RightHandTarget))
    {

        if(leftTarget != null && rightTarget != null)
         {
             ChangePhase(GamePhase.GameOver);
         }
         else if((_LeftHandTarget == correctTarget && _RightHandTarget == null) ||
                 (_RightHandTarget == correctTarget && _LeftHandTarget == null))
         {
             this._InstructionPosition++;

             if(this._InstructionPosition >= this._InstructionSequence.Length)
             {
                 ChangePhase(GamePhase.SimonInstructing);
             }
         }
         else if(leftTarget != null || rightTarget != null)
         {
             //Do nothing - target found
         }
         else
         {
             ChangePhase(GamePhase.GameOver);
         }

         if(leftTarget != this._LeftHandTarget)
         {
             AnimateHandLeave(this._LeftHandTarget);
             AnimateHandEnter(leftTarget)
             this._LeftHandTarget = leftTarget;
         }

         if(rightTarget != this._RightHandTarget)
         {
             AnimateHandLeave(this._RightHandTarget);
             AnimateHandEnter(rightTarget)
             this._RightHandTarget = rightTarget;
         }
     }

}

With these code changes in place, the application is fully functional. There two new method calls, which execute when updating the tracking target variable: AnimateHandLeave and AnimateHandEnter. These functions only exist to initiate some visual effect signaling to the user that she has entered or left a user interface element. These types of visual clues or indicators are important to having a successful user experience in your application, and are yours to implement. Use your creativity to construct any animation you want. For example, you could mimic the behavior of a standard WPF button or change the size or opacity of the rectangle.

Enhancing Simon Says

This project is a good first start in building interactive Kinect experiences, but it could use some improvements. There are three areas of improvement: the user experience, game play, and presentation. We discuss possible enhancements, but the development is up to you. Grab friends and family and have them play the game. Notice how users move their arms and reach for the game squares. Come up with your own enhancements based on these observations. Make sure to ask them questions, because this feedback is always beneficial to making a better experience.

User Experience

Kinect-based applications and games are extremely new, and until they mature, building good user experiences consist of many trials and an extreme number of errors. The user interface in this project has much room for improvement. Simon Says users can accidently interact with a game square, and this is most obvious at the start of the game when the user extends his hands to the game start targets. Once both hands are within the target, the game begins issuing instructions. If the user does not quickly drop his hands, he could accidently hit one of the game targets. One change is to give the user time to reset his hand by his side, before issuing instructions. Because people naturally drop their hands to their side, an easy change is simply to delay instruction presentation for a number of seconds. This same delay is necessary in between rounds. A new round of instructions begins immediately after completing the instruction set. The user should be given time to clear his hands from the game targets.

Game Play

The logic to generate the instruction sequence is simple. The round number determines the number of instructions, and the targets are chosen at random. In the original game, the new round added a new instruction to the instruction set of the previous round. For example, round one might be red. Round two would be red then blue. Round three would add green, so the instruction set would be red, blue, and green. Another change could be to not increase the number of instructions incrementally by one each round. Rounds one through three could have instructions sets that equal the round number in the instruction count, but after that, the instruction set is twice the round number. A fun aspect of software development is that the application code can be refactored so that we can have multiple algorithms to generate instruction sequences. The game could allow the user to pick an instruction generation algorithm. For simplicity, the algorithms could be named easy, medium, or hard. While the base logic for generating instruction sequences gets longer with each round, the instructions display at a constant rate. To increase the difficulty of the game even more, decrease the amount of time the instruction is visible to the user when presenting the instruction set.

Presentation

The presentation of each project in this book is straightforward and easy. Creating visually attractive and amazing applications requires more attention to the presentation than is afforded in these pages. We want to focus more on the mechanics of Kinect development and less on application aesthetics. It is your duty to make gorgeous applications. With a little effort, you can polish the UI of these projects to make them dazzle and engage users. For instance, create nice animation transitions when delivering instructions, and when a user enters and leaves a target area. When users get instruction sets correct, display an animation to reward them. Likewise, have an animation when the game is over. At the very least, create more attractive game targets. Even the simplest games and applications can be engaging to users. An application's allure and charisma comes from its presentation and not from the game play.

Reflecting on Simon Says

This project illustrates basics of user interaction. It tracks the movements of the user's hands on the screen with two cursors, and performs hit tests with each skeleton frame to determine if a hand has entered or left a user interface element. Hit testing is critical to user interaction regardless of the input device. Since Kinect is not integrated into WPF as the mouse, stylus, or touch are, Kinect developers have to do more work to fully implement user interaction into their applications. The Simon Says project serves as an example, demonstrating the concepts necessary to build more robust user interfaces. The demonstration is admittedly shallow and more is possible to create reusable components.

Depth-Based User Interaction

Our projects working with skeleton data so far (Chapter 4 and 5) utilize only the X and Y values of each skeleton joint. However, the aspect of Kinect that differentiates it from all other input devices is not utilized. Each joint comes with a depth value, and every Kinect application should make use of the depth data. Do not forget the Z. The next project explores uses for skeleton data, and examines a basic approach to integrating depth data into a Kinect application.

Without using the 3D capabilities of WPF, there are a few ways to layer visual elements, giving them depth. The layout system ultimately determines the layout order of the visual elements. Using elements of different sizes along with layout system layering gives the illusion of depth. Our new project uses a Canvas and the Canvas.ZIndex property to set the layering of visual elements. Alternatively, it uses both manual sizing and a ScaleTransform to control dynamic scaling for changes in depth. The user interface of this project consists of a number of circles, each representing a certain depth. The application tracks the user's hands with cursors (hand images), which change in scale depending on the depth of the user. The closer the user is to the screen, the larger the cursors, and the farther from Kinect, the smaller the scale.

In Visual Studio, create a new project and add the necessary Kinect code to handle skeleton tracking. Update the XAML in MainWindow.xaml to match that shown in Listing 5-10. Much of the XAML is common to our previous projects, or obvious additions based on the project requirements just described. The main layout panel is the Canvas element. It contains five Ellipses along with an accompanying TextBlock. The TextBlocks are labels for the circles. Each circle is randomly placed around the screen, but given specific Canvas.ZIndex values. A detailed explanation behind the values comes later. The Canvas also contains two images that represent the hand cursors. Each defines a ScaleTransform. The image used for the screenshots is that of a right hand. The -1 ScaleX value flips the image to make it look like a left hand.

Listing 5-10. Deep UI Targets XAML

<Window x:Class="DeepUITargets.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:c="clr-namespace:DeepUITargets"
        Title="Deep UI Targets"
        Height="1080" Width="1920" WindowState="Maximized" Background="White">

    <Window.Resources>
        <Style x:Key="TargetLabel" TargetType="TextBlock">
            <Setter Property="FontSize" Value="40"/>
            <Setter Property="Foreground" Value="White"/>
            <Setter Property="FontWeight" Value="Bold"/>
            <Setter Property="IsHitTestVisible" Value="False"/>
        </Style>
    </Window.Resources>


    <Viewbox>
        <Grid x:Name="LayoutRoot" Width="1920" Height="1280">
            <StackPanel HorizontalAlignment="Left" VerticalAlignment="Top">
                <TextBlock x:Name="DebugLeftHand" Style="{StaticResource TargetLabel}"
                                                  Foreground="Black"/>
                <TextBlock x:Name="DebugRightHand" Style="{StaticResource TargetLabel}"
                                                   Foreground="Black"/>
            </StackPanel>

            <Canvas>
                <Ellipse x:Name="Target3" Fill="Orange" Height="200" Width="200"
                           Canvas.Left="776" Canvas.Top="162" Canvas.ZIndex="1040"/>
                <TextBlock Text="3" Canvas.Left="860" Canvas.Top="206"
                           Panel.ZIndex="1040" Style="{StaticResource TargetLabel}"/>

                <Ellipse x:Name="Target4" Fill="Purple" Height="150" Width="150"
                           Canvas.Left="732" Canvas.Top="320" Canvas.ZIndex="940"/>
                <TextBlock Text="4" Canvas.Left="840" Canvas.Top="372" Panel.ZIndex="940"
                           Style="{StaticResource TargetLabel}"/>

                <Ellipse x:Name="Target5" Fill="Green" Height="120" Width="120"
                           Canvas.Left="880" Canvas.Top="592" Canvas.ZIndex="840"/>
                <TextBlock Text="5" Canvas.Left="908" Canvas.Top="590" Panel.ZIndex="840"
                                    Style="{StaticResource TargetLabel}"/>

                <Ellipse x:Name="Target6" Fill="Blue" Height="100" Width="100"
                           Canvas.Left="352" Canvas.Top="544" Canvas.ZIndex="740"/>
                <TextBlock Text="6" Canvas.Left="368" Canvas.Top="582" Panel.ZIndex="740"
                           Style="{StaticResource TargetLabel}"/>

                <Ellipse x:Name="Target7" Fill="Red" Height="85" Width="85" Canvas.Left="378"
                           Canvas.Top="192" Canvas.ZIndex="640"/>
                <TextBlock Text="7" Canvas.Left="422" Canvas.Top="226" Panel.ZIndex="640"
                           Style="{StaticResource TargetLabel}"/>

                <Image x:Name="LeftHandElement" Source="Images/hand.png" Width="80"
                                                Height="80" RenderTransformOrigin="0.5,0.5">
                    <Image.RenderTransform>
                        <ScaleTransform x:Name="LeftHandScaleTransform" ScaleY="1"
                                                                        ScaleX="-1"/>
                    </Image.RenderTransform>
                </Image>

                <Image x:Name="RightHandElement" Source="Images/hand.png" Width="80"
                                                 Height="80" RenderTransformOrigin="0.5,0.5">
                    <Image.RenderTransform>
                        <ScaleTransform x:Name="RightHandScaleTransform" ScaleY="1"
                                                                         ScaleX="1"/>
                    </Image.RenderTransform>
                </Image>
            </Canvas>
        </Grid>
    </Viewbox>
</Window>

Each circle represents a depth. The element named Target3, for example, corresponds to a depth of three feet. The width and height of Target3 is greater than Target7, loosely giving a sense of scale. For our demonstration, hard coding these values suffices, but real-world applications would dynamically scale based on the specific application conditions. The circles are given unique colors to help further distinguish one from another.

The Canvas element layers the visual elements based on the Canvas.ZIndex values, such that the top-most visual element is the one with the largest Canvas.ZIndex value. If two visual elements have the same Canvas.ZIndex value, the order of definition within the XAML dictates the order of the elements. The Canvas control positions elements such that the larger an element's ZIndex value the closer to the top the element is layered, and the smaller the number the farther back it is layered. This means we cannot assign ZIndex values based directly on the distance of the visual element. Instead, inverting the depth values gives the desired effect. The maximum depth value is 13.4 feet. Consequently, our Canvas.ZIndex values range from 0 to 1340, where the depth value is multiplied by 100 for better precision. Therefore, the Canvas.ZIndex value for Target5 at a depth of five feet is 840 (13.5 – 5 = 8.4 * 100 = 840).

The final note on the XAML pertains to the two TextBlocks named DebugLeftHand and DebugRightHand. These visual elements are used to display skeleton data, specifically the depth value of the hands. It is quite difficult to debug Kinect applications, especially when you are the developer and the test user. Temporarily adding elements such as these to an application helps debug the code when traditional debugging techniques fail. Additionally, this information helps to better illustrate the purpose of this project.

The code in Listing 5-11 handles the processing on the skeleton data. The SkeletonFrameReady event handler is no different from previous examples, except for the calls to the TrackHand method, used in previous projects, which is modified to handle the scaling of the cursors. The method converts the X and Y positions from the skeleton space to the coordinate space of the container and set using the Canvas.SetLeft and Canvas.SetTop methods, respectively. The Canvas.ZIndex is calculated as previously described.

Setting the Canvas.ZIndex is enough to property layer the visual elements, but it fails to project a sense of perspective needed to produce the illusion of depth. Without this scaling, the application fails to satisfy the user. It fails as a Kinect application, because the application does not deliver an experience to the user that they cannot get from other input device. The scaling calculation used is moderately arbitrary. It is simple enough for this project to demonstrate changes in depth using scale; however, for other applications this approach is too simple.

For the best user experience, the hand cursors should scale to meet the relative size of the user's hands. This produces an illusion of the cursor being like a glove on the user's hand. It creates a subtle bond between the application and the user, one that the user will not necessarily be cognizant of, but certainly will cause the user to interact more naturally with the application.

Listing 5-11. Hand Tracking With Depth

private void Runtime_SkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e)
{
    using(SkeletonFrame skeletonFrame = e.OpenSkeletonFrame())
    {
        if(skeletonFrame != null)
        {
            skeletonFrame.CopySkeletonDataTo(this._FrameSkeletons);
            Skeleton skeleton = GetPrimarySkeleton(this._FrameSkeletons);

            if(skeleton != null)
            {
                TrackHand(skeleton.Joints[JointType.HandLeft], LeftHandElement,
                          LeftHandScaleTransform, LayoutRoot, true);
                TrackHand(skeleton.Joints[JointType.HandRight], RightHandElement,
                          RightHandScaleTransform, LayoutRoot, false);
            }
         }
     }
}


private void TrackHand(Joint hand, FrameworkElement cursorElement,
                       ScaleTransform cursorScale, FrameworkElement container, bool isLeft)
{
    if(hand.TrackingState != JointTrackingState.NotTracked)
    {
        double z = hand.Position.Z * FeetPerMeters;
        cursorElement.Visibility = System.Windows.Visibility.Visible;
        Point cursorCenter = new Point(cursorElement.ActualWidth / 2.0,
                                       cursorElement.ActualHeight / 2.0)
        Point jointPoint = GetJointPoint(this.KinectDevice, hand,
                                         container.RenderSize, cursorCenter);
        Canvas.SetLeft(cursorElement, jointPoint.X);
        Canvas.SetTop(cursorElement, jointPoint.Y);
        Canvas.SetZIndex(cursorElement, (int) (1340 - (z * 100)));

        cursorScale.ScaleX = 1340 / z * ((isLeft) ? -1 : 1);
        cursorScale.ScaleY = 1340 / z;

        if(hand.JointType == JointType.HandLeft)
        {
            DebugLeftHand.Text = string.Format("Left Hand: {0:0.00}", z * 10);
        }
        else
        {
             DebugRightHand.Text = string.Format("Right Hand: {0:0.00}", z * 10);
        }
    }
    else
    {
        DebugLeftHand.Text  = string.Empty;
        DebugRightHand.Text = string.Empty;
    }
}

Make sure to include the GetJointPoint code from previous projects. With that code added, compile and run the project. Move your hands around to multiple depths. The first effect is immediately obvious. The hand cursors scale according to the depth of the user's hand. The second effect of layering the visual objects is easy when making broad dramatic movements back and forth. Watch the hand position values in the debug fields change, and use this information to position your hand either in front of or behind a depth marker. Take the image in Figure 5-7 for example. The right hand is just in front of the four-foot mark. The cursor is layered between Target3 and Target4, while the right hand is beyond six feet. Figure 5-8 shows the result of both hands at roughly the same depth, between five and six feet, and the cursors display accordingly.

While crude in presentation, this example shows the effects possible when using depth data. When building Kinect applications, developers must think beyond the X and Y planes. Virtually all Kinect applications can incorporate depth using these techniques. All augmented reality applications should employ depth in the experience, otherwise Kinect is underutilized and the full potential of the experience goes unfulfilled. Don't forget the Z!

images

Figure 5-7. Hands at different depths

images

Figure 5-8. Hands at nearly the same depth

Poses

A pose is a distinct form of physical or body communication. In everyday life, people pose as an expression of feelings. It is a temporary pause or suspension of animation, where one's posture conveys a message. Commonly in sports, referees or umpires use poses to signal a foul or outcome of an event. In football, referees signal touchdowns or field goals by raising their arms above their heads. In basketball, referees use the same pose to signify a three-point basket. Watch a baseball game and pay attention to the third base coach or the catcher. Both use a series of poses to relay a message to the batter and pitcher, respectively. Poses in baseball, where signal stealing is common, get complex. If a coach touches the bill of his hat and then the buckle of his belt, he means for the base runner to steal a base. However, it might be a decoy message when the coach touches the bill of his hat and then the tip of his nose.

Poses can be confused with gestures, but they are in fact two different things. As stated, when a person poses, she holds a specific body position or posture. The implication is that a person remains still when posing. A gesture involves action, while a pose is inert. In baseball, the umpire gestures to signal a strikeout. A wave is another example of gesture. On touch screens, users employ the pinch gesture to zoom in. Still another form of gesture is when a person swipes by flicking a finger across a touch screen. These are gestures, because the person is performing an action. Shaking a fist when angry is a gesture, however, one poses when displaying their middle finger to another.

In the early life of Kinect development, more attention and development effort has been directed toward gesture recognition than pose recognition. This is unfortunate, but understandable. The marketing messages used to sell Kinect's focus on movement. The Kinect name itself is derived from the word kinetic, which means to produce motion. Kinect is sold as a tool for playing games where your actions—your gestures—control the game. Gestures create challenges for developers and user experience designers. As we examine in greater detail in the next chapter, gestures are not always easy for users to execute and can be extremely difficult for an application to detect. However, poses are deliberate acts of the user, which are more constant in form and execution.

While poses have received little attention, they have the potential for more extensive use in all applications, even games, than at present. Generally, poses are easier for users to perform, and much easier to write algorithms to detect. The technical solution to determine if a person is signaling a touchdown by raising their arms above their head is easier to implement than detecting a person running in place or jumping.

Imagine creating a game where the user is flying through the air. One way of controlling the experience is to have the user flap his arms like a bird. The more the user flaps, the faster he flies. That would be a gesture. Another option is to have the user extend his arms away from his body. The more extended the arms, the faster the user flies, and the closer the arms are to the body, the slower they fly. In Simon Says, the user must extend his arms outward to touch both hand targets in order to start the game. An alternative option, using a pose, is to detect when a user has both arms extended. The question then is how to detect poses?

Pose Detection

The posture and position of a user's body joints define a pose; more specifically, it is the relationship of each joint to another. The type and complexity of the pose defines the complexity of the detection algorithm. A pose is detectable by either intersection or position of joints or the angle between joints. Detecting a pose through intersection is not as involved as with angles, and therefore provides a good beginning to pose detection.

Pose detection through intersection is hit testing for joints. Earlier in the chapter, we detected when a joint position was within the coordinate space of a visual element. We do the same type of test for joints. The difference being it requires less work, because the joints are in the same coordinate space and the calculations are easier. For example, take the hands-on-hip pose. Skeleton tracking tells us the position of the left and right hip joints as well as the left and right hand joints. Using vector math, calculate the length between the left hand and the left hip. If the length of two points is less than some variable threshold, then the hands are considered to be intersecting. The threshold distance should be small. Testing for an exact intersection of points, while technically possible creates a poor user interface just as we discovered with visual element hit testing. The skeleton data coming back from Kinect jitters even with smoothing parameters applied, so much so that exact joint matches are virtually impossible. Additionally, it is impossible to expect a user to make smooth and consistent movements, or even hold a joint position for an extended period time. In short, the precision of the user's movements and the accuracy of the data preclude the practicality of such a simple calculation. Therefore, calculating the length between the two positions and testing for the length to be within a threshold is the only viable solution.

The accuracy of the joint position degrades further when two joints are in tight proximity. It becomes difficult for the skeleton engine to determine where one joint begins and another ends. Test this by having a user place her hand over her face. The head position is roughly the position of one's nose. The joint position of the hand and the head will never exactly match. This makes certain poses indistinguishable from others, for example. It is impossible to detect the difference between hand over face, hand on top of head, and hand covering ear. This should not completely discourage application designers and developers from using these poses. While it is not possible to definitively determine the exact pose, if the user is given proper visual instructions by the application she will perform the desired pose.

Joint intersection is not required to use both X and Y positions. Certain poses are detectable using only one plane. For example, take a standing plank pose where the user stands erect with his arms flat by his side. In this pose, the user's hands are relatively close to the same vertical plane as his shoulders, regardless of the user's size and shape. For this pose, the logic is to test the difference of the X coordinates of the shoulder and hand joints. If the absolute difference is within a small threshold, the joints are considered to be within the same plane. However, this does not guarantee the user is in the standing plank pose. The application must also determine if the hands are below the shoulders on the Y-axis. This type of logic produces a high degree of accuracy, but is still not perfect. There is no simple approach to determining if the user is actually standing. The user could be on his knees or just have his knees slightly bent, making pose detection an inexact science.

Not all poses are detectable using joint intersection techniques, but those that are can be detected more accurately using another technique. Take, for example, a pose where the user extends her arms outward, away from the body but level with the shoulders. This is called the T pose. Using joint intersection, an application can detect if the hand, elbow, and shoulder are in the same relative Y plane. Another approach is to calculate the angle between different joints in the body. The Kinect SDK's skeleton engine detects up to twenty skeleton points any two of which can be used to form a triangle. The angles of these triangles are calculated using trigonometry.

From the skeleton tracking data, we can draw a triangle using any two joint points. The third point of the triangle is derived from the other two points. Knowing the coordinates of each point in the triangle means that we know the length of each side, but no angle values. Applying the Law of Cosines formula gives us the value of any desired angle. The Law of Cosines states that c2 = a2 + b2 -2abcosC, where C is the angle opposite side c. This formula derives from the commonly known Pythagorean theorem of c2 = a2 + b2. Calculations on the joint points give the values for a, b, and c. The unknown is angle C. Transforming the formulas to solve for the unknown angle C yields: C = cos−1((a2 + b2 − c2) / 2ab). Arccosine (cos−1) is the inverse of the cosine function, and returns the angle of a specific value.

images

Figure 5-9. Law of Cosines

To demonstrate pose detection using joint triangulation consider the pose where the user is flexing his bicep. In this pose, the arm from the shoulder to the elbow is roughly parallel to the floor with the forearm (elbow to wrist) drawn to the shoulder. In this pose, it is easy to see the form of a right or acute triangle. For right and acute triangles, we can use basic trigonometry, but not for obtuse triangles. Therefore, we use the Law of Cosines as it works for all triangles. Using it exclusively keeps the code clean and simple. Figure 5-10 shows a skeleton in the bicep flex pose with a triangle overlaid to illustrate the math.

images

Figure 5-10. Calculating the angle between two joints

The figure shows the position of three joints: wrist, elbow, and shoulder. The lengths of the three sides (a, b, and c) are calculated from the three joint positions. Plugging in the side lengths to the transformed Law of Cosines equation, we solve for angle C. In this example, the value is 93.875 degrees.

There are two methods of joint triangulation. The most obvious approach is to use three joints to form the three points of the triangle, as shown in the bicep pose example. The other uses two joints with the third triangle point derived in part arbitrarily. The approach to use depends on the complexity and restrictions of the pose. In this example, we use the three-joint method, because the desired angle is that created from wrist to elbow to shoulder. The angle should always be the same regardless of the angle between the arm and torso (armpit) or the angle of the torso and the hips. To understand this, stand straight and flex your bicep. Without moving your arm and forearm flex at the hip to touch the side of your knee with your other (non-flexed) hand. The angle between the wrist, elbow, and shoulder joints is the same, but the overall body pose is different because the angle between the torso and hips has changed. If the bicep flex pose was strictly defined as the user standing straight and the bicep flexed, then the three-joint approach in our example fails to validate the pose.

To apply the two-joint method to the bicep flex pose, use only the elbow and the wrist joints. The elbow becomes the center or zero point of the coordinate system. The wrist position establishes the defining point of the angle. The third point of the triangle is any arbitrary point along the X-axis of the elbow point. The Y value of the third point is always the same as the zero point, which in this case is the elbow. In the two-joint method, the calculated angle is different when the user is standing straight as opposed to when leaning.

Reacting to Poses

Understanding how to detect poses only satisfies the technical side of Kinect application development. What the application does with this information and how it communicates with the user is equally critical to the application functioning well. The purpose of detecting poses is to initiate some action from the application. The simplest approach for any application is to trigger an action immediately upon detecting the pose, similar to a mouse click.

What makes Kinect fascinating and cool is that the user is the input device, but this also introduces new problems. The most challenging problem for developers and designers is that the user is the input device, because users do not always act as desired and expected. For decades, developers and designers have been working to improve keyboard- and mouse-driven applications so applications are robust enough to handle anything uses throws at them. Most of the techniques learned for keyboard and mouse input do not apply to Kinect. When using a mouse, the user must deliberately click the mouse button to execute an action—well, most of the time the mouse click is deliberate. When mouse clicks are accidental, there is no way for an application to know that it was by accident, but because the user is required to push the button, accidents happen less often. With pose detection, this is not always the case, because users are constantly posing.

Applications using pose detection must know when to ignore and when to react to poses. As stated, the easiest approach is for the application to react immediately to the pose. If this is the desired function of the application, choose distinct poses that a user may not naturally revert to when resting or relaxing. Choose poses that are easy to perform, but are not natural or common in general human movement. This requires the pose to be a more deliberate action much like the mouse click. Instead of immediately reacting to the pose, an alternative is to start a timer. The application then reacts only if the user holds the pose for a specific duration. This is arguably considered a gesture. We will defer diving deeper into that argument until next chapter.

Another approach to responding to user poses is to use a sequence of poses to trigger an action. This requires the user to perform a number of poses in a specific sequence before the application executes an action. Think back to the baseball example, where the coach is giving a set of signals to the players. There is always one pose that is an indicator that the pose that follows is a command. If the coach touches his nose and then his belt buckle, the runner on first should steal second base. However, if the coach touches his ear and then the belt buckle, this means nothing. The touching of the nose is the indicator that the next pose is a command to follow. Using a sequence of poses along with the uncommon posturing clearly indicates that the user very purposely desires the application to execute a specific action. In other words, the user is less likely to accidently trigger an undesired action.

Simon Says Revisited

Looking back on the Simon Says project, let's redo it, but instead of using visual element hit testing, we will use poses. In our second version, Simon instructs the player to pose in a specific sequence instead of touching targets. Detecting poses using joint angles gives the application the greatest range of poses. The more poses available and the crazier the pose, the more fun the player has. If your application experience is fun, then it is a success.

images Tip This version of Simon Says makes a fun drinking game, but only if you are old enough to drink! Please drink responsibly.

Using poses in place of visual targets requires changing a large portion of the application, but not in a bad way. The code necessary to detect poses is less than that needed to perform hit testing and determining if a hand has entered or left the visual element's space. The pose detection code focuses on using math, specifically trigonometry. Besides the changes to the code, there are changes to the user experience and game play. All of the bland boxes go away. The only visual elements left are the TextBlocks and the elements for the hand cursors. We will need some way of telling the user what pose to perform. The best approach to this is to create graphics or images showing the exact shape of the pose. Understandably, not everyone is a graphic designer or has access to one. A quick and dirty way is to display the name of the pose in the instructions TextBlock, which is going to be our approach. This works for debugging and testing, and buys you enough time to make friends with a graphic designer.

The game play changes, too. Removing the visual element hit testing means we have to create a completely new approach to starting the game. This is easy. We make the user pose! The start pose for the new Simon Says will be the same as before. In the first version, the user extended her arms to hit the two targets. This is a T pose, because the player's body resembles the letter T. The new version of Simon Says starts a new game when it detects the user in a T pose.

In the previous Simon Says, the instruction sequence pointer advanced when the user successfully hit the target, or the game ended if the player hit another target. In this version, the player has a limited time to reproduce the pose. If the user fails to pose correctly in the allotted time, the game is over. If the pose is detected, the game moves to the next instruction and the timer restarts.

Before writing any game code, we must build some infrastructure. For the game to be fun, it needs to be capable of detecting any number of poses. Additionally, it must be capable of easily adding new poses to the game. To facilitate creating a pose library, create a new class named PoseAngle and a structure named Pose. The code is shown in Listing 5-12. The Pose structure simply holds a name and an array of PoseAngle objects. The decision to use a structure instead of a class is for simplicity only. The PoseAngle class holds two JointTypes necessary to calculate the angle, the required angle between the joints and a threshold value. Just as with visual element hit testing, we will not require the user to ever absolutely match the angle, as this is impossible. As with visual element hit testing, we only require the user to be within a range of angle. The user is required to be plus or minus the threshold from the angle.

Listing 5-12. Classes to Store Pose Information

public class PoseAngle
{
    public PoseAngle(JointType centerJoint, JointType angleJoint,
                     double angle, double threshold)
    {
        CenterJoint = centerJoint;
        AngleJoint  = angleJoint;
        Angle       = angle;
        Threshold   = threshold;
    }


    public JointType CenterJoint { get; private set;}
    public JointType AngleJoint { get; private set;}
    public double Angle { get; private set;}
    public double Threshold { get; private set;}
}


public struct Pose
{
    public string Title;
    public PoseAngle[] Angles;
}

With the necessary code in place to store pose configuration, we write the code to create the game poses. In MainWindow.xaml.cs, create new member variables _PoseLibrary and _StartPose, and a method named PopulatePoseLibrary. This code is shown in Listing 5-13. The PopulatePoseLibrary method creates the definition of the start pose (T pose) and two poses to be used during game play. The first game pose titled “Touch Down” resembles a football referee signaling a touchdown. The other game pose titled “Scarecrow” is the inverse of the first.

Listing 5-13. Creating a Library of Poses

private Pose[] _PoseLibrary;
private Pose _StartPose;


private void PopulatePoseLibrary()
{
    this._PoseLibrary = new Pose[2];
    PoseAngle[] angles;


    //Start Pose - Arms Extended (Touch Down!)
    this._StartPose             = new Pose();
    this._StartPose.Title       = "Start Pose";
    angles    = new PoseAngle[4];
    angles[0] = new PoseAngle(JointType.ShoulderLeft, JointType.ElbowLeft, 180, 20);
    angles[1] = new PoseAngle(JointType.ElbowLeft, JointType.WristLeft, 180, 20);
    angles[2] = new PoseAngle(JointType.ShoulderRight, JointType.ElbowRight, 0, 20);
    angles[3] = new PoseAngle(JointType.ElbowRight, JointType.WristRight, 0, 20);
    this._StartPose.Angles = angles;


    //Pose 1 - Both Hands Up (Touch Down)
    this._PoseLibrary[0]       = new Pose();
    this._PoseLibrary[0].Title = "Touch Down!";
    angles     = new PoseAngle[4];
    angles[0]  = new PoseAngle(JointType.ShoulderLeft, JointType.ElbowLeft, 180, 20);
    angles[1]  = new PoseAngle(JointType.ElbowLeft, JointType.WristLeft, 90, 20);
    angles[2]  = new PoseAngle(JointType.ShoulderRight, JointType.ElbowRight, 0, 20);
    angles[3]  = new PoseAngle(JointType.ElbowRight, JointType.WristRight, 90, 20);
    this._PoseLibrary[1].Angles = angles;


    //Pose 2 - Both Hands Down (Scarecrow)
    this._PoseLibrary[1]       = new Pose();
    this._PoseLibrary[1].Title = "Scarecrow";
    angles     = new PoseAngle[4];
    angles[0]  = new PoseAngle(JointType.ShoulderLeft, JointType.ElbowLeft, 180, 20);
    angles[1]  = new PoseAngle(JointType.ElbowLeft, JointType.WristLeft, 270, 20);
    angles[2]  = new PoseAngle(JointType.ShoulderRight, JointType.ElbowRight, 0, 20);
    angles[3]  = new PoseAngle(JointType.ElbowRight, JointType.WristRight, 270, 20);
    this._PoseLibrary[1].Angles = angles;
}

With the necessary infrastructure in place, we implement the changes to the game code, starting by detecting the start of the game. When the game is in the GameOver state, the ProcessGameOver method is continually called. The purpose of this method was originally to detect if the player's hands were over the start targets. This code is replaced with code that detects if the user is in a specific pose. Listing 5-14 details the code to start the game play and to detect a pose. It is necessary to have a single method that detects a pose match, because we use it in multiple places in this application. Also, note how dramatically less code is in the ProcessGameOver method.

The code to implement the IsPose method is straightforward until the last few lines. The code loops through the PoseAngles defined in the pose parameter, calculating the joint angle and validating the angle against the angle defined by the PoseAngle. If any PoseAngle fails to validate the IsPose, the method returns false. The if statement tests to ensure that the angle range defined by the loAngle and hiAngle values is not outside of the degree range of a circle. If the values fall outside of this range, adjust before validating.

Listing 5-14. Updated ProcessGameOver

private void ProcessGameOver(Skeleton skeleton)
{
    if(IsPose(skeleton, this._StartPose))
    {
        ChangePhase(GamePhase.SimonInstructing);
    }
}


private bool IsPose(Skeleton skeleton, Pose pose)
{
    bool isPose = true;
    double angle;
    double poseAngle;
    double poseThreshold;
    double loAngle;
    double hiAngle;


    for(int i = 0; i < pose.Angles.Length && isPose; i++)
    {
        poseAngle       = pose.Angles[i].Angle;
        poseThreshold   = pose.Angles[i].Threshold;
        angle           = GetJointAngle(skeleton.Joints[pose.Angles[i].CenterJoint],
                                         skeleton.Joints[pose.Angles[i].AngleJoint]);

        hiAngle = poseAngle + poseThreshold;
        loAngle = poseAngle - poseThreshold;

        if(hiAngle >= 360 || loAngle < 0)
        {
            loAngle = (loAngle < 0) ? 360 + loAngle : loAngle;
            hiAngle = hiAngle % 360;

            isPose = !(loAngle > angle && angle > hiAngle);
        }
        else
        {
            isPose = (loAngle <= angle && hiAngle >= angle);
        }
     }

     return isPose;
}

The IsPose method calls the GetJointAngle method to calculate the angle between the two joints. It calls the GetJointPoint method to get the points of each joint in the main layout space. This step is technically unnecessary. The raw position values of the joints are all that is needed to calculate the joint angles. However, converting the values to the main layout coordinate system helps with debugging. With the joint positions, regardless of the coordinate space, the method then implements the Law of Cosines formula to calculate the angle between the joints. WPF's arccosine method (Math.Acos()) returns values in radians, making it necessary for us to convert the angle value to degrees. The final if handles angles between 180-360 degrees. The Law of Cosines formula only works for angle between 0 and 180. The if block is necessary to adjust values for angles falling into the third and fourth quadrants of the graph.

Listing 5-15. Calculating the Angle Between Two Joints

private double GetJointAngle(Joint zeroJoint, Joint angleJoint)
{
    Point zeroPoint     = GetJointPoint(zeroJoint);
    Point anglePoint    = GetJointPoint(angleJoint);
    Point x             = new Point(zeroPoint.X + anglePoint.X, zeroPoint.Y);

    double a;
    double b;
    double c;

    a = Math.Sqrt(Math.Pow(zeroPoint.X - anglePoint.X, 2) +
                   Math.Pow(zeroPoint.Y - anglePoint.Y, 2));
    b = anglePoint.X;
    c = Math.Sqrt(Math.Pow(anglePoint.X - x.X, 2) + Math.Pow(anglePoint.Y - x.Y, 2));

    double angleRad = Math.Acos((a * a + b * b - c * c) / (2 * a * b));
    double angleDeg = angleRad * 180 / Math.PI;

    if(zeroPoint.Y < anglePoint.Y)
    {
        angleDeg = 360 - angleDeg;
    }

    return angleDeg;
}

The code needed to detect poses and start the game is in place. When the game detects the start pose, it transitions into the SimonInstructing phase. The code changes for this phase are isolated to the GenerateInstructions and DisplayInstructs methods. The updates for GenerateInstructions are relatively the same: populate the instructions array with a randomly selected pose from the pose library. The DisplayInstructions method is an opportunity to get creative in the way you present the sequence of instructions to the player. We will leave these updates to you.

Once the game completes the presentation of instructions, it transitions to the PlayerPerforming stage. The updated game rules give the user a limited time to perform the instructed pose. When the application detects the user in the required pose, it advances to the next pose and restarts the timer. If the timer goes off before the player reproduces the pose, the game ends. WPF's DispatcherTimer makes it easy to implement the timer feature. The DispatcherTimer object is in the System.Windows.Threading namespace. The code to initialize and handle the timer expiration is in Listing 5-16. Create a new member variable, and add the code in the listing to the MainWindow constructor.

Listing 5-16. Timer Initialization

this._PoseTimer             = new DispatcherTimer();
this._PoseTimer.Interval    = TimeSpan.FromSeconds(10);
this._PoseTimer.Tick       += (s, e) => { ChangePhase(GamePhase.GameOver); };
this._PoseTimer.Stop();

The final code update necessary to use poses in Simon Says is shown in Listing 5-17. This listing details the changes to the ProcessPlayerPerforming method. On each call, it validates the current pose in the sequence with the player's skeleton posture. If the correct pose is detected, it stops the timer and moves to the next pose instruction in the sequence. The game changes to the instructing phase when the player reaches the end of the sequence. Otherwise, the timer refreshed for the next pose.

Listing 5-17. Updated ProcessPlayerPerforming Method

private void ProcessPlayerPerforming(Skeleton skeleton)
{
    int instructionSeq = this._InstructionSequence[this._InstructionPosition];

    if(IsPose(skeleton, this._PoseLibrary[instructionSeq]))
    {
        this._PoseTimer.Stop();
        this._InstructionPosition++;

        if(this._InstructionPosition >= this._InstructionSequence.Length)
        {
           ChangePhase(GamePhase.SimonInstructing);
        }
        else
        {
            this._PoseTimer.Start();
        }
    }
}

With this code added to the project, Simon Says detects poses in place of visual element hit testing. This project is a practical example of pose detection and how to implement it in an application experience. With the infrastructure code in place, create new poses and add them to the game. Make sure to experiment with different types of poses. You will discover that not all poses are easily detectable and do not work well in Kinect experiences.

As with any application, but uniquely so for a Kinect-driven application, the user experience is critical to a successful application. After the first run of the new Simon Says, it is markedly obvious that much is missing from the game. The user interface lacks many elements necessary to make the user interface effective or even a fun game. Having a fun experience is the point after all. The game lacks any user feedback, which is paramount to a successful user experience. For Simon Says to become a true Kinect-driven experience, it must provide the user visual cues when the game starts and ends. The application should reward players with a visual effect when they successfully perform poses. The type of feedback and how it looks is for you to decide. Be creative! Make the game entertaining to play and visually striking. Here are a few other ideas for enhancements:

  • Create more poses. Adding new poses is easy to do using the Pose class. The infrastructure is in place. All you need to do is determine the angles of the joints and build the Pose objects.
  • Adjust the game play by speeding up the pose timer each round. This makes the user more active and engaged in the game.
  • Apply more pressure by displaying the timer in the user interface. Showing the timer on the screen applies stress to the user, but in a playful manner. Adding visual effects to the screen or the timer as it closely approaches zero adds further pressure.
  • Take a snapshot! Add the code from Chapter 2 to take snapshots of users while they are in poses. At the end of the game, display a slideshow of the snapshots. This creates a truly memorable gaming experience.

Reflect and Refactor

Looking back on this chapter, the most reusable code is the pose detection code from the revised Simon Says project. In that project, we wrote enough code to start a pose detection engine. It is not ridiculous to speculate that a future version of the Microsoft Kinect SDK will include a pose detection engine, but this is absent from the current version. Given that Microsoft has not provided any indication of the future features in the Kinect SDK, it is worthwhile to create such a tool. There have been some attempts to create similar tools by the online Kinect developer community, but so far, none has emerged as the standard.

For those who are industrious and willing to build their own pose engine, imagine a class named PoseEngine, which has a single event named PoseDetected. This event fires when the engine detects that a skeleton has performed a pose. By default, the PoseEngine listens to SkeletonFrameReady events, but would also have a means to manually test for poses on a frame-by-frame basis, making it serviceable under a polling architecture. The class would hold a collection of Pose objects, which define the detectable poses. Using the Add and Remove methods, similar to a .NET List, a developer defines the pose library for the application.

To facilitate adding and removing poses at runtime, the pose definitions cannot be hard-coded like they are in the Simon Says project. The simplicity of these objects means serialization is straightforward. Serializing the pose data provides two advantages. The first is that poses are more easily added and removed from an application. Applications can read poses from configuration when the application loads or dynamically adds new poses during the application's run time. Further, the ability to persist pose configuration means we can build tools to create pose configuration by capturing or recording poses.

It's easy to envision a tool to capture and serialize the poses for application use. This tool is a Kinect application that uses all of the techniques and knowledge presented thus far. Taking the SkeletonViewer control created in the previous chapter, add the joint angle calculation logic from Simon Says. Update the output of the SkeletonViewer to display the angle value and draw an arc to clearly illustrate the joint angle. The pose capture tool would then have a function to take a snapshot of the user poses, a snapshot being nothing more than a recording of the various joint angles. Each snapshot is serialized, making it easy to add to any application.

A much quicker solution is to update the SkeletonViewer control to display the joint angles. Figure 5-11 shows what the output might be. This allows you to quickly see the angles of the joints. Pose configuration can be manually created this way. Even with a pose detection engine and pose builder tool, updating the SkeletonViewer to include joint angles becomes a valuable debugging tool.

images

Figure 5-11. Illustrating the angle change between the right elbow and right wrist

Summary

The Kinect presents a new and exciting challenge to developers. It is a new form of user input into our applications. Each new input device has commonalities with other devices, but also has universially unique features. In this chapter, we introduced WPF's input system and how Kinect input is similar to the mouse and touch devices. The conversation covered the primitives of user interface interactions specifically hit testing, and we provided a demonstration of these principles with the Simon Says game. From there we expanded the discussion to illustrate how the unqiue feature of Kinect (Z data) can be used in an application.

We concluded the chapter by introducing the concept of a pose and how to detect poses. This included updating the Simon Says game to use poses to drive game play. Poses are a unique way for a user to communicate an action or series of actions to the application. Understanding how to detect poses is just the beginning. The next phase is defining what a pose is, followed by agreeing on common poses and standardizing pose names. The more fundamentally important part of the pose challenge is determining how to react once the pose is detected. This has technical as well as design implications. The technical considerations are more easily accomplished, requiring tools for processing skeleton data to recognize poses and to notify the user interface. Ideally, this type of behavior would be integrated into WPF or at the very least included in the Kinect for Windows SDK.

One of the strengths of WPF that distinguishes it above all other application platforms is its integration of input device controls, and style, and template engine. Processing Kinect skeleton data as an input device is not a natural function of WPF. This falls to the developer, who has to rewrite much of the low-level code WPF already has for other input devices. The hope is that someday Microsoft will integrate Kinect into WPF as a native input device, freeing developers from the burden of manually reproducing this effort, so they can focus on building exciting, fun, and engaging Kinect experiences.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset