Introduction to Computer Vision on Mobile Devices

Learn how computer vision algorithms are developed on mobile platforms with Face Detection and Recognition on Mobile Devices. This hands-on guide enables engineers to understand the implications and challenges behind every design choice. The author gives an overview of the field of computer vision and provides motivation about developing computer vision applications on mobile platforms. Using face-related algorithms as examples, the author surveys and illustrates how design choices and algorithms can be geared toward developing power-saving and efficient applications on resource-constrained mobile platforms.

Keywords

face recognition; computer vision; mobile hardware; power consumption; networking bandwidth

Computer vision, the field of how to enable computers to see the world, has been studied in the research community for a long time, and various technologies have been developed and are mature enough to be deployed in our daily lives. These technologies and applications include but are not limited to facial recognition for personal password login, moving object detection and tracking for video surveillance, and human activity recognition for entertainment purposes. These applications are typically made possible through one or multiple cameras mounted on stationary platforms. The backend software system—usually running on powerful machines such as personal computers or high-end servers—captures and analyzes the live video data and responds accordingly, depending on the target application.

In recent years, mobile computing devices such as tablets or smartphones have become more and more popular. Although these devices feature a low-power design and have less powerful computation capability as personal computers or mainframes, they still support most lightweight applications. Also, in addition to vision cameras, these computing devices usually come with sensors such as gyroscopes, accelerometers, pedometers, or GPS receivers, just to name a few. These sensors are typically not available on personal computers, and they open a new world on mobile devices for a wide variety of applications that are not seen on conventional platforms.

Researchers in academia and industry have started to look at developing computer vision algorithms on these devices to support and enable novel applications. On one hand, the applications could leverage the capabilities of additional sensors to assist the design of computer vision algorithms. On the other hand, the limited computing power and the interactive nature of mobile devices applications make it difficult to develop applications that are useful and natural. Targeting general engineers, these chapters will take a closer look at computer vision problems and algorithms and the possibilities of migrating them to mobile devices. It will cover the potential applications on these devices and encountered challenges. It will also introduce currently available platforms, software development kits, and hardware support. Specifically, the goals of this book include:

• To explain what computer vision is and why people would like to develop computer vision applications on mobile devices.

• To introduce commonly used state-of-the-art algorithms for different computer vision problems.

• To illustrate applications that might be made possible on mobile devices, and associated potential challenges.

• To describe available hardware, software, and platform support for developing mobile computer vision applications.

What Is Computer Vision?

Introduction to the Field of Computer Vision

Originated from the large field of artificial intelligence, computer vision is concerned with the study of enabling machines to perform visual tasks. The inputs to machines are images or video sequences, captured through a live camera feed. These tasks range from low-level de-nosing and filtering, to mid-level region growing or image/video segmentation (i.e., the operation of grouping image pixels into semantically uniform entities), and eventually to high-level detection or recognition (e.g., recognizing objects or faces in images). The old Chinese proverb, “A picture is worth a thousand words,” best summarizes the challenges and opportunities of this field. While it is beneficial to have a vast amount of information readily available in such a small image, it is not an easy task to analyze the information, impose a structure on it, and reason about it in an efficient manner, although these tasks might be simple and intuitive for human beings.

The field of computer vision is closely related to many sibling fields such as signal processing, machine learning, or even cognitive science in psychology. For example, it leverages techniques in signal processing to remove the noise in the image data. It also uses machine-learning methodologies so that machines can “learn” to recognize objects, such as fruits. Understanding how human beings perceive the world can inspire new representations of image data and alternative solutions to some of the computer vision problems.

What Can Computer Vision Do for Us?

After several decades of research and study, commercial products built on computer vision technology are already available in the market. For example, the built-in face detection chip inside digital cameras allows a camera to adjust the focus and exposure to the best quality automatically. By detecting the face regions and using the landmark information in the face (e.g., eye locations), a camera is also able to detect blinking eyes, red-eye effects, or smiling and signal the users accordingly. Some applications go one step further by recognizing the faces. The recognized faces are used as substitutes for usernames and passwords so the users can log onto their PCs without going through the irritating process of typing a lengthy password. Figure 1.1 shows example snapshots of these two scenarios.

image
Figure 1.1 Example computer vision application—face detection/recognition. The left image shows an example where the camera automatically adjusts the focus based on detected faces. The right image shows an example of auto login with recognized faces.

Another area that heavily relies on computer vision technologies is the security or video surveillance domain. For instance, by mounting one or more cameras overseeing the parking lot or the entrance area to a building, the backend computer running computer vision algorithms could analyze the video, detecting and tracking any moving objects. The technology could provide functionalities such as zone intrusion detection or trip wire to the users (e.g., allow the security personnel to define a zone for any moving object detections). The resulting trajectories, along with object meta-information—such as moving speed, object classes (e.g., person or car), or color—can be stored in a database for future retrieval, which is useful to support queries such as, “Find me all the red cars that traveled greater than 30 mph yesterday morning” (Figure 1.2).

image
Figure 1.2 Example computer vision application—video surveillance. The left shows an example where the computer vision system tracks any moving objects and classifies them into different categories, such as cars or human beings. The computer vision system can also be used for indoor surveillance, e.g., detecting suspicious activities such as an unattended bag in train stations or office areas. More details could be found at the PETS project website: http://www.anc.ed.ac.uk/demos/tracker/pets2001.html.

Aside from the aforementioned applications, retail stores also adopt computer vision systems to prevent self-checkout loss or sweethearting (i.e., cashiers giving away merchandise to customers). By analyzing the hand movements and combing object recognition results and the barcode information, these systems are able to detect false-scans and report suspect actions and items to the storeowners. As an example, the Samsung Smart TV recognizes the hand gestures with a built-in camera to enable the users to control the TV with hand gestures. Yet another example would be the Kinect/X360, where by analyzing the depth images, the software could keep track of different parts of the human body and recognize actions so that the players can interact with the machine and enjoy the game in an immersive way. The availability of the Kinect sensor also triggers a massive amount of interest in creating systems that provide gesture-control features, realizing the vision depicted in the movie Minority Report, where Chief John Anderton (acted by Tom Cruise) interacts with the mainframe computer with hand gestures.

Why Mobile Platform?

What Do We Mean by “Mobile”?

With the emerging of more advanced semiconductor technologies and wider deployment of internetworking devices, our daily lives are full of everyday objects and devices, capable of achieving ubiquitous or pervasive computing, a term describing the phenomenon of enabling and integrating information processing or computing capabilities into every ordinary object surrounding human beings. Examples include Internet-enabled refrigerators, which make it possible for housewives to query and store recipes, or living room lamps that can be controlled remotely. These intelligent gadgets are made possible through embedded chips inside larger appliances that have moderate computing power.

In these chapters, we focus on platforms that are mobile—i.e., devices that are carried around and are used anywhere and anytime by the users. Specifically, we focus on smartphones and tablets, which provide a rich set of interaction features and a certain degree of computing power. Although laptops can be carried around as well, we exclude them from the discussion due to their similarity with desktops in terms of features provided. In the remainder of this section, we will provide the motivation as to why we decide to focus this book on emerging mobile platforms. In particular, we will show the sales numbers and market share growth to justify our motivations.

Mobile Devices Markets

Since the first intelligent phone went onto the market, the sales of smartphones have been rapidly increasing. Figure 1.3 shows the historical sales numbers of PCs, smartphones, and tablets.1 We can see that the sales number of smartphones surpasses that of PCs, beginning in 2011, and it also maintains a steady growth. In addition, with the tablet computers gaining more popularity, tablet sales are forecasted to be much more than PCs and sell over 300 million units by 2016.

image
Figure 1.3 Yearly worldwide sales numbers for PCs, smartphones, and tablets (predicted numbers for tablets). Y-axis represents million units and X-axis represents years. The sales number of smartphones not only surpassed that of PCs but also quadrupled it in the past few years. The sales of tablets are also ramping up and are predicted to surpass those of PCs by 2016.

With these numbers, every major player in the technology sector would likely want to get into the highly lucrative and rapidly growing mobile markets. Currently, in terms of operating systems or mobile platforms, Google’s Android and iOS from Apple take a majority share of the mobile market (Figure 1.4). Samsung and Nokia take the lead in terms of mobile devices shipped worldwide.

image
Figure 1.4 Global mobile market shares as of Q3 2012. The left figure shows the mobile market shares for each major player in terms of operating system. Google’s Android takes about 70% of the market share, while iOS from Apple takes around 13%. The right figure shows the mobile market shares in terms of mobile devices shipped to the end users. Samsung and Nokia take about 40% of the market share. Gartner, www.gartner.com.

Applications or Apps on Mobile Devices

With the abundance of these nonconventional mobile devices, software engineers and application developers grab the opportunities to develop lightweight applications, or Apps, for these platforms. Taking advantages of additional sensors (e.g., gyroscopic, pedometer, accelerometer, GPS receiver, or touch sensor), these mobile applications provide the device users with new user experience and functionalities not available from software or applications on conventional platforms. For example, these novel applications include mobile games that allow the users to play games with touch gestures, such as Angry Birds or Fruit Ninja, personal health assistant applications that monitor the activity level of the device owner through a pedometer, or applications that plans trips with a GPS receiver.

The mobile applications are typically sold through an app store, a platform set up by operating system makers such as Google or Apple. Customers can download and use these applications by paying a small amount (<$10) through the platform. They can also rate the applications. A major difference of these mobile applications from conventional PC applications is that the users or nonprofessional engineers write most of them. This makes the software development cycle very short for these mobile applications but, at the same time, the rating system on the platform makes it easy for the application writers to receive feedback. The idea of writing software that many people can use and the low-cost nature of mobile applications drive the market growth and bring many more developers into the mobile world.

The popularity of mobile applications brings large revenues to the operating system makers as well, especially for Apple. As of late 2012, both Apple and Google have reached 25 billion downloads through the App Store and Google Play. The available applications for each are roughly around 700 k.2 However, the average value of purchase from the App Store is much higher and brings Apple four times more revenues than what Google is making from its online store.3

Combining Computer Vision with Mobile Computing

How does computer vision relate to these mobile devices, and what computer vision applications can be developed? What resources and challenges could the application developers leverage or encounter? In the next few sections, we will broadly touch on these issues.

Difference with Conventional Computer Vision Applications

While almost all mobile devices come with vision cameras, developing computer vision applications on these devices is different from so doing on PCs in the following aspects:

1. The application perspective is different. On PCs, the computer vision software takes a “third person” point of view. When analyzing the video feed, it first identifies the major subjects, who are either the interests of, or the ones that will interact with, the application. For instance, in video surveillance applications, the computer vision algorithms might identify moving targets (e.g., cars or people) while in the gesture recognition applications, the users would primarily be located. On the other hand, most mobile applications take the “first person” point of view. Through the camera, it sees what the users are seeing and hence, the focus of these applications shifts from the users to the scene or objects surrounding them.

2. The cameras are mounted in different ways. For mobile devices, the cameras usually move with the devices. This creates different motion artifacts for the applications. For example, hand shaking can blur the captured images, so camera motion would make video stabilization or frame correspondence necessary for high-level applications.

3. The availability of computing resources is different. Mobile devices usually have inferior computing power and less memory capacity compared with conventional platforms, which would put efficiency as an important design choice for application developers.

Challenges and Opportunities with “Going Mobile”

Given the aforementioned properties of these mobile devices, the challenges are different for the development of computer vision applications on these platforms. For one, because the devices are typically handheld by the users, it is crucial for the applications to reduce the artifacts caused by hand shaking during the imaging process. When analyzing videos, stabilizing the video frames or estimating the motion of the device to establish interframe object correspondences is also the key to building robust mobile computer vision applications. These problems are rare in conventional settings where cameras are mounted on stationary platforms.

Another critical issue is the computing resource. Mobile devices are designed for power efficiency, and therefore, these devices may not have as much computational resources as PCs. Hence, it is desirable when developing mobile vision applications to design them efficiently so that applications will not consume much battery life. However, for computer vision, it might not always be that straightforward to create computationally efficient applications. Some applications, such as gesture recognition, usually incur larger computation burden in order to achieve a reasonable robustness. One solution is to make a proper trade-off between robustness and power efficiency. Another possible solution is to off-load the computation to a remote server so the mobile devices are responsible for transmitting the data and interfacing the users.

Potential Impacts of Mobile for Computer Vision

The emergence of mobile devices makes them an important part of daily life. People use them to take pictures or videos of what they see, and then share and communicate with friends. Due to the change of perspective, the mobile applications are more customized to the users. Hence, these devices impact computer vision applications by shifting the focus to be more human-centric. For instance, the QR code reader applications recognize code from the captured images and allow the users to scan the code on any objects they see and show the web page or other relevant information to the users. The augment-reality applications fuse the surrounding of the users into a virtual world in a seamless way so they can interact with friends in an unprecedented manner. The gesture-control applications provide the users a new experience interacting with the devices, e.g., flipping a page with hand gestures when your hands are dirty.

Summary

In this chapter, we started by explaining what computer vision is about, followed by example applications that computer vision might enable. We then moved to survey the current mobile market and provide numbers demonstrating the rapid growth of the mobile market. We also provided information regarding currently available mobile platforms. Finally, we provided motivations as to why developers would like to write computer vision software on mobile devices, and we discussed possible challenges and potential impacts. In the next chapter, we will touch on the technical details of developing mobile computer vision applications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset