Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 8
Closing Notes

Antonio M. López

ADAS Group, Computer Vision Center (CVC) and Computer Science Department, Universitat Autònoma de Barcelona (UAB), Barcelona, Spain

With more than 20 years of experience in computer vision so far and after going through the different chapters of this book, one feels astonished about the progress made in this field. The contribution of computer vision for vehicular technologies is beyond doubt, in land, sea, and air. The current level of maturity has been possible thanks to advances in different axes, namely, in a continuous improvement of cameras (cost, resolution, frame rate, size, weight), more powerful processing units ready for on-board embedding (CPUs, GPUs, FPGAs), more publicly available data sets and evaluation protocols (e.g., the KITTI Vision Benchmark Suite), and of course more and more consolidated computer vision and machine learning algorithms for discriminative feature extraction, matching, stereo and optical flow computation, object detection and tracking, localization and mapping, spatiotemporal reasoning, semantic segmentation, and so on.

Overall, we can find different vision-based commercial solutions for real-live problems related to vehicles (especially for driver assistance). However, despite this great progress, if we compare the state of the art of computer vision for vehicles with the visual capabilities of humans, one can safely say that still there is a large room for improvement. Just taking the driver task as an example, we can see that high-level reasoning must be much improved (i.e., the AI for real-time risk analysis and decision making), which may even involve ethical and legislative considerations (e.g., when a crash involving third parties may be unavoidable). In the other extreme, that is, low-level vision, even feature extraction must be improved for increasing robustness under adverse conditions and to cope with the fact that the world is a changing entity (e.g., if we navigate by relying on maps and vision-based localization, we must be robust to even severe seasonal changes of the world's visual appearance).

It is worth mentioning the recent astonishing performance of deep convolutional neural networks (DCNNs) in difficult visual tasks such as image classification (Krizhevsky et al. 2012), object recognition/localization/detection (Girshick et al. 2016; Sermanet et al. 2014), and semantic segmentation (Long et al. 2015; Noh et al. 2015). Note that the layers of a given DCNN architecture can even be used for tasks different from those assumed during the original end-to-end training of the DCNN (Hariharan et al. 2015). In fact, different DCNN architectures are already being explored for low-level tasks such as optical flow and disparity computation (Dosovitskiy et al. 2015; Mayer et al. 2015), and higher level ones such as place recognition (Chena et al. 2015; Sunderhauf et al. 2015). This opens a new avenue of methods based on deep learning that willboost computer vision performance in vehicle technology (e.g., see Huval et al. (2015) in the context of autonomous driving). In fact, major hardware-oriented companies are gambling on the use of computer vision and deep learning to develop vehicle perception, an example is the embedded GPU-based supercomputer known as NVIDIA Drive PX 2 (see (NVIDIA at CES 2016)). Moreover, we have to add the recent announcement at the GTC 2016 of the new supercomputer especially designed to train DCNNs, the so-called NVIDIA DGX-1. Deep models trained in DGX-1 will be directly transferable to Drive PX 2.

Despite the advances that we can foresee, thanks to all these new scientific and technological tools, still there are different challenges to address. For instance, computer vision methods tend to fail in adverse weather conditions and under poor illumination. Thus, improvements from cameras operating either the visual spectrum or the far infrared are necessary. Otherwise, complementing vision with other sensors will be mandatory not only in the short midterm but also in the long term. In fact, whenever it is possible taking into account system cost, complementing vision with other types of sensors (light detection and ranging (lidar), radar, etc.) is a good practice to ensure redundancy and, thus, overall system reliability. Accordingly, multimodal perception is and is going to be a very relevant topic too. An interesting type of sensor announced by the end of 2016 is the solid-state lidar, which promises a low-cost array of depth information that can complement the visual spectrum.

Another important consideration for training and testing of the algorithms under development, especially relevant for using deep learning techniques, is to increase drastically the size and annotation quality of the publicly available data sets. A recent and notable example is Cityscapes (Cordts et al. 2015), a large data set with manually collected, pixel-level annotations of driving scenarios. A different approach consists of using realistic computer graphics to generate training images with automatically generated ground-truth. For instance, this is the case of SYNTHIA (Ros et al. 2016), which contains pixel-level information of RGB, depth, and semantic class. In fact, object detectors trained on virtual worlds and domain adapted to real scenarios already have demonstrated their usefulness in the context of driving assistance (Marín et al. 2010; Vazquez et al. 2014; Xu et al. 2014a,b, 2016). In addition, incremental self-learning methods able to use not only annotated data but also nonannotated data could be a key to improve the robustness of vision-based perception.

Finally, a core problem to address is overall vehicle validation. If we take an autonomous vehicle as an example, we can easily imagine the many situations it must be able to handle. How to measure the reliability of such a system for its massive production is an open question. In fact, the use of realistic virtual environments that allow use-case-driven testing can be a line to pursue for drastically reducing validation cost. An example is the Virtual KITTI (Gaidon et al. 2016) designed for testing vision-based vehicle detection algorithms underadverse weather conditions. Of course, once these systems are comprehensively evaluated in such simulated environments, still it is necessary to validate them in real-world scenarios both in dedicated infrastructures and regular conditions.

In summary, the academic and industry communities working on computer vision for vehicles are facing a really exciting revolution which, undoubtedly, will bring enormous social benefits and unbelievable technological advances. For instance, autonomous vehicles and UAVs are among the ten technologies which could change our lives according to the European parliament (Van Woensel et al. 2015).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8: Closing Notes

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 8: Closing Notes