Chapter 5
Vision-Based Advanced Driver Assistance Systems

David Gerónimo1, David Vázquez1 and Arturo de la Escalera2

1ADAS Group, Computer Vision Center, Universitat Autònoma de Barcelona, Barcelona, Spain

2Laboratorio de Sistemas Inteligentes, Universidad Carlos III de Madrid, Madrid, Spain

5.1 Introduction

Since the early ages of the automotive industry, motor companies have continuously pursued new technologies to improve the passengers' safety. First, relatively simple mechanical artifacts such as the turn signals or seat belts have led to more complex ones such as airbags or pop-up hoods. Then, with the development of electronics, new technologies such as the electronic stability control (ESC) or the advanced brake warning have provided more protection using signal processing. In the last decade, the development of computation has led to a new kind of protection systems: the advanced driver assistance systems (ADAS). The ADAS are intelligent systems that help the driver in the driving action by providing warnings, assist to take decisions, and take automatic actions to protect the vehicle passengers and other road users. The main difference between these systems and the former technologies is that while seat belts, airbags, or ESC deploy their functionality after the accident has happened or in the best cases while the dangerous situation is taking place, ADAS are aimed at predicting and avoiding the accident itself.

Such a new technology requires new sensors, different from the traditional mechanical devices, wheel speed sensor, or accelerometers. These sensors are cameras, radar, lidar (light detection and ranging), and so on. In this chapter, we put the focus on ADAS making use of cameras (Figure 5.1), that is, exploiting computer vision techniques.

We divide the different ADAS into three types, depending on the direction of the camera: forward assistance, lateral assistance, and inside assistance.

nfgz001

Figure 5.1 Typical coverage of cameras. For the sake of clarity of the illustrations, the actual cone-shaped volumes that the sensors see are shown as triangles

5.2 Forward Assistance

5.2.1 Adaptive Cruise Control (ACC) and Forward Collision Avoidance (FCA)

Rear-end crash is one of the most frequent types of traffic accident. In the United States, during 2002 light vehicles (passenger cars, vans, minivans, sport utility vehicles, and light trucks) were involved in 1.8 million rear-end crashes. This is 29% of all light-vehicle crashes and with a consequence of 850,000 injured people (see Najm et al. (2006)). In addition to its frequency, another characteristic is the absence of driver reaction before the accident. A report from 1999 showed that in more than 68% of rear-end collisions, the driver made no avoidance maneuver (braking or steering) to avoid it.

Two systems are related with this type of accident: adaptive cruise control (ACC) and forward collision avoidance (FCA). Figure 5.2 shows the forward-facing sensors employed in these systems. ACC keeps the vehicle at a fixed speed until there is a slower vehicle in front of it. At that moment, it keeps a safe distance between the driver's car and vehicles ahead. The driver can adjust the distance, and the system makes sure it is maintained using throttle and brake control. When the slower vehicle disappears, the system resumes the fixed speed. ACCs are designed for highways. FCA provides warning of an impending accident, mainly at low speeds, alerting the driver and if it does not react, some of them can even brake the vehicle at a full stop.

nfgz002

Figure 5.2 Forward assistance

Nearly all the ACC or FCA commercial systems are based on radar or laser sensors. Nevertheless, as lane departure warning (LDW) sensors are increasingly appearing on vehicles, it is convenient and cheaper to use the same sensor for both tasks. Lexus was a pioneer manufacturer that added a stereo camera to its Advanced Pre-Collision System in 2006. In 2007, Volvo's collision warning with auto brake (CWAB) developed in cooperation with Mobileye. Mobileye has developed a FCA system based on computer vision (see Raphael et al. (2011)). Subaru developed a stereo system on the Legacy called EyeSight in 2008. Recent systems performed sensor fusion, as in the Audi A8 in 2010, between a video and radar sensor. In 2012, General Motors provide a system where the same camera warns the driver when there is a vehicle ahead or there is a lane departure. Using four different exposure settings, satisfactory images across a wide range of lighting and weather conditions are obtained. The first step in the image analysis algorithm is obtaining regions of interest (ROI), where possible vehicles can be located. ROI are found where rectangles having vehicle-like characteristics are present in the image. During the night, pairs of light sources are looked for and classified as tail lights of a lead vehicle. The ROI are classified as vehicles through their appearance. Later on, they are tracked over time along with metrics (classification scores, consistency across time, etc.) that measure the likelihood that the tracked candidate is indeed a real vehicle. Knowing also the vehicle status (speed, pitch changes), a hidden Markov model filter generates and updates the estimated real-world position and dynamics of each target vehicle and determines whether they are stationary, moving, or oncoming. The last step is to determine which vehicles are within the trajectory of the ego-vehicle. The ego-vehicle path is calculated using steering angle, yaw, and speed, and also a vision sensor is used to detect the road lane markings.

5.2.2 Traffic Sign Recognition (TSR)

The speed of the vehicle plays a crucial role in many traffic accidents, and also the severity of injuries is proportional to the vehicle speed. Therefore, several measures have been taken in order to impose a low speed when necessary. Some of them are roundabouts and speed humps. Apart from that, on-board sensors can obtain the vehicle speed and check against GPS information or traffic signs detected using computer vision techniques. Although GPS navigators include speed limit information, sometimes it is not updated, there are highways where maximum speed depends on the time of day and road works may change the speed limit. Therefore, detecting traffic signs is always useful for ADAS. Figure 5.3 shows a sketch of a typical scenario where TSR is useful.

Besides being part of one ADAS, traffic sign recognition is useful for road maintenance and inventory (see Pelaez et al. (2012)). Nowadays, a humanoperator has to watch a recorded video sequence and check frame by frame the presence of the traffic signs. An automatic system has the advantage of releasing humans from this tedious task, it has no real-time constraint, but it has to deal with the whole set of traffic signs and sometimes in harder environments than the ADAS counterpart.

nfgz003

Figure 5.3 Traffic sign recognition

As this ADAS has to deal with object recognition in outdoor environments, there are many difficulties involved in the process due to the changes in lighting conditions, shadows, and the presence of other objects that can cause partial occlusions. Particular problems associated with traffic sign recognition are the color fading of the traffic signs depending on their age, perspective distortions, the different pictograms among countries, and the huge number of different sign pictograms. Many proposed algorithms and all the commercial systems recognized only a limited number of types and signs.

Usually, traffic sign recognition algorithms have two steps: detection and classification. The detection can help the classification step limiting the number of classes due to the shape and/or color information used for the detection.

For the detection, two approaches are possible depending on whether the color information is used or not. Traffic sign colors and shapes have been chosen to be easily distinguished from their background by drivers and, therefore, can be segmented in the images by their color and detected by their shape but, in practice, their colors depend a lot on the lighting conditions, the age and state of the signs, and, primarily in urban environments, there are many objects with similar colors. For these reasons, some authors preferred to analyze black and white images and used the shape or the appearance of the signs. Several color spaces have been used: RGB in Timofte et al. (2009), HSI in Liu et al. (2002), HSV in Ruta et al. (2010), and LCH in Gao et al. (2006). If no color information is used, the shape can be detected using the Hough transform as in Garcia-Garrido et al. (2011) or radial symmetry as in Barnes et al. (2008). The appearance can be described using Haar features (see Moutarde et al. (2007)) or HOG (see Xie et al. (2009)).

For the final step, the recognition of the detected signs, most of the methods are based on template matching using cross-correlation (see Piccioli et al. (1996)) or neural networks. Although there are many different approaches using neural networks, the most used are the radial basis functions (see Lim et al. (2009)) and the multilayer perceptron (see Broggi et al. (2007)), where a set of artificial samples is constructed to train the neural network.

One of the first image data sets was the German Traffic Sign Recognition Benchmark (see Stallkamp et al. (2011)) created for a competition held at the International Joint Conference on Neural Networks 2011. A detailed state of the art review with many references can be found in Mogelmose et al. (2012).

In 2008 through a cooperation between Mobileye and Continental AG, a commercial system was created, appearing on the BMW 7-series and in the Mercedes-Benz S-Class (2009), although only a restricted number of traffic signs were detected: round speed limit signs. Other systems increased the number of traffic signs detected, for example, overtaking restrictions in the Opel and Saab vehicles in 2009. Other automakers have recently started to offer this ADAS such as Volkswagen in 2011 and Volvo in 2012.

5.2.3 Traffic Jam Assist (TJA)

Driving in traffic jams is boring and monotonous. In these situations, drivers suffer stress due to the constant acceleration and braking. Being stuck in a traffic jam produces frustration and distraction to the drivers that can cause fender-bender crashes. In addition, traffic jams represent a waste of time for the drivers.

Traffic jam assistant is a device that controls the vehicle speed, steering, and the distance to the car ahead in heavy traffic at relative low speeds. It takes over the vehicle control in these monotonous traffic situations. This system simply guides along the vehicle with the other cars in dense traffic situations, making traffic jams less frustrating. However, some systems still require the driver to keep his/her hands on the wheel.

This technology provides a combination of the individual's own comfort with automated travel and helps to drive more comfortably and safe in heavy-driving situations. It could potentially cut down on accidents by reducing the fender-bender-type crashes in heavy traffic. The TJA system also offers moderate energy savings over regular driving, and takes up less space on the road than the same number of vehicles would occupy if each were driving independently. So it could reduce the pollution and alleviate traffic jams. However, it is as much of a luxury feature as it is a safety one. It frees the driver to take the control of the vehicle in this situation and allows them to find some other ways to make use of the time.

The traffic jam assistance system is a natural next step from advanced cruise control technology. It is similar to cruise control, except it is specifically designed to work in heavy traffic instead of on an open road. It is limited to use at slow speeds having systems that allow even 60 km/h. It combines automatic cruise control with lane assist and automatic braking technology to help your car glide smoothly along in the most annoying road conditions. The ACC system uses a combination of cameras and radar to maintain a safe, set distance behind the car in front, and Lane Keeping Aid uses a network of cameras and sensors to keep the car centered within its lane. So, this automotive innovation makes the use of available vehicle cameras and sensors to help all traffic flow more smoothly.

The car will monitor the vehicle in front of you, and pace it to automatically maintain a steady following distance. It will also steer to stay within the lane. If the car in front swerves to avoid an obstacle, your car can mimic the same swerve path by following the tire treads. Overall, the TJA system actuates over the engine, steering, and brakes. So by activating the system, the driver entrusts the car to make the most important judgment calls of heavy-traffic driving: steering, deciding when to accelerate and decelerate, and determining how much of a distance cushion to maintain around other vehicles and obstacles.

It has been available in Volvo's cars since 2014. Similar systems are in development by Audi, VW, Cadillac, Mercedes, and Ford. So far, it seems like most will operate in a comparable way, as they are also designed to allow completely hands-free operation in these low-speed scenarios. Owners of Mercedes equipped with TJA systems have fewer options to pass the time, though—these drivers have to maintain contact with the steering wheel for the system to work. If a driver pulls his or her hands off the wheel, the system will not engage. The Ford version of the system, which is similar to Volvo's, has a different plan in place. It uses audio warnings to alert the driver to take back control if the car determines that there is too much nearby activity, such as frequent changes in adjacent lanes, lots of obstacles, or erratic and therefore unpredictable speeds of travel.

5.2.4 Vulnerable Road User Protection

During most of the history of the vehicles, safety technologies developed by the motor companies have been focused on its occupants, while other road users such as pedestrians or cyclists have not received the same attention. However, statistics show that the number of accidents involving these actors is not to be neglected. For example, 150,000 pedestrians are injured and 7000 are killed every year in the European Union (see UN – ECE (2005)); and 70,000 injured and 4000 killed in the United States (see NHTSA (2007)). Even though these numbers are progressively decreasing through the years in developed regions, thanks to the new safety measures and awareness, emerging countries such as China or India are likely to be greatly increasing these numbers as a result of the already high accidents over vehicles ratio and the increasing number of vehicles (see Gerónimo and López (2014)).

The first papers addressing pedestrian protection using computer vision were presented in the late 1990s by Poggio, Gavrila, and Broggi research groups in MIT, University of Amsterdam, and Parma University, respectively. The first approaches made use of the knowledge in object classification and worked in very constrained scenarios (e.g., fully seen well-illuminated pedestrians in flat roads). Papageorgiou and Poggio (2000) introduced the use of Haar wavelets in pedestrian classification and the first pedestrian data set. Gavrila and colleagues introduced a template-based hierarchical classification algorithm known as the Chamfer System (see Gavrila (2001)). Bertozzi et al. (2003) and Broggi et al. (2000) proposed different approaches to specific problems of on-board pedestrian detection, for example, camera stabilization, symmetry-based models, and tracking.

The human image class is one of the most complex in computer vision: nonrigid shape, dynamic, and very heterogeneous in size and clothing, apart from the typical challenges that any forward-looking ADAS application must tackle, for example, illumination changes, moving targets. Since 2000, the number of papers addressing pedestrian detection has grown exponentially. The emergence of computer vision as a hot research field has helped to improve the systems' robustness and pursue new challenges, that is, going from holistic classification (detect the silhouette as a whole) in simple scenarios to multipart classification in cluttered streets. Thestandard pedestrian detector can be divided into three steps (see Figure 5.4):

  • Candidate generation selects image windows likely to contain a pedestrian. The selection is made using both prior constraints such as the position of the camera and the expected size and aspect ratio of pedestrians; and online detection of free space, obstacles, and the road profile.
  • Pedestrian classification labels the selected windows as pedestrian or nonpedestrian (e.g., background, vehicles).
  • Pedestrian tracking adds temporal coherence to the frame-based pedestrian detections, linking each pedestrian window with a specific identity that has a correspondence among the video frames. The last trend is to predict the pedestrian behavior.
nfgz004

Figure 5.4 The main steps of pedestrian detection together with the main processes carried out in each module

Some of the major milestones in candidate generation are the flat-world assumption (see Gavrila et al. (2004)), which takes advantage of the knowledge of the camera position with respect to the road in order to avoid scanning the whole image. As its name says, the assumption is that the road is flat, which only holds in certain streets and roads. The next step was to extend this to sloppy roads (see Labayrade et al. (2002) and Sappa et al. (2008)) making use of stereo cameras. One of the latest and most relevant innovations in this is the stixels model (see Badino et al. (2007) and Benenson et al. (2012)) presented by researchers at Daimler AG. It consists of a 3D occupancy grid that represents objects as vertical bars placed on the road where its pixels have a coherent disparity.

Candidate classification is the most researched component, given its connection with general visual object detection. The first approaches used a holistic model composed of local features such as Haar filters (see Papageorgiou and Poggio (2000) and Viola and Jones (2001a)) or HOG (see Dalal and Triggs (2005))—there are many more such as edge orientation histograms, local binary patterns, shapelets, and so on,—and a learning machine such as SVM, AdaBoost, or Random Forest, which label each selected window in the candidate generation step. Then, these classifiers evolved to parts-based ones, in which individual parts were also part of the model in a fixed (see Mohan et al. (2001)) or flexible manner (see Felzenszwalb et al. (2008) and Marin et al. (2013)). It is worth highlighting other approaches such as the aforementioned Chamfer System (see Gavrila (2001)) or the ISM (Implicit Shape Model; see Leibe et al. (2008)), the latter approach omitting the window generation stage or data-driven features that avoid the handcrafted ones (e.g. HOG, Haar) by the use of neural networks (see Enzweiler and Gavrila (2009)). The latest works aim at detecting occluded pedestrians (see Tang et al. (2009)), fusing and combining different features (see Dollár et al. (2009) and Rao et al. (2011)), using nonvisible spectrum cues (e.g., IR; see Krotosky and Trivedi (2007) and Socarras et al. (2013)), and using online adaptation to new scenarios (see Vazquez et al. (2014) and Xu et al. (2014a)), that is, improve the classifier as the vehicle captures new pedestrians.

Tracking approaches have traditionally used the well-known Kalman filtering (see Franke and Joos (2000) and Grubb et al. (2004)) or particle filtering (see Arndt et al. (2007)) techniques, although other approaches have also been proposed. Examples are the event cone by Leibe et al. (2007), tracking-by-detection by Adnriluka et al. (2008), the contour-based tracking by Mitzel et al. (2010), or the long-term 3D-tracklets model in Wojek et al. (2014). As previously mentioned, state-of-the-art works try to extend the tracking information with higher-level information such as action classification or face orientation in order to predict the pedestrian's behavior (see Keller and Gavrila (2014)). This information is crucial in order to avoid false alarms, for example, even if the predicted pedestrian's path collides with the vehicle path, an alarm may not need to be triggered if his/her face points to the vehicle, that is, the pedestrian is aware of the vehicle and is likely to stop in time.

Once the pedestrian or cyclist has been detected, this information is sent to a high-level component that triggers the alerts or actions to the vehicle. There exist many example applications, especially in the past years, where vulnerable road users' protection is being commercialized. One of the most illustrative systems is that presented in the SAVE-U European project (see Marchal et al. (2005)). It divides the application in three phases. The first one is called early detection, in which the pedestrians are just tracked and detected but no protection measure is activated. The second phase occurs when a pedestrian is estimated to enter the vehicle trajectory, but there is no predicted collision. In this case, an acoustic warning is triggered. Finally, the third phase is activated when a high risk of collision is identified. In this case, the brakes automatically activate in order to avoid the collision. In recent years, some interesting studies in evasive actions, that is, steering around the pedestrian when full braking is not enough to avoid the accident, have been being researched by Daimler AG (see Dang et al. (2004)).

One of the first vehicles to commercialize a pedestrian protection system was the Volvo S60, which used a radar and a visible-spectrum camera engineered by the ADAS company Mobileye. From 2013, a class of Mercedes-Benz incorporates the result of the research developed in Daimler AG, able to detect obstacles up to 200 m using a radar and pedestrians up to 35 m using a stereo camera. It incorporates emergency brake assist (EBA), which stops the car at speeds up to 70 km/h and emergency steer assist (ESA), which maneuvers the car around the obstacle if there is insufficient time to stop. Other motor companies such as BMW, Ford, and Toyota are planning to present similar systems in their top cars in the following years. The latest research directions are focused on intention estimation in pedestrians and cyclists.

The reader can refer to review papers such as by Dollár et al. (2012), Enzweiler and Gavrila (2009) and Gerónimo et al. (2010), and to books such as Gerónimo and López (2014) for detailed information.

5.2.5 Intelligent Headlamp Control

Driving in nighttime has its specific difficulties when compared to daytime, obviously as a result of poor lighting. Headlamps were introduced with the invention of the car around 1890, and very soon incorporated low- and high-beam capability. The two, sometimes three, position headlamps allow the driver to illuminate long distances taking advantage of the absence of approaching cars from the front and lowering them when a car approaches and there is danger of dazzling other drivers. Often, drivers prefer to use low beams and only switch to high beams when it is absolutelyneeded, in order to not blind others. This behavior is reflected in statistics, which state that high beams are used less than c05-math-001 of the time in which they should be used (see Mefford et al. (2006)). However, incorrectly setting high beams is a well-known source of accidents, and low beams also are a source of accidents due to the limited visibility and hence slow reaction time.

nfgz005

Figure 5.5 Different approaches in Intelligent Headlamp Control (Lopez et al. (2008a)). On the top, traditional low beams that reach low distances. In the middle, the beams are dynamically adjusted to avoid glaring the oncoming vehicle. On the bottom, the beams are optimized to maximize visibility while avoiding glaring by the use of LED arrays

Intelligent Headlamp Control provides an assisted control of the lights aimed at automatically optimizing their use without inconveniencing other drivers. The typical approach is to analyze the image searching for vehicles. Oncoming vehicles are distinguished by white lights, while preceding vehicles are distinguished by the rear lights, which are red. As can be seen, it is important not to dazzle oncoming vehicles but also preceding ones, as they could be dazzled via their rear mirror. In Schaudel and Falb (2007), both preceding and oncoming vehicles and also the presence of sufficient illumination are detected. If none of these conditions apply, the high beams are turned on.

A more sophisticated approach was introduced by the A. López et al. team in 2007 (Figure 5.5). The vehicles were detected by using blob, gray-level statistics, and red-channel statistics as features and AdaBoost as a classifier (see Lopez et al. (2008a 2008b) and Rubio et al. (2012)). These systems typically use a continuous range of beam, instead of just switching from high to low. The light cone is adjusted dynamically as the oncoming vehicles approach and moved further as far as 300 m when no vehicles are detected. This is feasible by the use of LED lights that illuminate or not according to the content of the position they are pointing at. The state-of-the-art research is focused on selectively illuminating all the road, shading only the spots where there is a vehicle.

Daimler AG and Volkswagen AG, among other companies, have incorporated a similar technology in their high-end vehicles since 2009.

nfgz006

Figure 5.6 Enhanced night vision. Thanks to infrared sensors the system is capable of distinguishing hot objects (e.g., car engines, pedestrians) from the cold road or surrounding natural environment

5.2.6 Enhanced Night Vision (Dynamic Light Spot)

Most of the applications presented in both ADAS and autonomous driving require specific solutions to work in nighttime (Figure 5.6). For example, while reflecting objects such as traffic signs are specially engineered to be visible in nighttime, pedestrians or nonreflective obstacles are barely distinguishable with regular headlamps, and often they are too close to the vehicle to avoid them in the case of danger.

Enhanced night vision systems take advantage of the fact that for most of the objects of interest (i.e., pedestrians, cyclists, vehicles) their fingerprint temperature is different from the background, which can be captured by analyzing the infrared spectrum of the scene. The infrared spectrum ranges from 0.8 to 1000 c05-math-002, dividing itself into near infrared (NIR) from 0.8 to 2.5 c05-math-003, mid–infrared (MIR) from 2.5 to 25 c05-math-004, and far infrared (FIR) from 25 to 1000 c05-math-005. Regular visible spectrum cameras often also capture NIR together with the visible spectrum range, which goes from 0.4 to 0.75 c05-math-006, making them a lower cost version of this technology. In fact, night vision technology can be divided into active and passive approaches, depending on the range that is captured. In the active systems, the camera emits a pulse of NIR, which is captured by the sensor (see Andreone et al. (2005)). Passive systems capture FIR (also referred to as thermal infrared) directly emitted by hot objects (see Olmeda et al. (2011) and Socarras et al. (2013)). These latter cameras tend to be more expensive and higher in dimensions, but capture longer distances and work well under cold weather.

In any case, nighttime visibility always depends on the available illumination on the road. For instance, an urban scene full of lampposts probably can make these systems irrelevant while a poorly illuminated highway or a countryside road will show their whole potential. Using infrared can also be challenging because the scene can be full of hot objects (hot lamps illuminated during the day, vehicles, etc.). Two interesting examples of the use of this technology are Ge et al. (2009) and Zhou et al. (2012). In the former, they use a two-stage pedestrian classifier consisting of Haar-like HOG features extracted from infrared imagery. In the latter, the authors extract HOG features also from thermal imagery to detect deer in the context of traffic safety.

One of the first companies incorporating such technology was General Motors in 2000–2004 with a head-up display that showed the raw infrared image acquired by the camera. This is the simplest setup: the driver has to switch often from the front-shield view to the head-up in order to analyze the potential hazards. In 2002, Toyota developed a similar head-up display but based on an active IR system. In 2004, Honda performed pedestrian detection and highlighted (framed) them in the head-up display. This system had an audio warning that triggered when some danger existed to the pedestrian, alleviating the driver from checking the display when there was no danger. Toyota has incorporated such a pedestrian detection system since 2008.

Other motor companies such as BMW, Mercedes-Benz, and Audi also featured these systems in their top models in the early years of development and added pedestrian detection several years later (around 2009). Currently, they are including animal detection and visual/audio alerts, exploiting either active or passive systems, depending on the company.

5.2.7 Intelligent Active Suspension

Suspension was incorporated in automobile design right after invention of the car. It provides not only comfort to the passengers but also improves handling and braking safety, given that it holds the wheels on the road by overcoming road irregularities or vehicle dynamics (e.g., braking or cornering). Suspension, as a system that absorbs bumps, has been used not only in vehicles of the modern industrialized era but also on old carts and carriages. The systems slowly evolved from leaf strings to shock absorbers, but it was not until the 1980s that active suspension was developed, thanks to the use of electronic control. Active suspension incorporates sensors, which analyze the road surface, and actuators, which control the vertical movement of each wheel, even raising and lowering the chassis, depending on the system. These systems provide a smoother driving experience than previous suspension systems that just absorbed the bumps mechanically.

The current developments in this field are pioneered by the use of vision in order to model the road surface in advance. Even though current active suspension systems incorporate different sensors that handle each vehicle movement (i.e., lateral movement, bumps, braking), vision has the capability of anticipating the needed suspension actions. The idea is that instead of rapidly reacting to the current road profile, the system scans and models the road surface by the use of a camera and raises or lowers each wheel accordingly. For instance, on a typical bumpy road, a regular active suspension system reacts instantaneously to the current surface just below the wheels. An intelligent active suspension system predicts the wheels and chassis movements in the road ahead, providing a much smoother suspension than the former. Singh and Agrawal (2011) made use of camera and lidar sensors in order to get a 3D contour of the road, providing cues to a servo drive managed by a neural network. Interestingly, the shock absorber oil is replaced by a magnetized fluid that allows the system to extend and retract the suspension cylinder.

Mercedes-Benz has recently presented a pioneering system, named Magic Body Control (Figure 5.7), which scans the road surface with a stereo camera mounted in the windscreen. It scans the road up to 15 m ahead with a vertical precision of 3 mm.

nfgz007

Figure 5.7 Intelligent active suspension.

Image courtesy of Daimler AG

nfgz008

Figure 5.8 Lane Departure Warning (LDW) and Lane Keeping System (LKS)

5.3 Lateral Assistance

5.3.1 Lane Departure Warning (LDW) and Lane Keeping System (LKS)

Lane detection has been studied for around 30 years (see Dickmanns (2007), Crisman and Thorpe (1993), and Pomerleau (1995)). As a high percentage of traffic accidents are related with unintended lane departures, the first commercially available ADAS were related for this task. First, they were implemented on trucks and, later on, on cars. Lane departures mostly happen on long trips on highways where the monotony of driving produce lapses of attention or drowsiness, leading to lane departures and collision with another vehicle or a stationary obstacle, or a rollover accident. This is an advantage for perception systems as the highway is a very structured environment. One example is shown in Figure 5.8, where a computer vision algorithm has detected both lane boundaries. From this information, as it will be described later, some warning to the driver or action on the steering wheel can be done.

Most of the algorithms have the same steps in common: after extracting some features from the images, they are fitted into a lane/road model and time integration is used in order to reject errors and refine the results. As human drivers detect the road depending on color or texture and the presence of roadboundaries and lane markings, these are the features detected on images. The lane boundaries are detected using the gradient steerable filters (see McCall and Trivedi (2006)), the top-hat shape of the marks (see Wu et al. (2008)), and the Euclidean distance transform (see Danescu and Nedevschi (2009)). Appearance is used for road detection. Color is used in Alvarez et al. (2007), Alon et al. (2006), and Nefian and Bradski (2006); Gabor filters are used to describe the road texture. Different models have been proposed to describe the lanes or the road. Straight lines are the most common in highway scenarios or when only the part of the road close to the vehicle is needed (see Alon et al. (2006) and Xu and Shin (2013)). More complex shapes have been described using arcs (see Zhou et al. (2010), McCall and Trivedi (2006), Linarth and Angelopoulou (2011), and Meuter et al. (2009)), polylines (see Romdhane et al. (2011)), clothoids (see Dickmanns (2007) and Danescu and Nedevschi (2011)), and splines (see Wang et al. (2003)). Temporal integration is based on Kalman (see McCall and Trivedi (2006) and Loose and Franke (2009)) or particle filtering (see Danescu and Nedevschi (2009) and Liu et al. (2011)).

Although cameras are the most frequent type of sensor used for road and lane detection, light detection and ranging (lidar) has also been used (Huang et al. 2009). The advantages are as follows: they provide very accurate 3D information of the scene in front of the vehicle so curbs can be detected; some give information about the amount of intensity reflected by the surfaces so lane marks or different types of terrains can be recognized. Nevertheless, their price and size make them unsuitable for commercial application at the moment. There are some states-of-the-art accounts about road and lane detection, such as McCall and Trivedi (2006), Hillel et al. (2014), and Shin et al. (2014).

There are two main types of ADAS that perceive the lanes of the highways: LDW and LKS. The main difference is the output action of the system, whereby LDW only warns the driver when an unintended departure is going to happen whereas LKS warns and helps the driver to keep the vehicle on track by controlling the steering wheel.

LDW can take into account several types of information: the lateral distance to the lane boundary or distance to line crossing (DLC), where no information about the vehicles or shape of the line is taken into account, or the actual time for the departure of the vehicle or time to line crossing (TLC), where the vehicle's speed and orientation with respect to the lane has to be known. And additional source of information is the turn signal state in order not to warn the driver when changing lanes.

One of the first systems was developed by Iteris for Mercedes Actros trucks in 2000, and later on the Japanese manufacturers Nissan in 2001 and Toyota in 2002 used this technology in sedan vehicles. Although Citroën, in 2004, mounted several infrared sensors under the vehicle for lane detection, computer vision is the preferredtechnology for LDW and LKS. A camera mounted behind the windshield gives more information of the geometry of the road, and the driver can be warned with more anticipation. Nowadays, LDW is the most widely available ADAS.

LKS needs the information previously explained and in addition they actuate on the heading, producing and assisting torque on the steering wheel, although the driver remains in charge of the control of the vehicle. In some systems, additional action is on the braking system of the vehicle. The geometry of the road, as a clothoid or polynomial approximation, has to be computed in order to obtain its curvature. Again, this ADAS was firstly introduced in Japan by Honda in 2003 and Toyota in 2004 and later on in Europe by Volkswagen in 2008.

5.3.2 Lane Change Assistance (LCA)

LCA is a collection of technologies taking care of blind spots and rear-view problems. It uses sensors to detect objects and vehicles that usually cannot be seen by the driver because of obstructed view. In addition, approaching vehicles from behind can be detected in time, and the driver can be informed of this.

Nearly all car manufacturers offer an ADAS that monitors the blind spots of the vehicle. Through the change in the color of one optical display, the system indicates to the driver if an obstacle is present or not. The kind of accident this ADAS tries to avoid is collision with another vehicle being driven at the blind spot when the driver changes lane on a highway, or collision with another vehicle not in the blind spot but whose speed is faster than the driver estimates. Therefore, this ADAS supports drivers when changing lanes because either they have not used the exterior and rear-view mirrors properly or they have incorrectly estimated the speed of approaching overtaking vehicles.

Two sensors can be used: radar and vision based. As radar sensor does not have information of the location of the lane, depending on the radius of the road they can give false warnings misplacing a vehicle as being in an adjacent lane when it is in the same lane or a following vehicle that is two lanes away and not in the next lane. Due to the sensor placement, a mounted trailer might interfere with the radar. Nevertheless, radar is more reliable in adverse weather conditions than vision and most manufacturers use 24 GHz radars mounted behind the rear bumper and cameras integrated in the exterior side-view mirrors.

Some Peugeot and Citroën models have used the system developed by FICOSA (see Sartori et al. (2005)) since 2002. It analyses the presence of objects in the blind spot providing a qualitative idea of the speed and position of them relative to the car. The system detects vehicles and computes optical flow. The movement is obtained through a phase difference approach and the vehicles are detected as rectangular structures in the image of edges. Fusing both types of information, approaching and receding objects can be differentiated and the distance and relative speed of the vehicles at different levels of risks can be determined. Volvo installed a system in 2005 based on computer vision but in 2007 started to offer a radar-basedsystem.

5.3.3 Parking Assistance

The Parking Assistance system looks like LCA, but it is meant for low speed and short distance, for example, when parking a car (Figure 5.9). By using sensors, a car can measure available space and show this information to the driver. Current systems have limited use because of the low range these sensors operate with. Future developments will let the system take over control of the car during parking, thus letting the car park itself.

nfgz009

Figure 5.9 Parking Assistance. Sensors' coverages are shown as 2D shapes to improve visualization

All car manufacturers offer some kind of Parking Assist system, varying from offering only visual and audio information to the driver to automatic parking free place detection or semiautomatic self-parking. The sensor in charge of this belongs to a kind of system that is based on the ring of ultrasound sensors around the vehicle. Vision-based systems are very simple. They offer the image provided by a camera placed in the rear of the vehicles. The most complex ones perform a calibration of the intrinsic and extrinsic parameters of the camera, and therefore they can show images without geometrical aberration and draw the obstacle detected by the ultrasound in the image. Knowing the pose of the camera and the dynamic constraints of the vehicle, the system can also draw the intended path of the vehicle in order to park and the existing one.

5.4 Inside Assistance

5.4.1 Driver Monitoring and Drowsiness Detection

Most traffic accidents have a human cause. Some studies establish that more than 80% of road accidents are due to human error (see Treat et al. (1979)). Among several factors, inattention can be established as the most important one (see Beirness et al. (2002)). For example, the NHTSA estimates that approximately 25% of police-reported crashes involve some form of driving inattention, including fatigue and distraction (see Ranney et al. (2001)). In Europe, driver inattention is the cause of 34,500 deaths and 1.5 million injured, with associated costs representing 2% of the EU GDP. Data gathered for several decades have shown that inattention, which includes drowsiness and distractions, is behind 80% of crashes (see Juliussen and Robinson (2010) and Mahieu (2009)).

Therefore, driver drowsiness and distraction have received a lot of attention from the scientific community in recent years in order to alert the driver before a dangerous situation happens. Most of the research can be classified within one of the following three groups. The first two groups pay attention to the driver and the last group to the vehicle itself:

  • Driver physiological information: one or several biological signals of the driver are collected such as electrocardiogram (ECG), electromyogram (EMG), electro-oculogram (EoG), and electroencephalogram (EEG) in order to detect driver drowsiness.
  • Driver appearance: The driver is monitored through one or several cameras and depending on several facial expressions such as yawning, eye closure, eye blinking, head pose, and so on, drowsiness and distraction are detected. Toyota developed its Driver Monitoring System in 2006. The system, when a dangerous situation is detected, checks if the driver is looking in front of the vehicle or not and warns it in that case. Several OEMs have developed systems that can be installed on board the vehicle such as LumeWay, Seeing Machines, or Smart Eye, which use computer vision to monitor drivers.
  • Vehicle information: The vehicle is monitored instead of the driver, including deviations from lateral lane position, time-to-line crossing, movement of the steering wheel, pressure on the acceleration pedal, and changes in a normal behavior indicating driver drowsiness or distraction. This last approach has been used by Volvo for its Driver Alert Control since 2007 and by Daimler since 2009 for its Attention Assist.

All the approaches have some advantages and shortcomings. On the one hand, biological signals are direct measurements of the driver state but are intrusive and nonpractical. On the other hand, driver observation is not intrusive and drowsiness can also be detected through computer vision. However, illumination changes and the diversity in driver appearance is a challenge, making this approach not suitable for every time of the day. Vehicle information is easier to obtain but the correlation with the state of the driver is complex, and in some important environments, such as driving within cities, the absence of lanes avoids the calculation of two of their most important parameters: lateral lane position and time-to-line crossing.

nfgz010

Figure 5.10 Drowsiness detection based on PERCLOS and an NIR camera

EEG has been used for drowsiness detection for more than four decades (see Volow and Erwin (1973)). Although its results are considered valid (Lal and Craig (2002) detected fatigue with an error rate of approximately 10%, and Golz et al. (2010) obtained microsleep detection errors of 10%), the excessive intrusiveness, where the driver has to wear electrodes connected by wires to a computer, makes it physiologically unsuitable for real applications.

PERCLOS (PERcentage of eye CLOSure), the proportion of time interval that the eyes are 80–100% closed, was described by Skipper and Wierwille (1986) as an index of driver drowsiness and received a lot of promotion, such as in 1998 when it was supported by the Federal Highway Administration (see Dinges and Grace (1998)) as an accepted standard for alertness measures. Using NIR illumination (Figure 5.10) has the advantages of the pupil effect, similar to the red-eye effect of flash cameras, and the capability of obtaining an independent image with respect to external illumination (see Ji and Yang (2002) and Bergasa et al. (2004)). However, if the driver wears sunglasses, the PERCLOS cannot be computed and the illumination change is a challenge for the robustness of the image analysis algorithms.

Computer vision is also useful for driver monitoring as the orientation of the head and the gaze can be obtained, in this way distraction can also be detected. In Jiménez et al. (2009), a stereo camera system automatically builds a 3D rigid model of the face. At the beginning of a video sequence, salient features of the face are detected and used to build the model, which was consequently tracked. It was developed by Jiménez et al. (2012a) for 90 yaw rotations and under low-light conditions. The driver head model was initially made upon a set of 3D points derived from stereo images. As new areas of the subject face appear, the model was updated. It was validated on sequences recorded in a naturalistic truck simulator, on driving exercises designed by a team of psychologists (see Jiménez et al. (2012b)).

Several signals on the vehicle reflect the driver intentions and actions so that the driver's behavior can be inferred by analyzing them, with the additional advantages of being a nonintrusive method and their availability through the CAN (controller area network) bus. This way several parameters can be obtained such as speed, steering angle, position of the accelerator, and brake pedals. If the vehicle is perceiving the lane, additional parameters are time-to-collision, time-to-lane crossing, and lateral shift (see Tango et al. (2010)).

This has been the technological choice for the first automotive company systems installing driver drowsiness or attention systems into their vehicles. The disadvantages are as follows: the systems require a training period for each person driver so that they are not useful for occasional drivers and they are not able to detect when the driver falls asleep for a few seconds on a straight road without changing the direction of the vehicle.

5.5 Conclusions and Future Challenges

Computer vision is a key component for understanding the scene around the car. It has the potential of providing higher level information than other techniques that use radar or lidar information. This high level of understanding is crucial when engineering systems such as pedestrian detection in complex environments, driver monitoring, or traffic sign recognition. As has been explained throughout this chapter, there are many commercial ADAS making use of vision as a key component already in the market. However, each one is at its own stage of maturity and has its specific problems. For instance, although vision-based ACCs are being researched mainly in order to improve the success case of radar-based ones, pedestrian protection is starting its commercialization and still has a long journey not only to increase its robustness but also to add extra functionalities (e.g., estimate their behavior to anticipate actions), as previously explained.

In this section, we overview the main challenges present in ADAS as a whole. We divide the challenges into two aspects: systems robustness and cost. These aspects correspond to the factors that must be addressed by researchers, industry, and governments in order to achieve the integration of ADAS with other vehicle safety measures such as airbags or seat belts.

5.5.1 Robustness

One of the ways to measure the robustness of any on-board safety system is the standard safety performance assessments such as the Euro NCAP (European New Car Assessment Program), the Japanese NASVA (National Agency for Automotive Safety and Victims Aid), or the American NHTSA (National Highway Traffic Administration).

Even though ADAS are still in the process of being included in these demanding assessments, very solid steps have already been made in this direction. As an example, the Euro NCAP has a rewards program for the best advanced safety systems in the market that complements the usual star-rating scheme (see EURO NCAP (2014)). This program includes blind spot, lane assist, speed alert, emergency braking, and pre-crash systems, among others. These systems include most of ADAS and are quite specific in the applications addressed. For example, in the case of the Autonomous Emergency Braking (AEB) program, it is again divided into city, interurban, and pedestrian systems. Different commercial vehicles from 2010 to 2013 have been rewarded. It is worth to highlight Ford's Driver Alert (an indirect way of driver monitoring through forward-looking cameras); Ford, Audi, and SEAT's lane assist systems; and Volkswagen Group's FCA systems. From 2014 onwards, standard protocols to assess the safety rating of some of these systems have been in use, which gives an idea of the maturity of the field.

Stereo Optical flow Detection Classification Tracking Lidar
Adaptive cruise control and forward collision avoidance Useful Low High High High Useful
Traffic sign recognition Null Null High High Useful Null
Traffic jam assist Useful Low High High High Useful
Vulnerable road user protection High Useful High High High Null
Intelligent headlamp control Null Useful High High Useful Null
Enhanced night vision Null Null High High Low Null
Intellignt active suspension High Null High Useful High Useful
Lane departure warning and lane keeping system Null Null High High Useful Null
Lane change assistance Useful Null High High Useful Useful
Parking assistance High Null Useful Useful Null Useful
Driver monitoring and drowsiness detection Useful Null High High High Null

Figure 5.11 Summary of the relevance of several technologies in each ADAS: in increasing relevance as null, low, useful, and high

5.5.2 Cost

Perhaps the most difficult barrier nowadays to the global deployment of ADAS is their cost. Night vision systems were commercialized as early as the year 2000, and vulnerable road users' protection is being integrated by the top tier vehicles of different motor companies nowadays. However, the real challenge is to integrate different ADAS at lower cost in the average cost vehicles. The approach that must be taken here is to be able to use the same sensors for different ADAS (i.e., using the same forward-looking camera for detecting vehicles at long distances, pedestrians at short distances, and lanes). This can be problematic nowadays given that most classifiers, trackers, and so on require cameras with fields of view and resolutions in a given range. Furthermore, another important point to be taken into account is to provide an easy and inexpensive maintenance of these systems. For example, the calibration of the vision system has to be performed once a year as for other parts of the vehicle, not once per month. In this direction, self-calibration is an interesting way to solve this potential problem (see Dang et al. (2009)). Finally, Figure 5.11 summarizes the relevance of several technologies in each ADAS: in increasing relevance as null, low, useful, and high.

Acknowledgments

This work is supported by the Spanish MICINN projects TRA2011-29454-C03-01 and TRA2014-57088-C2-1-R by the Secretaria d'Universitats i Recerca del Departament d'Economia i Coneixement de la Generalitat de Catalunya (2014-SGR-1506), and by DGT project SPIP2014-01352. Our research is also kindly supported by NVIDIA Corporation in the form of different GPU hardware.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset