Chapter 7

Real-Time HDR Video Processing and Compression Using an FPGA

P. Zemčík; P. Musil; M. Musil    Brno University of Technology, Brno, Czech Republic

Abstract

Real-time video processing of high dynamic range (HDR) content is an important and demanding task. This chapter demonstrates how a HDR video can be acquired in real-time through multi-exposure using standard image sensors, how the data can be fused, processed, and compressed in real-time, all using field programmable gate arrays (FPGA). The chapter describes a block diagram of the system built from an off-the-shelf camera and custom electronics, shows how the processing and compression algorithms work and what are their features, and also how they can be implemented using the FPGA, including a basic information about the results of such implementation.

Keywords

High dynamic range imaging; HDR; Image processing; Image compression; Multi-exposure image; Programmable hardware; Field programmable gate array (FPGA)

1 HDR Video Processing

High dynamic range (HDR) image and video processing is a trend that addresses the contemporary effort for HDR image, and video capture and reflects the need to acquire and provide highly informational image and video content to the user. As the widely available sensors for HDR images and video require multiple exposures for acquisition of the truly HDR content (the widely available sensors only provide around 12 bits per pixel—12 f-stops—of dynamic range), the processing involves acquisition of multiple images/frames with different exposures, merging of such frames into HDR frames, and storage or further processing of the acquired information. As the flow of data containing the frames of HDR video can be very high, compression of the data is often required and a HDR-specific video compression may have to be involved [1].

This chapter describes a HDR video acquisition and compression system built using mainly programmable hardware that captures multi-exposure HDR video, merges the standard range frames into the HDR frames, compresses the HDR video, and then transmits the compressed video over a computer network. This chapter describes the HDR acquisition task in more detail, presents an overall structure of the hardware used for the task, outlines the implementation of the algorithms used for HDR video compression, and shows also the interfaces outline and overall structure of the software. Finally, an overview of the achieved features and exploitation of resources is presented.

2 Description of the HDR Acquisition and Compression Task

The system described in this chapter is intended for devices, such as surveillance camera units, industrial quality inspection units, or other capture systems with fixed camera and objects of interest under demanding lighting conditions. The anticipated technical characteristics of the system include:

 exploitation of “off the shelf camera,” in our case, IO Industries FlareTM2KSDI Camera has been selected,

 HDR acquisition of 30 fps HDR video with at least 15 bits per pixel (15 f-stops) and/or 20 fps HDR video with at least 18 bits per pixel dynamic range (the system should be able to switch between the above modes),

 real-time video compression of HDR video using the predefined compression technology developed by goHDR [2], and

 transmission of video over a suitable communication infrastructure, such as Ethernet.

The system as a whole is illustrated in the block diagram in Fig. 1, which also shows the main building blocks of the proposed hardware solution that heavily relies on programmable hardware, namely the FPGA (field programmable gate array) technology [3], specifically the Xilinx ZYNQ architecture that combines the FPGA technology with a couple of ARM CPUs [4]. This system core is accompanied with the H.264 video compression chips that accomplish the standard task of 8-bit video compression that is a part of the proposed compression scheme [5].

f07-01-9780128094778
Fig. 1 Proposed HDR acquisition and compression system block diagram.

The entire system is completed with several electrical interfaces, such as SDI, and while it can run from batter power, it is also accompanied with battery power supply and charging blocks. The details of individual functional blocks are shown below.

3 Acquisition of HDR Video

The HDR video acquisition process is based on the idea of multi-exposure frame capture [6] and in our case, the input video stream is being formed from 60 fps Full HD video and the desired output can be either 30 fps Full HD with each frame formed from 2 exposures, or 20 fps Full HD video formed from 3 exposures each frame. With sharing the expositions between individual image sets the output can reach up to 60FPS. The video camera used allows for preprogramming in a way so that is generates continuous flow of video frames where the series of (2 or 3) frames are exposed with predetermined exposure times. The image acquisition is typically done in a color format where the color components are simple to handle, and where only the intensity component is processed in HDR processing chain, while the color is preserved independently. In our case, YCbCr model is used.

The acquisition process is, in fact, simple from the functional point of view, while it is relatively complex technologically. The algorithm is based on selection of the most suitable pixel for the output frame out of the corresponding pixels from the acquired frames. This is quite a traditional approach [7] and except for some algorithmic difficulties in estimation of the saturation level of individual pixels exposures—especially in case the cameras perform color space conversion, it is not simple to estimate the saturation level—it is simple (see Fig. 2). In our case, we choose the option of selection of only a single exposure, while some approaches tend to perform a linear combination typically of two exposures to avoid transitional effects that did not play important role in our case.

f07-02-9780128094778
Fig. 2 Selection of the proper exposure level of individual pixels.

In our approach, the actual representation of pixels in the system is logarithmic and so we used a LUT (LookUp Table) block built from an FPGA memory with static content, connected to the MUX block that encodes the pixels appropriately. The actual content of the LUT intended for calculating the logarithm of the pixels depends on the actual exposure times of the frames and reflects the relations between the individual exposures.

Technologically, building of the unit that combines the individual pixels is, however, relatively complex as it involves storage of the 2 or 3 individual frames as they are acquired, and their simultaneous retrieval when the combination needs to be performed—mind that the frames are captures sequentially (see Fig. 3).

f07-03-9780128094778
Fig. 3 Actual image acquisition building block.

Moreover, the combining operation (and further processing) must be time overlapped with exposure of further frames of the video, as the flow of data from the camera is continuous and so the combining (and further) operations are performed in parallel. Note, please, that the algorithm does not deal with ghosting effects [8] and deghosting remains subject of future work.

4 HDR Video Compression Implementation

The actual compression algorithm is performed according to the compression scheme used in [2]. The main idea of the compression is to create a detail-less version of the image, compress its dynamic range, and encode using the standard 8-bit compression (H.264 in our case) into one output stream. Then the differences between the detail-less version of the image and the actual input are encoded as a ratio between the actual pixel value and the corresponding pixel in the detail-less version. Such ratio is encoded in another 8-bit stream along with the color information.

The whole compression process is illustrated in Fig. 4. It contains three blocks in which complexity is high and that are costly to implement—H.264 encoders and Bilateral filter. Two of the blocks, H.264 encoders are, in our approach, implemented in separate electronic components integrating the whole functionality [5]. The third block, however, is not available in this form and had to be implemented in FPGA.

f07-04-9780128094778
Fig. 4 HDR image compression complete scheme.

The Bilateral filter block diagram is shown in Fig. 5. The filter is built over image delay lines with built-in register array, which provides access to 11x11 pixel neighborhood at the time. With every pixel at the input, the neighborhood shifts about one position in image. Above the registers, the pipelined computational tree is built, capable of processing one position in every clock cycle. The computation is performed in the same manner as in the reference implementation of bilateral filter, every pixel gets two weight coefficients, one by distance from the central pixel and the second based on difference of value against the central pixel. All the pixel values are multiplied by their weight and accumulated. At the end, the resultant pixel value is computed by division of this accumulated sum by sum of weight coefficients.

f07-05-9780128094778
Fig. 5 Bilateral filter block diagram.

Overall, the design of the acquisition and compression system is done in logarithmic domain. The only difference against reference implementation is using the coefficients composed based on multiples of two, which changes the operation of multiplication into simple addition, which makes the design much easier. The conversion from the linear input is covered in the acquisition block (see the previous paragraph) and the conversion back to the linear scale for the H.264 compression blocks is done using LUTs just before tone-mapping function, which output leads directly to the H.264 encoders.

5 HDR Acquisition and Compression System Interfaces

Interfacing of the system provides the necessary means of electronic interconnection with the camera on one side and the consumer of the video output on the other side. In our case, the camera input interface was SDI (3 Gb/s), while the output of the system was 1 Gb Ethernet interface at the output of the system. As the H.264 encoders produce their output to an USB interface, this needs to be handled as well. Fortunately, the CPU part of the Xilinx ZYNQ chip is equipped with both Ethernet and USB interfaces, and so these two interfaces were possible to take “as it is.” On the other hand, the SDI interface is not available to the CPU part and must be built separately. In fact the SDI to LVDS interface is available as a component and the LVDS interface can be directly handled by the FPGA part of the Xilinx ZYNQ chip (see Fig. 6).

f07-06-9780128094778
Fig. 6 Figure diagram of the system interfaces.

The interfaces “visible” to the outside world are the SDI input interface and the 1-Gb Ethernet interface. In fact, the data flow over the 1-Gb Ethernet is relatively low, and so it can be converted to wireless transmission over WiFi, or possibly even the LTE mobile network.

6 HDR Acquisition and Compression Software Structure Overview

The software involved in the design of the acquisition and compression systems is merely control software, as no important algorithmic part of the acquisition and compression is performed in software, except for the initialization and also configuration. OpenSUSE Linux 13.1 is used as the operating system. The software, more specifically performs the following tasks:

 initialization of the whole system including, the camera, H.264 encoders, and external peripheral devices,

 configuration of the whole system using the web interface through web server run in the CPU part,

 setting the camera parameters and handling continuous change of exposition times,

 video data DMA transfers from SDI into system memory and from the system memory into the FPGA for processing,

 compressed video data transfers from the H.264 encoders into the system memory, and

 output of the compressed video streams over Ethernet interface using the TCP/IP protocol.

7 HDR Acquisition and Compressions Features and Results

Features of the assembled system, from the technical point of view are:

 input of 60 fps FullHD video over SDI interface in YCbCr format,

 output of 30 fps FullHD video assembled from 3 exposures (limited by H.264 encoder),

 dimensions 270 × 180 × 60 mm3, and

 power consumption of 12 W (without FlareTM 2KSDI Camera).

In designs requiring additional processing of the HDR video, it is important what proportion of resources in FPGA was consumed by the acquisition and compression blocks. Also, it is important what power features the system has. These features are shown in Table 1.

Table 1

FPGA resource consumption

Resources for Zynq XC7Z045UtilizationAvailableUtilization (%)
LUT tables2470021860011
Registers30405437200 7
BlockRAM (36 kB)  47  545 9

t0010

A photograph of the assembled system is shown in Fig. 7. Note, please, the “base” FPGA board, the compression modules, and also the SDI interface.

f07-07-9780128094778
Fig. 7 Photo of the assembled system.

The concept of HDR acquisition and processing will be developed further. The future work includes creation of a compact design that would fit into and be integrated with a camera, possibly including ability to deghost the video. The future work also involves further development of processing algorithms, such as object detection, filtering, and other useful application specific functions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset