Similar to other video codecs, profiles specify the syntax (i.e., algorithms) and levels specify various parameters (resolution, frame rate, bit-rate, etc.). The various levels are described in Table 7.12.
Level | Maximum MB per Second | Maximum Frame Size (MB) | Typical Frame Resolution | Typical Frames per Second | Maximum MVs per Two Consecutive MBs | Maximum Reference Frames | Maximum Bit-Rate |
---|---|---|---|---|---|---|---|
1 | 1,485 | 99 | 176×144 | 15 | – | 4 | 64 kbps |
1.1 | 3,000 | 396 | 176×144 | 30 | – | 9 | 192 kbps |
320×240 | 10 | 3 | |||||
352×288 | 7.5 | 3 | |||||
1.2 | 6,000 | 396 | 352×288 | 15 | – | 6 | 384 kbps |
1.3 | 11,880 | 396 | 352×288 | 30 | – | 6 | 768 kbps |
2 | 11,880 | 396 | 352×288 | 30 | – | 6 | 2 Mbps |
2.1 | 19,800 | 792 | 352×480 | 30 | – | 6 | 4 Mbps |
352×576 | 25 | ||||||
2.2 | 20,250 | 1,620 | 720×480 | 15 | – | 5 | 4 Mbps |
720×576 | 12.5 | ||||||
3 | 40,500 | 1,620 | 720×480 | 30 | 32 | 5 | 10 Mbps |
720×576 | 25 | ||||||
3.1 | 108,000 | 3,600 | 1280×720 | 30 | 16 | 5 | 14 Mbps |
3.2 | 216,000 | 5,120 | 1280×720 | 60 | 16 | 4 | 20 Mbps |
4 | 245,760 | 8,192 | 1920×1080 | 30 | 16 | 4 | 20 Mbps |
1280×720 | 60 | ||||||
4.1 | 245,760 | 8,192 | 1920×1080 | 30 | 16 | 4 | 50 Mbps |
1280×720 | 60 | ||||||
4.2 | 491,520 | 8,192 | 1920×1080 | 60 | 16 | 4 | 50 Mbps |
5 | 589,824 | 22,080 | 2048×1024 | 72 | 16 | 5 | 135 Mbps |
5.1 | 983,040 | 36,864 | 2048×1024 | 120 | 16 | 5 | 240 Mbps |
4096×2048 | 30 |
Baseline profile is designed for progressive video such as video conferencing, video-over-IP, and mobile applications. Tools used by Baseline profile include:
Insider Info
Note that Baseline profile is not a subset of Main profile. Many solutions implement a subset of Baseline profile, without ASO or FMO; this is a subset of Main profile (and much easier to implement).
Extended profile is designed for mobile and Internet streaming applications. Additional tools over Baseline profile include:
Main profile is designed for a wide range of broadcast applications. Additional tools over Baseline profile include:
After the initial specification was completed, the Fidelity Range Extension (FRExt) amendment was added. This resulted in four additional profiles being added to the specification:
H.264 uses the YCbCr color space, supporting 4:2:0, 4:2:2, and 4:4:4 sampling. The 4:2:2 and 4:4:4 sampling options increase the chroma resolution over 4:2:0, resulting in better picture quality. In addition to 8-bit YCbCr data, H.264 supports 10- and 12-bit YCbCr data to further improve picture quality.
With H.264, the partitioning of the 16×16 macroblocks has been extended. Such fine granularity leads to a potentially large number of motion vectors per macroblock (up to 32) and number of blocks that must be interpolated (up to 96). To constrain encoder/decoder complexity, there are limits on the number of motion vectors used for two consecutive macroblocks.
Error concealment is improved with Flexible Macroblock Ordering (FMO), which assigns macroblocks to another slice so they are transmitted in a nonscanning sequence. This reduces the chance that an error will affect a large spatial region, and improves error concealment by being able to use neighboring macroblocks for prediction of a missing macroblock.
Motion compensation accuracy is improved from the ½-pixel accuracy used by most earlier video codecs. H.264 supports the same ¼-pixel accuracy that is used on the latest MPEG-4 video codec.
H.264 adds supports for multiple reference frames. This increases compression by improving the prediction process and increases error resilience by being able to use another reference frame in the event that one was lost.
A single macroblock can use up to 8 reference frames (up to 3 for HDTV), with a total limit of 16 reference frames used within a frame.
To compensate for the different temporal distances between current and reference frames, predicted blocks are averaged with configurable weighting parameters. These parameters can either be embedded within the bitstream or the decoder may implicitly derive them from temporal references.
Transform, Scaling, and Quantization
H.264 uses a simple 4×4 integer transform. In contrast, older video codecs use an 8×8 DCT that operates on floating-point coefficients. An additional 2×2 transform is applied to the four CbCr DC coefficients. Intra 16×16 macroblocks have an additional 4×4 transform performed for the sixteen Y DC coefficents.
Blocking and ringing artifacts are reduced as a result of the smaller block size used by H.264. The use of integer coefficients eliminates rounding errors that cause drifting artifacts common with DCT-based video codecs.
For quantization, H.264 uses a set of 52 uniform scalar quantizers, with a step increment of about 12.5% between each.
The quantized coefficients are then scanned, from low frequency to high frequency, using one of two scan orders.
After quantization and zig-zag scanning, H.264 uses two types of entropy encoding: variable-length coding (VLC) and Context Adaptive Binary Arithmetic Coding (CABAC).
For everything but the transform coefficients, H.264 uses a single Universal VLC (UVLC) table that uses an infinite-extend codeword set (Exponential Golomb). Instead of multiple VLC tables as used by other video codecs, only the mapping to the single UVLC table is customized according to statistics.
For transform coefficients, which consume most of the bandwidth, H.264 uses Context-Adaptive Variable Length Coding (CAVLC). Based upon previously processed data, the best VLC table is selected.
The MPEG compression standards have contributed greatly to the proliferation of video in today's devices. This chapter covered the major features of the most common standards: