image

Digital video is a large subject that draws upon an equally large number of technologies. Fortunately, every process can be broken down into smaller steps, each of which is relatively easy to follow. The main difficulty in the study is to appreciate where the small steps fit into the overall picture. The purpose of this chapter is to summarise the subject and to point out how those technologies are used, without attempting any explanations beyond basics. More comprehensive explanations can be found in the later chapters that are identified here.

TELEVISION SYSTEMS

Ultimately, the purpose of television is to amuse human eyes and ears. Without an understanding of the human visual system (HVS) and the human auditory system (HAS) little progress is to be expected. This book will have something to say about those systems and their expectations in Chapters 2 and 7. Whatever the technology used, television systems all have the same basic requirements. As a minimum, equipment is needed to acquire moving images and associated sounds and to broadcast them. Whilst this would be sufficient for a live event, many programs are prepared in advance and this requires storage devices. Cameras may have a built-in storage device or send the acquired images to a separate device by cable or radio link. The acquired information may be transferred to an intermediate storage device for editing and other production steps in a process called ingestion, and the completed program may be transferred again to another storage device for playout to air.

Thus television systems need equipment that can acquire, store, edit, and manipulate video signals as well as being able to send them from one place to another. In addition to the traditional terrestrial and satellite broadcasting channels, images may also be distributed by networks, including the Internet, and by media such as DVDs (digital video disc or digital versatile disc).

The first methods used in television transmission and storage were understandably analog, and the signal formats essentially were determined by requirements of the cathode ray tube (CRT) as a display, so that the receiver might be as simple as possible and be constructed with a minimum number of vacuum tubes. Following the development of magnetic audio recording during World War II, a need for television recording was perceived. This was initially due to the various time zones across the United States. Without recording, popular television programs had to be performed live several times so they could be seen at the peak viewing time in each time zone. Ampex eventually succeeded in recording monochrome video in the 1950s, and the fundamentals of the Quadruplex machine were so soundly based that subsequent analog video recorders only refined the process.1 Digital technology took over not because these devices did not work, but because it required less maintenance and worked more quickly and at lower cost, whilst allowing new freedoms.

There are two powerful factors that influence television equipment and it is vital to understand them. The first is that there is a large installed base of receivers and it is unpopular and unwise to adopt new techniques that make them obsolete overnight. Thus television standards move rather slowly, and new ones tend to be compatible with previous ones. The introduction of colour television was done in such a way that existing sets would still display the monochrome part of the signal. Digital video signals tend to be compatible with existing analog signals so that, for example, a digital editing device can be used in an analog TV station, or a traditional TV can display signals from a digital set top box or integrated receiver-decoder (IRD).

The second factor is the steady progress of microelectronics and data storage, whereby the cost of performing a given process or of storing a given quantity of information continues to fall. Thus equipment becomes widely available when it is economical, and not necessarily when it is invented.

VIDEO SIGNALS

Video signals are electrical waveforms that allow moving pictures to be conveyed from one place to another. Observing the real world with the human eye results in a two-dimensional image on the retina. This image changes with time and so the basic information is three-dimensional. With two eyes a stereoscopic view can be obtained and stereoscopic television is possible with suitable equipment. The experimental demonstrations the author has seen were breathtaking, but use is restricted to specialist applications and has yet to be exploited in broadcasting.

An electrical waveform is two-dimensional in that it carries a voltage changing with respect to time. To convey three-dimensional picture information down a two-dimensional cable it is necessary to resort to scanning. Instead of attempting to convey the brightness of all parts of a picture at once, scanning conveys the brightness of a single point that moves with time, typically along a series of near-horizontal lines known as a raster. After the image is scanned once, the process repeats at the frame rate. If the frame rate is high enough, the HVS believes that it is most likely to be seeing moving objects instead of a rapid series of still images. The layman may go to the movies, but the pictures do not actually move.

One of the vital concepts to grasp is that digital video is simply an alternative means of carrying a video signal. Although there are a number of ways in which this can be done, there is one system, known as pulse code modulation (PCM), that is in virtually universal use.2 Figure 1.1 shows how PCM works. Instead of being continuous, the distance across the image is represented in a discrete, or stepwise, manner. The image is not carried by continuous representation, but by measurement at regular intervals. This process is called sampling and the frequency with which samples are taken is called the sampling rate or sampling frequency, Fs. The sampling frequency may be measured in samples per second or samples per millimetre. The former is a temporal sampling frequency, whereas the latter is a spatial sampling frequency. The temporal sampling frequency is obtained by multiplying the spatial sampling frequency by the scanning speed.

image

FIGURE 1.1

When a signal is carried in numerical form, either parallel or serial, the mechanisms of Figure 1.4 ensure that the only degradation is in the conversion process.

It should be stressed that sampling is an analog process. Each sample still varies infinitely as the original waveform did. To complete the conversion to PCM, the magnitude of each sample is then represented to finite accuracy by a discrete number in a process known as quantizing.

DIGITAL VIDEO

In television systems the input image that falls on the camera sensor will be continuous in time and continuous in two spatial dimensions corresponding to the height and width of the sensor. In analog video systems, the time axis is sampled into frames, and the vertical axis is sampled into scanned lines. Digital video uses a third sampling process whereby a continuous line is replaced by a row of picture elements or pixels. Such a system was first proposed by Ayrton and Perry as early as 1880, a century before it could widely be adopted.

The number of pixels in an image is very large, so it is clearly impractical to send each one down its own wires. Instead, the pixels are sent sequentially. It is common for the pixel rate to be an integer multiple of the line scanning rate. Pixels then appear in the same place on every line, and a monochrome digital image is a rectangular array of pixels at which the brightness is stored as a number. As shown in Figure 1.2a, the array will generally be arranged with an even spacing between pixels, which are in rows and columns. By placing the pixels close together, it is hoped that the observer will perceive a continuous image.

In a completely digital system, the number of pixels and the aspect ratio of the image may be chosen with considerable freedom. However, where compatibility with an existing analog standard is needed, the digital version may be no more than the result of passing the analog video signal into an analog-to-digital convertor (ADC) having a suitable sampling rate.

Those who are not familiar with digital principles often worry that sampling takes away something from a signal because it does not take notice of what happened between the samples. This would be true in a system having infinite bandwidth, but no signal can have infinite bandwidth. All analog signal sources from cameras, VTRs, and so on have a resolution or frequency response limit, as indeed do devices such as CRTs and human vision. When a signal has finite bandwidth, the rate at which it can change is limited, and the way in which it changes becomes predictable. When a waveform can change between samples in only one way, it is then necessary to convey only the samples, and the original waveform can unambiguously be reconstructed from them. A more detailed treatment of the principle will be given in Chapter 4.

image

image

FIGURE 1.2

(a) A picture can be stored digitally by representing the brightness at each of the points shown by a binary number. For a colour picture each point becomes a vector and has to describe the brightness, hue, and saturation of that part of the picture. Samples are usually but not always formed into regular arrays of rows and columns, and it is most efficient if the horizontal spacing and vertical spacing are the same. (b) In the case of component video, each pixel site is described by three values and so the pixel becomes a vector quantity.

As stated, each sample is also discrete, or represented in a stepwise manner. The magnitude of the sample, which will be proportional to the voltage of the video signal, is represented by a whole number. This quantizing results in an approximation, but the size of the error can be controlled until it is negligible. If, for example, we were to measure the height of humans to the nearest metre, virtually all adults would register 2 metres high and obvious difficulties would result.

BINARY CODING

Humans prefer to use numbers expressed to the base of 10, having evolved with that number of digits. Other number bases exist; most people are familiar with the duodecimal system, which uses the dozen and the gross. The most minimal system is binary, which has only two digits, 0 and 1. Binary digits are universally contracted to bits. These are readily conveyed in switching circuits by an “on” state and an “off” state. With only two states, there is little chance of error.

In decimal systems, the digits in a number (counting from the right, or least significant, end) represent 1’s, 10’s, 100’s, 1000’s, etc. Figure 1.3 shows that in binary, the bits represent 1, 2, 4, 8, 16, etc. A multidigit binary number is commonly called a word, and the number of bits in the word is called the word length. The righthand bit is called the least significant bit (LSB), whereas the bit on the lefthand end of the word is called the most significant bit (MSB). Clearly more digits are required in binary than in decimal, but they are more easily handled. A word of 8 bits is called a byte, which is a contraction of “by eight.” The capacity of memories and storage media is measured in bytes, but to avoid large numbers, kilobytes, megabytes, and gigabytes are often used. As memory addresses are themselves binary numbers, the word length limits the address range. The range is found by raising 2 to the power of the word length. Thus a 4-bit word has 16 combinations and could address a memory having 16 locations. A 10-bit word has 1024 combinations, which is close to 1000. In digital terminology, 1 K 1024, so a kilobyte of memory contains 1024 bytes. A megabyte (1 MB) contains 1024 kilobytes and a gigabyte contains 1024 megabytes.

image

FIGURE 1.3

In a binary number, the digits represent increasing powers of 2 starting from the LSB. Also defined here are MSB and word length. When the word length is 8 bits, the word is a byte. Binary numbers are used as memory addresses, and the range is defined by the address word length. Some examples are shown here.

These are generally overcome by measuring height to the nearest centimetre. Clearly there is no advantage in going further and expressing our height in a whole number of millimetres or even micrometres. The point is that an appropriate resolution can also be found for video signals, and a higher figure is not beneficial. The link between video quality and sample resolution is explored in Chapter 4. The advantage of using whole numbers is that they are not prone to drift. If a whole number can be carried from one place to another without numerical error, it has not changed at all. By describing video waveforms numerically, the original information has been expressed in a way that is better able to resist unwanted changes.

Essentially, digital video carries the original image numerically. The number of the pixel is an analog of its location on the screen, and the magnitude of the sample is (in the case of luminance) an analog of the brightness at the appropriate point in the image. In fact the series of pixels along a line in a digital system is only a sampled version of the waveform that an analog system would have produced.

As both axes of the digitally represented waveform are discrete, that waveform can accurately be restored from numbers as if it were being drawn on graph paper. If we require greater accuracy, we simply choose paper with smaller squares. Clearly more numbers are then required and each one could change over a larger range.

Digital systems almost universally use binary coding in which there are only two symbols, 0 and 1. This is because binary is easiest to represent by real phenomena such as electrical, magnetic, or optical signals.

In a digital video system, the whole number representing the value of the sample is expressed in binary. The signals sent have two states and change at predetermined times according to some stable clock. Figure 1.4 shows the consequences of this form of transmission. If the binary signal is degraded by noise, this will be rejected by the receiver, which judges the signal solely by whether it is above or below the halfway threshold, a process known as slicing. The signal will be carried in a channel with finite bandwidth, and this limits the slew rate of the signal; an ideally upright edge is made to slope. Noise added to a sloping signal can change the time at which the slicer judges that the level passed through the threshold. This effect is also eliminated when the output of the slicer is reclocked. However many stages the binary signal passes through, it still comes out the same, only later.

Video samples represented by whole numbers can reliably be carried from one place to another by such a scheme, and if the number is correctly received, there has been no loss of information en route.

image

FIGURE 1.4

(a) A binary signal is compared with a threshold and reclocked on receipt; thus the meaning will be unchanged. (b) Jitter on a signal can appear as noise with respect to fixed timing. (c) Noise on a signal can appear as jitter when compared with a fixed threshold.

There are two ways in which binary signals can be used to carry samples and these are also shown in Figure 1.1. When each digit of the binary number is carried on a separate wire this is called parallel transmission. The state of the wires changes at the sampling rate. Using multiple wires is cumbersome and it is preferable to use a single wire in which successive digits from each sample are sent serially. This is the definition of pulse code modulation. Clearly the bit clock frequency must now be higher than the sampling rate.

STANDARD AND HIGH-DEFINITION VIDEO

Alice in Wonderland said “When I use a word, it means exactly what I want it to mean.” The same principle will be found in television. It should be appreciated that when the 405-line monochrome TV service was initiated in the UK, it was referred to as high definition. Today, the term “standard definition” (SD) refers to the TV scanning standards that were used in broadcasting over the last third of the 20th century. These have 500 or 600 lines per frame, but use interlaced scanning, which means that their effective definition on moving pictures is considerably less than the number of lines would suggest.

When the term “high definition” (HD) is encountered, one has to be aware that the writer may be using his own definition. It is this author's opinion that “high definition” is a poor term because it implies that the only attribute a moving picture can have is static definition and that all that is necessary is to increase it.

Nothing could be further from the truth. Instead the recent progress that has been made in understanding the HVS should be respected. This work suggests that there is a set of parameters that require attention to improve television picture quality and that static definition is only one of them. Clearly if the static definition alone is improved, the system may simply be better able to reveal other shortcomings to the viewer. The author has seen plenty of systems like that, which do not seem to be a real advance.

In general all one can assume from the term “high definition” is that the picture has more lines in it than the SD systems have. Comparisons of system performance based on the number of lines alone give misleading results. This is because some HD systems use interlaced scanning and some do not. The reader will find a comprehensive essay comparing interlaced with progressive scanning in Chapter 2, which will conclude that it is unsuitable for systems with a large number of lines, to the extent that “interlaced HD” is virtually an oxymoron.

The European Broadcast Union (EBU) recommended in 2004 that European HD services should use progressive scan.

It should be noted that the way the line count is measured has changed. For example, in 625-line SDTV, there are 625 lines in the entire frame period, but some of these are lost to CRT retrace or flyback and do not appear on the screen. In modern systems only the number of lines actually visible is quoted. This makes sense because only visible lines need to be transmitted in digital systems and in many modern displays there is no flyback mechanism.

Obviously the finer the pixel spacing, the greater the resolution of the picture will be, but the amount of data needed to store one picture, and the cost, will increase as the square of the resolution. A further complication is that HDTV pictures have a wider aspect ratio, increasing the pixel count further.

Without the use of compression, high-quality SDTV requires around 200 million bits per second, whereas HDTV requires more like a gigabit per second. Clearly digital video production could become commonplace only when such data rates could be handled economically. Consumer applications and broadcasting could become possible only when compression technology became available to reduce the data rate. Chapter 6 deals with video compression.

COLOUR

Colorimetry will be treated in depth in Chapter 2 and only the basics will be introduced here. Colour is created in television by the additive mixing in the display of three primary colours: red, green, and blue (RGB). Effectively, the display needs to be supplied with three video signals, each representing a primary colour. Because practical colour cameras generally also have three separate sensors, one for each primary colour, a camera and a display can be directly connected. RGB consists of three parallel signals, each requiring the same bandwidth, and is used where the highest accuracy is needed. RGB is not used for broadcast applications because of the high cost.

If RGB is used in the digital domain, it will be seen from Figure 1.2b that each image consists of three superimposed layers of samples, one for each primary colour. The pixel is no longer a single number representing a scalar brightness value, but a vector that describes in some way the brightness, hue, and saturation of that point in the picture. In RGB, the pixels contain three unipolar numbers representing the proportion of each of the three primary colours at that point in the picture.

Some saving of bandwidth can be obtained by using colour difference working. The HVS relies on brightness to convey detail, and much less resolution is needed in the colour information. Accordingly R, G, and B are matrixed together to form a luma (and monochrome-compatible) signal Y, which alone needs full bandwidth. The eye is not equally sensitive to the three primary colours, as can be seen in Figure 1.5, and so the luma signal is a weighted sum.

The matrix also produces two colour difference signals, R–Y and B–Y. Colour difference signals do not need the same bandwidth as Y, because the eye's acuity does not extend to colour vision. One-half or one-quarter of the bandwidth will do, depending on the application.

In the digital domain, each pixel again contains three numbers, but one of these is a unipolar number representing the luma and the other two are bipolar numbers representing the colour difference values. As the colour difference signals need less bandwidth, in the digital domain this translates to the use of a lower sampling rate, typically between one-half and one-sixteenth the bit rate of the luma.

image

FIGURE 1.5

The response of the human eye to colour is not uniform.

For monochrome-compatible analog colour television broadcasting, the NTSC, PAL, and SECAM systems interleave into the spectrum of a monochrome signal a subcarrier that carries two colour difference signals of restricted bandwidth. The subcarrier is intended to be invisible on the screen of a monochrome television set. A subcarrier-based colour system is generally referred to as composite video, and the modulated subcarrier is called chroma. Composite video is classified as a compression technique because it allows colour pictures in the same bandwidth as monochrome.

In the digital domain the use of composite video is not appropriate for broadcasting and the most common equivalent process is the use of an MPEG compression scheme. From a broadcasting standpoint MPEG is simply a more efficient digital replacement for composite video.

CONVERGENCE OF VIDEO AND INFORMATION TECHNOLOGY

When professional digital video equipment first became available in the 1980s, it was very expensive and was attractive only to those who needed the benefits that it offered. At that time compression was little used and most equipment operated with the full PCM bit rate, offering exceptional picture quality. The generation loss of analog videotape was eliminated, as PCM digital copies are indistinguishable from the original.

However, once video has been digitized, it differs from generic data only by having an implicit time base. Thus, in principle, computers, now known as information technology (IT), could handle video. The cost of computing devices fell and the processing power and storage capacity rose. Compression algorithms such as MPEG were developed. It became first possible, then easy, then trivial to use IT not just for video production but also for consumer equipment, at least in standard definition.

Whilst digital video allows extremely high picture quality, this was rarely explored. Instead the freedom of information technology transformed the way in which video was produced, distributed, and viewed, and in most cases the picture quality was actually worse than that of an analog off-air picture, with the possible exception of well-made DVDs.

TWO'S COMPLEMENT

In the two's complement system, the upper half of the pure binary number range has been redefined to represent negative quantities. This allows digital codes to represent bipolar values found in audio and colour difference signals. If a pure binary counter is constantly incremented and allowed to overflow, it will produce all the numbers in the range permitted by the number of available bits, and these are shown for a 4-bit example drawn around the circle in Figure 1.6. As a circle has no real beginning, it is possible to consider it as starting wherever it is convenient. In two's complement, the quantizing range represented by the circle of numbers does not start at 0, but starts on the diametrically opposite side of the circle. Zero is midrange, and all numbers with the MSB set are considered negative. The MSB is thus the equivalent of a sign bit, where 1 minus. Two's complement notation differs from pure binary in that the most significant bit is inverted to achieve the half-circle rotation.

Figure 1.7 shows how a real ADC is configured to produce two's complement output. At (a) an analog offset voltage equal to one-half the quantizing range is added to the bipolar analog signal to make it unipolar as at (b).

image

FIGURE 1.6

In this example of a 4-bit two's complement code, the number range is from 8 to 7. Note that the MSB determines polarity.

image

FIGURE 1.7

A two's complement ADC. In (a) an analog offset voltage equal to one-half the quantizing range is added to the bipolar analog signal to make it unipolar as in (b). The ADC produces positive-only numbers (c), but the MSB is then inverted (d) to give a two's complement output.

The ADC produces positive-only numbers at (c), which are proportional to the input voltage. The MSB is then inverted at (d) so that the all-0’s code moves to the centre of the quantizing range. The analog offset is often incorporated into the ADC as is the MSB inversion. Some convertors are designed to be used in either pure binary or two's complement mode. In this case the designer must arrange the appropriate digital convertor conditions at the input. The MSB inversion may be selectable by an external logic level. In the digital video interface standards the colour difference signals use offset binary because the codes of all 0’s and all 1’s are at the end of the range and can be reserved for synchronising. A digital vision mixer simply inverts the MSB of each colour difference sample to convert it to two's complement.

The two's complement system allows two sample values to be added, or “mixed,” in video parlance, and the result will be referred to the system midrange; this is analogous to adding analog signals in an operational amplifier.

Figure 1.8 illustrates how adding two's complement samples simulates a bipolar mixing process. The waveform of input A is depicted by solid black samples and that of B by samples with a solid outline. The result of mixing is the linear sum of the two waveforms obtained by adding pairs of sample values. The dashed lines depict the output values. Beneath each set of samples is the calculation, which will be seen to give the correct result. Note that the calculations are pure binary. No special arithmetic is needed to handle two's complement numbers.

It is sometimes necessary to phase reverse or invert a digital signal. The process of inversion in two's complement is simple. All bits of the sample value are inverted to form the one's complement, and 1 is added. This can be checked by mentally inverting some of the values in Figure 1.6. The inversion is transparent, and performing a second inversion gives the original sample values.

image

FIGURE 1.8

Using two's complement arithmetic, single values from two waveforms are added together with respect to midrange to give a correct mixing function.

Using inversion, signal subtraction can be performed using only adding logic. The inverted input is added to perform a subtraction, just as in the analog domain. This permits a significant saving in hardware complexity, because only carry logic is necessary and no borrow mechanism need be supported.

In summary, two's complement notation is the most appropriate scheme for bipolar signals and allows simple mixing in conventional binary adders. It is in virtually universal use in digital video and audio processing.

Two's complement numbers can have a radix point and bits below it, just as pure binary numbers can. It should, however, be noted that in two's complement, if a radix point exists, numbers to the right of it are added. For example, 1100.1 is not −4.5, it is −4 + 0.5 = 3.5.

Thus a mere comparison of digital and analog video is missing the point. Its most exciting aspects are the tremendous possibilities that are denied to analog technology. Networks, error correction, random access, compression, motion estimation, and interpolation are difficult or impossible in the analog domain, but are straightforward in the digital domain.

Systems and techniques developed in other industries for other purposes can be used to store, process, and transmit video. Computer equipment is available at low cost because the volume of production is far greater than that of professional video equipment. Disk drives and memories developed for computers can be put to use in video products.

As the power of processors increases, it becomes possible to perform under software control processes that previously required dedicated hardware. This causes a dramatic reduction in hardware cost. Inevitably the very nature of broadcast equipment and the ways in which it is used are changing along with the manufacturers who supply it. The computer industry is taking over from traditional broadcast manufacturers, because they have the economics of mass production on their side.

Whereas tape is a linear medium and it is necessary to wait for the tape to wind to a desired part of the recording, the head of a hard disk drive can access any stored data in milliseconds. This is known in computers as direct access and in broadcasting as nonlinear access. As a result the nonlinear editing workstation based on hard drives has eclipsed the use of videotape for editing.

Communications networks developed to handle data can happily carry digital video and accompanying audio over indefinite distances without quality loss. Techniques such as ADSL allow compressed digital video to travel over a conventional telephone line to the consumer.

Digital TV broadcasting uses coding techniques to eliminate the interference, fading, and multipath reception problems of analog broadcasting. At the same time, more efficient use is made of available bandwidth.

One of the fundamental requirements of computer communication is that it is bi-directional. When this technology becomes available to the consumer, services such as video-on-demand and interactive video become possible. Television programs may contain metadata that allows the viewer rapidly to access web sites relating to items mentioned in the program. When the TV set is a computer there is no difficulty in displaying both on the same screen.

Increasingly the viewer will be deciding what and when to watch instead of passively accepting the broadcaster's output. With a tape-based VCR, the consumer was limited to time-shifting broadcast programs that could not be viewed until recording was over. Now that the hard drive-based consumer VCR, or personal video recorder (PVR), is available, the consumer has more power. For example, he or she may never watch another TV commercial again. The consequences of this technology are far-reaching.

image

FIGURE 1.9

The TV set of the future may look something like this.

Figure 1.9 shows what the television set of the future may look like. MPEG compressed signals may arrive in real time by terrestrial or satellite broadcast, via a cable, or on media such as DVD. The TV set is simply a display, and the heart of the system is a hard drive-based server. This can be used to time-shift broadcast programs, to skip commercial breaks, or to assemble requested movies transmitted in non-real time. If equipped with a web browser, the server may explore the web looking for material of the same kind the viewer normally watches. As the cost of storage falls, the server may download this material speculatively.

Note that when the hard drive is used to time-shift or record, it simply stores the MPEG bitstream. On playback the bitstream is decoded and the picture quality will be as good as the original MPEG coder allowed. The generation loss due to using an analog VCR is eliminated.

The worlds of digital video, digital audio, film, communication, and computation are now closely related, and that is where the real potential lies. The time when television was a specialist subject that could evolve in isolation from other disciplines is long gone; digital technology has made sure of that. Video has now become a branch of IT. Importantly, the use of digital technology in filmmaking, if it can still be called that, is also widespread and it too is now a branch of IT, especially electronic cinema in which the “film” arrives as a data file over a secure network.

Ultimately digital technology will change the nature of television broadcasting out of recognition. Once the viewer has nonlinear storage technology and electronic program guides, the broadcaster's transmitted schedule is irrelevant. Increasingly viewers will be able to choose what is watched and when, rather than the broadcaster deciding for them. The broadcasting of conventional commercials will cease to be effective when viewers have the technology to skip them. Viewers can also download video over the Internet.

Anyone with a web site and a suitable file server can become a “broadcaster.” Given that the majority of TV sets and computers are and will remain powered by house current, and that an Internet or broadband connection will soon be as ubiquitous as a power socket, it seems difficult to justify using radio signals to communicate with fixed receiving devices. The development of digital video broadcasting to handheld devices (DVB-H) may be an indicator of what will happen.

BASICS: TRANSMISSION, STORAGE, AND COMPRESSION

These three technologies are strongly related. Transmission of digital video, or of any type of data, requires that the value of the bits results in discrete changes of state of some parameter in the transmission channel. This may be the voltage in a cable, the intensity of light in an optical fibre, or the amplitude or phase of a radio signal. The receiving device must be able to decode the original data bits from the signal that may have suffered various forms of degradation on the way. It is practically impossible to modulate a transmission directly with the state of the bits. A run of identical bits would cause a constant signal and the receiver would lose count. Instead a modulation scheme or channel code is used at the transmitter, with a matching decoding scheme at the receiver.

If the output waveform of a channel coder is recorded, the result is a storage device. Channel codes can be optimized for the characteristics of magnetic or optical disks, magnetic tapes, and so on. Chapter 8 considers channel coding.

The rate of transmission is always limited by the available bandwidth. This limitation is the most serious in terrestrial broadcasting, in which the radio spectrum must be shared with other services. Storage devices are limited by their capacity. In both cases an apparent improvement can be had using compression. A compressor produces an impression of the image that uses fewer bits and thus allows an extension of playing time in storage devices or a reduction of bandwidth in transmission. Chapter 6 discusses video compression techniques.

As an alternative to compression, where a transmission bandwidth limit exists, data may be sent at a lower than normal bit rate, instead of in real time, and stored at the receiver. When the entire message has been received, the storage device can then replay in real time. Clearly the transmission does not need to be at a fixed bit rate, or even continuous, provided all of the data are received. Such a mechanism is ideally suited to networks, where a short interruption of delivery due to congestion does not affect a message that is only being stored.

There is no such thing as an ideal channel, either transmitted or recorded. Real channels cause some degree of timing error and some proportion of data bits may be incorrect. These deficiencies are addressed by time base correction and error correction. Time base correction requires temporary storage in memory that is then read with a stable clock. Error correction is achieved by adding check bits at the encoder and by comparing them with the data at the decoder. Paradoxically, compressed data are more sensitive to error and need more check bits.

TIME COMPRESSION AND PACKETISING

When real-time signals such as audio and video are converted, the ADC must run at a constant and correct clock rate and it outputs an unbroken stream of samples during the active line. Following a break during blanking, the sample stream resumes. Time compression allows the sample stream to be broken into blocks for convenient handling.

Figure 1.10 shows an ADC feeding a pair of RAMs (random access memories). When one is being written by the ADC, the other can be read, and vice versa. As soon as the first RAM is full, the ADC output switches to the input of the other RAM so that there is no loss of samples. The first RAM can then be read at a higher clock rate than the sampling rate. As a result the RAM is read in less time than it took to write it, and the output from the system then pauses until the second RAM is full. The samples are now time compressed. Instead of being an unbroken stream that is difficult to handle, the samples are now subdivided into smaller groups with convenient pauses in between them. In network transmission, these groups are referred to as packets, whereas in recording, they are commonly known as blocks. The pauses allow numerous processes to take place. A rotary head recorder might spread the data from a frame over several tape tracks; a hard disk might move its heads to another track. In all types of recording and transmission, the time compression of the samples allows time for synchronising patterns, subcode, and error-correction check words to be inserted.

image

FIGURE 1.10

In time compression, the unbroken real-time stream of samples from an ADC is broken up into discrete blocks. This is accomplished by the configuration shown here. Samples are written into one RAM at the sampling rate by the write clock. When the first RAM is full, the switches change over, and writing continues into the second RAM whilst the first is read using a higher-frequency clock. The RAM is read faster than it was written and so all of the data will be output before the other RAM is full. This opens spaces in the data flow, which are used as described in the text.

image

FIGURE 1.11

Time compression is used to shorten the length of track needed by the video. Heavily time-compressed audio samples can then be recorded on the same track using common circuitry.

In digital VTRs, the video data are time compressed so that part of the track is left for audio data. Figure 1.11 shows that heavy time compression of the audio data raises the data rate up to that of the video data so that the same tracks, same heads, and much common circuitry can be used to record both.

MULTIPLEXING PRINCIPLES

Multiplexing is used where several signals are to be transmitted down the same channel. The channel bit rate must be the same as or greater than the sum of the source bit rates. Figure 1.12 shows that when multiplexing is used, the data from each source has to be time compressed. This is done by buffering source data in a memory at the multiplexer. It is written into the memory in real time as it arrives, but will be read from the memory with a clock that has a much higher rate. This means that the readout occurs in a shorter time span. If, for example, the clock frequency is raised by a factor of 10, the data for a given signal will be transmitted in a tenth of the normal time, leaving time in the multiplex for nine more such signals.

In the demultiplexer another buffer memory will be required. Only the data for the selected signal will be written into this memory at the bit rate of the multiplex. When the memory is read at the correct speed, the data will emerge with their original time base.

In practice it is essential to have mechanisms to identify the separate signals to prevent them being mixed up and to convey the original signal clock frequency to the demultiplexer. In time-division multiplexing the time base of the transmission is broken into equal slots, one for each signal. This makes it easy for the demultiplexer, but forces a rigid structure on all the signals such that they must all be locked to one another and have an unchanging bit rate. Packet multiplexing overcomes these limitations. The multiplexer must switch between different time-compressed signals to create the bitstream and this is much easier to organize if each signal is in the form of data packets of constant size. Figure 1.13 shows a packet multiplexing system.

Each packet consists of two components: the header, which identifies the packet, and the payload, which is the data to be transmitted. The header will contain at least an identification code (ID), which is unique for each signal in the multiplex. The demultiplexer checks the ID codes of all incoming packets and discards those that do not have the wanted ID.

In complex systems it is common to have a mechanism to check that packets are not lost or repeated. This is the purpose of the packet continuity count, which is carried in the header. For packets carrying the same ID, the count should increase by 1 from one packet to the next. Upon reaching the maximum binary value, the count overflows and recommences.

image

FIGURE 1.12

Multiplexing requires time compression on each input.

image

FIGURE 1.13

Packet multiplexing relies on headers to identify the packets.

Subsequently, any time compression can be reversed by time expansion. Samples are written into a RAM at the incoming clock rate, but read out at the standard sampling rate. Unless there is a design fault, time compression is totally undetectable. In a recorder, the time expansion stage can be combined with the time base correction stage so that speed variations in the medium can be eliminated at the same time. The use of time compression is universal in digital recording and widely used in transmission. In general the instantaneous data rate at the medium is not the same as the rate at the convertors, although clearly the average rate must be the same.

Another application of time compression is to allow several channels of information to be carried in a single physical transmission. This technique is called multiplexing.

SYNCHRONISATION AND TIME BASE CORRECTION

Figure 1.14a shows a minimal digital video system. This is no more than a point-to-point link that conveys analog video from one place to another. It consists of a pair of convertors and hardware to serialize and de-serialize the samples. There is a need for standardisation in serial transmission so that various devices can be connected together. These standards for digital interfaces are described in Chapter 10.

image

image

FIGURE 1.14

In (a) two convertors are joined by a serial link. Although simple, this system is deficient because it has no means to prevent noise on the clock lines causing jitter at the receiver. In (b) a phase-locked loop is incorporated, which filters jitter from the clock.

Analog video entering the system is converted in the ADC to samples that are expressed as binary numbers. A typical sample would have a word length of 8 bits. The sample is connected in parallel into an output register, which controls the cable drivers. The cable also carries the sampling rate clock. The data are sent to the other end of the line where a slicer selects noise picked up on each signal. Sliced data are then loaded into a receiving register by the clock and sent to the digital-to-analog convertor (DAC), which converts the sample back to an analog voltage.

Following a casual study one might conclude that if the convertors were of transparent quality, the system must be ideal. Unfortunately this is incorrect. As Figure 1.4 showed, noise can change the timing of a sliced signal. Whilst this system rejects noise that threatens to change the numerical value of the samples, it is powerless to prevent noise from causing jitter in the receipt of the word clock. Noise on the word clock means that samples are not converted with a regular time base, and the impairment caused can be noticeable. Stated another way, analog characteristics of the interconnect are not prevented from affecting the reproduced waveform and so the system is not truly digital.

image

FIGURE 1.15

In the frame store, the recording medium is a RAM. Recording time available is short compared with other media, but access to the recording is immediate and flexible, as it is controlled by addressing the RAM.

The jitter problem is overcome in Figure 1.14b by the inclusion of a phase-locked loop, which is an oscillator that synchronises itself to the average frequency of the clock but which filters out the instantaneous jitter. The operation of a phase-locked loop is analogous to the function of the flywheel on a piston engine. The samples are then fed to the convertor with a regular spacing and the impairment is no longer audible. Chapter 4 shows why the effect occurs and deduces the clock accuracy needed for accurate conversion.

The system of Figure 1.14 is extended in Figure 1.15 by the addition of some RAM. What the device does is determined by the way in which the RAM address is controlled. If the RAM address increases by one every time a sample from the ADC is stored in the RAM, a recording can be made for a short period until the RAM is full. The recording can be played back by repeating the address sequence at the same clock rate but reading the memory into the DAC. The result is generally called a frame store.3

If the memory capacity is increased, the device can be used for recording. At a rate of 200 million bits per second, each frame of SDTV needs a megabyte of memory and so the RAM recorder will be restricted to a fairly short playing time.

image

FIGURE 1.16

If the memory address is arranged to come from a counter that overflows, the memory can be made to appear circular. The write address then rotates endlessly, overwriting previous data once per revolution. The read address can follow the write address by a variable distance (not exceeding one revolution) and so a variable delay takes place between reading and writing.

Using compression, the playing time of a RAM-based recorder can be extended. For some applications, a camcorder that stores images in a Flash memory card rather than on disk or tape has advantages. For predetermined images such as test patterns and station IDs, read-only memory (ROM) can be used instead.

If the RAM is used in a different way, it can be written and read at the same time. The device then becomes a synchroniser, which allows video interchange between two systems that are not genlocked. Controlling the relationship between the addresses makes the RAM a variable delay. The addresses are generated by counters that overflow to zero after they have reached a maximum count at the end of a frame. As a result the memory space appears to be circular as shown in Figure 1.16. The read and write addresses chase one another around the circle. If the read address follows close behind the write address, the delay is short. If it stays just ahead of the write address, the maximum delay is reached. If the input and output have identical frame rates, the address relationship will be constant, but if there is a drift, then the address relationship will change slowly. Eventually the addresses will coincide and then cross. Properly handled, this results in the omission or repetition of a frame.

The issue of signal timing has always been critical in analog video, but the adoption of digital routing relaxes the requirements considerably. Analog vision mixers need to be fed by equal-length cables from the router to prevent propagation delay variation. In the digital domain this is no longer an issue as delay is easily obtained and each input of a digital vision mixer can have its own local synchroniser. A synchroniser with less than a frame of RAM can be used to remove static timing errors due, for example, to propagation delays in large systems. The finite RAM capacity gives a finite range of timing error that can be accommodated. This is known as the window. Provided signals are received having timing within the window of the inputs, all inputs are retimed to the same phase within the mixer. Chapter 10 deals with synchronising large systems.

ERROR CORRECTION AND CONCEALMENT

All practical recording and transmission media are imperfect. Magnetic media, for example, suffer from noise and dropouts. In a digital recording of binary data, a bit is either correct or incorrect, with no intermediate stage. Small amounts of noise are rejected, but inevitably, infrequent noise impulses cause some individual bits to be in error. Dropouts cause a larger number of bits in one place to be in error. An error of this kind is called a burst error. Whatever the medium and whatever the nature of the mechanism responsible, data either are recovered correctly or suffer some combination of bit errors and burst errors. In optical disks, random errors can be caused by imperfections in the moulding process, whereas burst errors are due to contamination or scratching of the disk surface.

The visibility of a bit error depends upon which bit of the sample is involved. If the LSB of one sample were in error in a detailed, contrasty picture, the effect would be totally masked and no one could detect it. Conversely, if the MSB of one sample were in error in a flat field, no one could fail to notice the resulting spot. Clearly a means is needed to render errors from the medium inaudible. This is the purpose of error correction. In compression systems, bit errors cause greater difficulty as the result in a variable-length coding scheme may be loss of synchronisation and damage to a significant picture area.

In binary, a bit has only two states. If it is wrong, it is necessary only to reverse the state and it must be right. Thus the correction process is trivial and perfect. The main difficulty is in identifying the bits that are in error. This is done by coding the data; adding redundant bits. Adding redundancy is not confined to digital technology; airliners have several engines and cars have twin braking systems. Clearly the more failures that have to be handled, the more redundancy is needed. If a four-engine airliner is designed to fly normally with one engine failed, three of the engines have enough power to reach cruise speed, and the fourth one is redundant. The amount of redundancy is equal to the amount of failure that can be handled. In the case of the failure of two engines, the plane can still fly, but it must slow down; this is graceful degradation. Clearly the chances of a two-engine failure on the same flight are remote.

In digital recording, the amount of error that can be corrected is proportional to the amount of redundancy, and it will be shown in Chapter 8 that within this limit, the samples are returned to exactly their original value. Consequently corrected samples are undetectable. If the amount of error exceeds the amount of redundancy, correction is not possible, and, to allow graceful degradation, concealment will be used. Concealment is a process in which the value of a missing sample is estimated from those nearby. The estimated sample value is not necessarily exactly the same as the original, and so under some circumstances concealment can be audible, especially if it is frequent. However, in a well-designed system, concealments occur with negligible frequency unless there is an actual fault or problem.

Concealment is made possible by rearranging the sample sequence prior to recording. This is shown in Figure 1.17, where odd-numbered samples are separated from even-numbered samples prior to recording. The odd and even sets of samples may be recorded in different places on the medium, so that an uncorrectable burst error affects only one set. On replay, the samples are recombined into their natural sequence, and the error is now split up so that it results in every other sample being lost in a two-dimensional structure. The picture is now described half as often, but can still be reproduced with some loss of accuracy. This is better than not being reproduced at all, even if it is not perfect. Many digital video recorders use such an odd/even distribution for concealment. Clearly if any errors are fully correctable, the distribution is a waste of time; it is needed only if correction is not possible.

image

FIGURE 1.17

In cases in which the error correction is inadequate, concealment can be used provided that the samples have been ordered appropriately during recording. Odd and even samples are recorded in different places as shown here. As a result an uncorrectable error causes incorrect samples to occur singly, between correct samples. In the example shown, sample 8 is incorrect, but samples 7 and 9 are unaffected and an approximation of the value of sample 8 can be had by taking the average value of the other two. This interpolated value is substituted for the incorrect value.

The presence of an error-correction system means that the video (and audio) quality is independent of the medium/head quality within limits. There is no point in trying to assess the health of a machine by watching a monitor or listening to the audio, as this will not reveal whether the error rate is normal or within a whisker of failure. The only useful procedure is to monitor the frequency with which errors are being corrected and to compare it with normal figures. Professional DVTRs have an error rate display for this purpose and in addition most allow the error-correction system to be disabled for testing.

TRANSMISSION

Transmission is only moving data from one place to another. It can be subdivided in different ways, according, for example, to the purpose to which it is put or the distance involved. In digital video production, transmission over short distances will use standardised interfaces that are unidirectional, real time, and uncompressed. The SDI standard (serial digital interface) and its HD equivalent will be covered in Chapter 10.

Networking is also a form of transmission. Networks can be private and local, or worldwide. In general networks will not work in real time and will use some form of compression, although there are exceptions when only short distances are involved.

Transmission also includes what was traditionally called broadcasting, by which signals are radiated from terrestrial or satellite-based transmitters over a wide area. In genuine broadcasting, everyone with a receiver may view the signal. In some cases payment is required and the transmission may be encrypted to prevent unauthorised viewing. In the case of networks, data are delivered only to specified addresses, and in the case of a service for which payment is required, data would be sent only to addresses known to have paid. This has a security advantage over encrypted services, because the encryption is often bypassed by those with the necessary skills.

PRODUCT CODES

Digital systems such as broadcasting, optical disks, and magnetic recorders are prone to burst errors. Adding redundancy equal to the size of expected bursts to every code is inefficient. Figure 1.18a shows that the efficiency of the system can be raised using interleaving. Sequential samples from the ADC are assembled into codes, but these are not recorded/transmitted in their natural sequence. A number of sequential codes are assembled along rows in a memory. When the memory is full, it is copied to the medium by reading down columns. Subsequently, the samples need to be de-interleaved to return them to their natural sequence. This is done by writing samples from tape into a memory in columns, and when it is full, the memory is read in rows. Samples read from the memory are now in their original sequence so there is no effect on the information. However, if a burst error occurs, as is shown outlined on the diagram, it will damage sequential samples in a vertical direction in the de-interleave memory. When the memory is read, a single large error is broken down into a number of small errors whose size is exactly equal to the correcting power of the codes and the correction is performed with maximum efficiency.

image

image

FIGURE 1.18

(a) Interleaving is essential to make error-correction schemes more efficient. Samples written sequentially in rows into a memory have redundancy P added to each row. The memory is then read in columns and the data are sent to the recording medium. On replay the nonsequential samples from the medium are de-interleaved to return them to their normal sequence. This breaks up the burst error (outlined) into one error symbol per row in the memory, which can be corrected by the redundancy P.

An extension of the process of interleaving is one in which the memory array has not only rows made into code words, but also columns made into code words by the addition of vertical redundancy. This is known as a product code. Figure 1.18b shows that in a product code the redundancy calculated first and checked last is called the outer code, and the redundancy calculated second and checked first is called the inner code. The inner code is formed along tracks on the medium. Random errors due to noise are corrected by the inner code and do not impair the burst-correcting power of the outer code. Burst errors are declared uncorrectable by the inner code, which flags the bad samples on the way into the de-interleave memory. The outer code reads the error flags to locate the erroneous data. As it does not have to compute the error locations, the outer code can correct more errors.

STORAGE

Given a competent error- and time base-correction system, it is impossible to tell what medium was used to store digital video, as in each case the reproduced data would be numerically and temporally identical. Thus the traditional analog approach of choosing the medium for its picture quality no longer applies, as the quality is determined elsewhere. Instead, storage media are selected using other attributes, including cost per bit and access time. Subsidiary attributes include ruggedness, behaviour during power loss, and whether the medium is interchangeable.

NOISE AND PROBABILITY

Probability is a useful concept when dealing with processes that are not completely predictable. Thermal noise in electronic components is random, and although under given conditions the noise power in a system may be constant, this value determines only the heat that would be developed in a resistive load. In digital systems, it is the instantaneous voltage of noise that is of interest, because it is a form of interference that could alter the state of a bit if it were large enough. Unfortunately the instantaneous voltage cannot be predicted; indeed if it could the interference could not be called noise. Noise can be quantified statistically only by measuring or predicting the likelihood of a given noise amplitude.

Figure 1.19 shows a graph relating the probability of occurrence to the amplitude of noise. The noise amplitude increases away from the origin along the horizontal axis, and for any amplitude of interest, the probability of that noise amplitude occurring can be read from the curve. The shape of the curve is known as a Gaussian distribution, which crops up whenever the overall effect of a large number of independent phenomena is considered. Thermal noise is due to the contributions from countless molecules in the component concerned. Magnetic recording depends on superimposing some average magnetism on vast numbers of magnetic particles.

If it were possible to isolate an individual noise-generating microcosm of a tape or a head on the molecular scale, the noise it could generate would have physical limits because of the finite energy present. The noise distribution might then be rectangular as shown in Figure 1.20a, where all amplitudes below the physical limit are equally likely. The output of a random number generator can have a uniform probability if each possible value occurs once per sequence. If the combined effect of two of these uniform probability processes is considered, clearly the maximum amplitude is now doubled, because the two effects can add; but provided the two effects are uncorrelated, they may also subtract, so the probability is no longer rectangular, but becomes triangular as in Figure 1.20b. The probability falls to 0 at peak amplitude because the chances of two independent mechanisms reaching their peak value with the same polarity at the same time are understandably small.

If the number of mechanisms summed together is now allowed to increase without limit, the result is the Gaussian curve shown in Figure 1.20c, in which it will be seen that the curve has no amplitude limit, because it is just possible that all mechanisms will simultaneously reach their peak value together, although the chances of this happening are incredibly remote. Thus the Gaussian curve is the overall probability of a large number of uncorrelated uniform processes.

image

FIGURE 1.19

White noise in analog circuits generally has the Gaussian amplitude distribution shown.

image

FIGURE 1.20

(a) A rectangular probability function; all values are equally likely but fall between physical limits. (b) The sum of two rectangular probability functions, which is triangular. (c) The Gaussian curve, which is the sum of an infinite number of rectangular probability functions.

image

FIGURE 1.21

Different storage media have different combinations of attributes and no one technology is superior in all respects.

Figure 1.21 shows that cost per bit and access time are generally contra dictory. The fastest devices are solid state, including RAM and Flash memory; next comes the hard disk, followed by tape, hampered by the need to wind to the correct spot. However, if the cost per bit is considered, the order is reversed. Tape is a very inexpensive and simple medium.

The fastest and largest capacity disks are magnetic and these generally are not interchangeable. However, optical disks generally do allow interchange. Some of these are intended for mass replication and cannot be recorded (ROM or read only memory disks), some can be recorded once only (R, or recordable), and some can be rerecorded indefinitely (RW, or read–write). These will be contrasted in Chapter 9.

image

FIGURE 1.22

In a hard disk recorder, a large-capacity memory is used as a buffer or time base corrector between the convertors and the disk. The memory allows the convertors to run constantly despite the interruptions in disk transfer caused by the head moving between tracks.

The magnetic disk drive was perfected by the computer industry to allow rapid random access to data, and so it makes an ideal medium for editing. As will be seen in Chapter 9, the heads do not touch the disk, but are supported on a thin air film, which gives them a long life. The rapid access of disks has made them extremely popular and accordingly a great deal of development has taken place resulting in storage capacities that stretch the imagination. Unfortunately the same research funds have not been available for tape, whose potential is also massive but underexplored.

The economics of computers cannot be ignored and instead of constructing large-capacity disks for special purposes, the economic solution is to use arrays of mass-produced disk drives in devices known as file servers.

The disk drive suffers from intermittent data transfer owing to the need to reposition the heads. Figure 1.22 shows that disk-based devices rely on a quantity of RAM acting as a buffer between the real-time video environment and the intermittent data environment.

Figure 1.23 shows the block diagram of a camcorder based on hard disks and compression. The recording time and picture quality may not compete with full-bandwidth tape-based devices, but following acquisition the disks can be used directly in an edit system, allowing a useful time saving in ENG (Electronic News Gathering) applications.

image

FIGURE 1.23

In a disk-based camcorder, the PCM data rate from the camera may be too high for direct recording on disk. Compression is used to cut the bit rate and extend playing time. If a standard file structure is used, disks may physically be transferred to an edit system after recording.

The rotary head recorder has the advantage that the spinning heads create a high head-to-tape speed, offering a high bit rate recording without high tape speed. Whilst mechanically complex, the rotary head transport has been raised to a high degree of refinement and offers the lowest cost per bit of all digital recorders.4 Digital VTRs segment incoming fields into several tape tracks and invisibly reassemble them in memory on replay to keep the tracks reasonably short.

Figure 1.19 shows a representative block diagram of a DVTR. Following the convertors, a compression process may be found. In an uncompressed recorder, there will be a distribution of odd and even samples and a shuffle process for concealment purposes. An interleaved product code will be formed prior to the channel coding stage, which produces the recorded waveform. On replay the data separator decodes the channel code and the inner and outer codes perform correction. Following the deshuffle the data channels are recombined and any necessary concealment will take place. Any compression will be decoded prior to the output convertors. Chapter 9 considers rotary head recorders.

VIDEO COMPRESSION AND MPEG

In its native form, digital video suffers from an extremely high data rate, particularly in high definition. One approach to the problem is to use compression that reduces the bit rate significantly with a moderate loss of subjective quality of the picture. The human eye is not equally sensitive to all spatial frequencies, so some coding gain can be obtained by using fewer bits to describe the less visible frequencies. Video images typically contain a great deal of redundancy in that flat areas contain the same pixel value repeated many times. Furthermore, in many cases there is little difference between one picture and the next, and compression can be achieved by sending only the differences.

image

FIGURE 1.24

Block diagram of a digital VTR. Note optional compression unit, which may be used to allow a common transport to record a variety of formats.

Whilst these techniques may achieve considerable reduction in bit rate, it must be appreciated that compression systems reintroduce the generation loss of the analog domain to digital systems. As a result high compression factors are suitable only for final delivery of fully produced material to the viewer.

For editing purposes, compression must be restricted to exploiting the redundancy within each picture individually. If a mild compression factor is used, multiple generation work is possible without artifacts becoming visible. Where offline or remote editing is used (see Chapter 5), higher compression factors may be used as the impaired pictures are not seen by the viewer.

Clearly a consumer DVTR or PVR needs only single-generation operation and has simple editing requirements. A much greater degree of compression can then be used, which may take advantage of redundancy between fields. The same is true for broadcasting, in which bandwidth is at a premium. A similar approach may be used in disk-based camcorders that are intended for ENG purposes.

The future of television broadcasting (and of any high-definition television) lies completely in compression technology. Compression requires an encoder prior to the medium and a compatible decoder after it. Extensive consumer use of compression could not occur without suitable standards. The ISO-MPEG coding standards were specifically designed to allow wide interchange of compressed video data. Digital television broadcasting and the digital video disc both use MPEG standard bitstreams that are detailed in Chapter 6.

image

FIGURE 1.25

The bitstream types of MPEG-2. See text for details.

STATISTICAL MULTIPLEXING

Packet multiplexing has advantages over time-division multiplexing because it does not set the bit rate of each signal. A demultiplexer simply checks packet IDs and selects all packets with the wanted code. It will do this however frequently such packets arrive. Consequently it is practicable to have variable bit rate signals in a packet multiplex. The multiplexer has to ensure that the total bit rate does not exceed the rate of the channel, but that rate can be allocated arbitrarily between the various signals.

As a practical matter it is usually necessary to keep the bit rate of the multiplex constant. With variable rate inputs this is done by creating null packets that are generally called stuffing or padding. The headers of these packets contain a unique ID that the demultiplexer does not recognize and so these packets are discarded on arrival.

In an MPEG environment, statistical multiplexing can be extremely useful because it allows for the varying difficulty of real program material. In a multiplex of several television program, it is unlikely that all the programs will encounter difficult material simultaneously. When one program encounters a detailed scene or frequent cuts that are hard to compress, more data rate can be allocated at the allowable expense of the remaining programs that are handling easy material.

Figure 1.20 shows that the output of a single compressor is called an elementary stream. In practice audio and video streams of this type can be combined using multiplexing. The program stream is optimized for recording and the multiplexing is based on blocks of arbitrary size. The transport stream is optimized for transmission and is based on blocks of constant size. In production equipment such as disk-based workstations and VTRs that are designed for editing, the MPEG standard is less useful and many successful products use non-MPEG compression.

Compression and the corresponding decoding are complex processes and take time, adding to existing delays in signal paths. Concealment of uncorrectable errors is also more difficult on compressed data.

REAL TIME?

Analog television causes such a small delay to the signals that pictures seen in the home from a live broadcast are substantially instantaneous. With the advent of digital technology innumerable sources of delay have crept in. Techniques such as multiplexing, error correction, and particularly compression all cause delay, as does the subsequent time base correction. Consequently with digital television there is no longer any real-time television in a strict interpretation. This can easily be verified by visiting a retailer demonstrating analog and digital televisions on the same broadcast channel, where the digital channel will be seen to be obviously behind the analog channel.

ASYNCHRONOUS, SYNCHRONOUS, AND ISOCHRONOUS SYSTEMS

In generic data transmission, the data do not have an implied time base and simply need to be received correctly. In this case it does not matter if the transmission is intermittent or has a variable bit rate. This type of transmission is asynchronous; the date rate has no fixed relation to any timing reference.

Digital interfaces used in TV production are intended to work in real time and thus use dedicated cables that run at a fixed bit rate that has some predetermined relationship with the picture scanning frequencies. This is a synchronous transmission.

Compression and networking, alone or together, are fundamentally incompatible with synchronous systems. Compressors produce a bit rate that varies with picture complexity and networks have to share a single resource between a multitude of unpredictable demands.

The solution is the isochronous system. The receiver has a system that accurately reconstructs the original time base of the signal from specially transmitted codes. The encoder and decoder both have a significant quantity of buffer memory so that the transmission between them can be intermittent, typically using packets. Provided enough packets are sent that the receiver buffer is never empty, a continuously decoded video signal can be output. It does not matter exactly when the packets are sent. In some networks it is possible to prioritise isochronous data so that if a receiver buffer is in danger of becoming empty, transmission of packets to replenish the buffer take priority over generic data. Digital television broadcasts are generally isochronous. Clearly the buffering memories add further delay to that already due to error correction and compression.

A further issue is that the digital production process also causes significant delay, especially where special effects are introduced. The result all too often is that the timing of the sound with respect to the picture slips, leading to obvious loss of “lip-sync.” Lip sync loss is often blamed on digital transmission, but in fact MPEG transport streams have extremely accurate time base reconstruction. Generally loss of lip sync on a television is seen because that is the way the broadcaster is presenting it.

DIGITAL AUDIO

Audio was traditionally the poor relation in television, with the poor technical quality of analog TV sound compounded by the miserable loudspeaker fitted in many TV sets. The introduction of the Compact Disc served to raise the consumer's expectations in audio and this was shortly followed in many countries by the introduction of the NICAM 728 system, which added stereo digital audio to analog broadcast TV.

With the advent of digital television broadcasting, the audio is naturally digital, but also compressed, with the choice of format divided between the ISO-MPEG audio coding standards and the Dolby AC-3 system. These are compared in Chapter 7. In the digital domain, where multiplexing is easy, any number of audio channels can be combined in one channel. HDTV broadcasting also offers “surround sound.” The form of surround sound appears to have been taken directly from cinema practice and may not be optimal for the domestic environment or even give the best possible quality.

DIGITAL CINEMA

Digital cinema is a good example of how digital technology solves a number of problems at once. The traditional cinema relies on the physical distribution of release prints. These are heavy and expensive. The number of release prints is limited and so is the number of cinemas that can simultaneously show the same title. A further concern is that unauthorised copies of movies can be made onto videotape or recordable DVDs and these can then be duplicated and sold.

In digital cinema, there is no film. The projector is a data projector and requires a suitable data input. This will typically be provided from a file server based on hard disks, which in turn obtain the data over a network. If suitably robust encryption is used, the transmitted data are essentially meaningless to unauthorised people. Generally the data remain in the encrypted state on the file server and the decryption takes place in the projector itself so that the opportunities for unauthorised copying are severely limited.

The file server does not need to download in real time and there is no limit to the number of cinemas that can show the same title. Naturally, without film there is no possibility of film damage and the picture quality will not deteriorate over time.

References

1. Ginsburg, C.P. Comprehensive description of the Ampex video tape recorder. SMPTE J., 66, 177–182 (1957).

2. Devereux, V.G. Pulse code modulation of video signals: 8 bit coder and decoder. BBC Res. Dept. Rept., EL-42 No. 25 (1970).

3. Pursell, S., and Newby, H. Digital frame store for television video. SMPTE J., 82, 402–403 (1973).

4. Baldwin, J.L.E. Digital television recording history and background. SMPTE J., 95, 1206–1214 (1986).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset