Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

This complex subject is difficult to approach because of the extensive interrelations between different areas. Figure 7.1 illustrates the number of dimensions involved. An image is two-dimensional and so there are the horizontal and vertical picture axes to consider. In film, the picture is continuous and sampling occurs only in the time axis. In analog television, the horizontal axis of the picture is continuous but the vertical axis is sampled into lines. In digital imaging the picture will be sampled in both axes to produce an array of pixels having a certain resolution. This chapter includes the theory necessary to link pixel counts to resolution.

In a monochrome image, each pixel describes a single quantity, namely the brightness. The range of this quantity determines the maximum contrast. At the bottom of the range there will be a noise floor. The relationship between the numerical value and the brightness may be linear or non-linear. In a coloured image each pixel becomes a vector, or multi-dimensional quantity which describes the hue and saturation in some way in addition to the brightness.

Figure 7.1 The dimensions involved in moving-image portrayal.

If motion is to be portrayed, the picture must be updated at the picture rate. Of course, the term ‘moving pictures’ is a misnomer; the pictures don’t move at all, but instead attempt to create an illusion of movement in the mind of the viewer with varying degrees of success. A moving object simply adopts a different location in each picture. The object is not moving along the time axis, but along a fourth axis known as the optic flow axis. The optic flow axis is identified by motioncompensated standards convertors to eliminate judder and also by noise reducers and MPEG compressors because the greatest similarity from one picture to the next is along that axis. The success of these devices is testimony to the importance of the theory which will be considered in some depth here.

7.2 Film

Film is the oldest of the moving image-portrayal systems and standards were established well before any scientific understanding existed. Originally, 35 mm ‘silent’ film ran at a 18 frames per second, but on the introduction of ‘talking pictures’ it was found that the bandwidth of the linear optically modulated soundtrack was inadequate and the frame rate was raised to 24 Hz. Thus no psycho-optic criteria were used in determining the frame rate of film.

It is often stated that the resolution available from film cannot be matched by electronic imaging techniques, but this is only true if static resolution is considered and even then digital imaging is an advancing art. The very fine grain structure of the latest cinema film stock gives extremely high static resolution, but unfortunately the viewer cannot enjoy it.

It is impossible to project film at 24 Hz because the extreme flicker is too distressing. The conventional solution is shown in Figure 7.2. The projector has a pulldown mechanism which replaces one frame with the next while a rotating shutter cuts off the light. This shutter is fitted with two blades instead of one so that the light is admitted twice per frame.

Figure 7.2 In conventional film projection, the shutter is opened twice for each film frame so that with a frame rate of 24 Hz, a flicker frequency of 48 Hz is seen.

Figure 7.3 (a) The optic flow axis of the original scene is distorted by the double projection of each frame. (b) A tracking eye sees a double image in the presence of motion.

This raises the flicker frequency to 48 Hz, which is desirable, but a number of undesirable artifacts are also introduced.

Figure 7.3(a) shows that when an object moves, the frame repeat mechanism does not portray the motion properly. Instead of an accurate portrayal of the position of the object at 48 Hz, the position in each frame is simply repeated. The optic flow axis is not smooth, but is distorted. When the eye attempts to track a moving object, Figure 7.3(b) shows that the result will be a double image. The effect depends on the speed of motion.

At low speeds, the eye sees two images superimposed with a slight shift. This has the effect of cancelling out high-frequency detail in the direction of motion so the resolution is reduced. At higher speeds, two distinct images are seen.

A further effect of frame repeat is that although flicker rate is 48 Hz, the rate at which a new background location is presented is only 24 Hz. The result is that background strobing (see section 6.11) is extremely obvious.

The already limited dynamic resolution of film is further reduced by the phenomena of weave and hop. Weave is where the film moves laterally in the camera or projector and hop is where the film is not pulled down by exactly the same distance every frame. Film frames can also bow during projection so that they are not flat. This has the effect of a slight variation in magnification.

The larger film formats, such as 70 mm, primarily look better because the projector doesn’t need such high magnification and the effect of weave and hop is reduced. It is also easier to get adequate light through the larger frames. Mike Todd, the developer of the Todd-AO 70 mm film standard, was aware of the dynamic problems of film and wanted to raise the frame rate to 30 Hz. To this day 70 mm projectors are still capable of running at 30 Hz, but Hollywood conservatism meant that the frame rate soon reverted to 24 Hz to make it easier to release films in both formats.

Although the reasons for these artifacts may not be understood, their existence certainly is. Cinematographers have to film in such a way that these artifacts are moderated. As film cannot portray rapid motion, cameras have to mounted very solidly and they must use fluid-damped pan and tilt bearings. Zooms must be very slow. Tracking shots are used where the camera travels at the same speed as the dominant action. The camera will often be mounted on a wheeled dolly for this purpose, and outdoors temporary rails will be laid to allow smooth tracking.

The serious level of background strobing is addressed by using a very large lens aperture to give a shallow depth of field. This requires the continuous attention of the focus puller, but puts the background out of focus, masking the effect of the 24 Hz strobing. Unfortunately in bright light the use of a large aperture results in a short exposure for each frame. This has the effect of making the background strobing more obvious because the short exposure reduces the amount of smear. A better result may be obtained by using a neutral density filter so that the large aperture can be retained with a longer exposure. The techniques needed to overcome the basic limitations of frame repeat have led to the ‘film look’.

The gulf between static and dynamic resolution in film means that the actual resolution needed in electronic projections systems does not have to be very high to give a similar experience to the viewer.

The frames of early formats were not far from square in order to operate within the limited coverage of early lenses. To combat the competition from television, cinema adopted a wider screen format, but it went about it exactly the wrong way, using anamorphic lenses on camera and projector which squeezed the subject laterally onto the film and expanded it again during projection. Anamorphic optics place great demands on the film resolution in the horizontal axis, which is wasted in the vertical axis. The reason that the cinema industry got away with using anamorphic optics was partly because the dynamic resolution was so poor the loss was masked.

Figure 7.4 A film format suitable for convergent systems. The frame height is halved to create a wide frame without anamorphic optics and to allow a suitable frame rate without increasing the film speed.

Figure 7.4 shows a film format which is more appropriate to convergent systems. The conventional 35 mm four-perf. frame is effectively cut in half to produce a two-perf. format. This is an existing film format. Two-perf. has a number of advantages. It allows the frame rate to be doubled without increasing the film speed and it allows widescreen frames without the use of anamorphic optics. It might be thought that there would be a loss of static resolution due to the smaller frames, but this is not the case. To maintain resolution anamorphic film frames need to be bigger by the anamorphic ratio. For example, with 2:1 anamorphism, the film grain limits the horizontal resolution, whereas the vertical resolution is twice as good as it needs to be, meaning that half of the film area is wasted.

Two-perf. film running at 48, 50 or 60 Hz allows direct projection with a single blade shutter giving precise re-creation of motion and allowing the dynamic resolution to rise dramatically. The visibility of film grain is reduced because the grain in each frame is different and the higher frame rate allows the temporal filtering of the eye to operate. Such a format is also directly compatible with television standards.

Colour is obtained in film projection by subtractive filtering. White light from the projector’s lamp passes through three different layers in the film in turn. Each layer is designed to attenuate a certain portion of the spectrum according to the density of the layer. If all the layers have maximum density, no light can pass. Variations in the relative density of the three layers allow different colours to be shown.

Colour accuracy in film is rather poor, but film manufacturers strive to make their product consistent so that the colour rendering stays the same for the entire movie. When film is transferred to the video or data domains, it is almost certain that the colour will appear distorted in some way and a process known as colour correction will be needed to give acceptable results.

7.3 Spatial sampling

Spatial sampling is a two-dimensional version of sampling, but Shannon’s theory still applies. For the highest quality and realism, there must be no visible sampling artifacts. In legacy television sytems, there has always been a tradeoff between resolution and the level of artifacts such as aliasing and visible line structure. This was acceptable at the time these systems were designed over fifty years ago, but today it does not have to be accepted.

Figure 7.5 shows that to eliminate sampling artifacts requires a formal approach to Shannon perfect reconstruction as was introduced in Chapter 4. There must be a spatial anti-aliasing filter mechanism associated with the image sensor and a spatial reconstruction filter mechanism associated with the display. These filters and the sampling rate used must be specified at the same time to give the required resolution.

Figure 7.5 Ideal reconstruction of sampled images requires the approach shown here. This is impossible for several reasons discussed in the text.

It is desirable to prevent spatial aliasing, since the result is visually irritating. CCD sensors can alias in both horizontal and vertical dimensions, and so an anti-aliasing optical filter may be fitted between the lens and the sensor of a conventional camera. This takes the form of a plate which diffuses the image formed by the lens. Such a device can never have a sharp cut-off, and so there are effectively two choices. If aliasing is permitted, the theoretical information rate of the system can be approached. If aliasing is prevented, the information conveyed is below system capacity.

These considerations also apply at the television display. The display ought to filter out spatial frequencies above one half the spatial sampling rate. In a conventional CRT this means that a vertical optical filter should be fitted in front of the screen to render the raster invisible. Again the poor slope of a simply realizable filter would attenuate too much of the wanted spectrum, and so the technique is not used.

As the eye is axisymmetric, the resolution in the horizontal and vertical axes is the same and this would suggest that the vertical and horizontal sample spacing should also be the same. This is the origin of the term ‘square pixels’ shown in Figure 7.6(a) in which samples are taken in rows and columns on a grid. Transform duality suggests that the resulting two-dimensional spectrum will also have a grid structure. The corresponding spectrum is shown in (b). The baseband spectrum is in the centre of the diagram, and the repeating sampling sideband spectrum extends vertically and horizontally. The star-shaped spectrum is purely symbolic and results from viewing an image of a man-made object such as a building containing primarily horizontal and vertical elements. A pastoral scene would result in a more circular or elliptical spectrum. In order to return to the baseband image, the sidebands must be filtered out with a two-dimensional spatial filter. The shape of the two-dimensional frequency response shown in Figure 7.6(c) is known as a Brillouin zone.

Figure 7.6(d) shows an alternative sampling site matrix known as quincunx sampling because of the similarity to the pattern of five dots on a die. The resultant spectrum has the same characteristic pattern as shown in (e). Quincuncx sampling offers more possibilities for the shape of the Brillouin zones. Any shape which will tesselate can in principle be used. Figure 7.6(f) shows a diamond-shaped zone, whereas (g) shows cross-shaped zones. Clearly none of these two-dimensional frequency responses can be implemented in optical filters, but they can be implemented in the digital domain following optical sampling at higher rates.

Quincunx sampling attracted attention at one time because it has some interesting properties. Considering the horizontal azis, the highest horizontal frequency in the baseband is more than half of the horizontal sampling rate, as is the case for the vertical axis. We appear to be violating sampling theory, hence the term ‘sub-Nyquist sampling’ which will be found in connection with quincunx sampling. In fact there is no violation whatsoever, simply a compromise. The increase in horizontal and vertical resolution is achieved at the expense of reduced diagonal resolution.

Taking Nyquist spatial sampling one axis at a time doesn’t convey the whole truth. It may be appropriate to have an additional criterion for two dimensions as follows: ‘When sampling in two dimensions, the product of the vertical and horizontal sampling rates must be at least four times the product of the vertical and horizontal spatial bandwidths.’

Once this is appreciated, it becomes clear that sub-Nyquist sampling is an oxymoron. Quincunx sampling causes practical difficulties because it does not fit conveniently with data arrays in computing or with many types of sensor or display, although certain digital cameras for still images use quincunx sensors. With the development of compression techniques such as MPEG, the pressure to reduce sampling rates by arcane sampling methods disappeared.

Figure 7.6 Image sampling spectra. The rectangular array of (a) has a spectrum shown at (b) having a rectangular repeating structure. Filtering to return to the baseband requires a two-dimensional filter whose response lies within the Brillouin zone shown at (c).

Figure 7.6 (Continued) Quincunx sampling is shown at (d) to have a similar spectral structure (e). An appropriate Brillouin zone is required as at (f). (g) An alternative Brillouin zone for quincunx sampling.

7.4 Spatial aperture effect

Figure 7.7(a) shows a test image consisting of alternating black and white bars. Considered as a spatial waveform, the modulation is a square wave which contains an indefinite series of odd harmonics in addition to the fundamental. This infinite bandwidth cannot be passed by any lens or sensor. Figure 7.7(b) shows that in practice the lens and sensor will both suffer from an aperture effect in addition to the effect of any deliberate anti-aliasing filter. These effects mean that conventional Nyquist rate sampling of images always gives relatively poor performance. The resolution achieved in practice is always significantly less than the pixel count would suggest.

Figure 7.7 Ideal test image at (a) consists of alternating black and white bars. After the aperture effects of the lens and the sensor have been considered, there will be considerable softening of the edges corresponding to a loss of high spatial frequencies.

Figure 7.8 F(a) CRT spot has a Gaussian intensity distribution and a Gaussian spatial frequency response. (b) CCD pixel has a rectangular aperture and a sinx/x spatial frequency response.

As the MTF of an aperture effect is given by the Fourier transform of the aperture function, analysis in the frequency domain must give a consistent result. The spatial frequency spectrum of the output can be obtained by multiplying the input spectrum by the frequency response of the lens and sensor.

Figure 7.8 shows some examples. At (a) the Fourier transform of a Gaussian impulse is seen also to be Gaussian. A cathode ray tube with a spot having a Gaussian intensity distribution will also have a Gaussian spatial frequency response. At (b) a CCD camera has discrete square sensors and a rectangular aperture function. Its frequency response will be the Fourier transform of a rectangle, which is a sinx/x function.

7.5 Spatial oversampling

Oversampling means using a sampling rate which is greater (generally substantially greater) than the Nyquist rate. As was seen in Chapter 4, sampling only works at the Nyquist rate if the samples are taken and reproduced as points with ideal filtering. In real imaging systems, samples are sensed and reproduced as finite areas and ideal optical filters are impossible. Oversampling allows these sensor issues substantially to be overcome. Figure 7.9 shows how spatial oversampling can be used to increase the resolution of an imaging system. Assuming a 720 × 400 pixel system, Figure 7.9(a) shows that the aperture effect would result in an early rolloff of the MTF. Instead a 1440 × 800 pixel sensor is used, having a response shown at (b). This outputs four times as much data, but if these data are passed into a two-dimensional low-pass filter which decimates by a factor of two in each axis, the original bit rate will be obtained once more. This will be a digital filter which can have arbitrarily accurate peformance, including a flat passband and steep cut-off slope. The combination of the aperture effect of the 1440 × 800 pixel camera and the LPF gives a spatial frequency response which is shown in (c). This is better than could be achieved with a 720 × 400 camera. The improvement in subjective quality is quite noticeable in practice.

Figure 7.9 Spatial oversampling in a sensor. At (a) 720 × 400 pixel sensor and its spatial frequency response. (b) 1440 × 800 pixel sensor and response. (c) Output of sensor (b) after downsampling to 720 × 400 pixels.

Figure 7.10 Spatial oversampling in a display. At (a) with conventional CRT display the line structure is visible and the spatial frequency response is impaired. At (b) an interpolator doubles the number of lines. The overlap of the intensity functions renders the raster less visible and improves the spatial frequency response.

In the case of display technology, oversampling can also be used, this time to render the raster or pixels invisible and to improve the aperture of the display. Once more a filter is required, but this now, for example, doubles the number of input pixels in each axis using interpolation. Again the filter can have arbitrarily high accuracy. The aperture effect of the display does not affect the passband of the input signal because of the use of oversampling, but it can instead be used to reduce the visibility of the line structure. Figure 7.10 shows that if the number of lines is doubled in an interpolator, the intensity function of the CRT spot is not halved in diameter, but is reduced by a smaller amount. The partial overlapping of the intensity functions reduces the depth of modulation at the new artificially small line spacing.

Figure 7.11 shows a system in which oversampling is used at both ends of the channel to obtain a higher resolution without increasing the channel data rate.

Figure 7.11 An oversampling television system in which both display and camera oversample so that an existing line standard can be used. Techniques such as this have effectively made high-definition television broadcasting unnecessary.

7.6 Temporal aperture effects

The conventional view of sampled moving images is that shown in Figure 7.12(a) in which there are three axes, vertical, horizontal and temporal. These orthogonal axes would suggest that what happens in, for example, the time axis does not affect the image plane. This is inadequate to explain the experience of the human viewer. It might be thought that the temporal sampling process could be returned to the continuous time domain by a temporal filter. In fact temporal filters destroy image quality in the presence of motion and should be avoided. The only acceptable temporal filter in a moving image-portrayal system is the persistence of vision of the eye. Although this is temporal with respect to the eye, the fact that the eye can track means that persistence of vision does not filter on the time axis of the display.

Figure 7.12 (a) The conventional view of image-portrayal systems is that everything can be explained in three axes, x, y and time. This is inadequate and (b) shows that the optic flow axis is necessary to explain how a tracking eye perceives motion portrayal.

Figure 7.12(b) shows that it is necessary to consider a fourth axis, namely the optic flow axis. The optic flow axis is not parallel to the time axis when there is motion. The HVS is watching along the optic flow axis and because it is not orthogonal to the image plane, it has a component in the image plane. Thus temporal filtering in the system does not have the same effect as persistence of vision.

The result is that events on the time axis can affect the image. Figure 7.13(a) shows that the ideal mechanism is for the image to be captured and displayed at a single vanishingly short point on the time axis, as the perfect reconstruction theory of sampling would suggest. In practice this is not possible, as finite light energy has to fall on all sensors and be created by all displays and this takes time. The result is a temporal aperture effect. Figure 7.13(b) shows that this reflects in the optic flow axis to cause smear in the image plane which reduces resolution in moving objects.

Figure 7.13 (a) Ideal sampling requires images to be sampled in an instant. This is impossible as all practical sensors require finite time to operate. This sampling time is a temporal aperture effect. As (b) shows, the temporal aperture reflects in the optic flow axis to cause image smear on the sensor.

The eye can resolve detail in moving objects by tracking and there is no fundamental reason why this should not be possible in a well-engineered image-portrayal system. These are, however, extremely rare. In most systems the level of motion-induced artifacts is so high that it is often better deliberately to induce smear to disguise what is happening. This is so common that it has led to the misguided belief that there will always be motion blur.

7.7 Analog video

It is difficult to convey two-dimensional images from one place to another directly, whereas electrical and radio signals are easily carried. The problem is to convert a two-dimensional image into a single voltage changing with time. The solution is to use the principle of scanning shown in Figure 7.14(a). The camera produces a video signals whose voltage is a function of the image brightness at a single point on the sensor. This voltage is converted back to the brightness of the same point on the display. The points on the sensor and display must be scanned synchronously if the picture is to be re-created properly. If this is done rapidly enough it is largely invisible to the eye. Figure 7.14(b) shows that the scanning is controlled by a triangular or sawtooth waveform in each dimension which causes a constant speed forward scan followed by a rapid return or flyback. As the horizontal scan is much more rapid than the vertical scan the image is broken up into lines which are not quite horizontal.

In the example of Figure 7.14(b), the horizontal scanning frequency or line rate, F_h, is an integer multiple of the vertical scanning frequency or frame rate and a progressive scan system results in which every frame is identical. Figure 7.14(c) shows an interlaced scan system in which there is an integer number of lines in two vertical scans or fields. The first field begins with a full line and ends on a half line and the second field begins with a half line and ends with a full line. The lines from the two fields interlace or mesh on the screen. Terrestrial analog broadcast systems such as PAL and NTSC use interlace. The additional complication of interlace has both merits and drawbacks which will be discussed in section 7.10.

Figure 7.14 Scanning converts two-dimensional images into a signal which can be sent electrically. In (a) the scanning of camera and display must be identical. The scanning is controlled by horizontal and vertical sawtooth waveforms (b).

Figure 7.14 (Continued) Where two vertical scans are needed to complete a whole number of lines, the scan is interlaced as shown in (c). The frame is now split into two fields.

7.8 Synchronizing

It is vital that the horizontal and vertical scanning at the camera is simultaneously replicated at the display. This is the job of the synchronizing or sync system which must send timing information to the display alongside the video signal. In very early television equipment this was achieved using two quite separate or non-composites signals. Figure 7.15(a) shows one of the first (US) television signal standards in which the video waveform had an amplitude of 1 volt peak to peak and the sync signal had an amplitude of 4 volts peak to peak. In practice, it was more convenient to combine both into a single electrical waveform then called composite video which carries the synchronizing information as well as the scanned brightness signal. The single signal is effectively shared by using some of the flyback period for synchronizing. The 4 volt sync signal was attenuated by a factor of ten and added to the video to produce a 1.4 volt peak to peak signal. This was the origin of the 10:4 video:sync relationship of US television practice. Later the amplitude was reduced to 1 volt peak to peak so that the signal had the same range as the original non-composite video. The 10:4 ratio was retained. As Figure 7.15(b) shows, this ratio results in some rather odd voltages and to simplify matters, a new unit called the IRE unit (after the Institute of Radio Engineers) was devised. Originally this was defined as 1 per cent of the video voltage swing, independent of the actual amplitude in use, but it came in practice to mean 1 per cent of 0.714 volt. In European systems shown in Figure 7.15(c) the messy numbers were avoided by using a 7:3 ratio and the waveforms are always measured in millivolts. Whilst such a signal was originally called composite video, today it would be referred to as monochrome video or Ys, meaning luma carrying syncs although in practice the s is often omitted.

Figure 7.15(d) shows how the two signals are separated. The voltage swing needed to go from black to peak white is less than the total swing available. In a standard analog video signal the maximum amplitude is 1 volt peak-to-peak. The upper part of the voltage range represents the variations in brightness of the image from black to white. Signals below that range are ‘blacker than black’ and cannot be seen on the display. These signals are used for synchronizing.

Figure 7.16(a) shows the line synchronizing system part-way through a field or frame. The part of the waveform which corresponds to the forward scan is called the active line and during the active line the voltage represents the brightness of the image. In between the active line periods are horizontal blanking intervals in which the signal voltage will be at or below black. Figure 7.16(b) shows that in some systems the active line voltage is superimposed on a pedestal or black level set-up voltage of 7.5 IRE. The purpose of this set-up is to ensure that the blanking interval signal is below black on simple displays so that it is guaranteed to be invisible on the screen. When set-up is used, black level and blanking level differ by the pedestal height. When set-up is not used, black level and blanking level are one and the same.

The blanking period immediately after the active line is known as the front porch, which is followed by the leading edge of sync. When the leading edge of sync passes through 50 per cent of its own amplitude, the horizontal retrace pulse is considered to have occurred. The flat part at the bottom of the horizontal sync pulse is known as sync tip and this is followed by the trailing edge of sync which returns the waveform to blanking level.

Figure 7.15 Early video used separate vision and sync signals shown in (a). The US one volt video waveform in (b) has 10:4 video/sync ratio. (c) European systems use 7:3 ratio to avoid odd voltages. (d) Sync separation relies on two voltage ranges in the signal.

Figure 7.16 (a) Part of a video waveform with important features named. (b) Use of pedestal or set-up.

The signal remains at blanking level during the back porch during which the display completes the horizontal flyback. The sync pulses have sloping edges because if they were square they would contain high frequencies which would go outside the allowable channel bandwidth on being broadcast.

The vertical synchronizing system is more complex because the vertical flyback period is much longer than the horizontal line period and horizontal synchronization must be maintained throughout it. The vertical synchronizing pulses are much longer than horizontal pulses so that they are readily distinguishable. Figure 7.17(a) shows a simple approach to vertical synchronizing. The signal remains predominantly at sync tip for several lines to indicate the vertical retrace, but returns to blanking level briefly immediately prior to the leading edges of the horizontal sync, which continues throughout. Figure 7.17(b) shows that the presence of interlace complicates matters, as in one vertical interval the vertical sync pulse coincides with a horizontal sync pulse whereas in the next the vertical sync pulse occurs half-way down a line. In practice the long vertical sync pulses were found to disturb the average signal voltage too much, and to reduce the effect extra equalizing pulses were put in, half-way between the horizontal sync pulses. The horizontal timebase system can ignore the equalizing pulses because it contains a flywheel circuit which only expects pulses roughly one line period apart. Figure 7.17(c) shows the final result of an interlaced system with equalizing pulses. The vertical blanking interval can be seen, with the vertical pulse itself towards the beginning.

Figure 7.17 (a) A simple vertical pulse is longer than a horizontal pulse. (b) In an interlaced system there are two relationships between H and V. (c) The use of equalizing pulses to balance the DC component of the signal.

Correct portrayal of a television image is only obtained when the synchronization system is working. Should the video signal be conveyed without the synchronizing information, a condition called sync loss occurs, causing the picture to break up.

7.9 Bandwidth and definition

As the conventional analog television picture is made up of lines, the line structure determines the definitions or the fineness of detail which can be portrayed in the vertical axis. The limit is reached in theory when alternate lines show black and white. In a 625-line picture there are roughly 600 unblanked lines. If 300 of these are white and 300 are black then there will be 300 complete cycles of detail in one picture height. One unit of resolution,which is a unit of spatial frequency, is c/ph or cycles per picture height. In practical displays the contrast will have fallen to virtually nothing at this ideal limit and the resolution actually achieved is around 70 per cent of the ideal, or about 210 c/ph. The degree to which the ideal is met is known as the Kell factor of the display.

Definition in one axis is wasted unless it is matched in the other and so the horizontal axis should be able to offer the same performance. As the aspect ratio of conventional television is 4:3 then it should be possible to display 400 cycles in one picture width, reduced to about 300 cycles by the Kell factor. As part of the line period is lost due to flyback, 300 cycles per picture width becomes about 360 cycles per line period.

In 625-line television, the frame rate is 25 Hz and so the line rate F_h will be:

F_h = 625 × 25 = 15 625Hz

If 360 cycles of video waveform must be carried in each line period, then the bandwidth required will be given by:

15 625 × 360 = 5.625 MegaHertz

In the 525-line system, there are roughly 500 unblanked lines allowing 250 c/ph theoretical definition, or 175 lines allowing for the Kell factor. Allowing for the aspect ratio, equal horizontal definition requires about 230 cycles per picture width. Allowing for horizontal blanking this requires about 280 cycles per line period.

In 525-line video, F_h = 525 × 30 = 15 750 Hz Thus the bandwidth required is:

15 750 × 280 = 4.4 MegaHertz

If it is proposed to build a high-definition television system, one might start by doubling the number of lines and hence double the definition. Thus in a 1250-line format about 420 c/ph might be obtained. To achieve equal horizontal definition, bearing in mind the aspect ratio is now 16:9, then nearly 750 cycles per picture width will be needed. Allowing for horizontal blanking, then around 890 cycles per line period will be needed. The line frequency is now given by:

F_h = 1250 × 25 = 31 250Hz

and the bandwidth required is given by:

31 250 × 890 = 28 MegaHertz

Note the dramatic increase in bandwidth. In general the bandwidth rises as the square of the resolution because there are more lines and more cycles needed in each line. It should be clear that, except for research purposes, high-definition television will never be broadcast as a conventional analog signal because the bandwidth required is simply uneconomic. If and when high-definition broadcasting becomes common, it will be compelled to use digital compression techniques to make it economic.

7.10 Interlace

Interlaced scanning is a crude compression technique which was developed empirically in the 1930s as a way of increasing the picture rate to reduce flicker without a matching increase in the video bandwidth. Instead of transmitting entire frames, the lines of the frame are sorted into odd lines and even lines. Odd lines are transmitted in one field, even lines in the next. A pair of fields is supposed to interlace to produce a frame, but it wil be seen that this frequently does not happen. Figure 7.18(a) shows that the vertical/temporal arrangement of lines in an interlaced system forms a quincunx pattern. Not surprisingly the vertical/temporal spectrum of an interlaced signal shows the same pattern.

Study of the vertical temporal spectrum allows many of the characteristics of interlace to be deduced. Like quincuncx spatial sampling, theoretically interlace has a triangular passband, as Figure 7.18(b) shows. The highest vertical resolution is obtained at the point shown, and this is only obtained with a temporal frequency of zero, i.e. when there is no motion. This is suggesting that interlaced systems have poor dynamic resolution.

Although the passband is triangular, a suitable reconstruction filter cannot be implemented in any known display. Figure 7.18(c) shows that in, for example, a CRT display, there is no temporal filter, only a vertical filter due to the aperture effect of the electron beam. As a result there are two problems. First, fine vertical detail will be displayed at the frame rate. The result is that although the field rate is above the CFF, a significant amount of frame rate energy is present to cause flicker. Second, in the presence of motion there will be vertical aliasing.

As was mentioned in Chapter 3, transform duality holds that any phenomenon can be described in both domains. Figure 7.18(d) shows that vertical detail such as an edge may only be present in one field of the pair and this results in frame rate flicker called ‘interlace twitter’.

Figure 7.19(a) shows a dynamic resolution analysis of interlaced scanning. When there is no motion, the optic flow axis and the time axis are parallel and the apparent vertical sampling rate is the number of lines in a frame.

Figure 7.18 (a) Interlaced systems shift the lines in between pictures. Two pictures, or fields, make a frame. (b) The vertical temporal spectrum of an interlaced system and its triangular passband, alowing motion or vertical resolution but not both. (c) With the spectrum of (b) on a real display, the triangular filter is absent, allowing energy at the frame rate to be visible as flicker. (d) the flicker originates on horizontal edges which only appear in one field.

Figure 7.19 When an interlaced picture is stationary, viewing takes place along the time axis as shown in (a). When a vertical component of motion exists, viewing takes place along the optic flow axis. (b) The vertical sampling rate falls to one half its stationary value.

However, when there is vertical motion, (b), the optic flow axis turns. In the case shown, the sampling structure due to interlace results in the vertical sampling rate falling to one half of its stationary value.

Consequently interlace does exactly what would be expected from a half-bandwidth filter. It halves the vertical resolution when any motion with a vertical component occurs. In a practical television system, there is no anti-aliasing filter in the vertical axis and so when the vertical sampling rate of an interlaced system is halved by motion, high spatial frequencies will alias or heterodyne causing annoying artifacts in the picture. This is easily demonstrated.

Figure 7.20(a) shows how a vertical spatial frequency well within the static resolution of the system aliases when motion occurs. In a progressive scan system this effect is absent and the dynamic resolution due to scanning can be the same as the static case.

Interlaced systems handle motion transverse to the scanning lines very poorly by aliasing, whereas motion parallel to the scanning lines results in a strange artifact. If the eye is tracking a horizontally moving object, the object itself will be portrayed quite well because the interlace mechanism will work. However, Figure 7.20(b) shows that the background strobing will appear feathered because only half of the lines are present in each version of the background. Vertical edges in the background appear as shown in the figure.

Figure 7.20 (a) The halving in sampling rate causes high spatial frequencies to alias. (b) To an eye following a horizontally moving object, vertical lines in the background will appear feathered because each field appears at a different place on the retina.

Feathering is less noticeable than vertical aliasing and for this reason interlaced television systems always have horizontal raster lines. In real life, horizontal motion is more common than vertical.

It is easy to calculate the vertical image motion velocity needed to obtain the half-bandwidth speed of interlace, because it amounts to one raster line per field. In 525/60 (NTSC) there are about 500 active lines, so motion as slow as one picture height in 8 seconds will halve the dynamic resolution. In 625/50 (PAL) there are about 600 lines, so the halfbandwidth speed falls to one picture height in 12 seconds. This is why NTSC, with fewer lines and lower bandwidth, doesn’t look as soft as it should compared to PAL, because it has better dynamic resolution.

Figure 7.21 shows that the situation deteriorates rapidly if an attempt is made to use interlaced scanning in systems with a lot of lines. In 1250/50, the resolution is halved at a vertical speed of just one picture height in 24 seconds. In other words on real moving video a 1250/50 interlaced system has the same dynamic resolution as a 625/50 progressive system. By the same argument a 1080 I system has the same performance as a 480 P system.

Now that techniques such as digital compression and spatial oversampling are available, the format used for display need not be the same as the transmission format. Thus it is difficult to justify the use of interlace in a transmission format. In fact interlace causes difficulties which are absent in progressive systems. Progressive systems are separable. Vertical filtering need not affect the time axis and vice versa. Interlaced systems are not separable, and two-dimensional filtering is mandatory. A vertical process requires motion compensation in an interlaced system whereas in a progressive system it does not.

Figure 7.21 Interlace works best in systems with few lines, e.g. NTSC. Increasing the number of lines reduces performance if the frame rate is not also raised. Here are shown the vertical velocities at which various interlaces standards fail.

Interlace, however, makes motion estimation more difficult. When compression is used, compression systems should not be cascaded. As digital compression techniques based on transforms are now available, it makes no sense to use an interlaced, i.e. compressed, video signal as an input. Better results will be obtained if a progressive scan signal is used.

Computer-generated images and film are not interlaced, but consist of discrete frames spaced on a time axis. As digital technology is bringing computers and television closer the use of interlaced transmission is an embarrassing source of incompatibility. The future will bring imagedelivery systems based on computer technology and oversampling cameras and displays which can operate at resolutions much closer to the theoretical limits. With the technology of the day, interlace had a purpose whereas it now impedes progress.

Interlace causes difficulty in any process which requires image manipulation. This includes DVEs, standards convertors and display convertors. All these devices give better results when working with progressively scanned data and if the source material is interlaced, a deinterlacing process will be necessary and will be considered in section 7.23.

7.11 Colour television

The precise approach to colour reproduction described in Chapter 6 is not adopted in colour television. Instead the approach is to find a set of primary CRT phosphors which give reasonable brightness and to engineer the rest of the system around them. Figure 7.22 shows the Rec. 709 primaries adopted in most TV systems along with the D₆₅ white point. In order to determine the colour matching functions needed in the camera, the line from the white point through each primary is extended to the perimeter of the CIE diagram to find the centre wavelength of the filter at which its response will peak. Three filter responses are then specified. In practice the ideal responses are seldom provided. Figure 7.23(a) shows a more typical filter set from a colour TV camera.

The type of filters used have flat passbands which are not optimal. As can be seen from Figure 7.23(b) flat passbands make the three outputs from the camera identical for a range of wavelengths, whereas they should be unique for each wavelength. This loses subtlety from the reproduced colour.

A monochrome camera produces a single luma signal Y or Ys whereas a colour camera produces three signals, or components, R, G and B which are essentially monochrome video signals representing an image after filtering in each primary colour. In some systems sync is present on a separate signal (RGBS). Rarely is it present on all three components, whereas most commonly it is only present on the green component leading to the term RGsB. The use of the green component for sync has led to suggestions that the components should be called GBR. Like luma, RGsB signals may use 0.7 or 0.714 volt signals, with or without set-up.

Figure 7.22 The primaries used in Rec. 709 television systems and the white point.

Figure 7.23 (a) Filter responses of a real camera are usually suboptimal. (b) Flat response curves result in the same combination of primaries for a range of colours.

RGB and Y signals are incompatible, yet when colour television was introduced it was a practical necessity that it should be possible to display colour signals on a monochrome display and vice versa.

Creating or transcoding a luma signal from R, Gs and B is relatively easy. Chapter 6 introduced the spectral response of the eye which has a peak in the green region. Green objects will produce a larger stimulus than red objects of the same brightness, with blue objects producing the least stimulus. A luma signal can be obtained by adding R, G and B together, not in equal amounts, but in a sum which is weighted by the relative response of the eye. Once the primaries of a television system have been defined, the weighting factors can be determined from the luminous efficiency curve of the HVS. For Rec. 709 primaries:

Y = 0.299R + 0.587G + 0.114B

Syncs may be regenerated, but will be identical to those on the Gs input and when added to Y result in Ys as required.

If Ys is derived in this way, a monochrome display will show nearly the same result as if a monochrome camera had been used in the first place. The results are not identical because of the non-linearities introduced by gamma correction as will be seen in section 7.12.

As colour pictures require three signals, it should be possible to send Ys and two other signals which a colour display could arithmetically convert back to R, G and B. There are two important factors which restrict the form that the other two signals may take. One is to achieve reverse compatibility. The other is the requirement to conserve bandwidth for economic reasons.

If the source is a monochrome camera, it can only produce Ys and the other two signals will be completely absent. A colour display should be able to operate on the Ys signal only and show a monochrome picture.

The above requirements are met by sending two colour difference signals along with Ys. There are three possible colour difference signals, R–Y, B–Y and G–Y. As the green signal makes the greatest contribution to Y, then the amplitude of G–Y would be the smallest and would be most susceptible to noise. Thus R–Y and B–Y are used in practice as Figure 7.24 shows.

R and B are readily obtained by adding Y to the two colour difference signals. G is obtained by rearranging the expression for Y above such that:

If a colour CRT is being driven, it is possible to apply inverted luma to the cathodes and the R–Y and B–Y signals directly to two of the grids so that the tube performs some of the matrixing. It is then only necessary to obtain G–Y for the third grid, using the expression:

G–Y = – 0.51(R–Y ) – 0.186(B–Y )

Figure 7.24 Colour components are converted to colour difference signals by the transcoding shown here.

If a monochrome source having only a Ys output is supplied to a colour display, R–Y and B–Y will be zero. It is reasonably obvious that if there are no colour difference signals the colour signals cannot be different from one another and R = G = B. As a result the colour display can produce only a neutral picture.

The use of colour difference signals is essential for compatibility in both directions between colour and monochrome, but it has a further advantage which follows from the way in which the HVS works. In order to produce the highest resolution in the fovea, the eye will use signals from all types of cone, regardless of colour. In order to determine colour the stimuli from three cones must be compared. There is evidence that the nervous system uses some form of colour difference processing to make this possible. As a result the full acuity of the human eye is available only in monochrome. Differences in colour cannot be resolved so well. A further factor is that the lens in the human eye is not achromatic and this means that the ends of the spectrum are not well focused. This is particularly noticeable on blue.

If the eye cannot resolve colour very well there is no point is expending valuable bandwidth sending high-resolution colour signals. Colour difference working allows the luma to be sent separately at full bandwidth. This determines the subjective sharpness of the picture. The colour difference signals can be sent with considerably reduced bandwidth, as little as one quarter that of luma, and the human eye is unable to tell.

In practice analog component signals are never received perfectly, but suffer from slight differences in relative gain. In the case of RGB a gain error in one signal will cause a colour cast on the received picture. A gain error in Y causes no colour cast and gain errors in R–Y or B–Y cause much smaller perceived colour casts. Thus colour difference working is also more robust than RGB working.

The overwhelming advantages obtained by using colour difference signals mean that in broadcast and production facilities RGB is seldom used. The outputs from the RGB sensors in the camera are converted directly to Y, R–Y and B–Y in the camera control unit and output in that form. Standards exist for both analog and digital colour difference signals to ensure compatibility between equipment from various manufacturers.

Whilst signals such as Y, R, G and B are unipolar or positive only, it should be stressed that colour difference signals are bipolar and may meaningfully take on levels below zero volts.

The wide use of colour difference signals has led to the development of test signals and equipment to display them. The most important of the test signals are the ubiquitous colour bars. Colour bars are used to set the gains and timing of signal components and to check that matrix operations are performed using the correct weighting factors. The origin of the colour bar test signal is shown in Figure 7.25. In 100 per cent amplitude bars, peak amplitude binary RGB signals are produced, having one, two and four cycles per screen width.

Figure 7.25 Origin of colour difference signals representing colours bars. Adding R, G and B according to the weighting factors produces an irregular luminance staircase.

When these are added together in a weighted sum, an eight-level luma staircase results because of the unequal weighting. The matrix also produces two colour difference signals, R–Y and B–Y as shown. Sometimes 75 per cent amplitude bars are generated by suitably reducing the RGB signal amplitude. Note that in both cases the colours are fully saturated; it is only the brightness which is reduced to 75 per cent. Sometimes the white bar of a 75 per cent bar signal is raised to 100 per cent to make calibration easier. Such a signal is sometimes erroneously called a 100 per cent bar signal.

Figure 7.26 Colour difference signals can be shown two-dimensionally on a vectorscope.

Figure 7.26 shows that both colour difference signals can be displayed at once on a component vectorscope. The screen of a component vectorscope represents a constant luminance chromaticity diagram with white in the centre and saturation increasing outwards with radius. The B–Y signal causes horizontal deflection, and R–Y causes vertical deflection. It will be seen that this results in a display having six peripheral dots and two central dots. The central dots result from the white and black bars which are not colours and in which the colour difference signals are both zero.

R–Y and B–Y have voltage swings which are inconvenient because they are somewhat different from the gamut of Y. Figure 7.27(a) shows an SMPTE/EBU standard colour difference signal set in which the signals are called Ys, Pb and Pr. 0.3 volt syncs are on luma only and all three video signals have a 0.7 volt peak to peak swing with 100 per cent bars. In order to obtain these voltage swings, the following gain corrections are made to the components:

Pr = 0.71327(R–Y ) and P_b = 0.56433(B–Y )

Within waveform monitors, the colour difference signals may be offset by 350 mV as in Figure 7.27(b) to match the luma range for display purposes.

Figure 7.27 (a) 100 per cent colour bars represented by SMPTE/EBU standard colour difference signals. (b) Level comparison is easier in waveform monitors if the B–Y and R–Y signals are offset upwards.

7.12 Constant luminance

The use of matrix arithmetic in colour difference video systems only works perfectly if the signals represent linear light. In the presence of gamma this is not the case. As was seen in section 6.9, all video signals are subject to a non-linear gamma precompensation which is a power function. In traditional colour television, the non-linearity of the CRT is used to counteract the gamma precompensation used at the camera. As can be seen, this means that the conversions between RGB and colour difference formats must be made in the non-linear domain.

Figure 7.28(a) shows a colour difference system using gamma in which there is no bandwidth reduction of the colour difference signals. The effect of gamma is that some luminance is present in the colour difference signals and vice versa. As the encoding and decoding matrices are complementary, their use is transparent. However, in practice the bandwidth of the colour difference signals is reduced, as shown in Figure 7.28(b), and this has the effect of removing that part of the luminance signal which was being carried in the colour difference signals.

Figure 7.28 (a) Colour difference system with no bandwidth reduction is transparent because the matrices are fully reversible. (b) If the bandwidth of the colour difference signals is reduced, the use of gamma causes failure of constant luminance. (c) Constant luminance can be achieved if the matrices work in the linear light domain.

The result is a phenomenon called failure of constant luminance. In the presence of large steps in the video signal at the boundaries of objects, the luminance level will be incorrect. If colour bars are observed after passing through such a system, the most obvious symptom is that at the green/ magenta transition in the centre of the screen there will be a dark line caused by a drop in the luminance level.

In systems designed to use the non-linearity of the CRT, the failure of constant luminance is accepted. However, when an alternative display technology is available, colour differences can correctly be calculated using the constant luminance system of Figure 7.28(c). As the matrix processes take place in the linear light domain, the problem is avoided. Whilst signals from a constant luminance encoder are incompatible with traditional television sets, there is no reason why they should not be used with proprietary systems such as in electronic cinema.

7.13 Analog colour television signals

Although analog video is obsolescent, analog equipment will continue in use for some time to come. Much archive material resides on analog formats. Convergent digital systems will be faced with accepting input from, or delivering output to, legacy analog systems and for this reason it is important to understand how they work at least well enough to avoid obvious pitfalls. The number of different approaches to colour television is large and confusing. Figure 7.29 shows how these relate. Note that in addition to variations in the colour modulation scheme, there will be differences in the number of scanning lines and in the frame rates used.

The starting point in in all cases is gamma preprocessed RGB, with Rec. 709 colorimetry. RGB can be digitized, but this is uncommon because of the high bit rate required. RGB can be matrixed to Y, R–Y and B–Y, but the colour difference signals will be scaled to produce Y, P_r P_b. The bandwidth of the colour difference signals is halved. Section 7.14 will show how this signal format can be digitized according to ITU Rec. 601 to produce a format widely used in television production equipment.

For analog broadcasting, colour difference signals can be converted to composite video. Figure 7.30 shows how a particular colour can be reached on a vectorscope display. In component signals, the dot is reached by travelling a given distance horizontally, followed by a given distance vertically. This is the way a map reference works; mathematicians call the components Cartesian coordinates. It is just as easy to reach the same dot by travelling a suitable distance at the appropriate heading or angle. Mathematicians call this polar coordinates.

Figure 7.29 Traditional analog colour television signals and how they relate.

Figure 7.30 The equivalence of polar (radius and angle) and XY coordinates in locating a specific colour.

Instead of two separate signals, we can convey distance and angle in the amplitude and phase of a single waveform. That is precisely how PAL and NTSC chroma work. The radius of the colour is given by the chroma amplitude which is proportional to the saturation, and the angle is the phase. The phase angle of the vector literally points to the appropriate hue in the chromaticity diagram.

Simultaneous modulation of amplitude and phase is performed by a quadrature modulator. Figure 7.31 shows how this works. A pair of amplitude modulators (analog multipliers) are supplied with the same carriers except that one has been phase shifted by 90°. The outputs of the two modulators are linearly added and the resultant signal will be found to be amplitude and phase modulated. The phase is a function of the relative proportions and polarities of the two inputs. The original subcarrier is suppressed in the output of the modulator. The picture frequencies in the baseband result in sidebands above and below the centre frequency after modulation.

Figure 7.31 Quadrature modulator allows two signals to modulate one carrier in amplitude and phase to produce chroma. This may be kept separate in the Y/C system, or added to the luminance in composite systems.

As a result it is incorrect to refer to the quadrature modulator output as subcarrier; the correct term is chroma. As the chroma signal carries the information from both colour difference signals, it is possible to carry a colour picture by sending two signals: luminance and chroma, abbreviated to Y/C. This is also shown in Figure 7.31. Y/C is used in S-VHS VCRs in which the chroma signal is kept separate from the luminance through the whole record and playback process in order to avoid cross effects. It is difficult to define a Y/C standard. As there are two signals involved, strictly speaking it is a component standard. A composite system linearly adds chroma to luminance for broadcasting. At the receiver the two signals must undergo Y/C separation before the chroma can be demodulated back to a pair of colour difference signals.

Demodulation is done using a pair of synchronous demodulators also driven in quadrature. These need reference carriers which are identical in phase to the original pair of carriers. As there is no subcarrier in the chroma signal it is necessary to send a reference subcarrier separately. This is the purpose of the burst which is sent during horizontal blanking. A heavily damped phase-locked loop synchronizes to the burst and continues to run for the rest of the line to provide a reference for the decoder.

One way of considering how quadrature modulation works is that when one of the carrier inputs reaches its peak, the other is passing through zero. At that time the signal voltage can only be a function of, say, the B–Y input. Ninety degrees later the relationships exchange and the signal voltage can then only be a function of the R–Y input. Demodulation is a question of sampling the signal every 90°. Odd samples reflect the state of one component; even samples reflect the state of the other. The demodulators have the effect of inverting alternate samples. A simple low-pass filter removes the harmonics of the subcarrier frequency to recreate the input waveform.

Composite video was originally designed as a monochrome-compatible system for broadcasting in which subcarrier-based colour-difference information was added to an existing line standard in such a way that existing sets could still display a monochrome picture. A further requirement was that the addition of colour should not increase the bandwidth of the TV channel. In that respect composite video has to be viewed as an early form of compression.

Whilst the details vary, all composite signals have in common the need to include a subcarrier-based chromas signal within the luminance band in such a way that it will be effectively invisible on an unmodified monochrome TV set. This is achieved in much the same way in all three systems. Figure 7.32 shows that if a chroma signal is linearly added to a luminance signal it has the effect of making it alternately too bright and too dark. If it is arranged that the chroma is inverted on the next picture line the effect is that areas which are too bright on one line are adjacent to areas which are too dark on the next. The eye will see the average brightness of the line pairs which is almost the original luminance. In the absence of gamma correction the cancellation would be perfect; in the presence of gamma it is imperfect but generally adequate.

Figure 7.32 In composite video the subcarrier frequency is arranged so that inversions occur between adjacent lines and pictures to help reduce the visibility of the chroma.

Efforts are also made to ensure that the phase of the chroma also reverses from frame to frame so that the same point on the screen alternates in brightness on the time axis about the value determined by the luminance signal. Clearly the exact frequency of the subcarrier has to be carefully chosen with respect to line and frame rates. NTSC and PAL use quadrature modulation as shown above, so that two components can be sent simultaneously whereas SECAM frequency modulates the subcarrier and sends the components on alternate lines. The effect of composite modulation is to produce an extremely complex signal spectrum, especially in PAL.

Analog composite video is today obsolescent because it being replaced by digital transmissions employing compression schemes such as MPEG.

7.14 Digital colour signals

In principle any analog video signal can be digitized with a suitable sampling rate and wordlength. This is commonly done with colour difference signals. The luma signal, Y, retains the same name in the digital domain, whereas P_r is known as C_r and P_b is known as C_b.

Whilst signals such as Y, R, G and B are unipolar or positive only, colour difference signals are bipolar and may meaningfully take on negative values.

In colour difference working, the important requirement is for image manipulation in the digital domain. This is facilitated by a sampling rate which is a multiple of line rate because then there is a whole number of samples in a line and samples are always in the same position along the line and can form neat columns. A practical difficulty is that the line period of the 525 and 625 systems is slightly different. The problem was overcome by the use of a sampling clock which is an integer multiple of both line rates.

ITU-601 (formerly CCIR-601) recommends the use of certain sampling rates which are based on integer multiples of the carefully chosen fundamental frequency of 3.375 MHz. This frequency is normalized to 1 in the document.

In order to sample 625/50 luminance signals without quality loss, the lowest multiple possible is 4, which represents a sampling rate of 13.5 MHz. This frequency line-locks to give 858 samples per line period in 525/59.94 and 864 samples per line period in 625/50.

In the component analog domain, the colour difference signals used for production purposes typically have one half the bandwidth of the luminance signal. Thus a sampling rate multiple of 2 is used, resulting in 6.75 MHz. This sampling rate allows, respectively, 429 and 432 samples per line.

Component video sampled in this way has a 4:2:2 format. Whilst other combinations are possible, 4:2:2 is the format for which the majority of production equipment is constructed. Figure 7.33(a) shows the spatial arrangement given by 4:2:2 sampling. Luminance samples appear at half the spacing of colour difference samples, and every other luminance sample is co-sited with a pair of colour difference samples. Co-siting is important because it allows all attributes of one picture point to be conveyed with a three-sample vector quantity. Modification of the three samples allows such techniques as colour correction to be performed. This would be difficult without co-sited information. Co-siting is achieved by clocking the three ADCs simultaneously.

For lower bandwidths, particularly in prefiltering operations prior to compression, the sampling rate of the colour difference signal can be halved. 4:1:1 delivers colour bandwidth in excess of that required by analog composite video.

In 4:2:2 the colour difference signals are sampled horizontally at half the luminance sampling rate, yet the vertical colour difference sampling rates are the same as for luminance. Whilst this is not a problem in a production application, this disparity of sampling rates represents a data rate overhead which is undesirable in a compression environment. In this case it is possible to halve the vertical sampling rate of the colour difference signals as well. Figure 7.33(b) shows that in MPEG-2 4:2:0 sampling, the colour difference signals are downsampled so that the same vertical and horizontal resolution is obtained.

Figure 7.33 (a) In CCIR-601 sampling mode 4:2:2, the line synchronous sampling rate of 13.5 MHz results in samples having the same position in successive lines, so that vertical columns are generated. The sampling rates of the colour difference signals C_R, C_B are one-half of that of luminance, i.e. 6.75 MHz, so that there are alternate Y only samples and co-sited samples which describe Y, C_R and C_B. In a run of four samples, there will be four Y samples, two C_R samples and two C_B samples, hence 4:2:2.

Figure 7.33 (b) In 4:2:0 coding the colour difference pixels are downsampled vertically as well as horizontally. Note that the sample sites need to be vertically interpolated so that when two interlaced fields are combined the spacing is even.

The chroma samples in 4:2:0 are positioned half-way between luminance samples in the vertical axis so that they are evenly spaced when an interlaced source is used. To obtain a 4:2:2 output from 4:2:0 data a vertical interpolation process will be needed in addition to lowpass filtering.

The sampling rates of ITU-601 are based on commonality between 525- and 625-line systems. However, the consequence is that the pixel spacing is different in the horizontal and vertical axes. This is incompatible with computer graphics in which so-called ‘square’ pixels are used. This means that the horizontal and vertical spacing is the same, giving the same resolution in both axes. However, high-definition TV and computer graphics formats universally use ‘square’ pixels. Converting between square and non-square pixel data will require a rate-conversion process as described in section 3.6.

It is not necessary to digitize analog video syncs in component systems, since the sampling rate is derived from sync. The only useful video data are those sampled during the active line. All other parts of the video waveform can be re-created at a later time. It is only necessary to standardize the size and position of a digital active line. The position is specified as a given number of sampling clock periods from the leading edge of sync, and the length is simply a standard number of samples. The component digital active line is 720 luminance samples long. This is slightly longer than the analog active line and allows for some drift in the analog input. Ideally the first and last samples of the digital active line should be at blanking level.

Figure 7.34 shows that in 625-line systems¹ the control system waits for 132 sample periods before commencing sampling the line. Then 720 luminance samples and 360 of each type of colour difference sample are taken; 1440 samples in all. A further 12 sample periods will elapse before the next sync edge, making 132 + 720 + 12 = 864 sample periods. In 525-line systems² the analog active line is in a slightly different place and so the controller waits 122 sample periods before taking the same digital active line samples as before. There will then be 16 sample periods before the next sync edge, making 122 + 720 + 16 = 858 sample periods.

Figure 7.35 shows the luminance signal sampled at 13.5 MHz and two colour difference signals sampled at 6.75 MHz. Three separate signals with different clock rates are inconvenient and so multiplexing can be used. If the colour difference signals are multiplexed into one channel, then two 13.5 MHz channels will be required. If these channels are multiplexed into one, a 27 MHz clock will be required. The word order will be:

C_b, Y, C_r, Y, etc.

In order unambiguously to deserialize the samples, the first sample in the line is always C_b.

Figure 7.34 (a) In 625-line systems to CCIR-601, with 4:2:2 sampling, the sampling rate is exactly 864 times line rate, but only the active line is sampled, 132 sample periods after sync. (b) In 525 line systems to CCIR-601, with 4:2:2 sampling, the sampling rate is exactly 858 times line rate, but only the active line is sampled, 122 sample periods after sync. Note active line contains exactly the same quantity of data as for 50 Hz systems.

In addition to specifying the location of the samples, it is also necessary to standardize the relationship between the absolute analog voltage of the waveform and the digital code value used to express it so that all machines will interpret the numerical data in the same way. These relationships are in the voltage domain and are independent of the line standard used.

Both eight- and ten-bit resolution are allowed by the interface standards. Figure 7.36 shows how the luminance signal fits into the quantizing range of a eight-bit system. Black is at a level of 16₁₀ and peak white is at 235₁₀ so that there is some tolerance of imperfect analog signals. The sync pulse will clearly go outside the quantizing range, but this is of no consequence as conventional syncs are not transmitted. The visible voltage range fills the quantizing range and this gives the best possible resolution.

Figure 7.35 The colour difference sampling rate is one-half that of luminance, but there are two colour difference signals, Cr and Cb hence the colour difference data rate is equal to the luminance data rate, and a 27 MHz interleaved format is possible in a single channel.

Figure 7.36 The standard luminance signal fits into eight or ten-bit quantizing structures as shown here.

The colour difference signals use offset binary, where 128₁₀ is the equivalent of blanking voltage. The peak analog limits are reached at 16₁₀ and 240₁₀ respectively allowing once more some latitude for maladjusted analog inputs.

Note that the code values corresponding to all ones and all zeros, i.e. the two extreme ends of the quantizing range are not allowed to occur in the active line as they are reserved for synchronizing. Convertors must be followed by circuitry which catches these values and forces the LSB to a different value if out-of-range analog inputs are applied.

The peak-to-peak amplitude of Y is 220 quantizing intervals, whereas for the colour difference signals it is 225 intervals. There is thus a small gain difference between the signals. This will be cancelled out by the opposing gain difference at any future DAC, but must be borne in mind when digitally converting to other standards. Computer graphics standards often use the entire number scale with black at all zeros.

As conventional syncs are not sent, horizontal and vertical synchronizing is achieved by special bit patterns sent with each line. Immediately before the digital active line location is the SAV (start of active video) pattern, and immediately after is the EAV (end of active video) pattern. These unique patterns occur on every line and continue throughout the vertical interval.

Each sync pattern consists of four symbols. The first is all ones and the next two are all zeros. As these cannot occur in active video, their detection reliably indicates a sync pattern. The fourth symbol is a data byte which contains three data bits, H, F and V. These bits are protected by four redundancy bits which form a seven-bit Hamming codeword for the purpose of detecting and correcting errors. Figure 7.37 shows the structure of the sync pattern. The sync bits have the following meanings:

H is used to distinguish between SAV, where it is set to 0 and EAV where it is set to 1.

F defines the state of interlace and is 0 during the first field and 1 during the second field. F is only allowed to change at EAV. In interlaced systems, one field begins at the centre of a line, but there is no sync pattern at that location so the field bit changes at the end of the line in which the change took place.

V is 1 during vertical blanking and 0 during the active part of the field. It can only change at EAV.

Figure 7.37 (a) The 4-byte synchronizing pattern which precedes and follows every active line sample block has this structure.

Figure 7.37 (b) The relationships between analog video timing and the information in the digital timing reference signals for 625

7.15 Digital colour space

Figure 7.38 shows the colour space available in eight-bit RGB. In computers, eight-bit RGB is common and claims are often seen that 16 million different colours are possible. This is nonsense.

A colour is a given combination of hue and saturation and is independent of brightness. Consequently all sets of RGB values having the same ratios produce the same colour. For example, R = G = B always gives the same colour whatever the pixel value. Thus there are 256 brightnesses which have the same colour allowing a more believable 65 000 different colours.

Figure 7.38 RGB space. If each component has eight-bit resolution, 16 million combinations are possible, but these are not all different colours as many will have the same hue and differ only in saturation.

Figure 7.39 shows the RGB cube mapped into eight-bit colour difference space so that it is no longer a cube. Now the grey axis goes straight up the middle because greys correspond to both C_r and C_b being zero. To visualize colour difference space, imagine looking down along the grey axis. This makes the black and white corners coincide in the centre. The remaining six corners of the legal colour difference space now correspond to the six boxes on a vectorscope. Although there are still 16 million combinations, many of these are now illegal. For example, as black or white are approached, the colour differences must fall to zero.

From an information theory standpoint, colour difference space is redundant. With some tedious geometry, it can be shown that less than a quarter of the codes are legal. The luminance resolution remains the same, but there is about half as much information in each colour axis. This due to the colour difference signals being bipolar. If the signal resolution has to be maintained, eight-bit RGB should be transformed to a longer wordlength in the colour difference domain, nine bits being adequate. At this stage the colour difference transform doesn’t seem efficient because twenty-four-bit RGB converts to twenty-six-bit Y, C_r, C_b.

Figure 7.39 Colour difference space is entered from RGB via the matrix shown in (a). In colour difference space (b) the White–Black axis is vertical and colour space is an area orthogonal to that axis which moves up and down as brightness changes. Note that the conventional vectorscope display is the projection of the RGB cube onto the the colour plane

In most cases the loss of colour resolution is invisible to the eye, and eight-bit resolution is retained. The results of the transform computation must be digitally dithered to avoid posterizing.

As was seen in section 6.7, the acuity of human vision is axisymmetric, making the so-called ‘square pixel’ the most efficient arrangement for luminance and colour difference signals alike. Figure 7.40 shows the ideal. The colour sampling is co-sited with the luminance sampling but the colour sample spacing is twice that of luminance. The colour difference signals after matrixing from RGB have to be low-pass filtered in two dimensions prior to downsampling in order to prevent aliasing of HF detail. At the display, the downsampled colour data have to be interpolated in two dimensions to produce colour information in every pixel. In an oversampling display the colour interpolation can be combined with the display upsampling stage.

Figure 7.40 Ideal two-dimensionally downsampled colour-difference system. Colour resolution is half of luma resolution, but the eye cannot tell the difference.

Co-siting the colour and luminance pixels has the advantage that the transmitted colour values are displayed unchanged. Only the interpolated values need to be calculated. This minimizes generation loss in the filtering. Downsampling the colour by a factor of two in both axes means that the colour data are reduced to one quarter of the original amount. When viewed by the HVS this is essentially a lossless process.

7.16 Telecine

Telecine is a significant source of video signals as film continues to be an important medium. Film cameras and projectors both work by means of synchronized shutters and intermittently driven sprockets or claw mechanisms. When the shutter is closed, the film is mechanically advanced by one frame by driving the intermittent sprocket. When the film has come to rest at the next frame, the shutter opens again. This process is repeated at the frame rate of film, the most common of which is 24 Hz. In order to reduce flicker, the projector works in a slightly different way from the camera. For each film frame, the shutter opens two or three times, instead of once, multiplying the flicker frequency accordingly.

A conventional telecine machine outputs a standard broadcast video signal. Initially the output was analog, but later machines were developed which could output a standardized digital signal. The latest machines are essentially generic film scanners and output pixel data. These are known as datacines.

In telecine traditionally some liberties are taken because there was until recently no alternative. In 50 Hz telecine the film is driven at 25 fps, not 24, so that each frame results in two fields. In 60 Hz telecine the film runs at 24 fps, but odd frames result in two fields, even frames result in three fields; a process known as 3:2 pulldown. On average, there are two and a half fields per film frame giving a field rate of 60 Hz. The field repetition of telecine causes motion judder which is explained in section 7.21.

Essentially a telecine machine is a form of film projector which outputs a video signal instead of an image on a screen. Early telecine machines were no more than conventional projectors which shone light into a modified TV camera, but these soon gave way to more sophisticated devices. As television pictures are scanned vertically down the screen, it is possible to scan film using the linear motion of the film itself. In these machines there is no intermittent motion and the film is driven by a friction roller. This causes less wear to the film, and if the sprocket holes are damaged, there is less vertical instability in the picture position.

The first constant speed telecines used the ‘flying spot’ scanning principle. A CRT produces a small spot of white light which is focused on the film. The spot is driven back and forth across the tube by a sawtooth waveform and consequently scans the film transversely. The film modulates the intensity of the light passing through and sensors on the other side of the film produce a video output signal. It is easy to see how a constantly moving film can produce a progressively scanned video signal, but less easy to see how interlace or 3:2 pulldown can be achieved. A further problem is that on the film the bottom of one frame is only separated from the top of the next frame by a thin black bar. Scanning of the next frame would begin as soon as that of the previous frame had ended, leaving no time for the vertical interval required by the television display for vertical retrace.

In flying spot telecines, these problems were overcome by deflecting the spot along the axis of film motion in addition to the transverse scan. By deflecting the spot steadily against film motion for one frame and then jumping back for the next, the film frame would be scanned in less than real time, creating a vertical interval in the video output. By jumping the spot in the same direction as the film travel by one frame and making a further scan, two interlaced fields could be produced from one film frame.

The two-dimensional motion of the flying spot caused it to produce what is known as a ‘patch’ on the CRT. If the film is stopped, the shape of the patch can be changed to be the same as the shape of the frame in order to display a still picture in the video output. More complex patch generation allows a picture to be obtained with the film running at any linear speed. The patch slides along the CRT to follow a given frame, and then omits or repeats frames in order to produce a standard field rate in the video whatever the film frame rate. By controlling both the width and height of the patch independently, films of any frame size can be handled, and anamorphic formats can be linearized. Smaller parts of a film frame can be made to fill the TV screen by shrinking the patch, and the patch can be rotated for effect or to compensate for a film camera which was not level.

In order to obtain a 3:2 pulldown interlaced output, there must be five different patch positions on the CRT where one frame is scanned twice but the next is followed for three scans. There is a ten-field sequence before the geometry repeats, requiring ten different patches to be generated. This caused great difficulty in early flying spot telecines because any physical error in the relative positioning of the patches would cause the image on the TV screen to bounce vertically. The solution was to use digital field stores between the scanner and the video output. Each frame of the film could then be scanned into two fields, and the 3:2 pulldown effect is obtained by outputting from one of the field stores twice. Figure 7.41 shows the procedure which produces a correctly interlaced 60 Hz output.

Progress in field store technology made it possible to build a telecine in which the vertical motion of the patch was replaced by a process of electronic timebase correction. The line scanning mechanism is obtained by projecting the steadily moving film onto a sensor consisting of a single line of CCD elements.

Figure 7.41 When a field store is available, 3:2 pulldown is obtained by repeating fields from the store, so the film scanning process is regular.

The entire film frame is scanned progressively in one frame period and this information is stored. By reading alternate lines from the store, an interlaced output is possible. 3:2 pulldown is obtained in the same way. By reading the store faster than it is written, the film frame area can be fitted into the active picture area, leaving time for blanking. The linear sensor cannot move, so a still frame cannot be obtained. However, output is possible over a range of speeds using the frame store as a buffer.

Practical telecine machines need some form of colour correction. The colorimetry of film is subject to considerable variation and the spectral response of each emulsion layer is unlikely to be matched by the filter responses of the telecine optics. A basic colour corrector will have facilities to apply compensating DC offsets to the signals from the film as well as controlling the gamma of each component. A matrix calculation can compensate to some extent for television primaries which do not match the primary filtering used when the film was shot.

7.17 Conversion between television and computer formats

Computer terminals have evolved quite rapidly from devices which could display only a few lines of text in monochrome into high-resolution colour graphics displays which outperform conventional television.

The domains of computer graphics and television have in common only that images are represented. The degree of incompatibility is such that one could be forgiven for thinking that it was the outcome of a perverse competition. Nevertheless with sufficient care good results can be obtained. Figure 7.42 shows that the number of issues involved is quite large. If only one of these is not correctly addressed, the results will be disappointing. The number of processes also suggests that each must be performed with adequate precision, otherwise there will be a tolerance build-up or generation loss problem.

Figure 7.42 The various issues involved in converting between broadcast video and computer graphics formats. The problem is non-trivial but failure to address any one of these aspects will result in impairment.

Figure 7.43 A typical computer graphics card. See text for details.

Figure 7.43 shows a typical graphics card. The pixel values to be displayed are written into a frame store by the CPU and the display mechanism reads the frame store line by line to produce a raster scanned image. Pixel array sizes are described as x × y pixels and these have been subject to much greater variation than has been the case in television. Figure 7.44 shows some of the array sizes supported in graphics devices. As computer screens tend to be used in brighter ambient light than television screens, the displays have to be brighter and this makes flicker more visible. This can be overcome by running at a frame rate of 75 Hz.

Figure 7.44 The array sizes which may be found in computer graphics.

Figure 7.45 The standard IBM graphics connector and its associated signals.

A typical graphics card outputs analog RGB which can drive a CRT display. The analog outputs are provided by eight-bit DACs. Figure 7.45 shows the standard IBM graphics connector. In order to avoid storing twenty-four bits per pixel, some systems restrict the number of different colours which can be displayed at once.

Between the frame store and the DACs is a device called a palette or colour look-up table (CLUT). This can be preloaded with a range of colours which are appropriate for the image to be displayed. Whilst this is adequate for general-purpose computing, it is unsuitable for quality image portrayal.

Computer graphics takes a somewhat different view of gamma than does television. This may be due to the fact that early computer displays had no grey scale and simply produced binary video (black or white) in which linearity has no meaning. As computer graphics became more sophisticated, each pixel became a binary number and a grey scale was possible. The gamma of the CRT display was simply compensated by an inverse gamma look-up table (LUT) prior to the video DAC as shown in Figure 7.46(a). This approach means that the pixel data within the computer are in the linear light domain. This in itself is not a problem, but when linear light is represented by only eight-bit pixels, then contouring in dark areas is inevitable. Linear light needs to be expressed by around fourteen bits for adequate resolution as was seen in Chapter 6.

In order to improve the situation, certain manufacturers moved away from the linear light domain, but without going as far as conventional television practice. The solution was that the internal data would be subject to a partial inverse gamma, as shown in Figure 7.46(b), followed by a further partial inverse gamma stage in the LUT of the graphics card. The combined effect of the two inverse gammas was correctly to oppose the CRT gamma.

Figure 7.46 Computers and gamma: a dog’s dinner. At (a) a simple system uses linear light-coding internals and an inverse gamma LUT prior to the CRT. With only eight-bit data this suffers excessive quantizing error. (b) Improved performance is obtained by having partial inverse gamma internal data in tandem with a further partial inverse gamma prior to the CRT. Unfortunately there are two conflicting incompatible standards.

Unfortunately Silicon Graphics and Macintosh came up with systems in which the two gamma stages were completely incompatible even though the overall result in both cases is correct. Data from one format cannot be displayed on the other format (or as video) without gamma conversion. In the absence of gamma conversion the grey scale will be non-linear, crushing either dark areas or light areas depending on the direction of data transfer. Gamma conversion is relatively straightforward as a simple look-up table can be created with eight-bit data.

Whatever the direction of conversion, one of the formats involved is likely to be RGB. It is useful if this is made the internal format of the conversion. Figure 7.47 shows that if the input is colour difference based, conversion should be done early, whereas if the output is to be colour difference based, the conversion should be done late.

It is also worth considering the use of the linear light domain within the conversion process. This overcomes any quality loss due to failure of constant luminance and distortion due to interpolating gamma-based signals. Figure 7.48 shows the principle. The gamma of the input format is reversed at the input and the gamma of the output format is re-created after all other processing is complete. Gamma in television signals generally follows a single standard, whereas with a computer format it will be necessary to establish exactly what gamma was assumed.

Figure 7.47 Possible strategies for video/computer conversion. (a) video to graphics RGB. (b) Graphics RGB to video.

Figure 7.48 Gamma is a compression technique and for the finest results it should not be used in any image-manipulation process because the result will be distorted. Accurate work should be done in the linear light domain.

Computer formats tend to use the entire number scale from black to white, such that in eight-bit systems black is 00Hex and white is FF. However, television signals according to ITU 601 have some headroom above white and footroom below black. If gamma, headroom and footroom conversion is not properly performed, the result will be black crushing, white crushing, lack of contrast of a distorted grey scale.

Colorimetry may be a problem in conversion. Television signals generally abide by ITU 709 colorimetry, whereas computer graphic files could use almost any set of primaries. As was seen in Chapter 6, it is not unusual for computer screens to run at relatively high colour temperatures to give brighter pictures. If the primaries are known, then it is possible to convert between colour spaces using matrix arithmetic. Figure 7.49 shows that if two triangles are created on the chromaticity diagram, one for each set of primaries, then wherever the triangles overlap, ideal conversion is possible. In the case of colours where there is no overlap the best that can be done is to produce the correct hue by calculating the correct vector from the white point, even if the saturation is incorrect.

Figure 7.49 Conversion between colour spaces only works where the areas enclosed by the primary triangles overlap (shaded). Outside these areas the best that can be done is to keep the hue correct by accepting a saturation error.

Where the colorimetry is not known, accurate conversion is impossible. However, in practice acceptable results can be obtained by adjusting the primary gains to achieve an acceptable colour balance on a recognizable part of the image such as a white area or a flesh tone.

The image size or pixel count will be different and, with the exception of recent formats, the television signal will be interlaced and will not necessarily use square pixels. Spatial interpolation will be needed to move between pixel array sizes and pixel aspect ratios. The frame rate may also be different. The best results will be obtained using motion compensation. If both formats are progressively scanned, resizing and rate conversion are separable, but if interlace is involved the problem is not separable and resizing and rate conversion should be done simultaneously in a three-dimensional filter.

7.18 The importance of motion compensation

Section 2.16 introduced the concept of eye tracking and the optic flow axis. The optic flow axis is the locus of some point on a moving object which will be in a different place in successive pictures. Any device which computes with respect to the optic flow axis is said to be motion compensated. Until recently the amount of computation required in motion compensation was too expensive, but now this is no longer the case the technology has become very important in moving image-portrayal systems.

Figure 7.50(a) shows an example of a moving object which is in a different place in each of three pictures. The optic flow axis is shown. The object is not moving with respect to the optic flow axis and if this axis can be found some very useful results are obtained. The proces of finding the optic flow axis is called motion estimation. Motion estimation is literally a process which analyses successive pictures and determines how objects move from one to the next. It is an important enabling technology because of the way it parallels the action of the human eye.

Figure 7.50(b) shows that if the object does not change its appearance as it moves, it can be portrayed in two of the pictures by using data from one picture only, simply by shifting part of the picture to a new location. This can be done using vectors as shown. Instead of transmitting a lot of pixel data, a few vectors are sent instead. This is the basis of motioncompensated compression which is used extensively in MPEG as will be seen in Chapter 9.

Figure 7.50(c) shows that if a high-quality standards conversion is required between two different frame rates, the output frames can be synthesized by moving image data, not through time, but along the optic flow axis. This locates objects where they would have been if frames had been sensed at those times, and the result is a judder-free conversion. This process can be extended to drive image displays at a frame rate higher than the input rate so that flicker and background strobing are reduced. This technology is available in certain high-quality consumer television sets. This approach may also be used with 24 Hz film to eliminate judder in telecine machines.

Figure 7.50(d) shows that noise reduction relies on averaging two or more images so that the images add but the noise cancels. Conventional noise reducers fail in the presence of motion, but if the averaging process takes place along the optic flow axis, noise reduction can continue to operate.

The way in which eye tracking avoides aliasing is fundamental to the perceived quality of television pictures. Many processes need to manipulate moving images in the same way in order to avoid the obvious difficulty of processing with respect to a fixed frame of reference. Processes of this kind are referred to as motion compensated and rely on a quite separate process which has measured the motion.

Figure 7.50 Motion compensation is an important technology. (a) The optic flow axis is found for a moving object. (b) The object in picture (n + 1) and (n + 2) can be re-created by shifting the object of picture n using motion vectors. MPEG uses this process for compression. (c) A standards convertor creates a picture on a new timebase by shifting object data along the optic flow axis. (d) With motion compensation a moving object can still correlate from one picture to the next so that noise reduction is possible.

Motion compensation is also important where interlaced video needs to be processed as it allows the best possible de-interlacing performance.

7.19 Motion-estimation techniques

There are three main methods of motion estimation which are to be found in various applications: block matching, gradient matching and phase correlation. Each have their own characteristics which are quite different.

Block matching is the simplest technique to follow. In a given picture, a block of pixels is selected and stored as a reference. If the selected block is part of a moving object, a similar block of pixels will exist in the next picture, but not in the same place. As Figure 7.51 shows, block matching simply moves the reference block around over the second picture looking for matching pixel values. When a match is found, the displacement needed to obtain it is used as a basis for a motion vector.

Whilst simple in concept, block matching requires an enormous amount of computation because every possible motion must be tested over the assumed range. Thus if the object is assumed to have moved over a 16-pixel range, then it will be necessary to test sixteen different horizontal displacements in each of sixteen vertical positions; in excess of 65 000 positions. At each position every pixel in the block must be compared with every pixel in the second picture. In typical video displacements of twice the figure quoted here may be found, particularly in sporting events, and the computation then required becomes enormous. If the motion is required to subpixel accuracy, then before any matching can be attempted the picture will need to be interpolated.

Figure 7.51 In block matching the search block has to be positioned at all possible relative motions within the search area and a correlation measured at each one.

7.20 Motion-compensated picture rate conversion

A conventional standards convertor is not transparent to motion portrayal, and the effect is judder and loss of resolution. Figure 7.52 shows what happens on the time axis in a conversion between 60 Hz and 50 Hz (in either direction). Fields in the two standards appear in different planes cutting through the spatio-temporal volume, and the job of the standards convertor is to interpolate along the time axis between input planes in one standard in order to estimate what an intermediate plane in the other standard would look like. With still images, this is easy, because planes can be slid up and down the time axis with no ill effect. If an object is moving, it will be in a different place in successive fields. Interpolating between several fields results in multiple images of the object. The position of the dominant image will not move smoothly, an effect which is perceived as judder. Motion compensation is designed to eliminate this undesirable judder.

Figure 7.52 The different temporal distribution of input and output fields in a 50/60Hz convertor.

A conventional standards convertor interpolates only along the time axis, whereas a motion-compensated standards convertor can swivel its interpolation axis off the time axis. Figure 7.53(a) shows the input fields in which three objects are moving in a different way. At (b) it will be seen that the interpolation axis is aligned with the optic flow axis of each moving object in turn.

Each object is no longer moving with respect to its own optic flow axis, and so on that axis it no longer generates temporal frequencies due to motion and temporal aliasing due to motion cannot occur.³ Interpolation along the optic flow axes will then result in a sequence of output fields in which motion is properly portrayed. The process requires a standards convertor which contains filters that are modified to allow the interpolation axis to move dynamically within each output field. The signals which move the interpolation axis are known as motion vectors. It is the job of the motion-estimation system to provide these motion vectors. The overall performance of the convertor is determined primarily by the accuracy of the motion vectors. An incorrect vector will result in unrelated pixels from several fields being superimposed and the result is unsatisfactory.

Figure 7.53 (a) Input fields with moving objects. (b) Moving the interpolation axes to make them parallel to the trajectory of each object.

Figure 7.54 shows the sequence of events in a motion-compensated standards convertor. The motion estimator measures movements between successive fields. These motions must then be attributed to objects by creating boundaries around sets of pixels having the same motion. The result of this process is a set of motion vectors, hence the term ‘vector assignation’. The motion vectors are then input to a modified four-field standards convertor in order to deflect the interfield interpolation axis.

The vectors from the motion estimator actually measure the distance moved by an object from one input field to another. What the standards convertor requires is the value of motion vectors at an output field.

Figure 7.54 The essential stages of a motion-compensated standards convertor.

Figure 7.55 The motion vectors on the input field structure must be interpolated onto the output field structure as in (a). The field to be interpolated is positioned temporally between source fields and the motion vector between them is apportioned according to the location. Motion vectors are two-dimensional, and can be transmitted as vertical and horizontal components shown at (b) which control the spatial shifting of input fields.

A vector interpolation stage is needed which computes where between the input fields A and B the current output field lies, and uses this to proportion the motion vector into two parts. Figure 7.55(a) shows that the first part is the motion between field A and the output field; the second is the motion between field B and the output field. Clearly the difference between these two vectors is the motion between input fields. These processed vectors are used to displace parts of the input fields so that the axis of interpolation lies along the optic flow axis. The moving object is stationary with respect to this axis so interpolation between fields along it will not result in any judder.

Whilst a conventional convertor only needs to interpolate vertically and temporally, a motion-compensated convertor also needs to interpolate horizontally to account for lateral movement in images. Figure 7.55(b) shows that the motion vector from the motion estimator is resolved into two components, vertical and horizontal. The spatial impulse response of the interpolator is shifted in two dimensions by these components. This shift may be different in each of the fields which contribute to the output field.

When an object in the picture moves, it will obscure its background. The vector interpolator in the standards convertor handles this automatically provided the motion estimation has produced correct vectors. Figure 7.56 shows an example of background handling. The moving object produces a finite vector associated with each pixel, whereas the stationary background produces zero vectors except in the area O – X where the background is being obscured.

Figure 7.56 Background handling. When a vector for an output pixel near a moving object is not known, the vectors from adjacent background areas are assumed. Converging vectors imply obscuring is taking place which requires that interpolation can only use previous field data. Diverging vectors imply that the background is being revealed and interpolation can only use data from later fields.

Vectors converge in the area where the background is being obscured, and diverge where it is being revealed. Image correlation is poor in these areas so no valid vector is assigned.

An output field is located between input fields, and vectors are projected through it to locate the intermediate position of moving objects. These are interpolated along an axis which is parallel to the optic flow axis. This results in address mapping which locates the moving object in the input field RAMs. However, the background is not moving and so the optic flow axis is parallel to the time axis. The pixel immediately below the leading edge of the moving object does not have a valid vector because it is in the area O – X where forward image correlation failed.

The solution is for that pixel to assume the motion vector of the background below point X, but only to interpolate in a backwards direction, taking pixel data from previous fields. In a similar way, the pixel immediately behind the trailing edge takes the motion vector for the background above point Y and interpolates only in a forward direction, taking pixel data from future fields. The result is that the moving object is portrayed in the correct place on its trajectory, and the background around it is filled in only from fields which contain useful data.

The technology of the motion-compensated standards convertor can be used in other applications. When video recordings are played back in slow motion, the result is that the same picture is displayed several times, followed by a jump to the next picture. Figure 7.57 shows that a moving object would remain in the same place on the screen during picture repeats, but jump to a new position as a new picture was played. The eye attempts to track the moving object, but, as Figure 7.57 also shows, the location of the moving object wanders with respect to the trajectory of the eye, and this is visible as judder.

Motion-compensated slow-motion systems are capable of synthesizing new images which lie between the original images from a slowmotion source. Figure 7.58 shows that two successive images in the original recording (using DVE terminology, these are source fields) are fed into the unit, which then measures the distance travelled by all moving objects between those images. Using interpolation, intermediate fields (target fields) are computed in which moving objects are positioned so that they lie on the eye trajectory. Using the principles described above, background information is removed as moving objects conceal it, and replaced as the rear of an object reveals it. Judder is thus removed and motion with a fluid quality is obtained.

Figure 7.57 Conventional slow motion using field repeating with stationary eye shown at (a). With tracking eye at (b) the source of judder is seen.

Figure 7.58 In motion-compensated slow motion, output fields are interpolated with moving objects displaying judder-free linear motion between input fields.

7.21 Motion-compensated telecine system

Figure 7.59(a) shows the time axis of film, where entire frames are simultaneously exposed, or sampled, at typically 24 Hz. The result is that the image is effectively at right angles to the time axis. During filming, some of the frame period is required to transport the film, and the shutter is closed whilst this takes place. The temporal aperture or exposure is thus somewhat shorter than the frame period.

When displayed in the cinema, each frame of a film is generally projected twice to produce a flicker frequency of 48 Hz. The result with a moving object is that the motion is not properly portrayed and there is judder. Figure 7.59(b) shows the origin of the judder.

Figure 7.59 (a) The spatio-temporal characteristic of film. Note that each frame is repeated twice on projection. (b) The frame repeating results in motion judder as shown here.

Figure 7.59 (c) Telecine machines must use 3:2 pulldown to produce 60 Hz field rate video.

The same effect is evident if film is displayed on a CRT via a conventional telecine machine. In telecine the film is transported at 25 fps and each frame results in two fields in 50 Hz standards and this will result in judder as well. In 60 Hz telecine the film travels at 24 fps, but odd frames result in three fields, even frames result in two fields; the wellknown 3/2 pulldown. Motion portrayal (or lack of it) in this case is shown in Figure 7.59(c).

In fact the telecine machine is a perfect application for motion compensation. As Figure 7.60 shows, each film frame is converted to a progressive scan image in a telecine machine, and then a motioncompensated standards conversion process is used to output whatever frame rate is required without judder, leading to much-improved subjective quality.

Figure 7.60 A film with a frame rate of 24 Hz cannot be displayed directly because of flicker. Using a motion-compensated standards conversion process extra frames can be synthesized in which moving objects are correctly positioned. Any television picture rate can then be obtained from film.

If the original film is not available, 50 and 60 Hz video recording can be used. In the case of 50 Hz, pairs of fields are combined to produce progressively scanned frames. In the case of 60 Hz, the third field in the 3/2 sequence is identified and discarded, prior to de-interlacing. Motioncompensated rate conversion then proceeds as before.

7.22 Camera shake compensation

As video cameras become smaller and lighter, it becomes increasingly difficult to move them smoothly and the result is camera shake. This is irritating to watch, as well as requiring a higher bit rate in compression systems. There are two solutions to the problem, one which is contained within the camera, and one which can be used at some later time on the video data.

Figure 7.61 shows that image-stabilizing cameras contain miniature gyroscopes which produce an electrical output proportional to their rate of turn about a specified axis.

Figure 7.61 Image-stabilizing cameras sense shake using a pair of orthogonal gyros which sense movement of the optical axis.

A pair of these, mounted orthogonally, can produce vectors describing the camera shake. This can be used to oppose the shake by shifting the image. In one approach, the shifting is done optically. Figure 7.62 shows a pair of glass plates with the intervening space filled with transparent liquid. By tilting the plates a variable angle prism can be obtained and this is fitted in the optical system before the sensor.

Figure 7.62 Image-stabilizing cameras. (a) The image is stabilized optically prior to the CCD sensors. (b) The CCD output contains image shake, but this is opposed by the action of a DVE configured to shift the image under control of the gyro inputs.

If the prism plates are suitably driven by servos from the gyroscopic sensors, the optical axis along which the camera is looking can remain constant despite shake. Shift is also possible by displacing some of the lens elements.

Alternatively, the camera can contain a DVE where the vectors from the gyroscopes cause the CCD camera output to be shifted horizontally or vertically so that the image remains stable. This approach is commonly used in consumer camcorders.

A great number of video recordings and films already exist in which there is camera shake. Film also suffers from weave in the telecine machine. In this case the above solutions are inappropriate and a suitable signal processor is required. Figure 7.63 shows that motion compensation can be used. If a motion estimator is arranged to find the motion between a series of pictures, camera shake will add a fixed component in each picture to the genuine object motions. This can be used to compute the optic flow axis of the camera, independently of the objects portrayed.

Operating over several pictures, the trend in camera movement can be separated from the shake by filtering, to produce a position error for each picture. Each picture is then shifted in a DVE in order to cancel the position error. The result is that the camera shake is gone and the camera movements appear smooth. In order to prevent the edges of the frame moving visibly, the DVE also performs a slight magnification so that the edge motion is always outside the output frame.

Figure 7.63 In digital image stabilizing the optic flow axis of objects in the input video is measured as in (a). This motion is smoothed to obtain a close approximation to the original motion (b). If this is subtracted from (a) the result is the camera shake motion (c) which is used to drive the image stabilizer.

7.23 Motion-compensated de-interlacing

The most efficient way of de-interlacing is to use motion compensation. Figure 7.64 shows that when an object moves in an interlaced system, the interlace breaks down with respect to the optic flow axis as was seen in section 2.17. If the motion is known, two or more fields can be shifted so that a moving object is in the same place in both. Pixels from both fields can then be used to describe the object with better resolution than would be possible from one field alone. It will be seen from Figure 7.65 that the combination of two fields in this way will result in pixels having a highly irregular spacing and a special type of filter is needed to convert this back to a progressive frame with regular pixel spacing.

Figure 7.64 In the presence of vertical motion or motion having a vertical component, interlace breaks down and the pixel spacing with respect to the tracking eye becomes irregular.

Figure 7.65 A de-interlacer needs an interpolator which can operate with input samples which are positioned arbitrarily rather than regularly.

At some critical vertical speeds there will be alignment between pixels in adjacent fields and no improvement is possible, but at other speeds the process will always give better results.

7.24 Aspect ratios

When television was in its infancy the production of cathode ray tubes (CRTs) was quite difficult because of the huge stress set up by the internal vacuum. Early tubes were circular as this shape resists pressure well. Early lenses had restricted coverage, and as the difficulty of obtaining coverage rises with distance from the optical axis, lens design is also eased by a circular picture. A square picture gives the largest area within a circle. Whatever the designers of television systems might have wanted for a picture aspect ratio, they had to compromise by choosing a rectangular shape which was close to the ideal square and this led to the 4:3 aspect ratio. Early film formats were also close to square for coverage reasons.

Now that lens and CRT design has advanced, these restrictions are no longer so severe. Newer types of display do not suffer the mechanical constraints of CRTs. As a result it is possible to have a wider picture without serious loss of quality and the aspect ratio of television will change to 16:9.

Figure 7.66 shows how a given picture is mapped onto an analog TV waveform. Neglecting the blanking interval which is needed for tube flyback, the distance across the picture is proportional to the time elapsed along the active line.

Figure 7.66 In all video standards, the source picture is mapped into a fixed percentage of the total line duration so that distance across the screen has a fixed proportional relationship to time along the line.

The camera will break the picture into a standard number of lines, and again neglecting the vertical blanking interval, the distance down the picture will be proportional to the time through the frame in a non-interlaced system. If the format has a fixed number of lines per frame, the aspect ratio of the video format reflects in the ratio of the horizontal and vertical scan speeds. Neglecting blanking, the ratio of horizontal to vertical scan speed in an ideal 625-line system having a square picture would be 625 to 1. In a 4:3 system it would be more like 830 to 1. In a 16:9 system it would be about 1100 to 1. A viewpoint of this kind is useful because it is size independent. The picture can be any size as both axes then scale by the same amount.

Clearly if the display is to be compatible with the resultant video format, it must have the same aspect ratio so that the vertical and horizontal mapping retains the correct relationship. If this is done, objects portrayed on the display have the same shape as they had in the original picture. If it is not done correctly there will be distortion. Most test cards contain a circular component to test for this distortion as it is easy to see non-circularity. If a circular object in front of a camera apears circular on the display, their scanning is compatible because both have the same aspect ratio. This test, however, does NOT mean that both camera and display are meeting any standard. For example, both camera and display could be maladjusted to underscan by 10 per cent horizontally, yet the circularity test would still succeed. Thus the aspect ratio compatibility test should be made by checking the the display with an electronically generated precision circle prior to assessing the camera output. Any discrepancy would then be removed by adjusting the camera.

Figure 7.67 Using a higher scan speed, a longer source line can be mapped onto a standard active line. Note that for a given resolution, more bandwidth will be required because more cycles of detail will be scanned in a given time.

Figure 7.67 shows how a 16:9 picture is mapped onto a video signal. If the frame rate and the number of lines in the frame is kept the same, the wider picture is obtained by simply increasing the horizontal scan speed at the camera. This allows the longer line to be scanned within the existing active line period. A 16:9 CRT will display the resulting signal with correct circularity.

Any television camera can instbe adapted to work in this way by fitting an anamorphic lens with a ratio of 1.333 …:1 which maps a 16:9 picture onto a 4:3 sensor. Clearly the viewfinder will need modification to reduce its vertical scan to 0.75 of its former deflection.

Redefinition of the scanning speed ratio at the camera has produced a different video standard which is now incompatible with 4:3 displays even though a waveform monitor confirms that it meets all the timing specifications. By stretching the horizontal scan the video has been rendered anamorphic. Figure 7.68(a) shows the result of displaying 16:9 video on a 4:3 monitor. The incompatible mapping causes circularity failure. Circular objects appear as vertical ellipses. Figure 7.68(b) shows the result of displaying 4:3 video on a 16:9 monitor. Circular objects appear as horizontal ellipses.

Figure 7.68 Displaying 16:9 video on a 4:3 monitor (a) results in a horizontal compression. The reverse case shown in (b) causes a horizontal stretch.

A form of standards convertor is needed which will allow interchange between the two formats. There are two basic applications of such convertors as can be seen in Figure 7.69. If 16:9 cameras and production equipment are used, an aspect ratio convertor is needed to view material on 4:3 monitors and to obtain a traditional broadcast output. Alternatively, conventional cameras can be used with a large safe area at top and bottom of the picture. 4:3 equipment is used for production and the aspect ratio convertor is then used to obtain a 16:9 picture output.

The criterion for conversion must be that circularity has to be maintained otherwise the pictures will appear distorted. Thus an aspect ratio convertor must change the aspect ratio of the picture frame, without changing the aspect ratio of portrayed objects.

If circularity is maintained, something else has to go. Figure 7.70(a) shows the result of passing 16:9 into a convertor for 4:3 display. If the screen must be filled, the convertor must perform a horizontal transform of 1.333 …:1 to maintain circularity. The result of doing this alone is that the edges of the input picture are lost as they will be pushed outside the active line length. This may be acceptable if a pan/scan control is available. Alternatively, if no part of the image can be lost, the convertor must perform a vertical transform of 0.75:1. This will result in the vertical blanking area of the 16:9 input entering the 4:3 screen area and the result will be black bars above and below the picture.

Figure 7.69 Two different approaches to dual-aspect ratio production which will be necessary during the change-over period.

Figure 7.70(b) shows the reverse conversion process where 4:3 is being converted to 16:9. Again if ‘full screen’ mode is required, the convertor must perform a vertical transform of 1.333 …:1 to maintain circularity. This pushes the top and bottom of the input picture into 16:9 blanking. If the 4:3 material was shot with 16:9 safe areas this is no problem.

Figure 7.70 The extreme cases of 16:9 and 4:3 interchange are at the ‘full image’ and ‘full screen’ points.

However, if the input was intended for 4:3 it may have wanted detail near the top or bottom of the picture and a tilt (vertical pan) control may be provided to select the area which appears in the output. If no part of the image can be lost, i.e ‘full image’ mode is required, a horizontal transform of 0.75:1 is needed, and this must result in the horizontally blanked areas of the 4:3 input entering the 16:9 screen area.

The above steps represent the two extremes or full screen or no image loss. In practice there is a scale between those extremes in which the black bars can be made smaller in one axis by an image loss in the other axis. In practice, then, an aspect ratio convertor needs to perform vertical and horizontal transforms which may be magnification, translation or both. In order to maintain circularity, the ratio between the horizontal and vertical magnifications can only have three values, 1.333 …:1 for 16:9 to 4:3 conversion, 1:1 for bypass and 0.75:1 for 4:3 to 16:9 conversion. Thus having selected the mode, a single magnification control would vary the conversion between ‘full screen’ and ‘full image’ modes. When not in full image mode, pan and tilt controls allow the user to select the part of the input image which appears in the output.

References

1. EBU Doc. Tech. 3246

2. SMPTE 125M, Television – Bit Parallel Digital Interface – Component Video Signal 4:2:2

3. Lau, H. and Lyon, D., Motion compensated processing for enhanced slow motion and standards conversion. IEE Conf. Publ. No. 358, 62–66 (1992)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
Chapter 7 Image portrayal

7
Image portrayal

7.1 Introduction