image

INTRODUCTION

Light is a form of radiation and physical laws have been constructed to explain its behaviour. The general science of radiation is called radiometry. However, physical laws cannot explain the sense we call vision or the impression of colour. For applications of imaging technology such as television and cinema, light is what can be seen by a human being and this is the subject of photometry. In that context, any discussion must include the characteristics of the eye in all the relevant domains. Once the operation of the human visual system (HVS) is understood, it will be clear that, to obtain realism, imaging quality has to meet adequate criteria in a number of domains. These include at least contrast, noise level, colour accuracy, static and dynamic resolution, flicker, and motion portrayal. Once these topics are appreciated, it then becomes possible to analyse today's popular imaging technologies to see why they all look different and to suggest a way forward to a new level of realism that will be expected in applications such as simulators and electronic cinema. Figure 2.1 shows some of the interactions between domains that complicate matters. Technically, contrast exists only in the brightness domain and is independent of resolution, which exists in the image plane. In the HVS, the subjective parameter of sharpness is affected by both and so these cannot be treated separately. Sharpness is also affected by the accuracy of motion portrayal. It would appear that colour vision evolved later as an enhancement to monochrome vision. The resolution of the eye to colour changes is very poor.

image

FIGURE 2.1

The various domains in which images can be analysed are not independent in the human visual system. Some of the interactions are shown here.

image

FIGURE 2.2

The luminous efficiency function shows the response of the HVS to light of different wavelengths.

WHAT IS LIGHT?

Electromagnetic radiation exists over a fantastic range of frequencies, f, and corresponding wavelengths, l, connected to the speed of light, c, by the equation

c = f × λ.

The HVS has evolved to be sensitive to a certain range of frequencies, which we call light. The frequencies are extremely high and it is the convention in optics to describe the wavelength instead.

Figure 2.2 shows that the HVS responds to radiation in the range of 400 to 700 nanometres (nm = m × 10−9) according to a curve known as a luminous efficiency function having a value defined as unity at the peak that occurs at a wavelength of 555 nm under bright light conditions. Within that range different distributions of intensity with respect to wavelength exist, which are called spectral power distributions or SPDs. The variations in SPD give rise to the sensation that we call colour. A narrowband light source with a wavelength of 400 nm appears violet, and shorter wavelengths are called ultraviolet. Similarly light with a wavelength of 700 nm appears red and longer wavelengths are called infrared. Although we cannot see infrared radiation, we can feel it as the sensation of heat.

image

FIGURE 2.3

The radiated spectrum of a black body changes with temperature.

SOURCES OF LIGHT

Light sources include a wide variety of heated bodies, from the glowing particles of carbon in candle flames to the sun. Radiation from a heated body covers a wide range of wavelengths. In physics, light and radiant heat are the same thing, differing only in wavelength, and it is vital to an understanding of colour to see how they relate. This was first explained by Max Planck, who proposed the concept of a black body. Being perfectly nonreflective, the only radiation that could come from it would be due to its temperature. Figure 2.3 shows that the intensity and spectrum of radiation from a body are a function of the temperature. The peak of the distribution at each temperature is found on a straight line according to Wien's law.

Radiation from the sun contains ultraviolet radiation, but this is (or was) strongly scattered by the earth's atmosphere and is accordingly weak. Incidentally this scattering of short wavelengths is why the sky appears blue. As temperature falls, the intensity of the radiation becomes too low to be useful. The wavelength range of human vision evolved to sense a reasonable dynamic range of black-body radiation between practical limits.

image

FIGURE 2.4

The spectra of Figure 2.3 normalized to the same intensity at midscale to show the red distribution at low temperatures changing to blue at very high temperatures.

The concept of colour temperature follows from Planck's work. Figure 2.4 shows a different version of Figure 6.3 in which the SPDs have been scaled so they all have the same level at one wavelength near the centre of the range of the HVS. A body at a temperature of around 3000° Kelvin (K) radiates an SPD centred in the infrared, and the HVS perceives only the lefthand end of the distribution as the colour red, hence the term “red hot.” As the temperature increases, at about 5000°K the peak of the SPD aligns with the peak of the sensitivity of the HVS and we see white, hence the term “white hot.” Red hot and white hot are a layman's colour temperature terms. A temperature of 9000°K takes the peak of the SPD into the ultraviolet and we see the righthand end of the distribution as blue. The term “blue hot” is not found because such a temperature is not commonly reached on Earth.

It is possible to characterize a thermal illuminant or source of light simply by specifying the temperature in degrees K of a black body that appears to be the same colour to a human observer. Non-thermal illuminants such as discharge lamps may be given an equivalent colour temperature, but their SPD may be quite different from that of a heated body. Although the radiation leaving the sun is relatively constant, the radiation arriving on earth varies throughout the day. Figure 2.5a shows that at midday, the sun is high and the path through the atmosphere is short. The amount of scattering at the blue end of the spectrum is minimal and the light has a bluish quality. However, at the end of the day, the sun is low and the path through the atmosphere is much longer, as Figure 2.5b shows. The extent of blue scattering is much greater and the remaining radiation reaching the observer becomes first orange as the sun gets low and finally red as it sets. Thus the colour temperature of sunlight is not constant. In addition to the factors mentioned, clouds will also change the colour temperature. Light can also be emitted by atoms in which electrons have been raised from their normal, stable, orbit to one of higher energy by some form of external stimulus other than heat, which could be ultraviolet light or electrical.

image

FIGURE 2.5

(a) At midday the path of sunlight through the atmosphere is short. (b) When the sun is low the path through the atmosphere is longer, making the effect of blue scattering more obvious.

Electrons that fall back to the valence band emit a quantum of energy as a photon whose frequency is proportional to the energy difference between the bands. The process is described by Planck's law,

energy difference E = H × f,

where H is Planck's constant, 6.6262 × 10−34 Joules/Hertz.

The wavelength of the light emitted is a function of the characteristics of a particular atom, and a great variety exists. The SPD of light sources of this kind is very narrow, appearing as a line in the spectrum. Some materials are monochromatic, whereas some have two or more lines. Useful and efficient illuminants can be made using mixtures of materials to increase the number of lines, although the spectrum may be far from white in some cases. Such illuminants can be given an effective colour temperature, even though there is nothing in the light source at that temperature. The colour temperature is that at which a black body and the illuminant concerned give the same perceived result to the HVS. This type of light generation is the basis of mercury and sodium lights, fluorescent lights, Day-Glo paint, the aurora borealis, whiteners in washing powder, phosphors in CRT and plasma displays, lasers, and LEDs. It should be noted that although these devices have colour temperatures as far as the HVS is concerned, their line spectrum structure may cause them to have unnatural effects on other colour-sensitive devices such as film and TV cameras.

OPTICAL PRINCIPLES

The wave theory of light suggests that a wavefront advances because an infinite number of point sources can be considered to emit spherical waves, which will add only when they are all in the same phase. This can occur only in the plane of the wavefront. Figure 2.6 shows that at all other angles, interference between spherical waves is destructive. Note the similarity with sound propagation described in Chapter 5. When such a wavefront arrives at an interface with a denser medium, such as the surface of a lens, the velocity of propagation is reduced; therefore the wavelength in the medium becomes shorter, causing the wavefront to leave the interface at a different angle (Figure 2.7). This is known as refraction. The ratio of velocity in vacuo to velocity in the medium is known as the refractive index of that medium; it determines the relationship between the angles of the incident and the refracted wavefronts. Reflected light, however, leaves at the same angle to the normal as the incident light. If the speed of light in the medium varies with wavelength, dispersion takes place, in which incident white light will be split into a rainbow-like spectrum, leaving the interface at different angles. Glass used for chandeliers and cut glass is chosen to be highly dispersive, whereas glass for lenses in cameras and projectors will be chosen to have a refractive index that is as constant as possible with changing wavelength. The use of monochromatic light allows low-cost optics to be used as they need to be corrected for only a single wavelength. This is done in optical disk pickups and in colour projectors that use one optical system for each colour.

image

FIGURE 2.6

Plane-wave propagation considered as infinite numbers of spherical waves.

In natural light, the electric-field component will be in many planes. Light is said to be polarized when the electric field direction is constrained. The wave can be considered as made up of two orthogonal components. When these are in phase, the polarization is said to be linear. When there is a phase shift between the components, the polarization is said to be elliptical, with a special case at 90° called circular polarization. These types of polarization are contrasted in Figure 2.8. To create polarized light, anisotropic materials are necessary. Polaroid material, invented by Edwin Land, is vinyl that is made anisotropic by stretching it whilst hot. This causes the long polymer molecules to line up along the axis of stretching. If the material is soaked in iodine, the molecules are rendered conductive and short out any electric-field component along themselves. Electric fields at right angles are unaffected; thus the transmission plane is at right angles to the stretching axis.

image

FIGURE 2.7

Reflection and refraction, showing the effect of the velocity of light in a medium.

image

FIGURE 2.8

(a) Linear polarization: orthogonal components are in phase. (b) Circular polarization: orthogonal components are in phase quadrature.

Stretching plastics can also result in anisotropy of the refractive index; this effect is known as birefringence. If a linearly polarized wavefront enters such a medium, the two orthogonal components propagate at different velocities, causing a relative phase difference proportional to the distance travelled. The plane of polarization of the light is rotated. Where the thickness of the material is such that a 90° phase change is caused, the device is known as a quarter-wave plate. The action of such a device is shown in Figure 2.9. If the plane of polarization of the incident light is at 45° to the planes of greatest and least refractive index, the two orthogonal components of the light will be of equal magnitude, and this results in circular polarization. Similarly, circular-polarized light can be returned to the linear-polarized state by a further quarter-wave plate. Rotation of the plane of polarization is a useful method of separating incident and reflected light in a laser disk pickup. Using a quarter-wave plate, the plane of polarization of light leaving the pickup will have been turned 45°, and on return it will be rotated a further 45°, so that it is now at right angles to the plane of polarization of light from the source. The two can easily be separated by a polarizing prism, which acts as a transparent block to light in one plane, but as a prism to light in the other plane, such that reflected light is directed toward the sensor.

image

FIGURE 2.9

Different speeds of light in different planes rotate the plane of polarization in a quarter-wave plate to give a circularly polarized output.

PHOTOMETRIC UNITS

Radiometric and photometric units are different because the latter are affected by the luminous efficiency function of the eye. Figure 2.10 shows the two sets of units for comparison. Figure 2.11 shows an imaginary point light source radiating equally in all directions. An imaginary sphere surrounds the source. The source itself has a power output, measured in Watts, and this power uniformly passes through the area of the sphere, so the power per unit area will follow an inverse square law. Power per unit area is known as intensity, with units of Watts per square metre. Given a surface radiating with a certain intensity, viewed at right angles to the surface the maximum brightness would be measured. Viewed from any other angle the brightness would fall off as a cosine function. The above units are indifferent to wavelength and whether the HVS can see the radiation concerned. In photometry, the equivalent of power is luminous flux, whose unit is the lumen, the equivalent of intensity is luminous intensity measured in candela, and the equivalent of brightness is luminance, measured in nits.

image

FIGURE 2.10

Radiometric and photometric units compared.

image

FIGURE 2.11

An imaginary point source radiating through a spherical area is helpful to visualize the units used to measure light.

It is difficult to maintain a standard of luminous flux, so instead the candela (cd) is defined. The candela replaced the earlier unit of candle power and is defined in such a way as to make the two units approximately the same. One square centimetre of platinum at its freezing point of 2042°K radiates 60 cd. The lumen is defined as the luminous flux radiated over a unit solid angle by a source whose intensity is one candela. The nit is defined as one candela per square metre. As an example, a CRT may reach 200–300 nits.

In an optical system, the power of the source is often concentrated in a certain direction and so for a fixed number of candela the brightness in that direction would rise. This is the optical equivalent of forward gain in an antenna.

image

image

FIGURE 2.12

(a) Three sources producing the same number of lumens produce different amounts of luminous flux. (b) Three sources having the same luminous flux (not the same number of lumens) appear white to the eye.

The lumen (lm) is a weighted value based on the luminous efficiency function of the HVS. Thus the same numerical value in lumens will appear equally bright to the HVS whatever the colour. If three sources of light, red, green, and blue, each of one lumen, are added, the total luminous flux will be three lumen but the result will not appear white. It is worthwhile discussing this in some detail. Figure 2.12a shows three monochromatic light sources of variable intensity that are weighted by the luminous efficiency function of the HVS to measure the luminous flux correctly. To obtain one lumen from each source, the red and blue sources must be set to produce more luminous flux than the green source. This means that the spectral distribution of the source is no longer uniform and so it will not appear white. In contrast, Figure 2.12b shows three sources that have the same luminous flux. After being weighted by the luminous efficiency function, each source produces a different number of lumens, but the eye perceives the effect as white. Essentially the eye has a nonuniform response, but in judging colour it appears to compensate for that so that a spectrum that is physically white, i.e., having equal luminous flux at all visible wavelengths, also appears white to the eye. As a consequence it is more convenient to have a set of units in which equal values result in white. These are known as tristimulus units and are obtained by weighting the value in lumens by a factor that depends on the response of the eye to each of the three wavelengths. The weighting factors add up to unity so that three tristimulus units, one of each colour, when added together produce one lumen.

MTF, CONTRAST, AND SHARPNESS

All imaging devices, including the eye, have finite performance, and the modulation transfer function (MTF) is a way of describing the ability of an imaging system to carry detail. The MTF is essentially an optical frequency response and is a function of depth of contrast with respect to spatial frequency. Prior to describing the MTF it is necessary to define some terms used in assessing image quality.

image

image

image

FIGURE 2.13

(a) The definition of contrast index (CI). (b) Frequency sweep test image having constant CI. (c) MTF is the ratio of output and input CIs.

Spatial frequency is measured in cycles per millimetre (mm 1). Contrast index (CI) is shown in Figure 2.13a. The luminance variation across an image has peaks and troughs and the relative size of these is used to calculate the contrast index as shown. A test image can be made having the same contrast index over a range of spatial frequencies as shown in Figure 2.13b. If a non-ideal optical system is used to examine the test image, the output will have a contrast index that falls with rising spatial frequency.

The ratio of the output CI to the input CI is the MTF as shown in Figure 2.13c. In the special case in which the input CI is unity, the output CI is identical to the output MTF. It is common to measure resolution by quoting the frequency at which the MTF has fallen to one-half. This is known as the 50 percent MTF frequency. The limiting resolution is defined as the point at which the MTF has fallen to 10 percent.

Whilst MTF resolution testing is objective, human vision is subjective and gives an impression we call sharpness. However, the assessment of sharpness is affected by contrast. Increasing the contrast of an image will result in an increased sensation of sharpness even though the MTF is unchanged. When CRTs having black areas between the phosphors were introduced, it was found that the improved contrast resulted in subjectively improved sharpness even though the MTF was unchanged.

Similar results are obtained with CRTs having non-reflective coatings. The perceived contrast of a display is also a function of the surroundings. Displays viewed in dark surroundings, such as cinema film and transparencies, appear to lack contrast, whereas when the same technical contrast is displayed with light surroundings, the contrast appears correct. This is known as the surround effect. It can be overcome by artificially expanding the contrast prior to the display. This will be considered later when the subject of gamma is treated.

THE HUMAN VISUAL SYSTEM

The HVS evolved as a survival tool. A species that could use vision to sense an impending threat or to locate food or a mate would have an obvious advantage. From an evolutionary standpoint, using the visual system to appreciate art or entertainment media is very recent.

The HVS has two obvious transducers, namely the eyes, coupled to a series of less obvious but extremely sophisticated processes, which take place in the brain. The result of these processes is what we call sight, a phenomenon that is difficult to describe. At an average reading distance of 350 mm, the letters in this book subtend an angle to the eye of about a third of a degree. The lines from which the letters are formed are about one-tenth of a millimetre across and subtend an angle of about one minute (one-sixtieth of a degree). The field of view of the HVS is nearly a hemisphere. A short calculation will reveal how many pixels would be needed to convey that degree of resolution over such a wide field of view. The result is simply staggering. If we add colour and we also wish to update all those pixels to allow motion, it is possible to estimate what bandwidth would be needed. The result is so large that it is utterly inconceivable that the nerves from the eye to the brain could carry so much data, or that the brain could handle it. Clearly the HVS does not work in this way. Instead the HVS does what the species finds most useful. It helps create a model in the mind of the reality around it.

Figure 2.14 shows the concept. The model can be considered like a kind of three-dimensional frame store in which objects are stored as the HVS identifies them. Inanimate objects are so-called because they do not move. They can be modelled once and left in the model until there is evidence to suggest that there has been a change. In contrast, animate objects need more attention, because they could be bringing benefit or detriment. The HVS solves both of these requirements with the same mechanism. The eyes can swivel to scan the environment and their owner can move within it. This scanning process allows the model to be built using eyes with a relatively narrow field of view. Within this narrow field of view, the provision of high resolution and colour vision does not require an absurd bandwidth, although it does require good lighting. Although the pixels are close together, the total number is fairly small.

image

FIGURE 2.14

The human concept of reality can be likened to a three-dimensional store in the mind in which objects are placed as they are recognized. Moving objects attract the attention because they need to be updated in the model.

Such narrow vision alone is not useful because events outside the field of vision do not alert the HVS to the need for an update of the model. Thus in addition there is a wider field of view, which has relatively poor resolution and is colourblind, but which works at low light levels and responds primarily to small changes or movements. Sitting at a laptop computer writing these words, I can see only a small part of the screen in detail. The rest of the study is known only from the model. On my right is a mahogany bracket clock, but in peripheral vision it appears as a grey lump. However, in my mind the wood and the brass are still the right colour. The ticking of the clock is coming from the same place in the model as the remembered object, reinforcing the illusion.

If I were to be replaced with a camera and a stereo microphone, and the two then turned to the right toward the clock, the visual image and the sound image would both move left. However, if I myself turn right this doesn't happen. The signals from the balance organs in the ear, the sound image model, and the visual model produce data consistent with the fact that it was I that moved and the result is that the model doesn't move. Instead I have become another object in the model and am moving within it. The advantage of this detached approach is that my limbs are included in the model so that I can see an object and pick it up.

This interaction between the senses is very strong and disparities between the senses are a powerful clue that one is being shown an illusion. In advanced systems for use in electronic cinema or flight simulators, it is vital to maintain accurate tracking between the visual image, the sound image, and the sense of balance. Disparities that are not obvious may result in fatigue or nausea.

One consequence of seeing via a model is that we often see what we expect to see rather than what is before us. Optical illusions demonstrate this, and Maurits Escher turned it into an art form. The technique of camouflage destroys familiar shapes and confuses the modelling process. Animals and birds may freeze when predators approach because their lack of motion doesn't trigger peripheral vision.

THE EYE

All television signals ultimately excite some response in the eye and the viewer can describe the result only subjectively. Familiarity with the operation and limitations of the eye is essential to an understanding of television principles.

The simple representation of Figure 2.15 shows that the eyeball is nearly spherical and is swivelled by muscles so that it can track movement. This has a large bearing on the way moving pictures are reproduced. The space between the cornea and the lens is filled with transparent fluid known as aqueous humour. The remainder of the eyeball is filled with a transparent jelly known as vitreous humour. Light enters the cornea, and the amount of light admitted is controlled by the pupil in the iris. Light entering is involuntarily focused on the retina by the lens in a process called visual accommodation. The lens is the only part of the eye that is not nourished by the bloodstream and its centre is technically dead. In a young person the lens is flexible and muscles distort it to perform the focusing action. In old age the lens loses some flexibility and causes presbyopia or limited accommodation. In some people the length of the eyeball is incorrect, resulting in myopia (shortsightedness) or hypermetropia (longsightedness). The cornea should have the same curvature in all meridia, and if this is not the case, astigmatism results.

image

FIGURE 2.15

A simple representation of an eyeball. See text for details.

The retina is responsible for light sensing and contains a number of layers. The surface of the retina is covered with arteries, veins, and nerve fibres and light has to penetrate these to reach the sensitive layer. This contains two types of discrete receptors known as rods and cones from their shape. The distribution and characteristics of these two receptors are quite different. Rods dominate the periphery of the retina, whereas cones dominate a central area known as the fovea, outside which their density drops off. Vision using the rods is monochromatic and has poor resolution but remains effective at very low light levels, whereas the cones provide high resolution and colour vision but require more light. Figure 2.16 shows how the sensitivity of the retina slowly increases in response to entering darkness. The first part of the curve is the adaptation of cone or photopic vision. This is followed by the greater adaptation of the rods in scotopic vision. At such low light levels the fovea is essentially blind and small objects that can be seen in the peripheral rod vision disappear when stared at.

image

FIGURE 2.16

Retinal sensitivity changes after sudden darkness. The initial curve is due to adaptation of cones. At very low light levels cones are blind and monochrome rod vision takes over.

The cones in the fovea are densely packed and directly connected to the nervous system, allowing the highest resolution. Resolution then falls off away from the fovea. As a result the eye must move to scan large areas of detail. The image perceived is not just a function of the retinal response, but is also affected by processing of the nerve signals. The overall acuity of the eye can be displayed as a graph of the response plotted against the degree of detail being viewed. Image detail is generally measured in lines per millimetre or cycles per picture height, but this takes no account of the distance from the image to the eye. A better unit for eye resolution is one based upon the subtended angle of detail, as this will be independent of distance. Units of cycles per degree are then appropriate. Figure 2.17 shows the response of the eye to static detail. Note that the response to very low frequencies is also attenuated. An extension of this characteristic allows the vision system to ignore the fixed pattern of shadow on the retina due to the nerves and arteries.

The retina does not respond instantly to light, but requires between 0.15 and 0.3 second before the brain perceives an image. The resolution of the eye is primarily a spatio-temporal compromise. The eye is a spatial sampling device; the spacing of the rods and cones on the retina represents a spatial sampling frequency. The measured acuity of the eye exceeds the value calculated from the sample site spacing because a form of oversampling is used.

image

FIGURE 2.17

Response of the eye to different degrees of detail.

The eye is in a continuous state of unconscious vibration called saccadic motion. This causes the sampling sites to exist in more than one location, effectively increasing the spatial sampling rate provided there is a temporal filter that is able to integrate the information from the various positions of the retina.

This temporal filtering is responsible for “persistence of vision.” Flashing lights are perceived to flicker until the critical flicker frequency (CFF) is reached; the light appears continuous for higher frequencies. The CFF is not constant but varies with brightness. Note that the field rate of European television at 50 fields per second is marginal with bright images. Film projected at 48 Hz works because cinemas are darkened and the screen brightness is actually quite low. Figure 2.18 shows the two-dimensional or spatiotemporal response of the eye.

If the eye were static, a detailed object moving past it would give rise to temporal frequencies, as Figure 2.19a shows. The temporal frequency is given by the detail in the object, in lines per millimetre, multiplied by the speed. Clearly a highly detailed object can reach high temporal frequencies even at slow speeds, yet Figure 2.18 shows that the eye cannot respond to high temporal frequencies.

However, the human viewer has an interactive visual system, which causes the eyes to track the movement of any object of interest. Figure 2.19b shows that when eye tracking is considered, a moving object is rendered stationary with respect to the retina so that temporal frequencies fall to zero and much the same acuity to detail is available despite motion. This is known as dynamic resolution and it is how humans judge the detail in real moving pictures. Dynamic resolution will be considered in the next section.

image

FIGURE 2.18

The response of the eye shown with respect to temporal and spatial frequencies. Note that even slow relative movement causes a serious loss of resolution. The eye tracks moving objects to prevent this loss.

image

image

FIGURE 2.19

In (a) a detailed object moves past a fixed eye, causing temporal frequencies beyond the response of the eye. This is the cause of motion blur. In (b) the eye tracks the motion and the temporal frequency becomes zero. Motion blur cannot then occur.

GAMMA

The true brightness of a television picture can be affected by electrical noise on the video signal. As contrast sensitivity is proportional to brightness, noise is more visible in dark picture areas than in bright areas. For economic reasons, video signals have to be made nonlinear to render noise less visible. An inverse gamma function takes place at the camera so that the video signal is nonlinear for most of its journey. Figure 2.20 shows a reverse gamma function. As a true power function requires infinite gain near black, a linear segment is substituted. It will be seen that contrast variations near black result in larger signal amplitude than variations near white. The result is that noise picked up by the video signal has less effect on dark areas than on bright areas. After a gamma function at the display, noise at near-black levels is compressed with respect to noise at near-white levels. Thus a video transmission system using gamma has a lower perceived noise level than one without. Without gamma, vision signals would need around 30 dB better signal-to-noise ratio for the same perceived quality and digital video samples would need 5 or 6 extra bits.

In practice the system is not rendered perfectly linear by gamma correction and a slight overall exponential effect is usually retained to reduce further the effect of noise in the darker parts of the picture. A gamma correction factor of 0.45 may be used to achieve this effect.

Clearly image data that are intended for display on a video system must have the correct gamma characteristic or the grey scale will not be correctly reproduced. Image data from computer systems often have gamma characteristics that are incompatible with the standards adopted in video and a gamma conversion process will be required to obtain a correct display. This may take the form of a lookup table.

Electrical noise has no DC component and so cannot shift the average video voltage. However, on extremely noisy signals, the nonlinear effect of gamma is to exaggerate the white-going noise spikes more than the black-going spikes. The result is that the black level appears to rise and the picture loses contrast.

There is a strong argument to retain gamma in the digital domain for analog compatibility. In the digital domain transmission noise is eliminated, but instead the conversion process introduces quantizing noise. Consequently gamma is retained in the digital domain.

image

FIGURE 2.20

CCIR Rec. 709 inverse gamma function used at camera has a straight line approximation at the lower part of the curve to avoid boosting camera noise. Note that the output amplitude is greater for modulation near black.

Figure 2.21 shows that digital luma can be considered in several equivalent ways. In Figure 2.21a a linear analog luminance signal is passed through a gamma corrector to create luma and this is then quantized uniformly. In (b) the linear analog luminance signal is fed directly to a nonuniform quantizer. In (c) the linear analog luminance signal is uniformly quantized to produce digital luminance. This is converted to digital luma by a digital process having a nonlinear transfer function.

Whilst the three techniques shown give the same result, (a) is the simplest, (b) requires a special ADC with gamma-spaced quantizing steps, and (c) requires a high-resolution ADC of perhaps 14 or 16 bits because it works in the linear luminance domain where noise is highly visible. Technique (c) is used in digital processing cameras, in which long word length is common practice.

image

image

image

FIGURE 2.21

(a) Analog gamma correction prior to ADC. (b) Non-uniform quantizer gives direct gamma conversion.

(c) Digital gamma correction using lookup table.

As digital luma with 8-bit resolution gives the same subjective performance as digital luminance with 14-bit resolution it will be seen that gamma can also be considered an effective perceptive compression technique.

As all television signals, analog and digital, are subject to gamma correction, it is technically incorrect to refer to the Y signal as luminance, because this parameter is defined as linear in colorimetry. Charles Poynton proposed that the term luma should be used to describe luminance that has been gamma corrected.

The contrast sensitivity of the eye is defined as the smallest brightness difference that is visible. In fact the contrast sensitivity is not constant, but increases in proportion to brightness. Thus whatever the brightness of an object, if that brightness changes by about 1 percent it will be equally detectable.

MOTION PORTRAYAL AND DYNAMIC RESOLUTION

As the eye uses involuntary tracking at all times, the criterion for measuring the definition of moving image portrayal systems has to be dynamic resolution, defined as the apparent resolution perceived by the viewer in an object moving within the limits of accurate eye tracking. The traditional metric of static resolution in film and television has to be abandoned as unrepresentative.

Figure 2.22a shows that when the moving eye tracks an object on the screen, the viewer is watching with respect to the optic flow axis, not the time axis, and these are not parallel when there is motion. The optic flow axis is defined as an imaginary axis in the spatio-temporal volume that joins the same points on objects in successive frames. When many objects move independently each will have its own optic flow axis.

The optic flow axis is identified by motion-compensated standards convertors to eliminate judder and also by MPEG compressors because the greatest similarity from one picture to the next is along that axis. The success of these devices is testimony to the importance of the theory.

Figure 2.22b shows that when the eye is tracking, successive pictures appear in different places with respect to the retina. In other words if an object is moving down the screen and followed by the eye, the raster is actually moving up with respect to the retina. Although the tracked object is stationary with respect to the retina and temporal frequencies are zero, the object is moving with respect to the sensor and the display and in those units high temporal frequencies will exist. If the motion of the object on the sensor is not correctly portrayed, dynamic resolution will suffer.

image

image

FIGURE 2.22

(a) The optic flow axis joins points on a moving object in successive pictures.

(b) When a tracking eye follows a moving object on a screen, that screen will be seen in a different place at each picture. This is the origin of background strobing.

In real-life eye tracking, the motion of the background will be smooth, but in an image-portrayal system based on periodic presentation of frames, the background will be presented to the retina in a different position in each frame. The retina separately perceives each impression of the background, leading to an effect called background strobing.

The criterion for the selection of a display frame rate in an imaging system is sufficient reduction of background strobing. It is a complete myth that the display rate simply needs to exceed the critical flicker frequency. Manufacturers of graphics displays that use frame rates well in excess of those used in film and television are doing so for a valid reason: it gives better results! Note that the display rate and the transmission rate need not be the same in an advanced system.

Dynamic resolution analysis confirms that both interlaced television and conventionally projected cinema film are both seriously sub-optimal. In contrast, progressively scanned television systems have no such defects.

SCANNING

It is difficult to convey two-dimensional images from one place to another directly, whereas electrical and radio signals are easily carried. The problem is how to convert a two-dimensional image into a single voltage changing with time. The solution is to use the principle of scanning. Figure 2.23a shows that the monochrome camera produces a video signal whose voltage is a function of the image brightness at a single point on the sensor. This voltage is converted back to the brightness of the same point on the display. The points on the sensor and display must be scanned synchronously if the picture is to be re-created properly. If this is done rapidly enough it is largely invisible to the eye. Figure 2.23b shows that the scanning is controlled by a triangular or sawtooth waveform in each dimension, which causes a constant-speed forward scan followed by a rapid return or flyback. As the horizontal scan is much more rapid than the vertical scan the image is broken up into lines that are not quite horizontal.

image

image

FIGURE 2.23

Scanning converts two-dimensional images into a signal that can be sent electrically. In (a) the scanning of camera and display must be identical. (b) The scanning is controlled by horizontal and vertical sawtooth waveforms.

In the example of Figure 2.23b, the horizontal scanning frequency or line rate, Fh, is an integer multiple of the vertical scanning frequency, or frame rate, and a progressive scan system results in which every frame is identical. Figure 2.23c shows an interlaced scan system in which there is an integer number of lines in two vertical scans or fields. The first field begins with a full line and ends on a half line and the second field begins with a half line and ends with a full line. The lines from the two fields interlace or mesh on the screen. Current analog broadcast systems such as PAL (Phase Alternate Line) and NTSC (National Television Systems Committee) use interlace, although in MPEG systems it is not necessary.

image

FIGURE 2.23

(c) Where two vertical scans are needed to complete a whole number of lines, the scan is interlaced. The frame is now split into two fields.

PROGRESSIVE OR INTERLACED SCAN?

Interlaced scanning is a crude compression technique, which was developed empirically in the 1930s as a way of increasing the picture rate to reduce flicker without a matching increase in the video bandwidth. Instead of transmitting entire frames, the lines of the frame are sorted into odd lines and even lines. Odd lines are transmitted in one field, even lines in the next. A pair of fields is supposed to interlace to produce a frame, but it will be seen that this frequently does not happen. Figure 2.24a shows that the vertical/temporal arrangement of lines in an interlaced system forms a quincunx pattern (somewhat like the five of dice). Not surprisingly the vertical/temporal spectrum of an interlaced signal shows the same pattern.

Study of the vertical/temporal spectrum allows many of the characteristics of interlace to be deduced. Like quincunxial spatial sampling, interlace has a triangular passband, as Figure 2.24b shows. The highest vertical resolution is obtained at the point shown, and this is obtained only with a temporal frequency of zero, i.e., when there is no motion. This suggests that interlaced systems have poor dynamic resolution, which is what is found in practice.

image

image

FIGURE 2.24

(a) Interlaced systems shift the lines in between pictures. Two pictures, or fields, make a frame. (b) The vertical temporal spectrum of an interlaced system and its triangular passband, allowing motion or vertical resolution but not both.

Although the passband is triangular, a suitable reconstruction filter cannot be implemented in any known display. Figure 2.24c shows that in, for example, a CRT display, there is no temporal filter, only a vertical filter due to the aperture effect of the electron beam. There are two problems: First, fine vertical detail will be displayed at the frame rate. The result is that although the field rate is above the CFF, a significant amount of frame rate energy is still present to cause flicker. Second, in the presence of motion there will be vertical aliasing. Transform duality holds that any phenomenon can be described in both domains. Figure 2.24d shows that vertical detail such as an edge may be present in only one field of the pair and this results in frame rate flicker called “interlace twitter.”

Figure 2.25a shows a dynamic resolution analysis of interlaced scanning. When there is no motion, the optic flow axis and the time axis are parallel and the apparent vertical sampling rate is the number of lines in a frame. However, when there is vertical motion (Figure 2.25b), the optic flow axis turns. In the case shown, the sampling structure due to interlace results in the vertical sampling rate falling to one-half its stationary value. Consequently interlace does exactly what would be expected from a half-bandwidth filter. It halves the vertical resolution when any motion with a vertical component occurs. In a practical television system, there is no anti-aliasing filter in the vertical axis and so when the vertical sampling rate of an interlaced system is halved by motion, high spatial frequencies will alias or heterodyne, causing annoying artifacts in the picture. This is easily demonstrated.

image

image

FIGURE 2.24

(c) With the spectrum of (b) on a real display, the triangular filter is absent, allowing energy at the frame rate to be visible as flicker. (d) The flicker originates on horizontal edges, which appear in only one field.

image

image

FIGURE 2.25

When an interlaced picture is stationary, viewing takes place along the time axis.

Figure 2.26a shows how a vertical spatial frequency well within the static resolution of the system aliases when motion occurs. In a progressive scan system this effect is absent and the dynamic resolution due to scanning can be the same as the static case. Interlaced systems handle motion transverse to the scanning lines very poorly by aliasing, whereas motion parallel to the scanning lines results in a strange artifact. If the eye is tracking a horizontally moving object, the object itself will be portrayed quite well because the interlace mechanism will work. However, Figure 2.26b shows that the background strobing will appear feathered because only half of the lines are present in each version of the background. Vertical edges in the background appear as shown in the figure.

Feathering is less noticeable than vertical aliasing and for this reason interlaced television systems always have horizontal raster lines. In real life, horizontal motion is more common than vertical.

image

FIGURE 2.26

(a) The halving in sampling rate causes high spatial frequencies to alias. (b) To an eye following a horizontally moving object, vertical lines in the background will appear feathered because each field appears at a different place on the retina.

It is easy to calculate the vertical image motion velocity needed to obtain the half-bandwidth speed of interlace, because it amounts to one raster line per field. In 525/60 (NTSC) there are about 500 active lines, so motion as slow as one picture height in 8 seconds will halve the dynamic resolution. In 625/50 (PAL) there are about 600 lines, so the half-bandwidth speed falls to one picture height in 12 seconds. This is why NTSC, with fewer lines and lower bandwidth, doesn't look soft, as might be expected compared to PAL, because it actually has better dynamic resolution. Figure 2.27 shows that the situation deteriorates rapidly if an attempt is made to use interlaced scanning in systems with a lot of lines. In 1250/50, the resolution is halved at a vertical speed of just one picture height in 24 seconds. In other words on real moving video a 1250/50 interlaced system has the same dynamic resolution as a 625/50 progressive system. By the same argument a 1080 I system has the same performance as a 480 P system.

Now that techniques such as digital compression and spatial oversampling are available, the format used for display need not be the same as the transmission format. Thus it is difficult to justify the use of interlace in a transmission format. In fact interlace causes difficulties that are absent in progressive systems. Progressive systems are separable. Vertical filtering need not affect the time axis and vice versa. Interlaced systems are not separable, and two-dimensional filtering is mandatory. A vertical process requires motion compensation in an interlaced system, whereas in a progressive system it does not. Interlace, however, makes motion estimation more difficult. When compression is used, compression systems should not be cascaded. As digital compression techniques based on transforms are now available, it makes no sense to use an interlaced, i.e., compressed, video signal as an input. Better results will be obtained if a progressive scan signal is used.

image

FIGURE 2.27

Interlace works best in systems with few lines, e.g., NTSC. Increasing the number of lines reduces performance if the frame rate is not also raised. Here are shown the vertical velocities at which various interlace standards fail.

Computer-generated images and film are not interlaced, but consist of discrete frames spaced on a time axis. As digital technology is bringing computers and television closer the use of interlaced transmission is an embarrassing source of incompatibility. The future will bring image delivery systems based on computer technology and oversampling cameras and displays that can operate at resolutions much closer to the theoretical limits. Given the level of technology at the time of its introduction, interlace was an appropriate solution, whereas it now impedes progress. Interlace causes difficulty in any process that requires image manipulation. This includes DVE (digital video effects) generators, standards convertors, and display convertors/scalers. All these devices give better results when working with progressively scanned data and if the source material is interlaced, a de-interlacing process will be necessary and will be considered in Chapter 5.

SYNCHRONISING

It is vital that the horizontal and the vertical scanning at the camera are simultaneously replicated at the display. This is the job of the synchronising or sync system, which must send timing information to the display alongside the video signal. In very early television equipment this was achieved using two quite separate or noncomposite signals. Figure 2.28a shows one of the first (U.S.) television signal standards in which the video waveform had an amplitude of 1 Volt peak to peak (pk–pk) and the sync signal had an amplitude of 4 Volts pk–pk. In practice, it was more convenient to combine both into a single electrical waveform, called at the time composite video, which carries the synchronising information as well as the scanned brightness signal. The single signal is effectively shared by using some of the flyback period for synchronising.

The 4 Volt sync signal was attenuated by a factor of 10 and added to the video to produce a 1.4-Volt pk–pk signal. This was the origin of the 10:4 video: sync relationship of U.S. analog television practice. Later the amplitude was reduced to 1 Volt pk–pk so that the signal had the same range as the original noncomposite video. The 10:4 ratio was retained. As Figure 2.28b shows, this ratio results in some rather odd voltages, and to simplify matters, a new unit called the IRE unit (after the Institute of Radio Engineers) was devised. Originally this was defined as 1 percent of the video voltage swing, independent of the actual amplitude in use, but it came in practice to mean 1 percent of 0.714 Volt. In European analog systems shown in Figure 2.28c the messy numbers were avoided by using a 7:3 ratio and the waveforms are always measured in milli-Volts. Whilst such a signal was originally called composite video, today it would be referred to as monochrome video or Ys, meaning “luma carrying syncs,” although in practice the “s” is often omitted.

Figure 2.28d shows how the two signals are separated. The voltage swing needed to go from black to peak white is less than the total swing available. In a standard analog video signal the maximum amplitude is 1 Volt pk–pk. The upper part of the voltage range represents the variations in brightness of the image from black to white. Signals below that range are “blacker than black” and cannot be seen on the display. These signals are used for synchronising.

Figure 2.29a shows the line synchronising system partway through a field or frame. The part of the waveform that corresponds to the forward scan is called the active line and during the active line the voltage represents the brightness of the image. In between the active line periods are horizontal blanking intervals in which the signal voltage will be at or below black. Figure 2.29b shows that in some systems the active line voltage is superimposed on a pedestal or black level setup voltage of 7.5 IRE. The purpose of this setup is to ensure that the blanking interval signal is below black on simple displays so that it is guaranteed to be invisible on the screen. When setup is used, black level and blanking level differ by the pedestal height. When setup is not used, black level and blanking level are one and the same.

image

image

image

image

FIGURE 2.28

(a) Early video used separate vision and sync signals. The U.S. one-Volt video waveform in (b) has a 10:4 video:sync ratio. (c) European systems use a 7:3 ratio to avoid odd voltages. (d) Sync separation relies on two voltage ranges in the signal.

The blanking period immediately after the active line is known as the front porch, which is followed by the leading edge of sync. When the leading edge of sync passes through 50 percent of its own amplitude, the horizontal retrace pulse is considered to have occurred. The flat part at the bottom of the horizontal sync pulse is known as sync tip and this is followed by the trailing edge of sync, which returns the waveform to blanking level. The signal remains at blanking level during the back porch, during which the display completes the horizontal flyback. The sync pulses have sloping edges because if they were square they would contain high frequencies, which would go outside the allowable channel bandwidth on being broadcast.

image

image

FIGURE 2.29

(a) Part of a video waveform with important features named. (b) Use of pedestal or setup.

The vertical synchronising system is more complex because the vertical flyback period is much longer than the horizontal line period and horizontal synchronisation must be maintained throughout it. The vertical synchronising pulses are much longer than horizontal pulses so that they are readily distinguishable. Figure 2.30a shows a simple approach to vertical synchronising. The signal remains predominantly at sync tip for several lines to indicate the vertical retrace, but returns to blanking level briefly immediately prior to the leading edges of the horizontal sync, which continues throughout. Figure 2.30b shows that the presence of interlace complicates matters, as in one vertical interval the vertical sync pulse coincides with a horizontal sync pulse, whereas in the next the vertical sync pulse occurs halfway down a line.

image

image

image

FIGURE 2.30

(a) A simple vertical pulse is longer than a horizontal pulse. (b) In an interlaced system there are two relationships between H and V. (c) The use of equalizing pulses to balance the DC component of the signal.

In practice the long vertical sync pulses were found to disturb the average signal voltage too much, and to reduce the effect extra equalizing pulses were put in, halfway between the horizontal sync pulses. The horizontal time base system can ignore the equalizing pulses because it contains a flywheel circuit, which expects pulses only roughly one line period apart. Figure 2.30c shows the final result of an interlaced system with equalizing pulses. The vertical blanking interval can be seen, with the vertical pulse itself toward the beginning.

In digital video signals it is possible to synchronise simply by digitizing the analog sync pulses. However, this is inefficient because many samples are needed to describe them. In practice the analog sync pulses are used to generate timing reference signals (TRS), which are special codes inserted into the video data that indicate the picture timing. In a manner analogous to the analog approach of dividing the video voltage range into two, one for syncs, the solution in the digital domain is the same: certain bit combinations are reserved for TRS codes and these cannot occur in legal video. TRS codes are detailed in Chapter 10.

It is essential to extract the timing or synchronising information from a sync or Ys signal accurately to control some processes such as the generation of a digital sampling clock. Figure 2.32a shows a block diagram of a simple sync separator. The first stage will generally consist of a black-level clamp, which stabilizes the DC conditions in the separator. Figure 2.32b shows that if this is not done the presence of a DC shift on a sync edge can cause a timing error.

The sync time is defined as the instant when the leading edge passes through the 50 percent level. The incoming signal should ideally have a sync amplitude of either 0.3 Volt pk–pk or 40 IRE, in which case it can be sliced or converted to a binary waveform by using a comparator with a reference of either 0.15 Volt or 20 IRE. However, if the sync amplitude is for any reason incorrect, the slicing level will be wrong. Figure 2.32a shows that the solution is to measure both blanking and sync tip voltages and to derive the slicing level from them with a potential divider. In this way the slicing level will always be 50 percent of the input amplitude. To measure the sync tip and blanking levels, a coarse sync separator is required, which is accurate enough to generate sampling pulses for the voltage measurement system. Figure 2.32c shows the timing of the sampling process.

BLACK-LEVEL CLAMPING

As the synchronising and picture content of the video waveform are separated purely by the voltage range in which they lie, it is clear that if any accidental drift or offset of the signal voltage takes place it will cause difficulty. Unwanted offsets may result from low-frequency interference such as power line hum picked up by cabling. The video content of the signal also varies in amplitude with scene brightness, changing the average voltage of the signal. When such a signal passes down a channel not having a response down to DC, the baseline of the signal can wander. Such offsets can be overcome using a black-level clamp, which is shown in Figure 2.31. The video signal passes through an operational amplifier, which can add a correction voltage or DC offset to the waveform. At the output of the amplifier the video waveform is sampled by a switch, which closes briefly during the back porch when the signal should be at blanking level. The sample is compared with a locally generated reference blanking level and any discrepancy is used to generate an error signal, which drives the integrator producing the correction voltage. The correction voltage integrator will adjust itself until the error becomes zero.

image

FIGURE 2.31

Black-level clamp samples video during blanking and adds offset until the sample is at black level.

Once a binary signal has been extracted from the analog input, the horizontal and vertical synchronising information can be separated. All falling edges are potential horizontal sync leading edges, but some are due to equalizing pulses and these must be rejected. This is easily done because equalizing pulses occur partway down the line. A flywheel oscillator or phase-locked loop will lock to genuine horizontal sync pulses because they always occur exactly one line period apart. Edges at other spacings are eliminated. Vertical sync is detected with a timer whose period exceeds that of a normal horizontal sync pulse. If the sync waveform is still low when the timer expires, there must be a vertical pulse present. Once again a phase-locked loop may be used, which will continue to run if the input is noisy or disturbed. This may take the form of a counter, which counts the number of lines in a frame before resetting.

image

FIGURE 2.32

(a) Sync separator block diagram; see text for details. (b) Slicing at the wrong level introduces a timing error. (c) The timing of the sync separation process.

The sync separator can determine which type of field is beginning because in one the vertical and horizontal pulses coincide, whereas in the other the vertical pulse begins in the middle of a line.

BANDWIDTH AND DEFINITION

As the conventional analog television picture is made up of lines, the line structure determines the definition or the fineness of detail that can be portrayed in the vertical axis. The limit is reached in theory when alternate lines show black and white. In a 625-line picture there are roughly 600 unblanked lines. If 300 of these are white and 300 are black then there will be 300 complete cycles of detail in one picture height. One unit of resolution, which is a unit of spatial frequency, is c/ph or cycles per picture height. In practical displays the contrast will have fallen to virtually nothing at this ideal limit and the resolution actually achieved is around 70 percent of the ideal, or about 210 c/ph. The degree to which the ideal is met is known as the Kell factor of the display.

Definition in one axis is wasted unless it is matched in the other and so the horizontal axis should be able to offer the same performance. As the aspect ratio of conventional television is 4:3 then it should be possible to display 400 cycles in one picture width, reduced to about 300 cycles by the Kell factor. As part of the line period is lost due to flyback, 300 cycles per picture width becomes about 360 cycles per line period.

In 625-line television, the frame rate is 25 Hz and so the line rate Fh will be

Fh = 625 × 25 = 15,625 Hz.

If 360 cycles of video waveform must be carried in each line period, then the bandwidth required will be given by

15,625 × 360 = 5.625 MHz.

In the 525-line system, there are roughly 500 unblanked lines allowing 250 c/ph theoretical definition, or 175 lines allowing for the Kell factor. Allowing for the aspect ratio, equal horizontal definition requires about 230 cycles per picture width. Allowing for horizontal blanking this requires about 280 cycles per line period.

In 525-line video, Fh = 525 × 30 = 15,750 Hz. Thus the bandwidth required is

15,750 × 280 = 4.4 MHz.

If it is proposed to build a high-definition television system, one might start by doubling the number of lines and hence double the definition. Thus in a 1250-line format about 420 c/ph might be obtained. To achieve equal horizontal definition, bearing in mind the aspect ratio is now 16:9, then nearly 750 cycles per picture width will be needed. Allowing for horizontal blanking, then around 890 cycles per line period will be needed. The line frequency is now given by

Fh 1250 × 25 = 31,250 Hz

and the bandwidth required is given by

31,250 × 890 = 28 MHz.

Note the dramatic increase in bandwidth. In general the bandwidth rises as the square of the resolution because there are more lines and more cycles needed in each line. It should be clear that, except for research purposes, high-definition television will never be broadcast as a conventional analog signal because the bandwidth required is simply uneconomic. If and when high-definition broadcasting becomes common, digital compression techniques will make it economical.

APERTURE EFFECT

The aperture effect will show up in many aspects of television in both the sampled and the continuous domains. The image sensor has a finite aperture function. In tube cameras and in CRTs, the beam will have a finite radius with a Gaussian distribution of energy across its diameter. This results in a Gaussian spatial frequency response. Tube cameras often contain an aperture corrector, which is a filter designed to boost the higher spatial frequencies that are attenuated by the Gaussian response. The horizontal filter is simple enough, but the vertical filter will require line delays to produce points above and below the line to be corrected. Aperture correctors also amplify aliasing products and an over-corrected signal may contain more vertical aliasing than resolution.

image

FIGURE 2.33

Frequency response with 100 percent aperture nulls at multiples of sampling rate. The area of interest is up to half the sampling rate.

Some digital-to-analog convertors keep the signal constant for a substantial part of or even the whole sample period. In CCD cameras, the sensor is split into elements that may almost touch in some cases. The element integrates light falling on its surface. In both cases the aperture will be rectangular. The case in which the pulses have been extended in width to become equal to the sample period is known as a zero-order hold system and has a 100 percent aperture ratio.

Rectangular apertures have a sin x/x spectrum, which is shown in Figure 2.33. With a 100 percent aperture ratio, the frequency response falls to a null at the sampling rate and as a result is about 4 dB down at the edge of the baseband.

The temporal aperture effect varies according to the equipment used. Tube cameras have a long integration time and thus a wide temporal aperture. Whilst this reduces temporal aliasing, it causes smear on moving objects. CCD cameras do not suffer from lag and as a result their temporal response is better. Some CCD cameras deliberately have a short temporal aperture as the time axis is resampled by a shutter. The intention is to reduce smear, hence the popularity of such devices for sporting events, but there will be more aliasing on certain subjects.

The eye has a temporal aperture effect, which is known as persistence of vision, and the phosphors of CRTs continue to emit light after the electron beam has passed. These produce further temporal aperture effects in series with those in the camera.

SCANNING FORMATS FOR SD AND HDTV

The scanning format is defined as the parameters by which time and the image plane are divided up by the scanning process. The parameters were originally the frame rate, the number of line periods in the frame period, and whether the scanning was interlaced or progressive. The number of line periods in the frame periods includes those lines in which flyback takes place in a CRT and which are blanked.

Now that the majority of cameras and displays use pixel-based structures, having no tangible flyback mechanism, the definition has changed. Recent scanning formats are defined by the number of active lines. This makes more sense as these are the lines that are actually visible and correspond to the vertical pixel count.

It might be thought that the scanning parameters of television would be based on psycho-optics, but this has yet to happen. The 525/60 scanning of U.S. SDTV, having 2:1 interlace, chose a field rate identical to the local power frequency. The power frequency of 50 Hz was chosen as the basis for the European SDTV scanning formats. For economy of scale, the line frequency was chosen to be close to that of the U.S. system so that the same CRT scanning transformers could be used. This led to 625 lines being specified, as 625 × 50 is close to 525 × 60.

COLOUR VISION

Colour vision is made possible by the cones on the retina, which occur in three different types, responding to different colours. Figure 2.20 showed that human vision is restricted to a range of light wavelengths from 400 to 700 nm. Shorter wavelengths are called ultraviolet and longer wavelengths are called infrared. Note that the response is not uniform, but peaks in the area of green. The response to blue is very poor and makes a nonsense of the traditional use of blue lights on emergency vehicles.

Figure 2.34 shows an approximate response for each of the three types of cone. If light of a single wavelength is observed, the relative responses of the three sensors allow us to discern what we call the colour of the light. Note that at both ends of the visible spectrum there are areas in which only one receptor responds; all colours in those areas look the same. There is a great deal of variation in receptor response from one individual to the next and the curves used in television are the average of a great many tests. In a surprising number of people the single receptor zones are extended and discrimination between, for example, red and orange is difficult.

image

FIGURE 2.34

All human vision takes place over this range of wavelengths. The response is not uniform, but has a central peak. The three types of cone approximate to the three responses shown to give colour vision.

The full resolution of human vision is restricted to brightness variations. Our ability to resolve colour details is only about a quarter of that.

COLORIMETRY

The triple-receptor characteristic of the eye is extremely fortunate as it means that we can generate a range of colours by adding together light sources having just three different wavelengths in various proportions. This process is known as additive colour matching, which should be clearly distinguished from the subtractive colour matching that occurs with paints and inks. Subtractive matching begins with white light and selectively removes parts of the spectrum by filtering. Additive matching uses coloured light sources that are combined.

An effective colour television system can be made in which only three pure or single wavelength colours or primaries can be generated. The primaries need to be similar in wavelength to the peaks of the three receptor responses, but need not be identical. Figure 2.35 shows a rudimentary colour television system. Note that the colour camera is in fact three cameras in one, of which each is fitted with a different coloured filter. Three signals, R, G, and B, must be transmitted to the display, which produces three images that must be superimposed to obtain a colour picture.

In practice the primaries must be selected from available phosphor compounds. Once the primaries have been selected, the proportions needed to reproduce a given colour can be found using a colorimeter. Figure 2.36 shows a colorimeter that consists of two adjacent white screens. One screen is illuminated by three light sources, one of each of the selected primary colours. Initially, the second screen is illuminated with white light and the three sources are adjusted until the first screen displays the same white. The sources are then calibrated. Light of a single wavelength is then projected onto the second screen. The primaries are once more adjusted until both screens appear to have the same colour. The proportions of the primaries are noted. This process is repeated for the whole visible spectrum, resulting in the colour mixture curves shown in Figure 2.37. In some cases it will not be possible to find a match because an impossible negative contribution is needed. In this case we can simulate a negative contribution by shining some primary colour onto the test screen until a match is obtained. If the primaries were ideal monochromatic (single-wavelength) sources, it would be possible to find three wavelengths at which two of the primaries were completely absent. However, practical phosphors are not monochromatic, but produce a distribution of wavelengths around the nominal value, and to make them spectrally pure other wavelengths have to be subtracted.

image

FIGURE 2.35

Simple colour television system. Camera image is split by three filters. Red, green, and blue video signals are sent to three primary-coloured displays whose images are combined.

image

FIGURE 2.36

Simple colorimeter. Intensities of primaries on the right screen are adjusted to match the test colour on the left screen.

image

FIGURE 2.37

Colour mixture curves show how to mix primaries to obtain any spectral colour.

The colour mixture curves dictate what the response of the three sensors in the colour camera must be. The primaries are determined in this way because it is easier to make camera filters to suit available CRT phosphors than the other way round.

As there are three signals in a colour television system, they can be simultaneously depicted only in three dimensions. Figure 2.38 shows the RGB colour space, which is basically a cube with black at the origin and white at the diagonally opposite corner. Figure 2.39 shows the colour mixture curves plotted in RGB space. For each visible wavelength a vector exists whose direction is determined by the proportions of the three primaries. If the brightness is allowed to vary it will affect all three primaries, and thus the length of the vector, in the same proportion.

image

FIGURE 2.38

RGB colour space is three-dimensional and not easy to draw.

image

FIGURE 2.39

Colour mixture curves plotted in RGB space result in a vector whose locus moves with wavelength in three dimensions.

Depicting and visualizing the RGB colour space are not easy, and it is also difficult to take objective measurements from it. The solution is to modify the diagram to allow it to be rendered in two dimensions on flat paper. This is done by eliminating luminance (brightness) changes and depicting only the colour at constant brightness. Figure 2.40a shows how a constant luminance unit plane intersects the RGB space at unity on each axis. At any point on the plane the three components add up to 1. A two-dimensional plot results when vectors representing all colours intersect the plane. Vectors may be extended if necessary to allow intersection. Figure 2.40b shows that the 500 nm vector has to be produced (extended) to meet the unit plane, whereas the 580 nm vector naturally intersects. Any colour can now be specified uniquely in two dimensions.

image

image

FIGURE 2.40

(a) A constant luminance plane intersects RGB space, allowing colours to be studied in two dimensions only. (b) The intersection of the unit plane by vectors joining the origin and the spectrum locus produces the locus of spectral colours, which requires negative values of R, G, and B to describe it.

The points at which the unit plane intersects the axes of RGB space form a triangle on the plot. The horseshoe-shaped locus of pure spectral colours goes outside this triangle because, as was seen above, the colour mixture curves require negative contributions for certain colours.

Having the spectral locus outside the triangle is a nuisance, and a larger triangle can be created by postulating new coordinates called X, Y, and Z representing hypothetical primaries that cannot exist. This representation is shown in Figure 2.40c.

image

FIGURE 2.40

In (c) a new coordinate system, X, Y, and Z, is used so that only positive values are required. The spectrum locus now fits entirely in the triangular space where the unit plane intersects these axes. To obtain the CIE chromaticity diagram (d), the locus is projected onto the XY plane.

The Commission Internationale d'Eclairage (CIE) standard chromaticity diagram shown in Figure 2.40d is obtained in this way by projecting the unity luminance plane onto the XY plane. This projection has the effect of bringing the red and blue primaries closer together. Note that the curved part of the locus is due to spectral or single-wavelength colours. The straight base is due to nonspectral colours obtained by additively mixing red and blue.

image

FIGURE 2.41

The colour range of television compares well with printing and photography.

As negative light is impossible, only colours within the triangle joining the primaries can be reproduced and so practical television systems cannot reproduce all possible colours. Clearly efforts should be made to obtain primaries that embrace as large an area as possible. Figure 2.41 shows how the colour range or gamut of television compares with paint and printing inks and illustrates that the comparison is favourable. Most everyday scenes fall within the colour gamut of television. Exceptions include saturated turquoise and spectrally pure iridescent colours formed by interference in a duck's feathers or reflections in a Compact Disc. For special purposes displays have been made having four primaries to give a wider colour range, but these are uncommon.

Figure 2.42 shows the primaries initially selected for NTSC. However, manufacturers looking for brighter displays substituted more efficient phosphors having a smaller colour range. This was later standardised as the SMPTE C phosphors, which were also adopted for PAL.

image

FIGURE 2.42

The primary colours for NTSC were initially as shown. These were later changed to more efficient phosphors, which were also adopted for PAL. See text.

Whites appear in the centre of the chromaticity diagram, corresponding to roughly equal amounts of primary colour. Two terms are used to describe colours: hue and saturation. Colours having the same hue lie on a straight line between the white point and the perimeter of the primary triangle. The saturation of the colour increases with distance from the white point. As an example, pink is a desaturated red.

The apparent colour of an object is also a function of the illumination. The “true colour” will be revealed only under ideal white light, which in practice is uncommon. An ideal white object reflects all wavelengths equally and simply takes on the colour of the ambient illumination. Figure 2.43 shows the locations of three “white” sources or illuminants on the chromaticity diagram. Illuminant A corresponds to a tungsten filament lamp, illuminant B corresponds to midday sunlight, and illuminant C corresponds to typical daylight, which is bluer because it consists of a mixture of sunlight and light scattered by the atmosphere. In everyday life we accommodate automatically to the change in apparent colour of objects as the sun's position or the amount of cloud changes and as we enter artificially lit buildings, but colour cameras accurately reproduce these colour changes. Attempting to edit a television program from recordings made at different times of day or indoors and outdoors would result in obvious and irritating colour changes unless some steps are taken to keep the white balance reasonably constant.

image

FIGURE 2.43

Positions of three common illuminants on chromaticity diagram.

COLOUR DIFFERENCE SIGNALS

There are many different ways in which television signals can be carried and these will be considered here. A monochrome camera produces a single luma signal, Y or Ys, whereas a colour camera produces three signals, or components, R, G, and B, which are essentially monochrome video signals representing an image in each primary colour. In some systems sync is present on a separate signal (RGBs), rarely it is present on all three components, whereas most commonly it is present only on the green component, leading to the term RGsB. The use of the green component for sync has led to suggestions that the components should be called GBR. As the original and long-standing term RGB or RGsB correctly reflects the sequence of the colours in the spectrum it remains to be seen whether GBR will achieve common usage. Like luma, RGsB signals may use 0.7- or 0.714-Volt signals, with or without setup.

RGB and Y signals are incompatible, yet when colour television was introduced it was a practical necessity that it should be possible to display colour signals on a monochrome display and vice versa.

Creating or transcoding a luma signal from R, Gs, and B is relatively easy. Figure 2.34 showed the spectral response of the eye, which has a peak in the green region. Green objects will produce a larger stimulus than red objects of the same brightness, with blue objects producing the least stimulus. A luma signal can be obtained by adding R, G, and B together, not in equal amounts, but in a sum that is weighted by the relative response of the eye. Thus:

Y = 0.299R + 0.587G + 0.114B.

Syncs may be regenerated, but will be identical to those on the Gs input and when added to Y result in Ys as required.

If Ys is derived in this way, a monochrome display will show nearly the same result as if a monochrome camera had been used in the first place. The results are not identical because of the nonlinearities introduced by gamma correction.

As colour pictures require three signals, it should be possible to send Ys and two other signals, which a colour display could arithmetically convert back to R, G, and B. There are two important factors that restrict the form the other two signals may take. One is to achieve reverse compatibility. If the source is a monochrome camera, it can produce only Ys and the other two signals will be completely absent. A colour display should be able to operate on the Ys signal only and show a monochrome picture. The other is the requirement to conserve bandwidth for economic reasons.

These requirements are met by sending two colour difference signals along with Ys. There are three possible colour difference signals, R − Y, B – Y, and G – Y. As the green signal makes the greatest contribution to Y, then the amplitude of G – Y would be the smallest and would be most susceptible to noise. Thus R – Y and B – Y are used in practice, as Figure 2.44 shows.

image

FIGURE 2.44

Colour components are converted to colour difference signals by the transcoding shown here.

R and B are readily obtained by adding Y to the two colour difference signals. G is obtained by rearranging the expression for Y above such that

image

If a colour CRT is being driven, it is possible to apply inverted luma to the cathodes and the R−Y and B−Y signals directly to two of the grids so that the tube performs some of the matrixing. It is then necessary only to obtain G−Y for the third grid, using the expression

G−Y = −0.51(R−Y) − 0.186(B−Y).

If a monochrome source having only a Ys output is supplied to a colour display, R−Y and B−Y will be zero. It is reasonably obvious that if there are no colour difference signals the colour signals cannot be different from one another and R = G = B. As a result the colour display can produce only a neutral picture.

The use of colour difference signals is essential for compatibility in both directions between colour and monochrome, but it has a further advantage, which follows from the way in which the eye works. To produce the highest resolution in the fovea, the eye will use signals from all types of cone, regardless of colour. To determine colour the stimuli from three cones must be compared. There is evidence that the nervous system uses some form of colour difference processing to make this possible. As a result the acuity of the human eye is available only in monochrome. Differences in colour cannot be resolved so well. A further factor is that the lens in the human eye is not achromatic and this means that the ends of the spectrum are not well focused. This is particularly noticeable on blue.

If the eye cannot resolve colour very well there is no point is expending valuable bandwidth sending high-resolution colour signals. Colour difference working allows the luma to be sent separately at full bandwidth. This determines the subjective sharpness of the picture. The colour difference signals can be sent with considerably reduced bandwidth, as little as one-quarter that of luma, and the human eye is unable to tell.

In practice analog component signals are never received perfectly, but suffer from slight differences in relative gain. In the case of RGB a gain error in one signal will cause a colour cast on the received picture. A gain error in Y causes no colour cast and gain errors in R−Y or B−Y cause much smaller perceived colour casts. Thus colour difference working is also more robust than RGB working.

The overwhelming advantages obtained by using colour difference signals mean that in broadcast and production facilities RGB is seldom used. The outputs from the RGB sensors in the camera are converted directly to Y, R−Y, and B−Y in the camera control unit and output in that form. Standards exist for both analog and digital colour difference signals to ensure compatibility between equipment from various manufacturers. The M-II and Betacam formats record analog colour difference signals, and there are a number of colour difference digital formats.

Whilst signals such as Y, R, G, and B are unipolar or positive only, it should be stressed that colour difference signals are bipolar and may meaningfully take on levels below 0 Volts.

The wide use of colour difference signals has led to the development of test signals and equipment to display them. The most important of the test signals is the ubiquitous colour bars. Colour bars are used to set the gains and timing of signal components and to check that matrix operations are performed using the correct weighting factors. Further details will be found in Chapter 4. The origin of the colour bar test signal is shown in Figure 2.45. In 100 percent amplitude bars, peak amplitude binary RGB signals are produced, having one, two, and four cycles per screen width. When these are added together in a weighted sum, an eight-level luma staircase results because of the unequal weighting. The matrix also produces two colour difference signals, R−Y and B−Y, as shown. Sometimes 75 percent amplitude bars are generated by suitably reducing the RGB signal amplitude. Note that in both cases the colours are fully saturated; it is only the brightness that is reduced to 75 percent. Sometimes the white bar of a 75 percent bar signal is increased to 100 percent to make calibration easier. Such a signal is sometimes erroneously called a 100 percent bar signal.

image

FIGURE 2.45

Origin of colour difference signals representing colour bars. Adding R, G, and B according to the weighting factors produces an irregular luminance staircase.

image

image

FIGURE 2.46

(a) 100 percent colour bars represented by SMPTE/EBU standard colour difference signals. (b) Level comparison is easier in waveform monitors if the B−Y and R−Y signals are offset upward.

Figure 2.46a shows a SMPTE/EBU standard colour difference signal set in which the signals are called Ys, Pb, and Pr. Syncs of 0.3 Volt are on luma only and all three video signals have a 0.7-Volt pk–pk swing with 100 percent bars. To obtain these voltage swings, the following gain corrections are made to the components:

Pr 0.71327(R−Y) and Pb = 0.56433(B−Y).

Within waveform monitors, the colour difference signals may be offset by 350 mV as in Figure 2.46b to match the luma range for display purposes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset