image

INTRODUCTION

The approach taken here must necessarily be broad and must include in principle any system that can deliver data over distance. There appears to be an unwritten rule that anything to do with communications has to be described entirely using acronyms, a rule which this chapter intends to break in the interests of clarity. Figure 10.1 shows some of the classifications of communications systems. The simplest is a uni-directional point-to-point signal path shown in Figure 10.1a. This is common in digital production equipment and includes the AES/EBU (Audio Engineering Society/European Broadcast Union) digital audio interface and the SDI (serial digital interface) for digital video. Bi-directional point-to-point signals include the RS-232 and RS-422 duplex systems. Bi-directional signal paths may be symmetrical, i.e., have the same capacity in both directions (b), or asymmetrical, having more capacity in one direction than the other (c). In this case the low-capacity direction may be known as a back channel.

Back channels are useful in a number of applications. Video-on-demand and interactive video are both systems in which the inputs from the viewer are relatively small, but result in extensive data delivery to the viewer. Archives and databases have similar characteristics.

When more than two devices can be interconnected in such a way that any one can communicate at will with any other, the result is a network as in Figure 10.1d. The traditional telephone system is a network, and although the original infrastructure assumed analog speech transmission, subsequent developments in modems have allowed data transmission.

image

FIGURE 10.1

Classes of communication systems. (a) The uni-directional point-to-point connection used in many digital audio and video interconnects. (b) Symmetrical bi-directional point-to-point system. (c) Asymmetrical point-to-point system. (d) A network must have some switching or addressing ability in addition to delivering data. (e) Networks can be connected by gateways.

The computer industry has developed its own network technology, a long-serving example being the Ethernet. Computer networks can work over various distances, giving rise to LANs (local area networks), MANs (metropolitan area networks), and WANs (wide area networks). Such networks can be connected together to form internetworks or internets for short, including the Internet. A private network, linking all employees of a given company, for example, may be referred to as an intranet.

Figure 10.1e shows that networks are connected together by gateways. In this example a private network (typically a local area network within an office block) is interfaced to an access network (typically a metropolitan area network with a radius on the order of a few kilometres), which in turn connects to the transport network. The access networks and the transport network together form a public network.

The different requirements of networks of different sizes have led to different protocols being developed. Where a gateway exists between two such networks, the gateway will often be required to perform protocol conversion. Such a device may be referred to as network termination equipment. Protocol conversion represents unnecessary cost and delay, and recent protocols such as ATM (Asynchronous Transfer Mode) are sufficiently flexible that they can be adopted in any type of network to avoid conversion.

 

Adaptation layer:
Reformats host data block structures to suit packet structure. Error correction

 

Protocol layer:
Priority and quality of service control and arbitration of requests

 

Transmission convergence layer:
Packet structure and ID, synchronising stuffing

(a)

Physical medium dependent layer:
Medium and connectors, voltages impedances, and modulation techniques

 

 

(b)

Addressing or direction of packets
Packet counting
Packet delay control
Arbitration between requests

FIGURE 10.2

(a) Layers are important in communications because they have a degree of independence such that one can be replaced by another, leaving the remainder undisturbed. (b) The functions of a network protocol. See text.

Networks that are optimized for storage devices also exist. These range from the standard buses linking hard drives with their controllers to SANs (storage area networks) in which distributed storage devices behave as one large store.

Communication must also include broadcasting, which initially was analog, but has also adopted digital techniques so that transmitters effectively radiate data. Traditional analog broadcasting was uni-directional, but with the advent of digital techniques, various means for providing a back channel have been developed.

To have an understanding of communications it is important to appreciate the concept of layers shown in Figure 10.2a. The lowest layer is the physical medium-dependent layer. In the case of a cabled interface, this layer would specify the dimensions of the plugs and sockets, so that a connection could be made, and the use of a particular type of conductor such as coaxial, STP (screened twisted pair), or UTP (unscreened twisted pair). The impedance of the cable may also be specified. The medium may also be optical fibre, which will need standardisation of the terminations and the wavelength(s) in use.

Once a connection is made, the physical medium-dependent layer standardises the voltage of the transmitted signal and the frequency at which the voltage changes (the channel bit rate). This may be fixed at a single value, chosen from a set of fixed values, or, rarely, variable. Practical interfaces need some form of channel coding (see Chapter 8) to embed a bit clock in the data transmission.

The physical medium-dependent layer allows binary transmission, but this needs to be structured or formatted. The transmission convergence layer takes the binary signalling of the physical medium-dependent layer and builds a packet or cell structure. This consists at least of some form of synchronisation system, so that the start and end of serialized messages can be recognized, and an addressing or labelling scheme, so that packets can reliably be routed and recognized. Most real cables and optical fibres run at fixed bit rates and a further function of the transmission convergence layer is the insertion of null or stuffing packets, in which insufficient user data exist or to allow that data to be asynchronous with the cable bit rate.

In broadcasting, the physical medium-dependent layer may be one that contains some form of radio signal and a modulation scheme. The modulation scheme will be a function of the kind of service. For example, a satellite modulation scheme would be quite different from one used in a terrestrial service.

In all real networks requests for transmission will arise randomly. Network resources need to be applied to these requests in a structured way to prevent chaos, data loss, or lack of throughput. This raises the requirement for a protocol layer. TCP (Transmission Control Protocol) and ATM are protocols. A protocol is an agreed set of actions in given circumstances. In a point-to-point interface the protocol is trivial, but in a network it is complex. Figure 10.2b shows some of the functions of a network protocol. There must be an addressing mechanism, so that the sender can direct the data to the desired location, and a mechanism by which the receiving device confirms that all the data have been correctly received. In more advanced systems the protocol may allow variations in quality of service whereby the user can select (and pay for) various criteria such as packet delay and delay variation and the packet error rate. This allows the system to deliver isochronous (near-real-time) MPEG data alongside asynchronous (non-time-critical) data such as email by appropriately prioritising packets.

The protocol layer arbit rates between demands on the network and delivers packets at the required quality of service. The user data will not necessarily have been packetised or, if it was, the packet size may be different from those used in the network. This situation arises, for example, when MPEG transport packets are to be sent via ATM. The solution is to use an adaptation layer. Adaptation layers re-format the original data into the packet structure needed by the network at the sending device and reverse the process at the destination device. Practical networks must have error checking/correction. Figure 10.3 shows some of the possibilities. In short interfaces, no errors are expected and a simple parity check or checksum with an error indication is adequate. In bi-directional applications a checksum failure would result in a re-transmission request or cause the receiver to fail to acknowledge the transmission so that the sender would try again.

1 No correction or checking

2 Detection only

3 Error detection and re-transmit request

4 Error detection and FEC to handle random errors

5 FEC and interleaving to handle packet loss

6 Automatic re-routing following channel failure

FIGURE 10.3

Different approaches to error checking used in various communications systems.

In real-time systems, there may not be time for a re-transmission, and an FEC (forward error correction) system will be needed in which enough redundancy is included with every data block to permit on-the-fly correction at the receiver. The sensitivity to error is a function of the type of data, and so it is a further function of the adaptation layer to take steps such as interleaving and the addition of FEC codes.

PRODUCTION-RELATED INTERFACES

As audio and video production equipment first made the transition from analog to digital technology, computers and networks were still another world and the potential of the digital domain was largely neglected because the digital video interfaces that were developed simply copied analog practice, but transmitted parallel binary numbers in real time instead of the original video waveform. These simple uni-directional interfaces had no addressing or handshaking ability. Creating a network required switching devices called routers controlled independent of the signals themselves.

The AES/EBU interface was developed to provide a short distance point-to-point connection for PCM digital audio. Using serial transmission, the signals could be passed over existing balanced analog audio cabling. Subsequently the standard evolved to handle compressed and surround-sound audio data.

Parallel interfaces using 25-pin D-connectors were not really practical for routers. The SDI was initially developed for interlaced standard definition only, to allow up to 10-bit samples of component or PAL/NTSC composite digital video to be communicated serially on coaxial cable.1 The 16:9 format component signals with 18 MHz sampling rate can also be handled. As if to emphasize the gulf that then existed between television and computing, the SDI as first standardised had no error-detection ability at all. This was remedied by a later option known as EDH (error detection and handling). The interface allows ancillary data including transparent conveyance of embedded AES/EBU digital audio channels during video blanking periods.

Subsequently the electrical and channel coding layer of SDI was used to create SDTI (serial data transport interface), which is used for transmitting, among other things, elementary streams from video compressors. ASI (asynchronous serial interface) uses only the electrical interface of SDI, with a different channel code and protocol, and is used for transmitting MPEG transport streams through SDI-based equipment.

The SDI format was later supplanted by HD-SDI, which uses a fixed bit rate of 1.485 Gbps independent of the video standard carried. Later came the 2.97 Gbps version, which allowed the larger frame-size progressively scanned formats to be carried, as well as the very large format images required for digital cinema at 24 Hz frame rate.

SERIAL DIGITAL VIDEO INTERFACES

The serial interfaces described here have a great deal of commonality. Any differences will be noted subsequently. All of the SD standards allow up to 10-bit samples to be communicated serially,1 whereas the HD standards allow a 12-bit option. If the input word length is less than the interface capacity, the missing bits are forced to zero for transmission except for the all-ones condition during synchronising. The interfaces are transparent to ancillary data in the parallel domain, including conveyance of AES/EBU digital audio channels.

The video signals to be carried over serial interfaces must be sampled and quantized in carefully standardised ways as was seen in Chapter 4. Serial transmission uses concepts that were introduced in Chapter 8. At the high bit rates of digital video, the cable is a true transmission line in which a significant number of bits are actually in the cable at any one time, having been sent but not yet received. Under these conditions cable loss is significant. These interfaces operate with cable losses up to 30dB. The losses increase with frequency and so the bit rate in use and the grade of cable employed both affect the maximum distance the signal will safely travel. Figure 10.4 gives some examples of cable lengths that can be used in SD. In HD there are only two bit rates. Using Belden 1649A or equivalent, a distance of 140 m can be achieved with the lower one.

Serial transmission uses a waveform that is symmetrical about ground and has an initial amplitude of 800 mV pk–pk across a 75-Ohm load. This signal can be fed down 75-Ohm coaxial cable having BNC connectors. Serial interfaces are restricted to point-to-point links. Unlike analog video practice, serial digital receivers contain correct termination that is permanently present, and passive loop through is not possible. In permanent installations, no attempt should be made to drive more than one load using T-pieces, as this will result in signal reflections that seriously compromise the data integrity. On the test bench with very short cables, however, systems with all manner of compromises may still function.

The range of waveforms that can be received without gross distortion is quite small and raw data produce waveforms outside this range. The solution is the use of scrambling, or pseudo-random coding. The serial interfaces use convolutional scrambling, as was described in Chapter 8. This is simpler to implement in a cable installation because no separate synchronising of the randomizing is needed. The scrambling process at the transmitter spreads the signal spectrum and makes that spectrum reasonably constant and independent of the picture content. It is possible to assess the degree of equalization necessary by comparing the energy in a low-frequency band with that in higher frequencies. The greater the disparity, the more equalization is needed. Thus fully automatic cable equalization at the receiver is easily achieved.

System

Clock

Fundamental

Crash knee length

Practical length

NTSC Composite

143 MHz

71.5 MHz

400 m

320 m

PAL Composite

177 MHz

88.5 MHz

360 m

290 m

Component 601

270 MHz

135 MHz

290 m

230 m

Component 16:9

360 MHz

180 MHz

210 m

170 m

CABLE: BICC TM3205, PSF1/2, BELDEN 8281
or any cable with a loss of 8.7 dB/100 m at 100 MHz

FIGURE 10.4

Suggested maximum cable lengths as a function of cable type and data rate to give a loss of no more than 30dB. It is unwise to exceed these lengths due to the “crash knee” characteristic of SDI.

The essential parts of a serial link are shown in Figure 10.5. Parallel data are fed to a 10-bit shift register, which is clocked at 10 times the input word rate: 2.97 GHz, 1.485 GHz, 360 MHz, or 270 MHz (composite interfaces use 40 × Fsc). The serial data emerge from the shift register LSB first and are then passed through the scrambler, in which a given bit is converted to the exclusive-OR of itself and two bits that are five and nine clocks ahead. This is followed by another stage, which converts channel ones into transitions. The transition encoder ensures that the signal is polarity independent. The resulting logic level signal is converted to a 75-Ohm source impedance signal at the cable driver.

The receiver must regenerate a bit clock at 2.97GHz, 1.485GHz, 360MHz, or 270MHz or from the input signal, and this clock drives the input sampler and slicer, which converts the cable waveform back to serial binary. The local bit clock also drives a circuit that simply reverses the scrambling at the transmitter. The first stage returns transitions to ones. The second stage is a mirror image of the encoder and reverses the exclusive-OR calculation to output the original data. Such descrambling results in error extension, but this is not a practical problem because link error rates must be near zero.

SYNCHRONISING

As with all serial transmissions it is necessary to identify the position of word boundaries so that correct de-serialization can take place at the receiver. The component interfaces carry a multiplex of luminance and colour difference samples and it is also necessary to synchronise the demultiplexing process so that the components are not inadvertently transposed. Only the active line data are sent, so horizontal and vertical synchronising must also be provided. These functions are performed by special bit patterns known as timing reference and identification signals (TRS-ID) sent with each line. TRS-ID differs only slightly between formats. Figure 10.6 shows the location of TRS-ID. Immediately before the digital active line location is the SAV (start of active video) TRS-ID pattern, and immediately after is the EAV (end of active video) TRS-ID pattern. These unique patterns occur on every line and continue throughout the vertical interval.

Each TRS-ID pattern consists of four symbols: the same length as the component multiplex repeating structure. In this way the presence of a TRS-ID does not alter the phase of the multiplex. Three of the symbols form a sync pattern for de-serializing and demultiplexing (TRS) and one is an identification symbol (ID) that replaces the analog sync signals. The first symbol contains all ones and the next two contain all zeros. This bit sequence cannot occur in active video, even due to concatenation of successive pixel values, so its detection is reliable. As the transition from a string of ones to a string of zeros occurs at a symbol boundary, it is sufficient to enable unambiguous de-serialization, location of the ID symbol, and demultiplexing of the components. Whatever the word length of the system, all bits should be either ones or zeros during TRS.

image

FIGURE 10.5

Major components of a serial scrambled link. Input samples are converted to serial form in a shift register clocked at 10 times the sample rate. The serial data are then scrambled for transmission. On reception, a phase-locked loop re-creates the bit rate clock and drives the descrambler and serial-to-parallel conversion. On detection of the sync pattern, the divide-by-10 counter is re-phased to load parallel samples correctly into the latch. For composite working the bit rate will be 40 times subcarrier, and a sync pattern generator (top left) is needed to inject TRS-ID into the composite data stream.

image

FIGURE 10.6

The active line data are bracketed by TRS-ID codes called SAV and EAV.

The fourth symbol is the ID, which contains three data bits, H, F, and V. These bits are protected by four redundancy bits and together form a seven-bit Hamming code word.

Figure 10.7a shows how the Hamming code is generated. Single-bit errors can be corrected and double-bit errors can be detected according to the decoding table in Figure 10.7b.

Figure 10.8a shows the structure of the TRS-ID. The data bits have the following meanings:

image H is used to distinguish between SAV, where it is set to 0, and EAV, where it is set to 1.

image F defines the state of interlace (if used) and is 0 during the first field and 1 during the second field. F is allowed to change only at EAV. In interlaced systems, one field begins at the centre of a line, but there is no sync pattern at that location so the field bit changes at the end of the line in which the change took place.

image V is 1 during vertical blanking and 0 during the active part of the field. It can change only at EAV.

Figure 10.8b, at the top, shows the relationship between the sync pattern bits and 625-line analog timing, whilst at the bottom is the relationship for 525 lines.

image

FIGURE 10.7

The data bits in the TRS are protected with a Hamming code, which is calculated according to the table in (a). Received errors are corrected according to the table in (b), in which a dot shows an error that is detected but not correctable.

Figure 10.9 shows a decode table for SD TRS, which is useful when interpreting logic analyser displays.

The same TRS-ID structure is used in SMPTE 274M and 296M HD. It differs in that the HD formats can support progressive scan in which the F bit is always set to zero.

SD-SDI

This interface supports 525/59.94 2:1 and 625/50 2:1 scanning standards in component and composite. The component interfaces use a common bit rate of 270MHz for 4:3 pictures with an option of 360 MHz for 16:9. In component, the TRS codes are already present in the parallel domain and SDI does no more than serialize the parallel signal protocol unchanged.

image

FIGURE 10.8

(a) The four-byte synchronising pattern, which precedes and follows every active line sample block, has this structure. (b) The relationships between analog video timing and the information in the digital timing reference signals for 625/50 (top) and 525/60 (bottom).

image

FIGURE 10.9

Decode table for component TRS.

Composite digital samples at four times the subcarrier frequency and so the bit rate is different between the PAL and the NTSC variants. The composite parallel interface signal is not a multiplex and also carries digitized analog syncs. Consequently there is no need for TRS codes. For serial transmission it is necessary to insert TRS at the serializer and subsequently to strip it out at the serial-to-parallel convertor. The TRS-ID is inserted during blanking, and the serial receiver can detect the patterns it contains. Composite TRS-ID is different from the one used in component signals and consists of five words inserted just after the leading edge of analog video sync. Figure 10.10a shows the location of TRS-ID at samples 967–971 in PAL and Figure 10.10b shows the location at samples 790–794 in NTSC.

Of the five words in TRS-ID, the first four are for synchronising and consist of a single word of all ones, followed by three words of all zeros. Note that the composite TRS contains an extra word of zeros compared with the component TRS, and this could be used for signal identification in multi-standard devices. The fifth word is for identification and carries the line and field numbering information shown in Figure 10.11. The field numbering is colour-framing information useful for editing. In PAL the field numbering will go from 0 to 7, whereas in NTSC it will only reach 3.

On detection of the synchronising symbols, a divide-by-10 circuit is reset, and the output of this will clock words out of the shift register at the correct times. This circuit will also provide the output word clock.

image

FIGURE 10.10

In composite digital it is necessary to insert a sync pattern during analog sync tip to ensure correct de-serialization. The location of TRS-ID is shown in (a) for PAL and in (b) for NTSC.

image

FIGURE 10.11

The contents of the TRS-ID pattern, which is added to the transmission during the horizontal sync pulse just after the leading edge. The field number conveys the composite colour framing field count, and the line number carries a restricted line count intended to give vertical positioning information during the vertical interval. This count saturates at 31 for lines of that number and above.

HD-SDI

The SD serial interface runs at a variety of bit rates according to the television standard being sent. At the high bit rates of HD, variable speed causes too many difficulties, so the HD serial interface2 runs at only two bit rates: 2.97 and 1.485 Gbps, although it is possible to reduce this by 0.1 percent so that it can lock to traditional 59.94Hz equipment. Apart from the bit rate, the HD serial interface has as much in common with the SDI standard as possible. Although the impedance, signal level, and channel coding are the same, the HD serial interface has a number of detail differences in the protocol.

The original parallel HD interface had two channels, one for luma and one for multiplexed colour-difference data. Each of these had a symbol rate of 74.25 MHz and its own TRS-ID structure. Essentially the HD serial interface is transparent to these data, as it simply multiplexes between the two channels at symbol rate. As far as the active line is concerned, the result is the same as for SD: a sequence of CB, Y, CR, Y, etc. However, in HD, the TRS-IDs of the two channels are also multiplexed. A further difference is that the HD interface has a line number and a CRC (cyclic redundancy check) for each active line inserted immediately after EAV. Figure 10.12a shows the EAV and SAV structure of each channel, with the line count and CRC, whereas Figure 10.12b shows the resultant multiplex.

To keep the interface bit rate constant, variable amounts of packing are placed between the active lines but the result is that the interface no longer works in real time at all frame rates and requires buffering at source and destination. The interface symbol rate has been chosen to be a common multiple of 24, 25, and 30 times 1125 Hz, so that there can always be an integer number of interface symbol periods in a line period.

image

FIGURE 10.12

The HD parallel data are in two channels, each having their own TRS, shown in (a). The EAV is extended by line number and CRC. (b) When the two channels are multiplexed, the TRS codes are interleaved.

A receiver can work out which format is being sent by counting the number of blanking periods between the active lines.

For example, if used at 30 Hz frame rate interlaced, there would be 1125 × 30 = 33,750 lines per second. Figure 10.13a shows that the luma sampling rate is 74.25 MHz and there are 2200 cycles of this clock in one line period. From these, 1920 cycles correspond to the active line and 280 remain for blanking and TRS. The colour difference sampling rate is one-half that of luma at 37.125 MHz and 960 cycles cover to the active line. As there are two colour difference signals, when multiplexed together the symbol rate will be 74.25 + 37.125 + 37.125 = 148.5MHz. The standard erroneously calls this the interface sampling rate, which is not a sampling rate at all, but a word rate or symbol rate.

The original HD parallel interface had a clock rate of 148.5 MHz. When 10-bit symbols are serialized, the bit rate becomes 1.485 GHz, the lower bit rate of serial HD. If the frame rate is reduced to 25 Hz, as in Figure 10.13b, the line rate falls to 1125 × 25 = 28,125 Hz and the luma sampling rate falls to 2200 × 28,125 = 61.875 MHz. The interface symbol rate does not change, but remains at 148.5 MHz. To carry 50 Hz pictures, time compression is used. At 28,125 lines per second, there will be 2640 cycles of 74.25 MHz, the luma interface rate, per line, rather than the 2200 cycles obtained at 60 Hz. Thus the line still contains 1920 active luma samples, but for transmission, the number of blank-ing/TRS cycles has been increased to 720.

image

FIGURE 10.13

In HD interfaces it is the data rate that is standardised, not the sampling rate. In (a) an 1125/30 picture requires a luma sampling rate of 74.25 MHz to have 1920 square pixels per active line. The data rate of the chroma is the same, thus the interface symbol rate is 148.5 MHz. (b) With 25 Hz pictures, the symbol rate does not change. Instead the blanking area is extended so the data rate is maintained by sending more blanking. (c) An extension of this process allows 24 Hz material to be sent.

Although the luma is sampled at 61.875 MHz, for transmission luma samples are placed in a buffer and read out at 74.25 MHz. This means that the active line is sent in rather less than an active line period.

Figure 10.13c shows that a similar approach is taken with 24 Hz material in which the number of blanking cycles is further increased.

ANCILLARY DATA

In component standards, only the active line is transmitted and this leaves a good deal of spare capacity. The two line standards differ on how this capacity is used. In 625 lines, only the active line period may be used on lines 20 to 22 and 333 to 335.3 Lines 20 and 333 are reserved for equipment self-testing.

In 525 lines there is considerably more freedom and ancillary data may be inserted anywhere there is no active video, during either horizontal blanking, where it is known as HANC, or vertical blanking, where it is known as VANC, or both.4 The spare capacity allows many channels of digital audio and considerably simplifies switching.

The all zeros and all ones codes are reserved for synchronising and cannot be allowed to appear in ancillary data. In practice only seven bits of the eight-bit word can be used as data; the 8th bit is redundant and gives the byte odd parity. As all ones and all zeros are even parity, the sync pattern cannot then be generated accidentally.

Ancillary data are always prefaced by a different four-symbol TRS, which is the inverse of the video TRS in that it starts with all zeros and then has two symbols of all ones followed by the information symbol.

SDTI

SDI is closely specified and is suitable only for transmitting 2:1 interlaced 4:2:2 digital video in 525/60 or 625/50 systems. Since the development of SDI, it has become possible economically to compress digital video and the SDI standard cannot handle this. SDTI (serial data transport interface) is designed to overcome that problem by converting SDI into an interface that can carry a variety of data types whilst retaining compatibility with existing SDI router infrastructures. SDTI5 sources produce a signal that is electrically identical to an SDI signal and that has the same timing structure. However, the digital active line of SDI becomes a data packet or item in SDTI.

Figure 10.14 shows how SDTI fits into the existing SDI timing. Between EAV and SAV (horizontal blanking in SDI) an ancillary data block is incorporated. The structure of this meets the SDI standard, and the data within describe the contents of the following digital active line. The data capacity of SDTI is about 200 Mbps because some of the 270 Mbps is lost due to the retention of the SDI timing structure. Each digital active line finishes with a CRCC (cyclic redundancy check character) to check for correct transmission.

SDTI raises a number of opportunities, including the transmission of compressed data at faster than real time. If a video signal is compressed at 4:1, then one-quarter as much data would result. If sent in real time the bandwidth required would be one-quarter of that needed by un-compressed video. However, if the same bandwidth is available, the compressed data could be sent in one-quarter of the usual time. This is particularly advantageous for data transfer between compressed camcorders and non-linear editing workstations. Alternatively, four different 50 Mbps signals could be conveyed simultaneously.

image

FIGURE 10.14

SDTI is a variation of SDI that allows transmission of generic data. This can include compressed video and non-real-time transfer.

Thus an SDTI transmitter takes the form of a multiplexer that assembles packets for transmission from input buffers. The transmitted data can be encoded according to MPEG, Motion JPEG, Digital Betacam, or DVC formats and all that is necessary is that compatible devices exist at each end of the interface. In this case the data are transferred with bit accuracy and so there is no generation loss associated with the transfer. If the source and destination are different, i.e., having different formats or, in MPEG, different group structures, then a conversion process with attendant generation loss would be needed.

ASI

The asynchronous serial interface is designed to allow MPEG transport streams to be transmitted over standard SDI cabling and routers. ASI offers higher performance than SDTI because it does not adhere to the SDI timing structure. Transport stream data do not have the same statistics as PCM video and so the scrambling technique of SDI cannot be used. Instead ASI uses an 8/10 group code to eliminate DC components and ensure adequate clock content.

SDI equipment is designed to run at a closely defined bit rate of 270 Mbps and has phase-locked loops in receiving and repeating devices, which are intended to remove jitter. These will lose lock if the channel bit rate changes. Transport streams are fundamentally variable in bit rate, and to retain compatibility with SDI routing equipment ASI uses stuffing bits to keep the transmitted bit rate constant.

The use of an 8/10 code means that although the channel bit rate is 270 Mbps, the data bit rate is only 80 percent of that, i.e., 216Mbps. A small amount of this is lost to overheads.

AES/EBU

The AES/EBU digital audio interface, originally published in 1985, was proposed to embrace all the functions of existing formats in one standard. The goal was to ensure interconnection of professional digital audio equipment irrespective of origin. The EBU ratified the AES proposal with the proviso that the optional transformer coupling was made mandatory and led to the term AES/EBU interface, also called EBU/AES by some Europeans and standardised as IEC-958.

The interface has to be self-clocking and self-synchronising, i.e., the single signal must carry enough information to allow the boundaries between individual bits, words, and blocks to be detected reliably. To fulfil these requirements, the FM channel code is used (see Chapter 8), which is DC-free, strongly self-clocking, and capable of working with a changing sampling rate. Synchronisation of de-serialization is achieved by violating the usual encoding rules.

image

FIGURE 10.15

Recommended electrical circuit for use with the standard two-channel interface.

The use of FM means that the channel frequency is the same as the bit rate when sending data ones. Tests showed that in typical analog audio cabling installations, sufficient bandwidth was available to convey two digital audio channels in one twisted pair. The standard driver and receiver chips for RS-422A6 data communication (or the equivalent CCITT-V.11) are employed for professional use, but work by the BBC7 suggested that equalization and transformer coupling were desirable for longer cable runs, particularly if several twisted pairs occupy a common shield. Successful transmission up to 350 m has been achieved with these techniques.8

Figure 10.15 shows the standard configuration. The output impedance of the drivers will be about 110 Ohms, and the impedance of the cable and receiver should be similar at the frequencies of interest. The driver was specified in AES-3-1985 to produce between 3 and 10V pk–pk into such an impedance but this was changed to between 2 and 7 V in AES-3-1992 to reflect better the characteristics of actual RS-422 driver chips.

In Figure 10.16, the specification of the receiver is shown in terms of the minimum eye pattern that can be detected without error. It will be noted that the voltage of 200 mV specifies the height of the eye opening at a width of half a channel bit period. The actual signal amplitude will need to be larger than this, and even larger if the signal contains noise. Figure 10.17 shows the recommended equalization characteristic that can be applied to signals received over long lines.

The purpose of the standard is to allow the use of existing analog cabling, and as an adequate connector in the shape of the XLR is already in wide service, the connector made to IEC-268 Part 12 has been adopted for digital audio use. Effectively, existing analog audio cables having XLR connectors can be used without alteration for digital connections.

image

FIGURE 10.16

The minimum eye pattern acceptable for correct decoding of standard two-channel data.

image

FIGURE 10.17

Equalization characteristic recommended by the AES to improve reception in the case of long lines.

There is a separate standard9 for a professional interface using coaxial cable for distances of around 1000 m. This is simply the AES/EBU protocol but with a 75-Ohm coaxial cable carrying a 1 V signal so that it can be handled by analog video distribution amplifiers. Impedance-converting transformers allow balanced 110 Ohm to unbalanced 75 Ohm matching.

In Figure 10.18 the basic structure of the professional and consumer formats can be seen. One subframe consists of 32-bit cells, of which four will be used by a synchronising pattern. Subframes from the two audio channels, A and B, alternate on a time-division basis, with the least significant bit sent first. Up to 24-bit sample word length can be used, which should cater to all conceivable future developments, but normally 20-bit maximum length samples will be available with four auxiliary data bits, which can be used for a voice-grade channel in a professional application.

image

FIGURE 10.18

The basic subframe structure of the AES/EBU format. Sample can be 20 bits, with four auxiliary bits, or 24 bits. LSB is transmitted first.

The format specifies that audio data must be in two's complement coding. If different word lengths are used, the MSBs must always be in the same bit position, otherwise the polarity will be misinterpreted. Thus the MSB has to be in bit 27 irrespective of word length. Shorter words are leading-zero-filled up to the 20-bit capacity. The channel-status data included from AES-3-1992 signalling of the actual audio word length used, so that receiving devices could adjust the digital dithering level needed to shorten a received word that is too long or pack samples onto a storage device more efficiently.

Four status bits accompany each subframe. The validity flag will be reset if the associated sample is reliable. Whilst there have been many aspirations regarding what the V bit could be used for, in practice a single bit cannot specify much, and if combined with other V bits to make a word, the time resolution is lost. AES-3-1992 described the V bit as indicating that the information in the associated subframe is “suitable for conversion to an analog signal.” Thus it might be reset if the interface was being used for non-PCM audio data such as the output of an audio compressor.

The parity bit produces even parity over the subframe, such that the total number of ones in the subframe is even. This allows for simple detection of an odd number of bits in error, but its main purpose is that it makes successive sync patterns have the same polarity, which can be used to improve the probability of detection of sync. The user and channel-status bits are discussed later. Two of the subframes described above make one frame, which repeats at the sampling rate in use. The first subframe will contain the sample from channel A, or from the left channel in stereo working. The second subframe will contain the sample from channel B, or the right channel in stereo. At 48 kHz, the bit rate will be 3.072 MHz, but as the sampling rate can vary, the clock rate will vary in proportion.

image

FIGURE 10.19

Three different preambles (X, Y, and Z) are used to synchronise a receiver at the start of subframes.

To separate the audio channels on receipt the synchronising patterns for the two subframes are different, as Figure 10.19 shows. These sync patterns begin with a run length of 1.5 bits, which violates the FM channel coding rules and so cannot occur due to any data combination.

The type of sync pattern is denoted by the position of the second transition, which can be 0.5, 1.0, or 1.5 bits away from the first. The third transition is designed to make the sync patterns DC-free. The channel-status and user bits in each subframe form serial data streams with one bit of each per audio channel per frame. The channel-status bits are given a block structure and synchronised every 192 frames, which at 48 kHz gives a block rate of 250 Hz, corresponding to a period of 4 ms. To synchronise the channel-status blocks, the channel A sync pattern is replaced for one frame only by a third sync pattern, which is also shown in Figure 10.19. The AES standard refers to these as X, Y, and Z, whereas IEC-958 calls them M, W, and B. As stated, there is a parity bit in each subframe, which means that the binary level at the end of a subframe will always be the same as at the beginning. Because the sync patterns have the same characteristic, the effect is that sync patterns always have the same polarity and the receiver can use that information to reject noise. The polarity of transmission is not specified, and indeed an accidental inversion in a twisted pair is of no consequence, because it is only the transition that is of importance, not the direction.

In both the professional and the consumer formats, the sequence of channel-status bits over 192 subframes builds up a 24-byte channel-status block. However, the contents of the channel-status data are completely different between the two applications. The professional channel-status structure is shown in Figure 10.20. Byte 0 determines the use of emphasis and the sampling rate. Byte 1 determines the channel usage mode, i.e., whether the data transmitted are a stereo pair, two unrelated mono signals, or a single mono signal, and details the user bit handling, and byte 2 determines word length. Byte 3 is applicable only to multichannel applications. Byte 4 indicates the suitability of the signal as a sampling rate reference.

image

FIGURE 10.20

Overall format of the professional channel-status block.

There are two slots of four bytes each, which are used for alphanumeric source and destination codes. These can be used for routing. The bytes contain seven-bit ASCII characters (printable characters only) sent LSB first, with the 8th bit set to 0 according to AES-3-1992. The destination code can be used to operate an automatic router.

Bytes 14–17 convey a 32-bit sample address, which increments every channel status frame. It effectively numbers the samples in a relative manner from an arbitrary starting point. Bytes 18–21 convey a similar number, but this is a time-of-day count, which starts from zero at midnight. As many digital audio devices do not have real-time clocks built in, this cannot be relied upon. AES-3-92 specified that the time-of-day bytes should convey the real time at which a recording was made, making it rather like timecode. There are enough combinations in 32 bits to allow a sample count over 24 hours at 48 kHz. The sample count has the advantage that it is universal and independent of local supply frequency.

In theory, if the sampling rate is known, conventional hours, minutes, seconds, frames timecode can be calculated from the sample count, but in practice it is a lengthy computation and users have proposed alternative formats in which the data from the EBU or SMPTE timecode are transmitted directly in these bytes. Some of these proposals are in service as de facto standards.

The penultimate byte contains four flags, which indicate that certain sections of the channel-status information are unreliable. This allows the transmission of an incomplete channel-status block for which the entire structure is not needed or the information is not available. The final byte in the message is a CRCC, which converts the entire channel-status block into a code word. The channel-status message takes 4 ms at 48 kHz and in this time a router could have switched to another signal source. This would damage the transmission, but will also result in a CRCC failure, so the corrupt block is not used.

TELEPHONE-BASED SYSTEMS

The success of the telephone has led to a vast number of subscribers being connected with copper wires and this is a valuable network infrastructure. As technology has developed, the telephone has become part of a global telecommunications industry. Simple economics suggests that in many cases improving the existing telephone cabling with modern modulation schemes is a good way of providing new communications services.

The development of electronics revolutionized telephone exchanges. Whilst the loop current, AC ringing, and hook switch sensing remained for compatibility, the electro-mechanical exchange gave way to electronic exchanges in which the dial pulses were interpreted by digital counters, which then drove crosspoint switches to route the call. The communication remained analog.

The next advance permitted by electronic exchanges was touch-tone dialling, also called DTMF. Touch-tone dialling is based on seven discrete frequencies. The telephone contains tone generators and tuned filters in the exchange can detect each frequency individually. The numbers 0 through 9 and two non-numerical symbols, asterisk and hash, can be transmitted using 12 unique tone pairs. A tone pair can be reliably detected in about 100 ms and this makes dialling much faster than the pulse system.

The first electronic exchanges simply used digital logic to perform the routing function. The next step was to use a fully digital system in which the copper wires from each subscriber terminate in an interface or line card containing ADCs and DACs. The sampling rate of 8 kHz retains the traditional analog bandwidth, and eight-bit quantizing is used. This is not linear, but uses logarithmically sized quantizing steps so that the quantizing error is greater on larger signals. The result is a 64 kbps data rate in each direction.

Packets of data can be time-division multiplexed into high bit rate data buses that can carry many calls simultaneously. The routing function becomes simply one of watching the bus until the right packet comes along for the selected destination. Sixty-four kilobits per second data switching came to be known as IDN (integrated digital network). As a data bus does not care whether it carries 64 kbps of speech or 64 kbps of something else, communications systems based on IDN tend to be based on multiples of that rate. Such a system is called ISDN (integrated services digital network), which is basically a use of the telephone system that allows dial-up data transfer between subscribers in much the same way as a conventional phone call is made.

As it is based on IDN, ISDN works on units of 64 kbps, known as “B channels,” so that the communications channel carries the ISDN data just as easily as a voice call. However, for many applications, this bit rate is not enough and ISDN joins together more than one B channel to raise the bit rate. In the lowest cost option, known as Basic Rate ISDN, two B channels are available, allowing 128 kbps communication. Physically, the ISDN connection between the subscriber and the exchange consists of two twisted pairs; one for transmit and one for receive. The existing telephone wiring cannot be used. The signalling data, known as the D channel and running at 16 kbps, is multiplexed into the bitstream. A Basic Rate ISDN link has two B channels and one D channel multiplexed into the twisted pair. The B channels can be used for separate calls or ganged together.

Each twisted pair carries 2 × 64 plus 1 × 16 kbps of data, plus synchronising patterns that allow the B and D information to be de-serialized and separated. This results in a total rate of 192 kbps. The network echoes the D bits sent by the terminal. This is used to prove the connection exists in both directions and to detect if more than one terminal has tried to get on the lines at the same time. Figure 10.21 shows what the signalling waveform of ISDN looks like.

A three-level channel code called AMI (alternate mark inversion) is used. The outer two levels (positive or negative voltage) both represent data 0, whereas the centre level (0 V) represents a data 1. Successive zeros must use alternating polarity. Whatever the data bit pattern, AMI coding means that the transmitted waveform is always DC-free because ones cause no offset and any 0 is always balanced by the next 0, which has opposite polarity.

For wider bandwidth, the Primary Rate ISDN system allows, in many parts of the world, up to 30 B channels in a system called E1, whereas in North America a system called T1 is used, which offers 23 or 24 B channels. Naturally the more bit rate that is used, the more the call costs.

For compatibility with IDN, E1 and T1 still use individual 64-kbit channels and the provision of wider bandwidth depends upon units called inverse multiplexers (I-MUXes), which distribute the source data over several B channels. The set of B channels used in an ISDN call do not necessarily all pass down the same route. Depending on how busy lines are, some B channels may pass down a physically different path between subscribers. The data arrive unchanged, but the time axis will be disrupted because the different paths may introduce different delays.

image

FIGURE 10.21

ISDN uses a modulation scheme known as AMI to deliver data over telephone-type twisted pairs.

image

FIGURE 10.22

ISDN works on combining channels of fixed bit rate to approximate the bit rate needed for the application.

Figure 10.22 shows that the multiplexer at the receiving end has to combine the data from a number of B channels and apply suitable delays to each so that the final result is the original bitstream. The I-MUX has to put special time-variant codes in each B channel signal so that the multiplexer can time-align them. An alternative is when a telco has made full use of the synchronising means within the networks. Where suitable control systems are implemented, once a single B channel call has been connected, the remaining B channels are logically attached so that they must follow the same routing, avoiding differential delays. With the subsequent development of broadband networks (B-ISDN), the original ISDN is now known as N-ISDN, in which the N stands for narrowband. B-ISDN is the ultimate convergent network, able to carry any type of data, and uses the well-known ATM protocol. Broadband and ATM are considered in a later section.

One of the difficulties of the AMI coding used in N-ISDN is that the data rate is limited and new cabling to the exchange is needed. ADSL (asymmetric digital subscriber line) is an advanced coding scheme that obtains high bit rate delivery and a back channel down existing subscriber telephone wiring. ADSL works on frequency-division multiplexing using 4 kHz-wide channels, and 249 of these provide the delivery or downstream channel and another 25 provide the back channel. Figure 10.23a shows that the existing bandwidth used by the traditional analog telephone is retained. The back channel occupies the lowest-frequency channels, with the downstream channels above. Figure 10.23b shows that at each end of the existing telephone wiring a device called a splitter is needed. This is basically a high-pass/low-pass filter that directs audio-frequency signals to the telephones and high-frequency signals to the modems.

Telephone wiring was never designed to support high-frequency signalling and is non-ideal. There will be reflections due to impedance mismatches, which will cause an irregular frequency response in addition to high-frequency losses and noise, which will all vary with cable length. ADSL can operate under these circumstances because it constantly monitors the conditions in each channel. If a given channel has adequate signal level and low noise, the full bit rate can be used, but in another channel there may be attenuation and the bit rate will have to be reduced. By independently coding the channels, the optimum data throughput for a given cable is obtained.

image

image

FIGURE 10.23

(a) ADSL allows the existing analog telephone to be retained, but adds delivery and back channels at higher frequencies. (b) A splitter is needed at each end of the subscriber's line.

Each channel is modulated using DMT (discrete multitone technique), in which combinations of discrete frequencies are used. Within one channel symbol, there are 15 combinations of tones and so the coding achieves 15 bps/Hz. With a symbol rate of 4 kHz, each channel can deliver 60 kbps, resulting in 14.9 Mbps for the downstream channel and 1.5 Mbps for the back channel. It should be stressed that these figures are theoretical maxima, which are not reached in real cables. Practical ADSL systems deliver multiples of the ISDN channel rate up to about 6 Mbps, enough to deliver MPEG-2 coded video.

Over shorter distances, VDSL can reach up to 50 Mbps. Where ADSL and VDSL are being referred to as a common technology, the term xDSL will be found.

DIGITAL TELEVISION BROADCASTING

Digital television broadcasting relies on the combination of a number of fundamental technologies. These are MPEG-2 compression to reduce the bit rate, multiplexing to combine picture and sound data into a common bitstream, digital modulation schemes to reduce the RF bandwidth needed by a given bit rate, and error correction to reduce the error statistics of the channel down to a value acceptable to MPEG data. MPEG compressed video is highly sensitive to bit errors, primarily because they confuse the recognition of variable-length codes so that the decoder loses synchronisation. However, MPEG is a compression and multiplexing standard and does not specify how error correction should be performed. Consequently a transmission standard must define a system that has to correct essentially all errors such that the delivery mechanism is transparent.

Essentially a transmission standard specifies all the additional steps needed to deliver an MPEG transport stream from one place to another. This transport stream will consist of a number of elementary streams of video and audio, in which the audio may be coded according to MPEG audio standards or AC-3. In a system working within its capabilities, the picture and sound quality will be determined only by the performance of the compression system and not by the RF transmission channel. This is the fundamental difference between analog and digital broadcasting. In analog television broadcasting, the picture quality may be limited by composite video encoding artifacts as well as transmission artifacts such as noise and ghosting. In digital television broadcasting the picture quality is determined instead by the compression artifacts and interlace artifacts if interlace has been retained.

If the received error rate increases for any reason, once the correcting power is used up, the system will degrade rapidly as uncorrected errors enter the MPEG decoder. In practice, decoders will be programd to recognize the condition and mute or freeze to avoid outputting garbage. As a result digital receivers tend either to work well or not to work at all. It is important to realise that the signal strength in a digital system does not translate directly to picture quality. A poor signal will increase the number of bit errors. Provided that this is within the capability of the error-correction system, there is no visible loss of quality. In contrast, a very powerful signal may be unusable because of similarly powerful reflections due to multipath propagation.

Whilst in one sense an MPEG transport stream is only data, it differs from generic data in that it must be presented to the viewer with a particular time base. Generic data are usually asynchronous, whereas baseband video and audio are synchronous. However, after compression and multiplexing audio and video are no longer precisely synchronous and so the term isochronous is used. This refers to a signal that was at one time synchronous and will be displayed synchronously, but which uses buffering at transmitter and receiver to accommodate moderate timing errors in the transmission.

Clearly another mechanism is needed so that the time axis of the original signal can be re-created on reception. The time stamp and program clock reference system of MPEG does this.

Figure 10.24 shows that the concepts involved in digital television broadcasting exist at various levels that have an independence not found in analog technology. In a given configuration a transmitter can radiate a given payload data bit rate. This represents the useful bit rate and does not include the necessary overheads needed by error correction, multiplexing, or synchronising. It is fundamental that the transmission system does not care what this payload bit rate is used for. The entire capacity may be used up by one high-definition channel, or a large number of heavily compressed channels may be carried. The details of this data usage are the domain of the transport stream. The multiplexing of transport streams is defined by the MPEG standards, but these do not define any error-correction or transmission technique.

At the lowest level in Figure 10.25, the source coding scheme, in this case MPEG compression, results in one or more elementary streams, each of which carries a video or audio channel. Elementary streams are multiplexed into a transport stream. The viewer then selects the desired elementary stream from the transport stream. Metadata in the transport stream ensure that when a video elementary stream is chosen, the appropriate audio elementary stream will automatically be selected.

image

FIGURE 10.24

Source coder does not know delivery mechanism and delivery does not need to know what the data mean.

image

FIGURE 10.25

Program-specific information helps the demultiplexer to select the required program.

MPEG PACKETS AND TIME STAMPS

The video elementary stream is an endless bitstream representing pictures that take variable lengths of time to transmit. Bi-directional coding means that pictures are not necessarily in the correct order. Storage and transmission systems prefer discrete blocks of data and so elementary streams are packetised to form a PES (packetised elementary stream). Audio elementary streams are also packetised. A packet is shown in Figure 10.26. It begins with a header containing a unique packet start code and a code that identifies the type of data stream.

Optionally the packet header also may contain one or more time stamps, which are used for synchronising the video decoder to real time and for obtaining lip-sync. Figure 10.27 shows that a time stamp is a sample of the state of a counter, which is driven by a 90 kHz clock. This is obtained by dividing down the master 27 MHz clock of MPEG-2. This 27 MHz clock must be locked to the video frame rate and the audio sampling rate of the program concerned. There are two types of time stamp: PTS and DTS. These are abbreviations for presentation time stamp and decode time stamp. A presentation time stamp determines when the associated picture should be displayed on the screen, whereas a decode time stamp determines when it should be decoded. In bi-directional coding these times can be quite different.

image

FIGURE 10.26

A PES packet structure is used to break up the continuous elementary stream.

image

FIGURE 10.27

Time stamps are the result of sampling a counter driven by the encoder clock.

Audio packets are not reordered and have only presentation time stamps. Clearly if lip-sync is to be obtained, the audio sampling rate of a given program must have been locked to the same master 27 MHz clock as the video and the time stamps must have come from the same counter driven by that clock. In practice the time between input pictures is constant and so there is a certain amount of redundancy in the time stamps. Consequently PTS/DTS need not appear in every PES packet. Time stamps can be up to 100 ms apart in transport streams. As each picture type (I, P, or B) is flagged in the bitstream, the decoder can infer the PTS/DTS for every picture from the ones actually transmitted.

The MPEG-2 transport stream is intended to be a multiplex of many TV programs with their associated sound and data channels, although a single program transport stream (SPTS) is possible. The transport stream is based upon packets of constant size so that multiplexing, adding error-correction codes, and interleaving in a higher layer are eased. Figure 10.28 shows that these are always 188 bytes long.

image

FIGURE 10.28

Transport stream packets are always 188 bytes long to facilitate multiplexing and error correction.

Transport stream packets always begin with a header. The remainder of the packet carries data known as the payload. For efficiency, the normal header is relatively small, but for special purposes the header may be extended. In this case the payload gets smaller so that the overall size of the packet is unchanged. Transport stream packets should not be confused with PES packets, which are larger and which vary in size. PES packets are broken up to form the payload of the transport stream packets.

The header begins with a sync byte, which is a unique pattern detected by a demultiplexer. A transport stream may contain many different elementary streams and these are identified by giving each a unique 13-bit packet identification code or PID, which is included in the header. A multiplexer seeking a particular elementary stream simply checks the PID of every packet and accepts only those that match.

In a multiplex there may be many packets from other programs in between packets of a given PID. To help the demultiplexer, the packet header contains a continuity count. This is a four-bit value that increments at each new packet having a given PID. This approach allows statistical multiplexing, as it does matter how many or how few packets have a given PID; the demux will still find them. Statistical multiplexing has the problem that it is virtually impossible to make the sum of the input bit rates constant. Instead the multiplexer aims to make the average data bit rate slightly less than the maximum and the overall bit rate is kept constant by adding “stuffing” or null packets. These packets have no meaning, but simply keep the bit rate constant. Null packets always have a PID of 8191 (all ones) and the demultiplexer discards them.

PROGRAM-SPECIFIC INFORMATION (PSI)

In a real transport stream, each elementary stream has a different PID, but the demultiplexer has to be told what these PIDs are and what audio belongs with what video before it can operate. This is the function of PSI, which is a form of metadata. Figure 10.30 shows the structure of PSI.

When a decoder powers up, it knows nothing about the incoming transport stream except that it must search for all packets with a PID of 0. PID 0 is reserved for the program association table (PAT) packets. The PAT is transmitted at regular intervals and contains a list of all the programs in this transport stream. Each program is further described by its own program map table (PMT) and the PIDs of the PMTs are contained in the PAT.

Figure 10.30 also shows that the PMTs fully describe each program. The PID of the video elementary stream is defined, along with the PID(s) of the associated audio and data streams. Consequently when the viewer selects a particular program, the demultiplexer looks up the program number in the PAT, finds the right PMT, and reads the audio, video, and data PIDs. It then selects elementary streams having these PIDs from the transport stream and routes them to the decoders.

Program 0 of the PAT contains the PID of the network information table (NIT). This contains information about what other transport streams are available. For example, in the case of a satellite broadcast, the NIT would detail the orbital position, polarization, carrier frequency, and modulation scheme. Using the NIT a set-top box could automatically switch between transport streams.

Apart from 0 and 8191, a PID of 1 is also reserved for the conditional access table. This is part of the access control mechanism needed to support pay-per-view or subscription viewing.

PROGRAM CLOCK REFERENCE

A transport stream is a multiplex of several TV programs and these may have originated from widely different locations. It is impractical to expect all the programs in a transport stream to be genlocked and so the stream is designed from the outset to allow unlocked programs. A decoder running from a transport stream has to genlock to the encoder and the transport stream has to have a mechanism to allow this to be done independently for each program. The synchronising mechanism is called program clock reference (PCR).

Figure 10.29 shows how the PCR system works. The goal is to re-create at the decoder a 27 MHz clock that is synchronous with that at the encoder. The encoder clock drives a 48-bit counter, which continuously counts up to the maximum value before overflowing and beginning again.

A transport stream multiplexer will periodically sample the counter and place the state of the count in an extended packet header as a PCR (see Figure 10.26). The demultiplexer selects only the PIDs of the required program, and it will extract the PCRs from the packets in which they were inserted. The PCR codes are used to control a numerically locked loop (NLL) described in Chapter 4. The NLL contains a 27 MHz VCXO (voltage-controlled crystal oscillator), a variable-frequency oscillator based on a crystal, which has a relatively small frequency range.

The VCXO drives a 48-bit counter in the same way as in the encoder. The state of the counter is compared with the contents of the PCR and the difference is used to modify the VCXO frequency. When the loop reached lock, the decoder counter would arrive at the same value as is contained in the PCR and no change in the VCXO would then occur. In practice the transport stream packets will suffer from transmission jitter and this will create phase noise in the loop. This is removed by the loop filter so that the VCXO effectively averages a large number of phase errors.

image

FIGURE 10.29

Program or system clock reference codes regenerate a clock at the decoder. See text for details.

A heavily damped loop will reject jitter well, but will take a long time to lock. Lockup time can be reduced when switching to a new program if the decoder counter is jammed to the value of the first PCR received in the new program. The loop filter may also use shorter time constants during lockup.

Once a synchronous 27 MHz clock is available at the decoder, this can be divided down to provide the 90 kHz clock, which drives the time stamp mechanism. The entire time-base stability of the decoder is no better than the stability of the clock derived from PCR. MPEG-2 sets standards for the maximum amount of jitter that can be present in PCRs in a real transport stream.

Clearly if the 27 MHz clock in the receiver is locked to one encoder it can receive only elementary streams encoded with that clock. If it is attempted to decode, for example, an audio stream generated from a different clock, the result will be periodic buffer overflows or underflows in the decoder. Thus MPEG defines a program in a manner that relates to timing. A program is a set of elementary streams that have been encoded with the same master clock.

image

FIGURE 10.30

MPEG-2 PSI is used to tell a de-multiplexer what the transport stream contains.

TRANSPORT STREAM MULTIPLEXING

A transport stream multiplexer is a complex device because of the number of functions it must perform. A fixed multiplexer will be considered first. In a fixed multiplexer, the bit rate of each of the programs must be specified so that the sum does not exceed the payload bit rate of the transport stream. The pay-load bit rate is the overall bit rate less the packet headers and PSI rate. In practice the programs will not be synchronous to one another, but the transport stream must produce a constant packet rate given by the bit rate divided by 188 bytes, the packet length. Figure 10.31 shows how this is handled. Each elementary stream entering the multiplexer passes through a buffer that is divided into payload-sized areas. Note that periodically the payload area is made smaller because of the requirement to insert PCR.

MPEG-2 decoders also have a quantity of buffer memory. The challenge to the multiplexer is to take packets from each program in such a way that neither its own buffers nor the buffers in any decoder either overflow or underflow. This requirement is met by sending packets from all programs as evenly as possible rather than bunching together a lot of packets from one program. When the bit rates of the programs are different, the only way this can be handled is to use the buffer contents indicators. The fuller a buffer is, the more likely it should be that a packet will be read from it. Thus a buffer content arbitrator can decide which program should have a packet allocated next.

image

FIGURE 10.31

A transport stream multiplexer can handle several programs that are asynchronous to one another and to the transport stream clock. See text for details.

If the sum of the input bit rates is correct, the buffers should all slowly empty because the overall input bit rate has to be less than the payload bit rate. This allows for the insertion of program-specific information. Whilst PATs and PMTs are being transmitted, the program buffers will fill up again. The multiplexer can also fill the buffers by sending more PCRs as this reduces the payload of each packet. In the event that the multiplexer has sent enough of everything but still cannot fill a packet, it will send a null packet with a PID of 8191. Decoders will discard null packets and, as they convey no useful data, the multiplexer buffers will all fill whilst null packets are being transmitted.

The use of null packets means that the bit rates of the elementary streams do not need to be synchronous with one another or with the transport stream bit rate. As each elementary stream can have its own PCR, it is not necessary for the different programs in a transport stream to be genlocked to one another; in fact they do not even need to have the same frame rate. This approach allows the transport stream bit rate to be accurately defined and independent of the timing of the data carried. This is important because the transport stream bit rate determines the spectrum of the transmitter and this must not vary.

In a statistical multiplexer or STATMUX, the bit rate allocated to each program can vary dynamically. Figure 10.32 shows that there must be a tight connection between the STATMUX and the associated compressors. Each compressor has a buffer memory, which is emptied by a demand clock from the STATMUX. In a normal, fixed bit rate coder, the buffer content feeds back and controls the requantizer. In statmuxing this process is less severe and takes place only if the buffer is very close to full, because the degree of coding difficulty is also fed to the STATMUX.

image

FIGURE 10.32

A statistical multiplexer contains an arbitrator, which allocates bit rate to each program as a function of program difficulty.

REMULTIPLEXING

In real life a program creator may produce a transport stream that carries all its programs simultaneously. A service provider may take in several such streams and create its own transport stream by selecting different programs from different sources. In an MPEG-2 environment this requires a remultiplexer, also known as a transmultiplexer. Figure 10.33 shows what a remultiplexer does. Remultiplexing is easier when all the incoming programs have the same bit rate. If a suitable combination of programs is selected it is obvious that the output transport stream will always have sufficient bit rate. When statistical multiplexing has been used, there is a possibility that the sum of the bit rates of the selected programs will exceed the bit rate of the output transport stream. To avoid this, the remultiplexer will have to employ recompression.

Recompression requires a partial decode of the bitstream to identify the DCT (discrete cosine transform) coefficients. These will then be requantized to reduce the bit rate until it is low enough to fit the output transport stream. Remultiplexers have to edit the PSI (Program Specific Information) such that the PAT (Program Association Tables) and the PMTs (Program Map Tables) correctly reflect the new transport stream content. It may also be necessary to change the PIDs (packet identification codes) because the incoming transport streams could inadvertently have used the same values.

When PCR (Program Clock Reference) data are included in an extended packet header, they represent a real-time clock count, and if the associated packet is moved in time the PCR value will be wrong. Remultiplexers have to re-create a new multiplex from a number of other multiplexes and it is inevitable that this process will result in packets being placed in locations in the output transport stream that are different from those they had in the input. In this case the remultiplexer must edit the PCR values so that they reflect the value the clock counter would have had at the location at which the packet now resides.

image

FIGURE 10.33

A remultiplexer creates a new transport stream from selected programs in other transport streams.

The STATMUX contains an arbitrator, which allocates more packets to the program with the greatest coding difficulty. Thus if a particular program encounters difficult material it will produce large prediction errors and begin to fill its output buffer. As the STATMUX has allocated more packets to that program, more data will be read out of that buffer, preventing overflow. Of course this is possible only if the other programs in the transport stream are handling typical video.

In the event that several programs encounter difficult material at once, clearly the buffer contents will rise and the requantizing mechanism will have to operate.

BROADCAST MODULATION TECHNIQUES

A key difference between analog and digital transmission is that the transmitter output is switched between a number of discrete states rather than continuously varying. The process is called channel coding, which is the digital equivalent of modulation. A good code minimizes the channel bandwidth needed for a given bit rate. This quality of the code is measured in bits per second per hertz (bps/Hz) and is roughly the equivalent of the density ratio in recording. Figure 10.34 shows, not surprisingly, that the less bandwidth required, the better the signal-to-noise ratio has to be. The figure shows the theoretical limit as well as the performance of a number of codes that offer different balances of bandwidth/noise performance.

image

FIGURE 10.34

Where a better SNR exists, more data can be sent down a given bandwidth channel.

image

FIGURE 10.35

Differential quadrature phase-shift keying (DQPSK).

Where the SNR is poor, as in satellite broadcasting, the amplitude of the signal will be unstable, and phase modulation is used. Figure 10.35 shows that phase-shift keying (PSK) can use two or more phases. When four phases in quadrature are used, the result is quadrature phase-shift keying or QPSK. Each period of the transmitted waveform can have one of four phases and therefore conveys the value of two data bits. Eight-PSK uses eight phases and can carry three bits per symbol where the SNR is adequate. PSK is generally encoded in such a way that a knowledge of absolute phase is not needed at the receiver. Instead of encoding the signal phase directly, the data determine the magnitude of the phase shift between symbols. A QPSK coder is shown in Figure 10.36.

In terrestrial transmission more power is available than, for example, from a satellite, and so a stronger signal can be delivered to the receiver. Where a better SNR exists, an increase in data rate can be had using multilevel signalling or m-ary coding instead of binary. Figure 10.37 shows that the ATSC system uses an eight-level signal (8-VSB), allowing three bits to be sent per symbol. Four of the levels exist with normal carrier phase and four exist with inverted phase so that a phase-sensitive rectifier is needed in the receiver. Clearly the data separator must have a three-bit ADC, which can resolve the eight signal levels. The gain and offset of the signal must be precisely set so that the quantizing levels register precisely with the centres of the eyes. The transmitted signal contains sync pulses that are encoded using specified code levels so that the data separator can set its gain and offset.

Multilevel signalling systems have the characteristic that the bits in the symbol have different error probability. Figure 10.38 shows that a small noise level will corrupt the low-order bit, whereas twice as much noise will be needed to corrupt the middle bit and four times as much will be needed to corrupt the high-order bit. In ATSC the solution is that the lower two bits are encoded together in an inner error-correcting scheme so that they represent only one bit with reliability similar to that of the top bit. As a result the 8-VSB system actually delivers two data bits per symbol even though eight-level signalling is used.

image

FIGURE 10.36

A QPSK coder conveys two bits for each modulation period. See text for details.

image

FIGURE 10.37

In eight-VSB the transmitter operates in eight different states enabling three bits to be sent per symbol.

image

FIGURE 10.38

In multi-level signalling the error probability is not the same for each bit.

The modulation of the carrier results in a double-sideband spectrum, but following analog TV practice most of the lower sideband is filtered off, leaving a vestigial sideband only, hence the term 8-VSB. A small DC offset is injected into the modulator signal so that the four in-phase levels are slightly higher than the four out-of-phase levels. This has the effect of creating a small pilot at the carrier frequency to help receiver locking.

Multilevel signalling can be combined with PSK to obtain multilevel quadrature amplitude modulation (QUAM). Figure 10.39 shows the example of 64-QUAM. Incoming six-bit data words are split into two three-bit words and each is used to amplitude modulate a pair of sinusoidal carriers that are generated in quadrature. The modulators are four-quadrant devices such that 23 amplitudes are available, four of which are in phase with the carrier and four in antiphase. The two AM carriers are linearly added and the result is a signal that has 26 or 64 combinations of amplitude and phase. There is a great deal of similarity between QUAM and the colour subcarrier used in analog television in which the two colour difference signals are encoded into one amplitude- and phase-modulated waveform. On reception, the waveform is sampled twice per cycle in phase with the two original carriers and the result is a pair of eight-level signals. Sixteen-QUAM is also possible, delivering only four bits per symbol but requiring a lower SNR.

The data bit patterns to be transmitted can have any combination whatsoever, and if nothing were done, the transmitted spectrum would be non-uniform. This is undesirable because peaks cause interference with other services, whereas energy troughs allow external interference in. The randomizing technique of Chapter 8 is used to overcome the problem. The process is known as energy dispersal. The signal energy is spread uniformly throughout the allowable channel bandwidth so that it has less energy at a given frequency.

image

FIGURE 10.39

In 64-QUAM, two carriers are generated with a quadrature relationship. These are independently amplitude modulated to eight discrete levels in four quadrant multipliers. Adding the signals produces a QUAM signal having 64 unique combinations of amplitude and phase. Decoding requires the waveform to be sampled in quadrature like a colour TV subcarrier.

A pseudo-random sequence generator is used to generate the randomizing sequence. Figure 10.40 shows the randomizer used in DVB. This 16-bit device has a maximum sequence length of 65,535 bits and is preset to a standard value at the beginning of each set of eight transport stream packets. The serialized data are XORed with the LSB of the Galois field, which randomizes the output, which then goes to the modulator. The spectrum of the transmission is now more uniform.

On reception, the de-randomizer must contain the identical ring counter, which must also be set to the starting condition to bit accuracy. Its output is then added to the data stream from the demodulator. The randomizing will effectively then have been added twice to the data in modulo-2 and, as a result, is cancelled out, leaving the original serial data.

image

FIGURE 10.40

The randomizer of DVB is pre-set to the initial condition once every eight transport stream packets. The maximum length of the sequence is 65535 bits, but only the first 12024 bits are used before resetting again (b).

OFDM

The way that radio signals interact with obstacles is a function of the relative magnitude of the wavelength and the size of the object. AM sound radio transmissions, with a wavelength of several hundred metres, can easily diffract around large objects. The shorter the wavelength of a transmission, the larger objects in the environment appear to it, and these objects can then become reflectors. Reflecting objects produce a delayed signal at the receiver in addition to the direct signal. In analog television transmissions this causes the familiar ghosting. In digital transmissions, the symbol rate may be so high that the reflected signal may be one or more symbols behind the direct signal, causing intersymbol interference. As the reflection may be continuous, the result may be that almost every symbol is corrupted. No error-correction system can handle this.

Raising the transmitter power is no help at all as it simply raises the power of the reflection in proportion. The only solution is to change the characteristics of the RF channel in some way to either prevent the multi-path reception or prevent it being a problem. The RF channel includes the modulator, transmitter, antennae, receiver, and demodulator.

As with analog UHF TV transmissions, a directional antenna is useful with digital transmission as it can reject reflections. However, directional antennae tend to be large and they require a skilled permanent installation. Mobile use on a vehicle or vessel is simply impractical. Another possibility is to incorporate a ghost canceller into the receiver. The transmitter periodically sends a standardised known waveform called a training sequence. The receiver knows what this waveform looks like and compares it with the received signal. In theory it is possible for the receiver to compute the delay and relative level of a reflection and so insert an opposing one. In practice if the reflection is strong it may prevent the receiver from finding the training sequence.

The most elegant approach is to use a system in which multi-path reception conditions cause only a small increase in error rate, which the error-correction system can manage. This approach is used in DVB. Figure 10.41a shows that when one carrier with a high bit rate is used, reflections can easily be delayed by one or more bit periods, causing interference between the bits. Figure 10.41b shows that, instead, OFDM sends many carriers, each having a low bit rate. When a low bit rate is used, the energy in the reflection will arrive during the same bit period as the direct signal. Not only is the system immune to multi-path reflections, but also the energy in the reflections can actually be used. This characteristic can be enhanced by using guard intervals, shown in Figure 10.41c. These reduce multi-path bit overlap even more.

Note that OFDM is not a modulation scheme, and each of the carriers used in an OFDM system still needs to be modulated using any of the digital coding schemes described above. What OFDM does is provide an efficient way of packing many carriers close together without mutual interference. A serial data waveform basically contains a train of rectangular pulses. The transform of a rectangle is the function sin x/x and so the baseband pulse train has a sin x/x spectrum. When this waveform is used to modulate a carrier the result is a symmetrical sin x/x spectrum centred on the carrier frequency.

image

FIGURE 10.41

(a) High-bit rate transmissions are prone to corruption due to reflections. (b) If the bit rate is reduced the effect of reflections is eliminated; in fact, reflected energy can be used. (c) Guard intervals may be inserted between symbols.

image

FIGURE 10.42

In OFDM the carrier spacing is critical, but when it is correct the carriers become independent and the most efficient use is made of the spectrum. (a) Spectrum of bitstream has regular nulls. (b) Peak of one carrier occurs at null of another.

Figure 10.42a shows that nulls in the spectrum appear spaced at multiples of the bit rate away from the carrier. Further carriers can be placed at spacings such that each is centred at the null of the another as is shown in (b). The distance between the carriers is equal to 90º or one quadrant of sin x. Owing to the quadrant spacing, these carriers are mutually orthogonal, hence the term orthogonal frequency division. A large number of such carriers (in practice several thousand) will be interleaved to produce an overall spectrum that is almost rectangular and that fills the available transmission channel.

When guard intervals are used, the carrier returns to an unmodulated state between bits for a period greater than the period of the reflections. Then the reflections from one transmitted bit decay during the guard interval before the next bit is transmitted. The use of guard intervals reduces the bit rate of the carrier because for some of the time it is radiating carrier not data. A typical reduction is to around 80 percent of the capacity without guard intervals.

This capacity reduction does, however, improve the error statistics dramatically, such that much less redundancy is required in the error correction system. Thus the effective transmission rate is improved. The use of guard intervals also moves more energy from the sidebands back to the carrier. The frequency spectrum of a set of carriers is no longer perfectly flat but contains a small peak at the centre of each carrier.

The ability to work in the presence of multi-path cancellation is one of the great strengths of OFDM. In DVB, more than 2000 carriers are used in single transmitter systems. Provided there is exact synchronism, several transmitters can radiate exactly the same signal so that a SFN (single-frequency network) can be created throughout a whole country. SFNs require a variation on OFDM that uses over 8000 carriers.

With OFDM, directional antennae are not needed and, given sufficient field strength, mobile reception is perfectly feasible. Of course, directional antennae may still be used to boost the received signal outside normal service areas or to enable the use of low-powered transmitters.

An OFDM receiver must perform fast Fourier transforms (FFTs) on the whole band at the symbol rate of one of the carriers. The amplitude and/or phase of the carrier at a given frequency effectively reflects the state of the transmitted symbol at that time slot and so the FFT partially demodulates as well. To assist with tuning in, the OFDM spectrum contains pilot signals. These are individual carriers that are transmitted with slightly more power than the remainder. The pilot carriers are spaced apart through the whole channel at agreed frequencies, which form part of the transmission standard.

Practical reception conditions, including multi-path reception, will cause a significant variation in the received spectrum and some equalization will be needed. Figure 10.43 shows what the possible spectrum looks like in the presence of a powerful reflection. The signal has almost been cancelled at certain frequencies. However, the FFT performed in the receiver is effectively a spectral analysis of the signal and so the receiver computes for free the received spectrum.

As in a flat spectrum the peak magnitude of all the coefficients would be the same (apart from the pilots), equalization is easily performed by multiplying the coefficients by suitable constants until this characteristic is obtained.

image

FIGURE 10.43

Multi-path reception can place notches in the channel spectrum. This will require equalization at the receiver.

Although the use of transform-based receivers appears complex, when it is considered that such an approach simultaneously allows effective equalization, the complexity is not significantly higher than that of a conventional receiver, which needs a separate spectral analysis system just for equalization purposes.

The only drawback of OFDM is that the transmitter must be highly linear to prevent intermodulation between the carriers. This is readily achieved in terrestrial transmitters by derating the transmitter so that it runs at a lower power than it would in analog service. This is not practicable in satellite transmitters, which are optimized for efficiency, so OFDM is not really suitable for satellite use.

ERROR CORRECTION IN DIGITAL TELEVISION BROADCASTING

As in recording, broadcast data suffer from both random and burst errors and the error-correction strategies of digital television broadcasting have to reflect that. Figure 10.44 shows a typical system in which inner and outer codes are employed. The Reed–Solomon codes are universally used for burst-correcting outer codes, along with an interleave, which will be convolutional rather than the block-based interleave used in recording media. The inner codes will not be R-S, as more suitable codes exist for the statistical conditions prevalent in broadcasting. DVB uses a parity-based variable-rate system in which the amount of redundancy can be adjusted according to reception conditions. ATSC uses a fixed-rate parity-based system along with trellis coding to overcome cochannel interference from analog NTSC transmitters.

DVB

The DVB system is subdivided into systems optimized for satellite, cable, and terrestrial delivery. This section concentrates on the terrestrial delivery system. Figure 10.45 shows a block diagram of a terrestrial (DVB-T) transmitter. Incoming transport stream packets of 188 bytes each are first subject to R-S outer coding. This adds 16 bytes of redundancy to each packet, resulting in 204 bytes. Outer coding is followed by interleaving. The interleave mechanism is shown in Figure 10.46. Outer code blocks are commutated on a byte basis into 12 parallel channels. Each channel contains a different amount of delay, typically achieved by a ring-buffer RAM. The delays are integer multiples of 17 bytes, designed to skew the data by one outer block (12 × 17 = 204). Following the delays, a commutator re-assembles interleaved outer blocks.

image

FIGURE 10.44

Error-correcting strategy of digital television broadcasting systems.

image

FIGURE 10.45

DVB-T transmitter block diagram. See text for details.

image

FIGURE 10.46

The interleaver of DVB uses 12 incrementing delay channels to re-order the data. The sync byte passes through the undelayed channel and so is still at the head of the packet after interleave. However, the packet now contains non-adjacent bytes from 12 different packets.

image

FIGURE 10.47

(a) The mother inner coder of DVB produces 100 percent redundancy, but this can be punctured by subsampling the X and Y data to give five different code rates, as (b) shows.

These have 204 bytes as before, but the effect of the interleave is that adjacent bytes in the input are 17 bytes apart in the output. Each output block contains data from 12 input blocks, making the data resistant to burst errors.

Following the interleave, the energy-dispersal process takes place. The pseudorandom sequence runs over eight outer blocks and is synchronised by inverting the transport stream packet sync symbol in every eighth block. The packet sync symbols are not randomized. The inner coding process of DVB is shown in Figure 10.47. Input data are serialized and pass down a shift register. Exclusive-OR gates produce convolutional parity symbols X and Y, such that the output bit rate is twice the input bit rate. Under the worst reception conditions, this 100 percent redundancy offers the most powerful correction, with the penalty that a low data rate is delivered. However, Figure 10.47 also shows that a variety of inner redundancy factors can be used from 1/2 down to 1/8 of the transmitted bit rate. The X, Y data from the inner coder are subsampled, such that the coding is punctured.

The DVB standard allows the use of QPSK, 16-QUAM, or 64-QUAM coding in an OFDM system. There are five possible inner code rates, and four different guard intervals, which can be used with each modulation scheme. Thus for each modulation scheme there are 20 possible transport stream bit rates in the standard DVB channel, each of which requires a different receiver SNR. The broadcaster can select any suitable balance between transport stream bit rate and coverage area. For a given transmitter location and power, reception over a larger area may require a channel code with a smaller number of bits per second per hertz, and this reduces the bit rate that can be delivered in a standard channel. Alternatively, a higher amount of inner redundancy means that the proportion of the transmitted bit rate that is data goes down. Thus for wider coverage the broadcaster will have to send fewer programs in the multiplex or use higher compression factors.

THE DVB RECEIVER

Figure 10.48 shows a block diagram of a DVB receiver. The off-air RF signal is fed to a mixer driven by the local oscillator. The IF output of the mixer is band-pass filtered and supplied to the ADC, which outputs a digital IF signal for the FFT stage. The FFT is initially analysed to find the higher-level pilot signals. If these are not in the correct channels the local oscillator frequency is incorrect and it will be changed until the pilots emerge from the FFT in the right channels.

image

FIGURE 10.48

DVB receiver block diagram. See text for details.

The data in the pilots will be decoded to tell the receiver how many carriers and what inner redundancy rate, guard band rate, and modulation scheme are in use in the remaining carriers. The FFT magnitude information is also a measure of the equalization required.

The FFT outputs are demodulated into 2 K or 8 K bitstreams and these are multiplexed to produce a serial signal. This is subject to inner error correction, which corrects random errors. The data are then de-interleaved to break up burst errors and then the outer R-S error correction operates.

The output of the R-S correction will then be derandomized to become an MPEG transport stream once more. The derandomizing is synchronised by the transmission of inverted sync patterns. The receiver must select a PID of 0 and wait until a PAT (Program Associate Table) is transmitted. This will describe the available programs by listing the PIDs of the PMTs (Program Map Table). By looking for these packets the receiver can determine what PIDs to select to receive any video and audio elementary streams. When an elementary stream is selected, some of the packets will have extended headers containing a PCR (program clock reference). These codes are used to synchronise the 27 MHz clock in the receiver to the one in the MPEG encoder of the desired program.

The 27 MHz clock is divided down to drive the time stamp counter so that audio and video emerge from the decoder at the correct rate and with lip sync. It should be appreciated that time stamps are relative, not absolute. The time stamp count advances by a fixed amount each picture, but the exact count is meaningless. Thus the decoder can establish the frame rate of the video only from time stamps, but not the precise timing. In practice the receiver has finite buffering memory between the demultiplexer and the MPEG decoder. If the displayed video timing is too late, the buffer will tend to overflow, whereas if the displayed video timing is too early the decoding may not be completed.

The receiver can advance or retard the time stamp counter during lockup so that it places the output timing midway between these extremes.

ATSC

The ATSC system is an alternative way of delivering a transport stream, but it is considerably cruder than DVB and supports only one transport stream bit rate of 19.28 Mbps. If any change in the service area is needed, this will require a change in transmitter power. Figure 10.49 shows a block diagram of an ATSC transmitter. Incoming transport stream packets are randomized, except for the sync pattern, for energy dispersal. Figure 10.50 shows the randomizer. The outer correction code includes the whole packet except for the sync byte. Thus there are 187 bytes of data in each code word, and 20 bytes of R-S redundancy are added to make a 207-byte code word. After outer coding, a convolutional interleave shown in Figure 10.51 is used. This re-orders data over a time span of about 4 ms. Interleave simply exchanges content between packets, but without changing the packet structure.

image

FIGURE 10.49

Block diagram of ATSC transmitter. See text for details.

image

FIGURE 10.50

The randomizer of ATSC. The twisted ring counter is preset to the initial state shown for each data field. It is then clocked once per byte and the eight outputs D0–D7 are X-ORed with the data byte.

image

FIGURE 10.51

The ATSC convolutional interleaver spreads adjacent bytes over a period of about 4 ms.

image

FIGURE 10.52

The ATSC data frame is transmitted one segment at a time. Segment sync denotes the beginning of each segment and the segments are counted from the field sync signals.

Figure 10.52 shows that the result of outer coding and interleave is a data frame that is divided into two fields of 313 segments each. The frame is transmitted by scanning it horizontally a segment at a time. There is some similarity with a traditional analog video signal here, because there is a sync pulse at the beginning of each segment and a field sync that occupies two segments of the frame. Data segment sync repeats every 77.3 ms, a segment rate of 12,933 Hz, whereas a frame has a period of 48.4 ms. The field sync segments contain a training sequence to drive the adaptive equalizer in the receiver.

The data content of the frame is subject to trellis coding, which converts each pair of data bits into three channel bits inside an inner interleave. The trellis coder is shown in Figure 10.53 and the interleave in Figure 10.54. Figure 10.53 also shows how the three channel bits map to the eight signal levels in the 8-VSB modulator. Figure 10.55 shows the data segment after eight-level coding. The sync pattern of the transport stream packet, which was not included in the error-correction code, has been replaced by a segment sync waveform.

This acts as a timing reference to allow de-serialized of the segment, but as the two levels of the sync pulse are standardised, it also acts as an amplitude reference for the eight-level slicer in the receiver. The eight-level signal is subject to a DC offset so that some transmitter energy appears at the carrier frequency to act as a pilot. Each eight-level symbol carries 2 data bits and so there are 832 symbols in each segment. As the segment rate is 12,933 Hz, the symbol rate is 10.76 MHz and so this will require 5.38 MHz of bandwidth in a single sideband. Figure 10.56 shows the transmitter spectrum. The lower sideband is vestigial and an overall channel width of 6 MHz results.

image

FIGURE 10.53

(a) The precoder and trellis coder of ATSC converts 2 data bits, X1 and X2, to 3 output bits, Z0, Z1, and Z2. (b) The Z0, Z1, and Z2 output bits map to the eight-level code as shown.

image

FIGURE 10.54

The inner interleave (a) of ATSC makes the trellis coding operate as 12 parallel channels working on every 12th byte to improve error resistance. The interleave is byte-wise, and, as (b) shows, each byte is divided into four di-bits for coding into the tri-bits Z0, Z1, and Z2.

image

FIGURE 10.55

ATSC data segment. Note the sync pattern, which acts as a timing and amplitude reference. The eight levels are shifted up by 1.25 to create a DC component resulting in a pilot at the carrier frequency.

image

FIGURE 10.56

The spectrum of ATSC and its associated bit and symbol rates. Note pilot at carrier frequency created by DC offset in multi-level coder.

image

FIGURE 10.57

An ATSC receiver. Double conversion can be used so that the second conversion stage can be arranged to lock to the transmitted pilot.

Figure 10.57 shows an ATSC receiver. The first stages of the receiver are designed to lock to the pilot in the transmitted signal. This then allows the eight-level signal to be sampled at the right times. This process will allow location of the segment sync and then the field sync signals. Once the receiver is synchronised, the symbols in each segment can be decoded. The inner or trellis coder corrects for random errors and then, following de-interleave, the R-S coder corrects burst errors. After de-randomizing, standard transport stream sync patterns are added to the output data.

In practice ATSC transmissions will experience co-channel interference from NTSC transmitters and the ATSC scheme allows the use of an NTSC rejection filter. Figure 10.58 shows that most of the energy of NTSC is at the carrier, sub-carrier, and sound carrier frequencies. A comb filter with a suitable delay can produce nulls or notches at these frequencies. However, the delay-and-add process in the comb filter also causes another effect. When two eight-level signals are added together, the result is a 16-level signal. This will be corrupted by noise of half the level that would corrupt an eight-level signal. However, the 16-level signal contains redundancy because it corresponds to the combinations of four bits, whereas only two bits are being transmitted. This allows a form of error correction to be used.

The ATSC inner precoder results in a known relationship existing between symbols independent of the data. The time delays in the inner interleave are designed to be compatible with the delay in the NTSC rejection comb filter. This limits the number of paths the received waveform can take through a time/voltage graph called a trellis. Where a signal is in error it takes a path sufficiently near to the correct one that the correct one can be implied. ATSC uses a training sequence sent once every data field, but is otherwise helpless against multipath reception, as tests have shown. In urban areas, ATSC must have a correctly oriented directional antenna to reject reflections. Unfortunately the American viewer has been brought up to believe that television reception is possible with a pair of “rabbit ears” on top of the TV set and ATSC will not work like this. Mobile reception is not practicable. As a result the majority of the world's broadcasters appear to be favouring an OFDM-based system.

image

FIGURE 10.58

(a) Spectrum of typical analog transmitter showing maximum power at carrier, subcarrier, and audio carrier. (b) A comb filter with a suitable delay can notch out NTSC interference. The precoding of ATSC is designed to work with the necessary receiver delay.

NETWORKS

A network is basically a communication resource that is shared for economic reasons. Like any shared resource, decisions have to be made somewhere and somehow about how the resource is to be used. In the absence of such decisions the resultant chaos would be such that the resource might as well not exist. In communications networks the resource is the ability to convey data from any node or port to any other. On a particular cable, clearly only one transaction of this kind can take place at any one instant, even though in practice many nodes will simultaneously want to transmit data. Arbitration is needed to determine which node is allowed to transmit.

There are a number of different arbitration protocols and these have evolved to support the needs of different types of networks. In small networks, such as LANs, a single point failure that halts the entire network may be acceptable, whereas in a public transport network owned by a telecommunications company, the network will be redundant so that if a particular link fails data may be sent via an alternative route. A link that has reached its maximum capacity may also be supplanted by transmission over alternative routes. In physically small networks, arbitration may be carried out in a single location. This is fast and efficient, but if the arbitrator fails it leaves the system completely crippled. The processor buses in computers work in this way. In centrally arbit rated systems the arbitrator needs to know the structure of the system and the status of all the nodes. Following a configuration change, due perhaps to the installation of new equipment, the arbitrator needs to be told what the new configuration is or to have a mechanism that allows it to explore the network and learn the configuration. Central arbitration is suitable only for small networks that change their configuration infrequently.

In other networks the arbitration is distributed so that some decision-making ability exists in every node. This is less efficient but is does allow at least some of the network to continue operating after a component failure. Distributed arbitration also means that each node is self-sufficient and so no changes need to be made if the network is reconfigured by adding or deleting a node. This is the only possible approach in wide area networks in which the structure may be very complex and change dynamically in the event of failures or overload.

Ethernet uses distributed arbitration. FireWire is capable of using both types of arbitration. A small amount of decision-making ability is built into every node so that distributed arbitration is possible. However, if one of the nodes happens to be a computer, it can run a centralized arbitration algorithm.

The physical structure of a network is subject to some variation, as Figure 10.59 shows. In radial networks (Figure 10.59a), each port has a unique cable connection to a device called a hub. The hub must have one connection for every port and this limits the number of ports. However, a cable failure will result in the loss of only one port. In a ring system (b) the nodes are connected like a daisy chain, with each node acting as a feedthrough. In this case the arbitration requirement must be distributed. With some protocols, a single cable break does not stop the network operating.

Depending on the protocol, simultaneous transactions may be possible provided they do not require the same cable. For example, in a storage network a disk drive may be outputting data to an editor whilst another drive is backing up data to a tape streamer. For the lowest cost, all nodes are physically connected in parallel to the same cable. Figure 10.59c shows that a cable break would divide the network into two halves, but it is possible that the impedance mismatch at the break could stop both halves working.

image

FIGURE 10.59

Network configurations. (a) The radial system uses one cable to each node. (b) The ring system uses less cable than radial. (c) The linear system is simple but has no redundancy.

NETWORK ARBITRATION

One of the concepts involved in arbitration is priority, which is fundamental to providing an appropriate quality of service. If two processes both want to use a network, the one with the highest priority would normally go first. Attributing priority must be done carefully because some of the results are non-intuitive. For example, it may be beneficial to give a high priority to a humble device that has a low data rate for the simple reason that if it is given use of the network it will not need it for long. In a television environment transactions concerned with on-air processes would have priority over file transfers concerning production and editing.

When a device gains access to the network to perform a transaction, generally no other transaction can take place until it has finished. Consequently it is important to limit the amount of time that a given port can stay on the bus.

image

FIGURE 10.60

Receiving a file that has been divided into packets allows for the re-transmission of just the packet in error.

In this way when the time limit expires, a further arbitration must take place. The result is that the network resource rotates between transactions rather than one transfer hogging the resource and shutting out everyone else.

It follows from the presence of a time (or data quantity) limit that ports must have the means to break large files up into frames or cells and reassemble them on reception. This process is sometimes called adaptation. If the data to be sent originally exist at a fixed bit rate, some buffering will be needed so that the data can be time-compressed into the available frames. Each frame must be contiguously numbered and the system must transmit a file size or word count so that the receiving node knows when it has received every frame in the file.

The error-detection system interacts with this process because if any frame is in error on reception, the receiving node can ask for a re-transmission of the frame. This is more efficient than re-transmitting the whole file. Figure 10.60 shows the flowchart for a receiving node.

Breaking files into frames helps to keep down the delay experienced by each process using the network. Figure 10.61 shows that each frame may be stored ready for transmission in a silo memory. It is possible to make the priority a function of the number of frames in the silo, as this is a direct measure of how long a process has been kept waiting. Isochronous systems must do this to meet maximum delay specifications. Once frame transmission has completed, the arbitrator will determine which process sends a frame next by examining the depth of all the frame buffers. MPEG transport stream multiplexers and networks delivering MPEG data must work in this way because the transfer is isochronous and the amount of buffering in a decoder is limited for economic reasons.

image

FIGURE 10.61

Files are broken into frames or packets for multiplexing with packets from other users. Short packets minimize the time between the arrival of successive packets. The priority of the multiplexing must favour isochronous data over asynchronous data.

A central arbitrator is relatively simple to implement because when all decisions are taken centrally there can be no timing difficulty (assuming a well-engineered system). In a distributed system, there is an extra difficulty due to the finite time taken for signals to travel down the data paths between nodes.

Figure 10.62 shows the structure of Ethernet, which uses a protocol called CSMA/CD (carrier sense multiple access with collision detect) developed by DEC and Xerox. This is a distributed arbitration network in which each node follows some simple rules. The first of these is not to transmit if an existing bus signal is detected. The second is not to transmit more than a certain quantity of data before releasing the bus. Devices wanting to use the bus will see bus signals and so will wait until the present bus transaction finishes. This must happen at some point because of the frame size limit. When the frame is completed, signalling should cease. The first device to sense the bus becoming free and to assert its own signal will prevent any other nodes transmitting according to the first rule. Where numerous devices are present it is possible to give them a priority structure by providing a delay between sensing the bus coming free and beginning a transaction. High-priority devices will have a short delay so they get in first. Lower-priority devices will be able to start a transaction only if the high-priority devices do not need to transfer.

It might be thought that these rules would be enough and everything would be fine. Unfortunately the finite signal speed means that there is a flaw in the system. Figure 10.62 shows why. Device A is transmitting and devices B and C both want to transmit and have equal priority. At the end of A's transaction, devices B and C see the bus become free at the same instant and start a transaction. With two devices driving the bus, the resultant waveform is meaningless. This is known as a collision and all nodes must have means to recover from it. First, each node will read the bus signal at all times. When a node drives the bus, it will also read back the bus signal and compare it with what was sent. Clearly if the two are the same all is well, but if there is a difference, this must be because a collision has occurred and two devices are trying to influence the bus voltage at once.

If a collision is detected, both colliding devices will sense the disparity between the transmitted and readback signals, and both will release the bus to terminate the collision. However, there is no point is adhering to the simple protocol to reconnect because this will simply result in another collision. Instead each device has a built-in delay, which must expire before another attempt is made to transmit. This delay is not fixed, but is controlled by a random-number generator and so changes from transaction to transaction.

image

FIGURE 10.62

In Ethernet collisions can occur because of the finite speed of the signals. A “back-off” algorithm handles collisions, but they do reduce the network throughput.

The probability of two node devices arriving at the same delay is infinitesimally small.

Consequently, if a collision does occur, both devices will drop the bus, and they will start their back-off timers. When the first timer expires, that device will transmit and the other will see the transmission and remain silent. In this way the collision is not only handled, but also prevented from happening again. The performance of Ethernet is usually specified in terms of the bit rate at which the cabling runs. However, this rate is academic because it is not available all the time. In a real network bit rate is lost by the need to send headers and error-correction codes and by the loss of time due to interframe spaces and collision handling. As the demand goes up, the number of collisions increases and throughput goes down. Collision-based arbitrators do not handle congestion well.

An alternative method of arbitration developed by IBM is shown in Figure 10.63. This is known as a token ring system. All the nodes have an input and an output and are connected in a ring that must be complete for the system to work. Data circulate in one direction only. If data are not addressed to a node that receives them, the data will be passed on. When the data arrive at the addressed node, that node will capture the data as well as passing them on with an acknowledge symbol added. Thus the data packet travels right around the ring back to the sending node. When the sending node receives the acknowledge, it will transmit a token packet. This token packet passes to the next node, which will pass it on if it does not wish to transmit.

image

FIGURE 10.63

In a token ring system only the node in possession of the token can transmit, so collisions are impossible. In very large rings the token circulation time causes loss of throughput.

If no device wishes to transmit, the token will circulate endlessly. However, if a device has data to send, it simply waits until the token arrives again and captures it. This node can now transmit data in the knowledge that there cannot be a collision because no other node has the token. In simple token ring systems, the transmitting node transmits idle characters after the data packet has been sent in order to maintain synchronisation. The idle character transmission will continue until the acknowledge arrives. In the case of long packets the acknowledge will arrive before the packet has all been sent and no idle characters are necessary. However, with short packets idle characters will be generated. These idle characters use up ring bandwidth.

Later token ring systems use ETR (early token release). After the packet has been transmitted, the sending node sends a token straight away. Another node wishing to transmit can do so as soon as the current packet has passed.

It might be thought that the nodes on the ring would transmit in their physical order, but this is not the case because a priority system exists. Each node can have a different priority if necessary. If a high-priority node wishes to transmit, when a packet from elsewhere passes through that node, the node will set reservation bits with its own priority level. When the sending node finishes and transmits a token, it will copy that priority level into the token. In this way nodes with a lower priority level will pass the token on instead of capturing it. The token will ultimately arrive at the high-priority node.

The token ring system has the advantage that it does not waste throughput with collisions and so the full capacity is always available. However, if the ring is broken the entire network fails.

In Ethernet the performance is degraded by the number of transactions, not the number of nodes, whereas in token ring the performance is degraded by the number of nodes.

FIREWIRE

FireWire10 is actually an Apple Computers, Inc., trade name for the interface that is formally known as IEEE 1394-1995. It was originally intended as a digital audio network, but grew out of recognition. FireWire is more than just an interface as it can be used to form networks and if used with a computer effectively extends the computer's data bus. Figure 10.64 shows that devices are simply connected together as any combination of daisy-chain or star network.

image

FIGURE 10.64

FireWire supports radial (star) or daisy-chain connection. Two-port devices pass on signals destined for a more distant device—they can do this even when powered down.

Any pair of devices can communicate in either direction, and arbitration ensures that only one device transmits at a time. Intermediate devices simply pass on transmissions. This can continue even if the intermediate device is powered down, as the FireWire carries power to keep repeater functions active. Communications are divided into cycles, which have a period of 125μs. During a cycle, there are 64 time slots. During each time slot, any one node can communicate with any other, but in the next slot, a different pair of nodes may communicate. Thus FireWire is best described as a TDM (time-division multiplexed) system. There will be a new arbitration between the nodes for each cycle.

FireWire is eminently suitable for video/computer convergent applications because it can simultaneously support asynchronous transfers of non-real-time computer data and isochronous transfers of real-time audio/video data. It can do this because the arbitration process allocates a fixed proportion of slots for isochronous data (about 80 percent) and these have a higher priority in the arbitration than the asynchronous data. The higher the data rate a given node needs, the more time slots it will be allocated. Thus a given bit rate can be guaranteed throughout a transaction; a prerequisite of real-time A/V data transfer.

It is the sophistication of the arbitration system that makes FireWire remarkable. Some of the arbitration is in hardware at each node, but some is in software that only needs to be at one node. The full functionality requires a computer, somewhere in the system, which runs the isochronous bus management arbitration. Without this only asynchronous transfers are possible. It is possible to add or remove devices whilst the system is working. When a device is added the system will recognize it through a periodic learning process. Essentially every node on the system transmits in turn so that the structure becomes clear.

The electrical interface of FireWire is shown in Figure 10.65. It consists of two twisted pairs for signalling and a pair of power conductors. The twisted pairs carry differential signals of about 220 mV swinging around a common mode voltage of about 1.9 V with an impedance of 112 Ohms.

Figure 10.66 shows how the data are transmitted. The host data are simply serialized and used to modulate twisted pair A. The other twisted pair (B) carries a signal called a strobe, which is the exclusive-OR of the data and the clock. Thus whenever a run of identical bits results in no transitions in the data, the strobe signal will carry transitions. At the receiver another exclusive-OR gate adds data and strobe to re-create the clock.

image

FIGURE 10.65

FireWire uses twin twisted pairs and a power pair.

image

FIGURE 10.66

The strobe signal is the X-OR of the data and the bit clock. The data and strobe signals together form a self-clocking system.

This signalling technique is subject to skew between the two twisted pairs and this limits cable lengths to about 10 m between nodes. Thus FireWire is not a long-distance interface technique; instead it is very useful for interconnecting a large number of devices in close proximity. Using a copper interconnect, FireWire can run at 100, 200, or 400 Mbps, depending on the specific hardware.

BROADBAND NETWORKS AND ATM

Broadband ISDN (B-ISDN) is the successor to N-ISDN, and in addition to offering more bandwidth, it gives practical solutions to the delivery of any conceivable type of data. The flexibility with which ATM operates means that intermittent, or one-off, data transactions that require only asynchronous delivery can take place alongside isochronous MPEG video delivery. This is known as application independence, whereby the sophistication of isochronous delivery does not raise the cost of asynchronous data. In this way, generic data, video, speech, and combinations thereof can co-exist.

ATM is multiplexed, but it is not time-division multiplexed. TDM is inefficient because if a transaction does not fill its allotted bandwidth, the capacity is wasted. ATM does not offer fixed blocks of bandwidth, but allows infinitely variable bandwidth to each transaction. This is done by converting all host data into small fixed-size cells at the adaptation layer. The greater the bandwidth needed by a transaction, the more cells per second are allocated to that transaction. This approach is superior to the fixed-bandwidth approach, because if the bit rate of a particular transaction falls, the cells released can be used for other transactions so that the full bandwidth is always available.

As all cells are identical in size, a multiplexer can assemble cells from many transactions in an arbitrary order. The exact order is determined by the quality of service required, where the time positioning of isochronous data would be determined first, with asynchronous data filling the gaps.

Figure 10.67 shows how a broadband system might be implemented. The transport network would typically be optical-fibre based, using SONET (synchronous optical network) or SDH (synchronous digital hierarchy). These standards differ in minor respects. Figure 10.68 shows the bit rates available in each. Lower bit rates will be used in the access networks, which will use different technology such as xDSL. SONET and SDH assemble ATM cells into a structure, known as a container, in the interests of efficiency. Containers are passed intact between exchanges in the transport network. The cells in a container need not belong to the same transaction, they simply need to be going the same way for at least one transport network leg.

image

FIGURE 10.67

Structure and terminology of a broadband network. See text.

image

FIGURE 10.68

Bit rates available in SONET and SDH.

The cell-routing mechanism of ATM is unusual and deserves explanation. In conventional networks, a packet must carry the complete destination address so that at every exchange it can be routed closer to its destination. The exact route by which the packet travels cannot be anticipated and successive packets in the same transaction may take different routes. This is known as a connectionless protocol. In contrast, ATM is a connection-oriented protocol. Before data can be transferred, the network must set up an end-to-end route. Once this is done, the ATM cells do not need to carry a complete destination address. Instead they need only to carry enough addressing so that an exchange or switch can distinguish between all the expected transactions. This is effectively an application of compression applied to address space.

image

FIGURE 10.69

(a) The ATM cell carries routing information in the header. (b) ATM paths carrying a group of channels can be switched in a virtual path switch. (c) Individual channel switching requires a virtual channel switch, which is more complex and causes more delay.

The end-to-end route is known as a virtual channel, which consists of a series of virtual links between switches. The term “virtual channel” is used because the system acts like a dedicated channel even though physically it is not. When the transaction is completed the route can be dismantled so that the bandwidth is freed for other users. In some cases, such as delivery of a TV station's output to a transmitter, or as a replacement for analog cable TV, the route can be set up continuously to form what is known as a permanent virtual channel.

The addressing in the cells ensures that all cells with the same address take the same path, but owing to the multiplexed nature of ATM, at other times and with other cells a completely different routing scheme may exist. Thus the routing structure for a particular transaction always passes cells by the same route, but the next cell may belong to another transaction and will have a different address causing it to be routed in another way.

The addressing structure is hierarchical. Figure 10.69a shows the ATM cell and its header. The cell address is divided into two fields, the virtual channel identifier and the virtual path identifier. Virtual paths are logical groups of virtual channels that happen to be going the same way. An example would be the output of a video-on-demand server travelling to the first switch. The virtual path concept is useful because all cells in the same virtual path can share the same container in a transport network. A virtual path switch, shown in Figure 10.69b, can operate at the container level, whereas a virtual channel switch (c) would need to dismantle and reassemble containers.

When a route is set up, at each switch a table is created. When a cell is received at a switch the VPI (virtual path indicator) and/or VCI (virtual channel indicator) codes are looked up in the table and used for two purposes. First, the configuration of the switch is obtained, so that this switch will correctly route the cell; second, the VPI and/or VCI codes may be updated so that they correctly control the next switch. This process repeats until the cell arrives at its destination. To set up a path, the initiating device will initially send cells containing an ATM destination address, the bandwidth, and the quality of service required. The first switch will reply with a message containing the VPI/VCI codes, which are to be used for this channel. The message from the initiator will propagate to the destination, creating lookup tables in each switch. At each switch the logic will add the requested bandwidth to the existing bandwidth in use to check that the requested quality of service can be met. If this succeeds for the whole channel, the destination will reply with a connect message, which propagates back to the initiating device as confirmation that the channel has been set up.

The connect message contains a unique call reference value, which identifies this transaction. This is necessary because an initiator such as a file server may be initiating many channels and the connect messages will not necessarily return in the same order as the set-up messages were sent. The last switch will confirm receipt of the connect message to the destination and the initiating device will confirm receipt of the connect message to the first switch.

ATM AALs

ATM works by dividing all real data messages into cells of 48 bytes each. At the receiving end, the original message must be re-created. This can take many forms. Figure 10.70 shows some possibilities. The message may be a generic data file having no implied timing structure or a serial bitstream with a fixed clock frequency, known as UTD (unstructured data transfer). It may be a burst of data bytes from a TDM system.

Generic data file having no timebase

Constant bit rate serial data stream

Audio/video data requiring a timebase

Compressed A/V data with fixed bit rate

Compressed A/V data with variable bit rate

FIGURE 10.70

Types of data that may need adapting to ATM.

ATM Application Layer

Convergence sublayer

Recovers timing of original data

Segmentation and reassembly

Divides data into cells for transport Reassembles original data format

FIGURE 10.71

ATM adaptation layer has two sublayers, segmentation and convergence.

The application layer in ATM has two sublayers, shown in Figure 10.71. The first is the SAR (segmentation and reassembly) sublayer, which must divide the message into cells and rebuild it to get the binary data right. The second is the CS (convergence sublayer), which recovers the timing structure of the original message. It is this feature that makes ATM so appropriate for delivery of audio/visual material. Conventional networks such as the Internet do not have this ability.

To deliver a particular quality of service, the adaptation layer and the ATM layer work together. Effectively the adaptation layer will place constraints on the ATM layer, such as cell delay, and the ATM layer will meet those constraints without needing to know why. Provided the constraints are met, the adaptation layer can rebuild the message. The variety of message types and timing constraints leads to the adaptation layer having a variety of forms.

The adaptation layers that are most relevant to MPEG applications are AAL-1 and AAL-5. AAL-1 is suitable for transmitting MPEG-2 multiprogram transport streams at constant bit rate and is standardised for this purpose in ETS 300814 for DVB application. AAL-1 has an integral FEC (forward error correction) scheme. AAL-5 is optimized for SPTS (single-program transport streams) at a variable bit rate and has no FEC.

AAL-1 takes as an input the 188-byte transport stream packets that are created by a standard MPEG-2 multiplexer. The transport stream bit rate must be constant but it does not matter if statistical multiplexing has been used within the transport stream.

AES 47

AES 47 is a standard designed to facilitate transmission of digital audio over ATM. It supports multiple channels of uncompressed AES/EBU digital audio and transparently carries the entire AES/EBU bitstream. Using the networking techniques explained in this chapter to the full, it exploits the bandwidth reservation technology of the ATM Quality of Service mechanism to ensure synchronous audio sample delivery with low latency.

Multiple AES/EBU channels can be carried, but they do not need to have the same sampling rate. AES 53 describes how time stamps (see MPEG Packets and Time Stamps) can be used to ensure isochronous reception.

The isochronous capability of AES 47 means that the master audio sampling clock used at the data source is re-created at the destination. For real-time use, the destination master sampling clock must be synchronous with the source clock. This may be achieved using a common reference available to both sites, such as GPS. Alternatively a sampling rate convertor may be employed so the source and destination remain unlocked. This will, however, destroy the transparency of the link. Accordingly, some equipment will drop or repeat samples during silent or very quiet passages to avoid rate conversion.

The Reed–Solomon FEC of AAL-1 uses a code word of size 128 so that the code words consist of 124 bytes of data and four bytes of redundancy, making 128 bytes in all. Thirty-one 188-byte packets are restructured into this format. The 256-byte code words are then subject to a block interleave. Figure 10.72 shows that 47 such code words are assembled in rows in RAM and then columns are read out. These columns are 47 bytes long and, with the addition of an AAL header byte, make up a 48-byte ATM packet payload. In this way the interleave block is transmitted in 128 ATM cells. The result of the FEC and interleave is that the loss of up to four cells in 128 can be corrected, or a random error of up to two bytes can be corrected in each cell. This FEC system allows most errors in the ATM layer to be corrected so that no retransmissions are needed. This is important for isochronous operation.

The AAL header has a number of functions. One of these is to identify the first ATM cell in the interleave block of 128 cells. Another function is to run a modulo-8 cell counter to detect missing or out-of-sequence ATM cells. If a cell simply fails to arrive, the sequence jump can be detected and used to flag the FEC system so that it can correct the missing cell by erasure (see Chapter 8). In a manner similar to the use of program clock reference in MPEG, AAL-1 embeds a timing code in ATM cell headers. This is called the synchronous residual time stamp (SRTS) and in conjunction with the ATM network clock allows the receiving AAL device to reconstruct the original data bit rate. This is important because in MPEG applications it prevents the PCR jitter specification being exceeded.

image

FIGURE 10.72

The interleave structure used in AAL-1.

image

FIGURE 10.73

The AAL-5 adaptation layer can pack MPEG transport packets in this way.

In AAL-5 there is no error correction and the adaptation layer simply reformats MPEG transport stream blocks into ATM cells. Figure 10.73 shows one way in which this can be done. Two transport stream blocks of 188 bytes are associated with an eight-byte trailer known as CPCS (common part convergence sublayer). The presence of the trailer makes a total of 384 bytes that can be carried in eight ATM cells. AAL-5 does not offer constant delay, and external buffering will be required, controlled by reading the MPEG PCRs to reconstruct the original time axis.

References

1. SMPTE 259M, 10-bit 4:2:2 Component and 4Fsc NTSC Composite Digital Signals—Serial Digital Interface.

2. SMPTE 292M, Bit-Serial Digital Interface for High Definition Television Systems.

3. EBU Doc. Tech. 3246.

4. SMPTE 125M, Television—Bit Parallel Digital Interface—Component Video Signal 4:2:2.

5. SMPTE 305M, Serial Data Transport Interface.

6. EIA RS-422A. Electronic Industries Association, 2001 Eye Street NW, Washington, DC 20006, USA.

7. Smart, D.L. Transmission performance of digital audio serial interface on audio tie lines. BBC Designs Department Technical Memorandum 3.296/84.

8. European Broadcasting Union. Specification of the digital audio interface. EBU Doc. Tech. 3250.

9. Rorden, B., and Graham, M. A proposal for integrating digital audio distribution into TV production. J. SMPTE, 606–608 (1992).

10. Wicklegren, I.J. The facts about FireWire. IEEE Spectrum, 19–25 (1997).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset