Chapter 4: Dedicated audio interfaces

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Dedicated Audio Interfaces

For the purposes of this book, digital audio interfaces will be divided into two types. This chapter is concerned with those that are dedicated pointto-point audio interfaces (e.g. AES/EBU), designed specifically to carry audio and little else. The next chapter deals with those that are protocols running over general purpose data networks or are computer interfaces that can also carry other data (e.g. USB).

Some interfaces have been internationally standardized whereas others are associated principally with one manufacturer. Some interfaces of the latter type have become widely used by other manufacturers, either because no alternative existed at the time or in order to provide compatibility between devices. It is important to distinguish de facto standards, which have arisen because of commercial predominance, from the standards formulated by independent international bodies. Proprietary interfaces may be the subject of licences or patents, although most agree that wide adoption is beneficial for all concerned. A further subdivision of interface types is also useful, and that is between interfaces carrying one or two channels of audio data and those carrying a large number of channels.

Naturally, since standards are published and widely available, these chapters should be viewed as commentaries upon or illuminations of the published documents, together with guidelines on their implementation and discussions of the problems involved when attempting to interconnect devices digitally. Although some details of standards will be given here, readers are encouraged to read the information contained herein in conjunction with the standards documents themselves (see the references at the end of each chapter), and to note any additions or modifications to the standards which may have arisen since this book was written. As this book is designed to aid understanding of standards in real situations it is not worded like a standards document and therefore does not use the official language of such documents. It is not a substitute for the standards themselves and any intending implementer would be wise not to rely solely on the text and diagrams contained herein.

4.1 Background to Dedicated Audio Interfaces

When digital audio interfaces were first introduced it was assumed that digital audio signals would need to be carried between devices on connections similar to those used for analog signals. In other words, there would need to be individual point-to-point connections between each device, carrying little other than audio and using standard connectors and cables. The AES3 interface, as described below, is a good example of such an interface. It was intended to be used in as similar a way as possible to the method used for connecting pieces of analog audio equipment together. It used XLR connectors with relatively standard cables so that existing installations could be converted to digital applications. From a practical point of view, the only practical difference was that a single connection carried two channels instead of one. Such dedicated interfaces carry one or more channels of audio data, normally sample-locked to the transmitting device’s sampling rate, and operate in a real-time ‘streaming’ fashion. They generally do not operate using an addressing structure, and so are normally used for connections between a single transmitting device and a single receiving device (hence ‘point-to-point’). This method of interconnection is still in wide use today and is likely to continue to be so for some time, but the increasing ubiquity of high-speed data networks and computer-based audio systems is likely to have an increasing effect on the way audio is carried as time goes by (see the next chapter).

4.2 Background to Internationally Standardized Interfaces

Within the audio field the Audio Engineering Society (AES) has been a lead body in determining digital interconnect standards. Although the AES is a professional society, and not a standards body as such, its recommendations first published in the AES3-1985 document¹ have formed the basis for many international standards documents concerning a two-channel digital audio interface. The Society has been instrumental in coordinating professional equipment manufacturers’ views on interface standards although it has tended to ignore consumer applications to some extent, preferring to leave those to the IEC (see below). A consumer interface was initially developed by Sony and Philips, subsequently to be standardized by the IEC and EIAJ, and as a result there are many things in common between the professional and consumer implementations. Before setting out to describe the international standard two-channel interface it is important to give a summary of the history of the standard, since it will then be realized how difficult it is to call this interface by one definitive title.

Other organizations that based standards on AES3 recommendations were the American National Standards Institute (ANSI), the European Broadcasting Union (EBU), the International Radio Consultative Committee (CCIR) (now the ITU-R), the International Electrotechnical Commission (IEC), the Electronic Industries Association of Japan (EIAJ) and the British Standards Institute (BSI). Each of these organizations formulated a document describing a standard for a two-channel digital audio interface, and although these documents were all very similar there were often also either subtle or (in some cases) not-so-subtle differences between them. As time has gone by some of the most glaring anomalies have been addressed. A useful overview of the evolutionary process that resulted in each of these standards may be found in Finger². The documents concerned were as follows: AES3-1985¹, ANSI S4.40-1985, EBU Tech. 3250-E³ (1985), CCIR Rec. 647 (1986)⁴, CCIR Rec. 647 (1990)⁵, IEC 958 (1989)⁶ (with subsequent annexes), EIAJ CP-340 (1987)⁷, EIAJ CP-1201 (1992)⁸, and BS 7239 (1989)⁹.

As mentioned above, the roots of the consumer format interface were in a digital interface implemented by Sony and Philips for the CD system in 1984. This interface was modelled on the data format of AES3, but used different electrical characteristics (see section 4.3.4) and is often called the SPDIF (Sony–Philips Digital Interface). Although audio data was in the same format as AES3, there were significant differences in the format of non-audio data. In 1987 the EIAJ CP-340 standard subsequently combined professional (Type I) and consumer (Type II) versions of the interface within one document and included specifications for non-audio data which aimed to ensure compatibility between different consumer devices such as DAT and CD players. This interface was by no means identical to the original SPDIF and led to some differences between early CD players and later digital devices conforming to CP-340. CP-340 has now been renumbered and is called CP-1201.

Slightly later than CP-340 the IEC produced a document that eventually appeared in 1989 as IEC 958. The consumer version (with its subsequent annexes) was an extension of the SPDIF to allow for wider applications than just the CD, and the professional version was essentially the same as AES3. It also allowed for, but did not describe in detail, an optical connection. As will be seen later, interpreting this standard to the letter seemingly allowed the manufacturer to combine either consumer or professional data formats with either ‘consumer’ or ‘professional’ electrical interfaces. It did not originally state that a particular electrical interface had to be used in conjunction with a particular data format, although the situation was made clearer in the revised IEC 60958 standard¹⁰. Some confusion therefore existed in the industry over whether consumer devices could be interconnected with professional and vice versa, and the answer to this problem is by no means straightforward, as will be discussed.

Concerning key similarities and differences between the other documents listed above, one should note that EBU Tech. 3250-E is a professional standard, the only effective difference from AES3 being the insistence on the use of transformer coupling. Tech. 3250-E was revised in 1992 to define more aspects of the channel status bits, to allow a speech quality coordination channel in the auxiliary bits, to give an improved electrical specification and to specify which aspects of the interface should be implemented in standard broadcast equipment. CCIR Rec. 647 was a professional standard that did not insist on transformers, but was otherwise similar to the EBU standard. The 1990 revision contained some further definitions of certain non-audio bits, including use of the auxiliary bits for a low quality coordination channel (see section 4.5). CCIR Rec. 467 became ITU-R BS.647 (1992) when the CCIR was reborn as the ITU-R. BS 7239 was identical to IEC 958. ANSI S4.40 was identical to AES3. (The formal relationship between ANSI and AES standards has recently been broken.)

It may reasonably be concluded from the foregoing discussion that a device which claimed conformity to AES3, ANSI S4.40, EBU 3250 or CCIR 647 would have been be a professional device, but that one which claimed conformity to IEC 958, EIAJ CP-340, 1201 or BS 7239 could have been either consumer or professional. It was conventional in the latter case to specify Type I or II, for professional or consumer. (In modern IEC nomenclature, the consumer application is defined in 60958-3 and the professional application is in 60958-4.) Because it will be necessary to refer to differences between these standards in the following text, the cumbersome but necessary term ‘standard two-channel interface’ will be used wherever generalization is appropriate.

Some revisions of the original documents have taken place, the most important of which will be found in AES3-1992 (with revisions in 1997 and amendments up to 1999)¹¹ and IEC 60958 (superseding IEC 958 and covering general specifications, software delivery mode, consumer and professional applications in four parts). Because these two standards are the most comprehensive and form the primary focal points for international standards activity relating to two-channel interfaces, most of the following discussion will deal primarily with AES3 for professional interfaces and IEC 60958 for all other purposes. (The British Standard now follows the IEC standard directly in content and nomenclature, being denoted BS EN 60958 (2000).)

Owing mainly to the efforts of four UK companies, a further standard was devised to accommodate up to 56 audio channels. Originally called MADI (Multichannel Audio Digital Interface), it is based on the AES3 data format and has now been standardized as AES10-1991¹². It also appears as an American Standard: ANSI S4.43-1991. This is only a professional interface and is covered further in section 4.11.

4.3 Standard Two-Channel Interface – Principles

Common to all the international standards for a two-channel interface is the data format of the subframe containing samples of audio data for each channel. There are two principal electrical approaches used for the standard two-channel interface: one is unbalanced and uses relatively low voltages, the other is balanced and uses higher voltages. AES3-ID-2001 also describes an unbalanced coaxial link for use over distances beyond 100 m (see section 4.3.6).

4.3.1 Data Format

The interface is serial and self-clocking. That is to say that two channels of audio data are carried in a multiplexed fashion over the same communications channel, and the data is combined with a clock signal in such a way that the clock may be extracted at the receiver and used to synchronize reception. As shown in Figure 4.1, one frame of data is divided into two subframes, handling channels 1 and 2 respectively. Channels 1 and 2 may be independent mono signals or they may be the left and right channels of a stereo pair, and they are separately identified by the preamble that takes up the first four clock periods of each subframe. Samples of channels 1 and 2 are transmitted alternately and in real time, such that two subframes are transmitted within the time period of one audio sample – thus the data rate of the interface depends on the prevailing audio sampling rate.

Figure 4.1 Format of the standard two-channel interface frame.

The subframe format consists of a sync preamble, four auxiliary bits (which may be used for additional audio resolution), 20 audio sample bits in linear two’s complement form, a validity bit (V), a user bit (U), a channel status bit (C) and a parity bit (P). The audio data is transmitted least significant bit first, and any unused LSBs are set to zero; thus the MSB of the audio sample, whatever the resolution, is always in the MSB position. The remaining non-audio bits are discussed in later sections.

The data is combined with a clock signal of twice the bit rate using a simple coding scheme known as bi-phase mark, in which a transition is caused to occur at the boundary of each bit cell (see Figure 4.2). An additional transition is also introduced in the middle of any bit cell that is set to binary state ‘1’. Such a scheme eliminates almost all DC content from the signal, making it possible to use transformer coupling if necessary and allowing for phase inversion of the data signal. (It is only the transition that matters, not the direction of the transition.) This channel code is the same as that used for SMPTE/EBU timecode.

Figure 4.2 An example of the bi-phase mark channel code.

As shown in Figure 4.3, there are three possible subframe preambles in time slots 1 to 4 which violate the rules of the modulation scheme in order to provide a clearly recognizable sync point when the data is decoded. These preambles cannot be confused with the data portion of the subframe. In AES3 these are called ‘X’, ‘Y’ and ‘Z’ preambles, but in IEC 60958 they are primarily labelled ‘M’, ‘W’ and ‘B’. As the diagram shows, X and Y preambles identify subframes of channels 1 and 2 respectively, whereas the Z preamble occurs once every 192 frames in place of the X preamble in order to mark the beginning of a new channel status block (see section 4.8). Since the parity bit which ends the previous subframe is ‘even parity’, the transition at the start of each preamble will always be in the same (positive) direction, but a phase inverted preamble must still be decoded properly.

Figure 4.3 Three different preambles (X, Y and Z) are used to synchronize a receiver at the starts of subframes.

The parity bit is set such that the number of ones in the subframe, excluding the preamble, is even, and thus it may be used to detect single bit errors but not correct them. Such a parity scheme cannot detect an even number of errors in the subframe, since parity would appear to be correct in this case. As discussed in section 6.9 there are more effective ways of detecting poor links than using the parity bit.

4.3.2 Audio Resolution

In normal operation only the 20-bit chunk of the subframe is used for audio data.

This is adequate for most professional and consumer purposes but the standard allows for the four auxiliary bits to be replaced by additional audio LSBs if necessary, taking the maximum resolution up to 24 bits. AES3-1992 provides a facility within the channel status data for signalling the actual number of audio bits used in the transmitted data, such that receiving equipment may adjust to decode them appropriately. This will be of considerable importance in ensuring optimum transfer of audio quality between devices of different resolutions during post-production, as discussed in section 6.8.

In consumer formats, the category code that describes the source device (see section 4.8.6) may also imply a fixed audio word length, because certain categories only operate at a particular resolution. The Compact Disc, for example, always uses a 16-bit word length.

4.3.3 Balanced Electrical Interface

All the standards referring to a professional or ‘broadcast use’ interface specify a balanced electrical interface conforming to CCITT Rec. V.11¹³. There are distinct similarities between this and the RS-422A standard¹⁴ but they are not identical, although RS-422 drivers and receivers are used in many cases. Figure 4.4 shows a circuit designed for better isolation and electrical balance than the basic CCITT specification, as suggested in AES3-1992. Although transformers are not a mandatory feature of all the standards, they are advisable because they provide true electrical isolation between devices and help to reduce electromagnetic interference problems. (Manufacturers often connect an RS-422 driver directly between the two legs of the source, which makes it balanced but not floating. Alternatively an RS-485 driver is used, which is a tri-state version of RS-422 giving a typical output voltage of 4 V ±5%, going to a high impedance state when turned off.) The standards specify that the connector to be used is the conventional audio three-pin XLR (IEC 268-12), using pin 1 as the shield and pins 2 and 3 as the balanced data signal. Polarity is not really important, since the channel code is designed to allow phase inversion, although the convention is that pin 2 is ‘+’ and pin 3 is ‘−’.

Figure 4.4 Recommended electrical circuit for use with the standard two-channel interface.

Although the original AES3 standard allowed for up to four receivers to be connected to one transmitter, this is now regarded as inadvisable due to the impedance mismatch which arises. Originally the standard called for the output impedance of the transmitter to be 110 ohms ±20% over the range 0.1 to 6 MHz, and for that of receivers to be 250 ohms, but this has been changed in AES3-1992 so that the receiver’s impedance should now be the same as that of the transmitter and the transmission line. Amendment 3 (1999) modifies the specification to accommodate the increasingly common use of higher frame rates than originally envisaged, as a result of the use of high sampling frequencies such as 96 kHz. The standard now specifies that impedance should be maintained within the defined limits between 100 kHz and 128 times the maximum frame rate. Only one receiver should be connected across each line and distribution amplifiers should be used for feeding large numbers of receivers from a single source. The cable’s characteristic impedance, originally specified as between 90 and 120 ohms, is now specified as 110 ohms as well. It should be a balanced, screened pair, and although standard audio cables are often used successfully it is worthwhile considering cable with better controlled characteristics for large installations and long distances, in order to improve the reliability and integrity of the link (see section 6.7.2). This is especially true when using sampling frequencies above 48kHz where the selection of cables and maximum lengths will become increasingly critical.

There is a difference in driver voltage levels between the original and later versions of the standard. AES3-1985 and all the related standards specified a peak-to-peak amplitude of between 3 and 10 volts when measured across a 110 ohm resistor without the connecting cable present. The 1992 revision changed it to be between 2 and 7 volts in order to conform more closely to the specifications of the RS-422 driver chips used in many systems. (RS-422A in fact specifies that receiver inputs should not be damaged by voltages of less than 12 volts.)

At the receiving end, the standards all indicate that correct decoding of the data should be possible provided that the eye pattern (see section 3.2.4) of the received data is no worse than shown in Figure 4.5. This suggests a minimum peak-to-peak amplitude of 200 mV and allows for the toleration of a certain amount of jitter in the time domain. Without equalization the balanced interface should be capable of error-free communication over distances of at least 100 m at 48 kHz sampling frequency, and often further. This depends to some extent on the type of cable, the electromagnetic environment, the integrity of the transmission line, the frame rate and the quality of the data recovery in the receiver. One should expect maximum cable lengths to be shorter at high frame rates, all other factors being equal. Receivers vary quite widely in respect of their ability to lock to an unstable data signal which has suffered distortion over the link, and an interconnect which works badly with one receiver may be satisfactory with another. Devices are available which will give some idea of the quality of the received data signal, in order that the user may tell how close the link is to failure (see section 6.9).

Figure 4.5 The minimum eye pattern acceptable for correct decoding of standard two-channel data.

It is possible to equalize the signal at the receiver in order to compensate for high-frequency losses over long links and the standards suggest the curve shown in Figure 4.6 for use at the 48 kHz sampling frequency. It has been suggested¹⁵, though, that as cable lengths increase the loss characteristic approaches a second order curve before problems occur, and that therefore a second order equalization characteristic is often more effective.

Figure 4.6 EQ characteristic recommended by the AES to improve reception in the case of long lines (basic sampling rate).

4.3.4 Unbalanced Electrical Interface

The unbalanced interface described in this section is commonly found on consumer and semi-professional equipment and has become widely used as a stereo interface on computer sound cards, probably because of the compact size of the connector. The unbalanced electrical interface specified originally in IEC 958 and EIAJ CP-340/1201 is not a feature of professional standards such as AES3. IEC 958 did not originally state explicitly that the unbalanced interface was intended for consumer use – it simply called it ‘unbalanced line (two-wire transmission)’ – but operational convention and the origin of the SPDIF interface on which it was based established that the unbalanced two-wire interface, terminating in RCA phono connectors, was for consumer applications. Interestingly, EIAJ CP-340 took the step of noting that the unbalanced interface and the optical fibre interface applied only to Type II transmissions (consumer), although it did not say anything about the balanced interface being only for professional purposes. These confusions are resolved in IEC 60958 which clearly indicates the use of the unbalanced or optical interfaces for consumer applications in Part 3.

The unbalanced interface is shown in Figure 4.7. IEC 60958 (1999) specifies a source impedance of 75 ohms ±20% for this interface, between 0.1 and 6 MHz, and a termination impedance of 75 ohms ±5%. Like AES3, it is being revised to account for higher sampling frequencies and so will in future state an upper limit of 128 times the maximum frame rate. It specifies a characteristic cable impedance of 75 ohms ±35%. The cable is normally a standard audio coaxial cable and this interface is typically used for interconnecting consumer equipment over the sorts of distances involved in hi-fi systems. It does not specify a maximum length over which communication may be expected to be successful but it does give an eye pattern limit for correct decoding and specifies a minimum peak-to-peak input voltage at the receiver of 200 mV (the minimum eye pattern for correct decoding is essentially the same as AES3). A significant difference between this interface and the balanced interface is that the source signal amplitude should be only 0.5 V ±20%, peak-to-peak, which is much lower than the balanced interface. It should be noted, though, that video-type 75 ohm coaxial cable exhibits very low losses below about 10 MHz and so one might expect to be able to cover significant distances without the signal level falling below the minimum specified.

Figure 4.7 The consumer electrical interface (transformer and capacitor are optional but may improve the electrical characteristics of the interface).

It used to be said by some that because the unbalanced interface was a coaxial transmission line with well-controlled impedances it formed a better link than the balanced interface. This was always offset by the advantages of a balanced line in rejecting interference and the higher voltages used in the balanced interface. Now that the balanced interface specifies source and termination impedances to be the same, requires point-to-point connection, and recommends 110 ohm cable (rather than anything between 90 and 120 ohms), the balanced interface has the benefits of a good transmission line as well as its other advantages.

4.3.5 Optical Interface

An optical interface was introduced as a possibility in IEC 958 but was left ‘under consideration’. Surprisingly, perhaps, this still seems to be the case in 60958. It was specified more explicitly in EIAJ CP-340 (or CP-1201) as applying only to Type II data and consisting of a transmitter with a wavelength of 660 nm ±30 nm and a power of between –15 and –21 dBm. Receivers should still correctly interpret the data when the optical input power is –27 dBm. The connector indicated conforms to the specification laid out in EIAJ RCZ-6901.

Typically the optical interface is found in consumer equipment such as DAT recorders, CD players, computer sound cards, stand-alone convertors, and amplifiers with built-in D/A convertors. It usually makes use of an LED transmitter (see section 1.7) and a fibre optic cable, connected to a photodetector in the receiver. The ‘TOSLink’ style of fibre optic interface is popular in consumer equipment, and is driven from a TTL level (0–5 volt) unbalanced source, with a data format identical to that used with the electrical interface. The advantages of optical links in rejecting interference have already been stated in section 1.7 but there are also dangers in using cheap optical interfaces. Their limited bandwidth and high dispersion may actually result in a poorer transmission channel than a normal electrical interface, resulting in a high degree of timing instability in the positions of data transitions (see Chapter 6). For a comprehensive introduction to fibre optics in audio the reader is referred to Ajemian and Grundy¹⁶.

4.3.6 Coaxial Interface

A coaxial method of transmission for the professional AES3 interface, described in AES-3ID¹⁷, makes use of 75 ohm video-style coaxial cable to carry digital audio signals over distances up to around 1000 m. A similar but not identical description of this is to be found in SMPTE 276M¹⁸. A signal level similar to that of a video signal (1 volt) is used, although the signal is not formatted to look like a video waveform (it still uses basically the same bi-phase mark channel code as AES3). Easy conversion is possible between the balanced form of AES3 and this coaxial form and a number of manufacturers make balanced-to-coax adaptors (although the voltage level is much lower after transformation from 1 volt/75 ohms back to 110 ohms than might be expected from a standard AES3 balanced output stage). Some simple conversion networks are illustrated in the information document.

The advantages of this interface include the ease of distribution of audio within a television studio environment, using video distribution amplifiers and cabling, and improved electromagnetic radiation characteristics when compared with the balanced twisted pair, as discussed in Rorden and Graham¹⁹. Tests of such an interface have been successful, showing in one test that equalized video lines could carry an AES-format digital audio signal over a distance of more than 800 miles (1300 km) without any noticeable corruption.

4.3.7 Multipin Connector

A multipin connector version of AES3 is described in the information document AES2-ID²⁰ for use in circumstances in which there is not sufficient space for multiple XLR connectors. Such a connector allows multiple channel interfacing without going to the lengths of implementing the MADI standard (see section 4.11) and could be a lower cost solution than MADI in cases of a smaller number of channels than 56. The described configuration carries 16 channels of audio data on a single 50 pin D-type connector.

4.4 Sampling Rate Related to Data Rate

The standard two-channel interface was originally designed to accommodate digital audio signals with sampling rates between 32 and 48 kHz, with a margin of ±12.5% to allow for varispeed operations. Since the interface carries audio data in real time, normally transferring two audio samples (channel 1 and channel 2) in the time of one sampling period, the data rate of the interface depends on the audio sampling rate. It is normally 64 times the sampling rate, since there are 64 bits in a frame (= two subframes). At a sampling rate of 48 kHz the data rate is 64 times 48 000, which is 3.072 Mb/s, whereas at 32 kHz it is only 2.048 Mb/s. If the source is varispeeded by a certain percentage then the data rate will change by the same percentage, and although it can usually be tracked by a receiver this presents problems in a system where all devices must be locked to a common, fixed sampling frequency reference (see Chapter 6), since the receiver may not change its sampling rate to follow a varispeeded source.

In recent years there has been a demand for interfaces capable of handling audio at increased sampling frequencies up to 192 kHz. For this reason a situation can arise in which one AES3 interface is used in a single-channel-double-sampling-frequency mode. Here the two subframes within a single AES frame carry successive samples of the same audio channel, making the audio sampling frequency twice the AES frame rate. The sampling frequency indicated in byte 0 of channel status (see below) remains as if the interface was operating at normal sampling frequency, because this is the frame rate of the interface. For example, in this mode a single interface could carry a single channel of audio at 96 kHz sampling frequency whilst continuing to operate at the same data transmission rate as a stereo interface running at 48 kHz. This is a useful alternative to doubling the overall interface transmission rate to accommodate the higher audio sampling frequency (giving rise to the need for better and possibly shorter cables). These modes are indicated in channel status byte 1, described below.

4.5 Auxiliary Data in the Standard Two-Channel Interface

As stated earlier, the four bits of auxiliary data in each subframe may be used for additional LSBs of audio resolution, if more than 20 bits of audio data are needed per sample. Alternatively the auxiliary data may be used to carry information associated with the audio channel. In many items of equipment manufactured to date they remain unused.

It was proposed to the CCIR in 1987 that the aux bits would prove useful for a good voice quality channel which could be used for coordination (talkback) purposes in broadcasting²¹. Typically in a radio broadcast studio the programme source (say a studio) sends a stereo programme to a destination (say a continuity suite) along with a good voice quality link for coordination purposes (see Figure 4.8). A feed of cue programme (normally mono), together with a coordination channel and perhaps additional data, is returned from the destination. It was proposed that in digital studio environments all of these signals could be carried over a single standard two-channel interface in each direction by sampling the coordination voice channel at exactly one-third of the main audio’s sampling frequency and coding it linearly at 12 bits per sample, resulting in a data rate exactly one-fifth that of the main audio channel. (Main audio channel @ 48 kHz, 20 bits; Data rate = 960 000 bits per second; Coordination channel @ 16 kHz, 12 bits; Data rate = 192 000 bits per second.) At such a data rate, a main sampling rate of 48 kHz would allow for a coordination channel bandwidth of about 7 kHz.

Figure 4.8 In broadcasting a coordination link often accompanies the main programme, and cue programme is fed back to the source, also with coordination.

Capacity exists in the two-channel interface for two 12-bit coordination samples, one in channel 1’s subframe (‘A’ coordination) and one in channel 2’s (‘B’ coordination). They are inserted four bits at a time, as shown in Figure 4.9, with the four LSBs of the ‘A’ signal going into the first aux word in the block (designated by the Z preamble), followed by the four LSBs of the ‘B’ signal in the next subframe, and so on for three frames, whereon all 12 bits will have been transmitted for each signal. The process then starts over again. The Z preamble thus acts as a sync point for the coordination channels, and the sampling frequencies of the coordination channels are locked to that of the main audio.

Figure 4.9 The coordination signal is of lower bit rate to the main audio and thus may be inserted in the auxiliary nibble of the interface subframe, taking three subframes per coordination sample.

This arrangement proves very satisfactory because the correct talkback always accompanies a programme signal, the cue programme may be in stereo, and only two hard links are necessary. The user bit channel of the interface can be used to carry any additional control data. In the 1990 revision of CCIR Rec. 647 this modification was taken on board, and forms part of that standard. It was also adopted by the EBU in the 1992 revision of Tech. 3250-E, and appears as an annex to AES3-1992 (for information purposes only). In order to indicate that the aux bits are being used for this purpose, byte 2 of the channel status word (see section 4.8) is used. The first three bits of this byte are set to ‘010’ when a channel’s aux bits carry a coordination signal of this type.

4.6 The Validity (V) Bit

The application and value of the validity bit in each subframe were debated widely during the formulation of standards. Originally the V bit was designed to indicate whether the audio sample in that subframe was ‘valid’ or ‘reliable’, ‘secure and error free’ – in other words, to show either whether it contained valid audio (rather than something else, or nothing), or if it was in error. It was set to ‘0’ if the sample was reliable, and ‘1’ if unreliable (so really it is an ‘invalidity’ flag). Since it is only a single bit, there is no opportunity for signalling the extent or severity of the error. What has never been clear is what devices should do in the case of an invalid sample and this is largely left up to the manufacturer. The most common use for the V bit is to signal errors that occurred in the transmitting device, such as when an uncorrectable error is encountered when replaying a recording, for example. But not all devices treat this in the same way, since some signal any offtape CRC error whether it was corrected or not, whereas others only set the V bit if the error was uncorrectable, resulting in interpolation or even muting in the convertors of most systems (this seems the more appropriate solution). But this is not the only use. Interactive CD (CD-I) players, for example, use the V bit to indicate that the audio data part of the subframe has been replaced by non-audio data. This is because otherwise there would be a potential delay of up to 1 block (192 frames) before the receiving device realized that the channel use had changed from audio to non-audio (this is normally signalled in channel status that is only updated once per block).

In AES3-1992 the description of this bit was modified and now indicates whether the audio information in the subframe is ‘suitable for conversion to an analog signal’, which may or may not amount to the same thing as before. (A binary ‘1’ still represents the error state.) For example, the V bit may be set if an uncorrectable error arises on a DAT machine which would normally result in error concealment by interpolation rather than muting at the output. Any subsequent device receiving audio data from this machine over the digital interface would see the V bit set true and, if interpreting it literally, would assume that the audio was unsuitable for conversion and mute its output, yet the user might still want to hear it, assuming that the interpolation sounds better than a mute! Receivers vary in this respect, and there are some that always mute on seeing the V bit set true. A further and potentially more serious problem, highlighted by Finger², is that a recording device will usually store incoming audio data but has no means of storing the validity flag. In such a case the replayed audio would then be transmitted with no indication of invalidity, even if recorded errors existed.

Another problem arises in devices which simply process the data signal, such as the sample rate convertors or interface processors described in Chapter 6. Should such devices take any action in the case of invalid samples, or should they simply pass the data through untouched? One such device takes the approach of carrying through the V bit state and holding the last sample value in the case of an error, but the solution is less clear when two digital signals are to be mixed together, one which is erroneous and the other not. In such a case it is difficult to decide whether the single mixed data stream should be valid or invalid and there are no clear guidelines on the matter (indeed there is no ‘catch-all’ solution to such a problem).

In truth, the most appropriate actions to and uses of the V bit are application and product dependent. AES2-ID discusses some of these issues and concludes that it is largely the responsibility of the manufacturer to determine the most appropriate treatment of the V bit in accordance with its general definition. Since it is now recommended that all products implementing AES3 are provided with an implementation table, notes on the treatment of the V bit should be incorporated in a similar place in the relevant manual. It is not recommended that the V bit be used as a permanent alternative to the ‘audio/non-audio’ bit in the channel status block, as an indicator of ongoing non-audio data within the subframe, but this could be a temporary solution until the start of the next channel status block as noted above. IEC 60958 describes this as a good temporary use of the V bit for consumer applications involving non-audio data.

4.7 The User (U) Channel

The U bit of each subframe has a multiplicity of uses, many of which have remained hidden from the user of commercial equipment, such as the carrying of text, subcode, and other non-audio data. It is most widely used in consumer equipment and there is now a rather complicated AES standard for its use in professional applications (AES18, see below). There is also a Philips method for inserting data into the user channel, called ITTS (Interactive Text Transmission System), on which the CD system relies for the transferring of subcode and other non-audio data over the consumer interface. The U bit is only a single bit in each subframe, potentially allowing a user channel to accompany each audio channel, and its definition in the various standards is normally ‘for any other information’. The user bits are not normally aggregated over the same block length as channel status data (192 subframes), although they may be, but are often aggregated over different block lengths depending on the application, or may simply be used as individual flags. Many devices, especially professional ones, do not use them at all, although this may change in the future.

In the following sections a number of the most common applications for user data are outlined, although the standards do not really prohibit users or manufacturers using this capacity for alternative purposes. AES3-1992 signals the use of the user bits in byte 1, bits 4–7 of channel status as shown in Table 4.1. IEC 60958 recommends a common format for the application of user bits, suggesting that the user bits in each subframe should be combined to make a single user bitstream for each interface.

Table 4.1 Indication of user bits format in channel status byte 1

Bits 4–7	User bits format
0000	Default, no user information
0001	192-bit block structure
0010	AES18
0011	User defined

(It may be noted that the terminology used to describe the bit number of a message in the user channel can be confusing, depending on whether the message is considered as running ‘MSB to LSB’ or vice versa. We shall refer to the first transmitted bit of a message byte as ‘bit 0’, but some documents refer to this as ‘bit 7’.)

4.7.1 HDLC Packet Scheme (AES18-1992)

Unlike channel status data, user data may consist of a wide variety of different message types, and the AES working group on ‘labels’ decided that the best approach to the problem for professional users was to allow the user channel to be handled in a ‘free format’, such that its maximum capacity of 48 Kbit/s could be shared between applications, with user data multiplexed into ‘packets’ of information which would share the interface. The history of this goes back to 1986, and a proposal by Roger Lagadec of Sony suggested that user data was very different to channel status data and would not suffer the same block structure, requiring a more flexible approach in which some messages could be sent once with minimal delay, whereas others might be repeated at regular time intervals, and yet others might have to be time-specific. It required that the data rate in the user channel be independent of the audio sampling rate, whereas the actual rate of user bits depends on the interface frame rate and thus on the sampling rate.

It is not necessary to document the whole history here, except to say that the direction of the work was influenced considerably by proposals from TDF (Télédiffusion de France) and others, well documented by Alain Komly²², suggesting the use of an asynchronous frame format already well established in the telecommunications and computer industries called HDLC (High-level Data Link Control). This is an internationally standardized way of transferring data at a bit-oriented level around networks (ISO 3309-2)²³, and there are a number of commercial chips available which do the job of inserting data into the correct packet structure. The working group finally recommended a structure for carrying user data in AES18-1992²⁴, and a useful commentary on this may be found in Nunn²⁵. The AES18 standard was revised in 1996²⁶ to include recommendations for coding the data carried over the user channel. If this particular way of treating the user bits is implemented then it is indicated in byte 1 of the channel status information, bits 4–7, as shown in Table 4.1.

Although this is a flexible and versatile way of treating the user bit channel it is possibly overcomplicated for some applications, leaving it up to the user to build his or her own applications around the protocol. It treats the channel rather like a transport stream on an asynchronous computer network and is probably most well suited to large broadcasting installations and systems, although it may quickly be overtaken by protocols that use standard high speed computer networks for both audio and data communications. This approach is not part of the consumer format.

Among the key features of this standard are that the data rate of the user channel can be kept constant over a defined range of sampling rates (but only between 42 and 54 kHz in AES18), that a precise timing relationship can be maintained between audio and user data, that timecritical data may be transmitted within a specified and guaranteed period, and that the channel may be used simultaneously by a number of users. User data to be transmitted is formed into packets which are preceded by a header containing the address of the destination, and the packet is then inserted into the user data stream as soon as there is room. In order to ensure that the user data rate remains constant down to an audio sampling rate of 42 kHz (which is 48 kHz minus 12.5%) extra packing bits are added at the end of each block of packets which can be disposed of as the sampling frequency is lowered. At audio sampling rates below 42 kHz the data rate will be lower, and thus some information would be lost if 48 kHz data were to be sample rate converted to, say, 44.1 or 32 kHz, but it is expected that some form of data management would be implemented to ensure that important data gets the highest priority in these circumstances.

Data is formatted at a number of levels before being transmitted over the interface, starting at the highest level – the ‘application level’ – and ending at the lowest level – the ‘physical level’ – at which the data is actually inserted bit by bit into the audio interface subframe structure. It is not intended to cover the process by which this is achieved here, since this would constitute needless repetition of available documentation. What is important is some commentary on the handling of different types of message, particularly time-specific messages, and on the insertion of additional messages at later points in the interface chain.

AES18 allows for the handling of time-specific messages by formatting the user data packets into blocks, normally of fixed but definable length, and repeating these at a user-definable rate which can be set to correspond to time intervals pertinent in the application concerned. An optional ‘system packet’ may also be transmitted at block intervals which may contain timecode data among other things, and sets priorities for different types of message which may have more or less urgency. It recommends some useful repetition rates of blocks, which correspond to the timing intervals of frames in audio and video applications, as shown in Table 4.2. In some applications variable block lengths may be necessary, such as when using 48kHz audio with NTSC video (which runs at 29.97 fps) where there is not an integer number of audio samples per video frame.

Table 4.2 Some useful repetition rates of blocks

Blocks per second	Duration (ms)	Application
24	41.67	Film
25	40	PAL, SECAM video or 50 frame per second (fps) HDTV
29.97	33.37	NTSC video
30	33.33	60 fps HDTV
33.33	30	DAT

In order to allow for the insertion of messages of varying importance at different points in the system, the standard sets down comprehensive rules governing the way in which messages should be prioritized. The maximum delay involved in inserting a packet of data depends on its priority (from 0 to 3), and the block length involved. The highest priority packet (level 3) may be inserted once per block, and as the priority is decreased the packets are inserted only once per so many blocks. Since the shortest practical block length is 10 ms, this is the minimum delay one might anticipate.

The original version of the standard defined packet structures and transport stream protocols, but said little or nothing about the format or structure of messages. AES18-1996 describes a means of addressing for messages that defines the application area and purpose of the message. Examples include messages about programme description, engineering notebook and switching information. Collaboration is claimed with the EBU regarding the format and structure of such messages.

4.7.2 Consumer Applications of the User Bit

IEC 60958 is more specific about recommended protocol for the user bitstream than the original IEC 958, probably because the range of uses of the interface has grown greatly in the intervening period. In essence it suggests that the user bits for the two subframes should be combined to make a single data stream for the interface concerned. The basis for the recommended structure lies in the Compact Disc application of the user bits to transfer Q–W-channel subcode data, but it has been generalized to other product categories.

The relevant bits should be formed into information units (IUs) of eight bits, starting with a binary 1 and followed by seven information bits. Probably because of the historical link with CD subcode the eight bits of an IU are called the P–W bits, although the P bit does not bear relationship to the P-channel subcode data on CD. IUs are typically separated by four ‘0’ bits, but can be separated by between none and eight. More than eight ‘0’ bits in a row signifies the start of a new message. An example is shown in the next section, relating to the CD.

Three classes of equipment are indicated, each with a different role in relation to user bits, being essentially those that originally generate user bits (Class I), those that pass them through or are ‘transparent’ to user bits (Class II), and mixed-mode equipment such as may combine signals or process them (Class III). In the case of Class II equipment that delays the audio signal it is recommended that the user bits are similarly delayed in order to preserve time alignment.

Because of the somewhat disorganized history of user bit application in consumer products, different categories of products (indicated by the category code in channel status) have their own message structures. However, there is now a general structure that is recommended for new products that can start from scratch in their implementation. In the general structure a message may be made up of a minimum of three and a maximum of 129 IUs, although messages of 96 IUs are reserved specifically for certain classes of laser products. The first three IUs have a specific function, as shown in Figure 4.10, indicating the type of message, the number of remaining IUs in the message (after the first three) and the category code of the originating equipment. Message data is then carried in successive IUs. Because of the complexity and number of possibilities for user bit messaging in consumer products, the reader is referred to the standard and its annexes for further details, although most of the annexes show relatively little in this respect.

Figure 4.10 Format of the first three IUs of the consumer user bitstream.

4.7.3 Applications of the User Bit in Compact Disc and MiniDisc Systems

The following is given as a common example of the application of user bits in two consumer product categories. Compact Disc and MiniDisc players having consumer format digital outputs normally place subcode data in the user bits. This is in addition to the control bits of the Q-channel subcode from the disc which are transmitted within channel status (see section 4.8.9) and formed the only full specification for user bits implementation in the original IEC 958 document, as Annex A.

In this application the user bits for the left and right subframes are treated as one channel, and the Q to W subcode data is multiplexed between them (the P flag is not transmitted since it only represents positioning information for the transport). The subcode data block is built up over 1176 samples, formed into sync blocks of 12 samples each (making 98 subcode symbols, which include two symbols for block sync). There are eight subcode bits in each of these sync blocks (P–W), but only seven of them are transmitted (Q–W) over the interface. Because the subcode data rate is lower than the user bit rate, zeros are used as packing between the groups of subcode bits. The number of zeros is variable, principally to allow for variable speed replay to ±25%. As shown in Figure 4.11, the subcode block begins with a minimum of 16 zeros, followed by a start bit (a binary ‘1’ which some documents call ‘P’ although this might be misleading since it is always a ‘1’ and thus contains no additional information). There then follow seven subcode bits (Q1 to W1), after which there may be up to eight zeros before the next start bit and the next seven bits of subcode data. Only four packing zeros are shown in this diagram. This pattern is repeated 98 times, after which a new intermessage sync pattern of at least 16 zeros is expected.

Figure 4.11 An example of user bits formatting in the CD system.

The Q data in the subcode stream can be used to identify track starts and ends, among other things (see the full CD specification in IEC 908), so it is useful when transferring CDs to DAT or vice versa (for professional purposes, of course), or from a CD player to a CD recorder, since the audio data and the track IDs are duplicated together and the copy is a true clone of the original. Between CD machines there is usually little problem in copying subcode data, since the two machines are of the same format, but between CD and DAT a special processor unit is normally required to convert DAT track IDs to CD track IDs or vice versa and there are occasional discrepancies. Since the P flag is not transferred over the interface the copy may only rely on Q subcode information and there is usually a gap between the start of the P flag on the CD and the Q subcode track number increment. Some CD players increment the track number on their own displays at the start of the P flag and then count down to the true track start using the Q data, whereas a copy of such a recording would only increment the track number at the true track start. There is also occasionally a small delay in the assertion of the track start flag on DAT recordings, due to the automatic start ID facility used in many machines which writes a new start ID when the audio level rises above a certain point, which may sometimes be compensated for in the transfer.

4.7.4 Applications of the User Bit in Dat Systems

As with the CD/MD, the consumer interface on DAT machines also carries some additional information in the user bits. The first edition of IEC 958 suggested that subcode data would be carried in the four auxiliary bits rather than the user bits but IEC 60958 now shows subcode in the user bits, with nothing in the aux bits. Considering the subcode information which could be sent in the user bits the actual implementation is incredibly crude, as it simply indicates the presence or lack of start and skip (shortening) IDs on the tape. This approach was in fact inherent in the DAT design standard right from the start²⁷.

As shown in Figure 4.12, sync, start ID and skip ID are transmitted over the interface with relation to the DAT frame rate of 33.33 frames per second. As with CD, the user bits of the left and right channel subframes are considered together and (differently from the ‘general’ format described above) each ‘message’ consists of only one IU with only two active bits (Q and R). The sync ID (P bit) is transmitted once per frame by setting the user bit of the first left channel sample (L₀) of that frame true – this simply indicates where the frame begins and could be used for crude synchronization of two machines. (In other words, the user bit of the interface subframe corresponding to the first sample of the DAT frame is always set true.) When a start ID is present on the tape the user bit of the following interface subframe (which is the Q bit of the IU or the first right channel sample of the DAT frame, or R₀) is also set true, and this lasts for 300 ± 30 frames, or about 10 seconds (the same duration as the start ID information on the tape). When a skip ID is encountered in normal play mode (without actually skipping) the user bit of the next left channel subframe (the R bit or L₁) is set true, and this is repeated for 33 ± 3 frames, or about 1 second. When the DAT machine is programmed to act on skip IDs it will skip to the next start ID, and in this case the user bit of the L₁ frame is set true only once – in the first frame that it is encountered. All the other user bits are set to zero.

Figure 4.12 Signalling of DAT start and skip IDs in user bits (user bits only shown).

The number of samples corresponding to a DAT frame depend on the sampling rate, and therefore this dictates the distance between sync, start and skip IDs in the user bits of the interface. At 48 kHz there are 1440 left and right samples per frame, making 2880 subframes between the occurrence of these user bits. At 44.1 kHz this gap is reduced to 2646 words, and in the 32 kHz long play mode found on some players it is 3840 words.

4.8 Channel Status Data

Channel status data (represented by the C bit in each subframe) is commonly a problematic area with implementations of the standard two-channel interface. It is here that a number of incompatibilities arise between devices and it is here that the main differences exist between professional and consumer formats, because the usage of channel status in consumer and professional equipment is almost entirely different. In this section the principles of channel status usage will be explained, together with an introduction to potential problem areas, although discussion of practical situations is largely reserved until Chapter 7.

4.8.1 Format of Channel Status

Although there is only one channel status (C) bit in each subframe, these are aggregated over a period of time to form a large data word that contains information about the audio signal being transmitted. The two audio channels theoretically have independent channel status data, although commonly the information is identical, since most applications are for stereo audio. Starting with the frame containing the Z preamble, channel status bits are collected for 192 frames (called a channel status block), resulting in 192 channel status bits for each channel. This long word is subdivided into 24 bytes, each bit of which has a designated function, but the function of these bits depends on whether the application is consumer or professional. The channel status information is updated at block rate, which is 4 ms at a sampling rate of 48 kHz and longer pro rata at other sampling rates.

Standards such as AES3 only cover the professional application of channel status, but the EIAJ, IEC and British standards all include a section on consumer applications as well. In the early days of digital interfaces relatively little notice was taken of channel status data by some products and implementations could be sparse, leading to incompatibilities. In AES3 (1992) the format of professional channel status data was extended and explained more carefully, with the intention of setting down more clearly what devices should do with this data if they were to conform properly with the standard. Even more recently the amendments to AES3 have indicated yet further detail in channel status implementation to accommodate such things as new channel modes and higher sampling frequencies. These revisions should ensure that more recent professional devices are more compatible with each other, although it will not help the situation with older equipment. The problem of incompatibility in channel status data is covered further in Chapter 7.

4.8.2 Professional and Consumer Usage Compared

It is both a blessing and a curse that the formats of professional and consumer data are so similar. It is a blessing because the user may often wish to transfer material from one to the other and the similarity would appear to make this possible (see section 6.7), but it is a curse because the usage of channel status is so different between the two that the potential for problems is very high, due to the likely misinterpretation of this information by a receiver of the other format. (There is no technical reason, though, why a device should not be designed to interpret both formats correctly.) The first bit of the channel status block indicates whether the usage is consumer (0) or professional (1), and this bit should be interpreted strictly in order to avoid difficulties. (It should be noted that before the consumer format was standardized by the IEC the first bit of the consumer channel status data actually represented ‘four-channel mode’, not ‘consumer/professional’. Some early devices may still exhibit this feature.)

Figure 4.13 shows a graphical comparison of the beginnings of the channel status blocks in professional and consumer modes. Clearly bit 0 is the same and indicates the usage, and bit 1 is also the same, indicating linear PCM audio (0) or ‘other purposes’ (1) usage of the interface (such as data-compressed audio, as discussed in section 4.9), but here the similarity stops. In the professional version three bits (bits 2, 3 and 4) are used to signal pre-emphasis, but in the consumer version only two (bits 3 and 4) are used for emphasis indication in the linear PCM mode (although only one is actually used in practice). Bit 2 is used to signify copyright status in the consumer format. Already there is room for incompatibility since a professional device trying to interpret a consumer signal or vice versa could confuse copy protection states with emphasis states. Since copy protection is not normally an issue in professional applications, there is no provision for signalling it in the professional interface.

Figure 4.13 Comparison of (a) professional and (b) consumer channel status, bits 1–16.

It is clearly crucial to ensure that the first bit of channel status is correctly interpreted before any further interpretation of channel status takes place. Devices that ignore this fundamental difference in implementation, or that simply ignore most channel status information, may work adequately with other devices some of the time. It is, however, likely that the ever increasing number of possible device configurations will lead to communication difficulties if the channel status implementation is not interpreted strictly.

4.8.3 Professional Usage

Usage of Channel Status and Channel Modes

Figure 4.14 shows the format of the professional channel status block, and Figures 4.15 and 4.16 show the functions of the first (byte 0) and second (byte 1) bytes in more detail. These indicate basic things about the nature of the source, such as its sampling frequency (a root of problems, as discussed in section 6.7), its pre-emphasis, the mode of operation (mono, stereo, user defined, etc.) and the way in which the user bits are handled (see section 4.7 and Table 4.1). The sampling frequencies indicated in bits 6–7 of byte 0 are the original and basic ones. If one of the new rates is to be used, as now indicated in byte 4, the byte 0 indication can be set to 00 and the actual rate indicated in byte 4. Indication of the sampling frequency here is not, in any case, a requirement for operation of the interface. Recent versions of the standard use byte 1 to indicate that the interface is operating in single-channel-double-sampling-frequency mode, as discussed in section 4.4. In such a case the sampling frequency indicated in byte 0 is essentially the frame rate of the interface rather than the audio sampling frequency (which would be twice that).

Figure 4.14 Overview of the professional channel status block.

Figure 4.15 Format of byte 0 of professional channel status.

Figure 4.16 Format of byte 1 of professional channel status.

The EBU has made recommendations concerning the primary/secondary mode of the interface for broadcasting applications in document R72-1999²⁸. This identifies three possible uses for the primary and secondary channels as shown in Table 4.3. The means by which these will be indicated elsewhere in channel status has not yet been formally agreed.

Table 4.3 EBU R72 recommendation for use of primary/secondary modes

Primary	Secondary
Complete monophonic mix	Reverse talkback
Mono signal (M)	Stereo difference signal (S)
Commentary	International sound

Use of Aux Bits and Audio Resolution

Figure 4.17 shows how byte 2 is split up, with the first three bits describing the use of the auxiliary bits (see section 4.5), and Table 4.4 shows how bits 3–5 are used to indicate audio word length. In Amendment 4 to AES3-1992, bits 6 and 7 are used to indicate the audio alignment level as shown in Table 4.5. Byte 2 allows the source to indicate the number of bits actually used for audio resolution in the main part of the subframe. No matter what the audio resolution, the MSB of the audio sample should always be placed in the MSB position of the interface subframe. This allows receiving devices to adapt their signal processing to handle the incoming signal resolution appropriately, such as when a 16-bit device receives a 20-bit signal, perhaps redithering the audio at the appropriate level for the new resolution in order to avoid distortion. This byte was less comprehensively used in the original version of AES3, only indicating whether the audio word length was maximum 20 bits or 24 bits (in other words, whether the aux bits were available for other purposes or not).

Figure 4.17 Format of byte 2 of professional channel status.

Table 4.4 Use of byte 2 to represent audio resolution

Bits states 3 4 5	Audio word length (24-bit mode)	Audio word length (20-bit mode)
0 0 0	Not indicated	Not indicated
0 0 1	23 bits	19 bits
0 1 0	22 bits	18 bits
0 1 1	21 bits	17 bits
1 0 0	20 bits	16 bits
1 0 1	24 bits	20 bits

Table 4.5 Alignment levels indicated in byte 2, bits 6–7

Bits 6–7	Alignment level
00	Not indicated
01	—20 dB FS (SMPTE RP155)
10	—18.06 dB FS (EBU R68)
11	Reserved

Multichannel Modes

Byte 3 is used to describe multichannel modes of the interface, as shown in Figure 4.18. Although the interface can only carry two channels, these may be specific channels of a bundle of associated signals carried over a group of interfaces. The use of the remaining bits depends on the state of bit 7, so that either the whole byte is used to denote the channel number or part of it is used to indicate a particular multichannel mode and the other part to indicate the channel number. The latter is likely to be used for applications like the signalling of particular surround sound configurations, as there are numerous ways in which numbered channels can be assigned to speaker locations (e.g. Left, Right, Centre, Left Surround, Right Surround)²⁹.

Figure 4.18 Format of byte 3 of professional channel status.

Sampling Frequency Status

Byte 4 is illustrated in Figure 4.19. The first two bits are used for indicating whether the signal can be used as a sampling frequency reference, according to the AES11 standard on synchronization. In the ‘00’ state these bits indicate that the signal is not a reference, whilst the ‘01’ state is used to represent a Grade 1 reference and the ‘10’ state is used to represent a Grade 2 reference. This topic is discussed in greater detail in Chapter 6. Bits 3–6, in recent amendments, are now used to describe sampling frequencies not originally mentioned in the standard, as detailed in Table 4.6. These are audio sampling frequencies, not necessarily interface frame rates, so they are not dependent on the interface modes described in byte 1. Bit 7 is a sampling frequency scaling flag used to signify (in the ‘1’ state) so-called ‘drop-frame’ or ‘pull-down’ frequencies of 1/1.001 times the basic frequency that sometimes arise in post-production operations when digital audio equipment is synchronized to NTSC television signals.

Figure 4.19 Format of byte 4 of professional channel status.

Table 4.6 New sampling frequencies indicated in byte 4, bits 3–6

Bits 6–3 (in that order)	Sampling frequency
0000	Not indicated (default)
0001	24 kHz
0010	96 kHz
0011	196 kHz
1001	22.05 kHz
1010	88.2 kHz
1011	176.4 kHz
1111	User defined
All other states	Reserved

Source and Destination Identification

In bytes 6 to 13 it is possible to transmit information concerning the source and destination of the signal in the ASCII text format used widely in information technology (see ISO 646). ASCII characters are normally seven bits long (although extended character sets exist which use eight bits), and can represent alphanumeric information. The eighth bit is often used as a parity bit in telecommunications, but it is not used in this application, being set to zero. (AES3-1985 and IEC 958 specified odd parity, but this was changed in AES3-1992 to no parity.) Some ASCII characters are non-printing symbols called ‘control characters’ (the first 31 (hex 01–1F) and the last one (hex 7F)) and these are not permitted in this application either. Using this part of channel status the user can ‘stamp’ the audio signal with a four-character label to indicate the name of the source (bytes 6–9), and the same for the destination (bytes 10–13). It is possible to use destination labelling in automatic routers in order that a signal may control its own routing. The format for these ASCII messages is LSB first, and with the first character of each message in bytes 6 and 10 respectively.

Sample Address Codes

Bytes 14–21 carry what are called ‘sample address codes’, which are a form of timecode but counting in audio samples rather than hours, minutes, seconds and frames. Bytes 14–17 are a so-called ‘local sample address’ which can be used rather like a tape counter to indicate progress through a recording from an arbitrary start point, and bytes 18–21 are a time-of-day code indicating the number of samples elapsed since midnight. Usually this is the time since the device was reset or turned on, because most devices do not have the facility for resetting the sample address code to time of day, but AES3-1992 contains a note to state that it should be the time of day which was laid down during the original source encoding of the signal, and should not be changed thereafter, implying that it should be derived from offtape timecode if that exists.

The four bytes for each sample address code are treated as a 32-bit number, with the LSB sent first; 32 bits allows for a day of 4 294 967 296 samples, which represents just over 24 hours at a sampling rate of 48 kHz. (At the maximum sampling frequency normally allowed for over the interface (54 kHz) 32 bits are not quite enough to represent a whole day’s worth of samples, allowing a count of just over 22 hours, but this sampling frequency is normally only used as an upward varispeed of 48 kHz and thus is a non-real-time situation in any case.) If the sampling frequency is known then it is a straightforward matter to convert the sample address to a time of day. The sample address code is updated once per channel status block; thus it is incremented in steps of 4 ms at 48 kHz, and represents the sample address of the first sample of that block.

Byte 22 is used to indicate whether the data contained in certain channel status bytes is reliable, by setting the appropriate bit to ‘1’ in the case of unreliable information, as shown in Table 4.7.

Table 4.7 Use of byte 22 to indicate reliability of channel status data bytes

Bits	Bytes reliable
0–3	Reserved
4	Bytes 0–5
5	Bytes 6–13
6	Bytes 14–17
7	Bytes 18–21

The last byte of the channel status block (byte 23) is a CRC (Cyclic Redundancy Check) designed to detect errors in channel status information only, and a simple serial method of producing CRC information is given in AES3-1992. (Most modern AES/EBU interface chips generate channel status CRC automatically.) The CRC byte was not made mandatory in AES3-1985 and a number of systems have not implemented it, but in AES3-1992 it was made mandatory in the ‘standard’ implementation of the interface (although there is a ‘minimum’ mode in which it can be left out). The presence or lack of the CRC byte in different devices is a root of incompatibility in older equipment because devices expecting to see it may refuse to interpret channel status data which does not contain a CRC byte and will assume that it is always in error.

Any bits in channel status that are either not used or reserved should be set to the default state of binary ‘0’ – another important factor in ensuring compatibility between devices.

4.8.4 Levels of Professional Channel Status Implementation

AES3-1985 was rather unclear as to what manufacturers should do with channel status if they were not implementing certain features. Receivers could be found all the way between the two extremes of either interpreting the standard so literally that they expected to see every single bit set ‘correctly’ before they would work, or else interpreting it so loosely that virtually anything would work, often when it should not. (To be fair it is true that much of the time communication worked without a problem.) In the 1992 revision it was decided to recommend three levels of imple-mentation at the transmitter end, encouraging manufacturers to state the level at which they were working. These levels have been called ‘minimum’, ‘standard’ and ‘enhanced’. It is intended that at all levels the rest of the frame should be correctly encoded according to the standard.

At the minimum level channel status bits are all set to zero except for the first bit which should be set to signify professional usage. Such an implementation would allow ‘belt and braces’ communication in most cases, but leaves room for many problems since CRC is not included and neither is any indication of pre-emphasis or sampling frequency. It is intended in such a case that the receiver should set itself to the default conditions (48 kHz, no emphasis, two-channel mode) but allow manual override.

At the standard level the transmitter is expected to implement bytes 0 to 2 and 23 of channel status. This then allows for all the information about the source signal and use of different bits in the frame to be signalled, as well as including the CRC.

An enhanced mode is also allowed which is basically the standard data plus any additional data such as sample address and source identification.

As far as receivers are concerned there is currently little in the way of insistence in the standards concerning how receivers should behave in the case of certain data combinations, except that the manufacturer should state clearly the data recognized and the actions which will be taken. This situation is clarified in AES2-ID, in which a number of equipment classifications are specified, indicating the level to which the device processes and modifies channel status. Manufacturers should publish implementation charts for their devices, much as manufacturers of MIDI-controlled equipment publish such charts in operators’ manuals, allowing users to identify the roots of any incompatibility. Such charts, examples of which are found in AES2-ID, should also indicate the treatment of the user bit and validity bit, as well as indicating supported sampling frequencies/resolutions.

In AES2-ID, the implementation of channel status is grouped into A, B and C classes according to the sophistication and nature of the device. Group A devices behave like a wire, simply passing whatever they receive and not storing or acting on it in any way (e.g. a crosspoint switcher). Group B devices all do something with the data they receive, to varying degrees. B1 only decodes audio, not channel status (highly unlikely and possibly risky); B2 decodes channel status but does not store or pass it on; B3 stores and/or passes on channel status data indicated in the implementation chart, modifying it as necessary to reflect the correct state of the audio signal it accompanies. Group C is terminal end equipment (e.g. D/A convertor) that either does not decode channel status (C1) or does and acts on it as indicated (C2).

4.8.5 Overview of Channel Status in Consumer Applications

In a way the use of channel status is rather more complicated in consumer applications because of the many types of consumer device and the wide variety of data types that may be transmitted. The format of the basic block is shown in Figure 4.20 and a more detailed breakdown of the first byte (byte 0) is shown in Figure 4.21. (For a comparison with byte 0 of the professional format see Figure 4.15.) Bits 6 and 7 in the consumer format define the ‘mode’ of use of the channel status block, and so far only mode 0 is standardized with these bits set to ‘00’. There was a proposal for a mode 1 to be used for ‘software information delivery’, with the bits set to ‘10’ respectively, in order to allow the transmission of information about production details on prerecorded media. This was written up as a technical report that appears listed as IEC 60958-2 (1994) but is not considered to be formally part of the standard.

Figure 4.20 Overview of the consumer channel status block.

Figure 4.21 Format of byte 0 of consumer channel status.

The usage of channel status in consumer equipment depends on the mode, and also on the category code which defines the type of device transmitting the data in the second byte (byte 1) of channel status (see the next section). In mode 0, byte 2 of channel status is used to indicate a source number from 1 to 16 and a channel number (see Figure 4.22), so that in the case of sources with multiple audio channels it is possible to signal which two are being transmitted. Byte 3 is used to indicate the sampling frequency of the source and the clock accuracy of the source (see Table 4.8).

Figure 4.22 Format of byte 2 of consumer channel status.

Table 4.8 Sampling frequency and clock accuracy in IEC60958 channel status

Bits 24–27	Sampling frequency (kHz)
0000	44.1
0001	88.2
0010	22.05
0011	176.4
0100	48
0101	96
0110	24
0111	192
1000	Not indicated
1100	32

Bits 28–29	Clock accuracy
00	Level II
01	Level III
10	Level I
11	Frame rate not matched to f_s

Byte 4 of consumer channel status now contains an indication of the audio word length and the original sampling frequency of the signal. It used to be set to all zeros, so older equipment may show this characteristic. Bit 32 indicates whether the sample is maximum 20 or 24 bits (0 and 1 states respectively) and bits 33–35 indicate the resolution in the same way as the professional interface (Table 4.4). The original sampling frequency indication in bits 36–39 can be used to show the sampling frequency of a signal before sample rate conversion in a consumer playback system. This might be the case for applications such as computer games where sounds with low sampling frequencies could be internally converted to 44.1 kHz for replay through a common convertor, or with DVD players where high sampling frequency material is down-converted to 44.1 or 48 kHz for transmission over a standard digital interface. (Currently it is not permitted to transfer high sampling frequency material over an IEC 60958 digital interface in the DVD standard and digital outputs are limited to basic rates. This is an initial barrier to piracy of high resolution master material, although other procedures are being developed involving watermarking and encryption.)

4.8.6 Category Codes in Consumer Channel Status

The eight-bit category code identifies the source device, allowing subsequent channel status and user data to be interpreted correctly. Some examples of the most common category codes are shown in Table 4.9. (Note that category codes are normally written this way round, with the LSB first, which may be confusing since binary numbers are normally written down MSB first.)

Table 4.9 Category codes of common products

Category code (bits 8–15; LSB to MSB)	Device type
00000000	General
10000000	CD player
1001001L	MiniDisc
1001100L	DVD
1100000L	DAT
010XXXXL	DSP devices and digital/digital convertors

As can be seen, all except the General and CD category codes have bit 15 as the ‘L’ bit. This is used in conjunction with the copyright bit to manage copy protection as explained in the next section. The CD standard was introduced before the copy protection issue became such a ‘hot potato’ and was stuck with the category code and copyright indication it first used. However, a workaround has been introduced whereby the copyright bit in byte 0 of channel status can be made to alternate states at a rate between 4 and 10 Hz to indicate a ‘home copy’ of original material of generation 1 or higher.

4.8.7 SCMS and Copy Protection

The method of coping with copy management is called SCMS (Serial Copy Management System) and is now implemented on all consumer digital recording equipment. SCMS applies when signals are copied digitally across a consumer format interface and has no meaning in the professional format. Although SCMS appears not to apply when a signal is copied via an analog interface, in fact there are problems as discussed below.

The principle of SCMS is that a copyright prerecorded signal may be copied only once (provided that the source device is one of the so-called ‘white list’ of product types from which limited copying is allowed), but no further generations are allowed. This is supposed to allow home users to make a single copy of something they have bought for their own purposes, but to prevent large scale piracy. The copyright protection (Cp) bit that already existed in the consumer format (bit 2) was not sufficient on its own because it did not give any indication of the generation of the copy. The so-called ‘L bit’ was therefore introduced in the category code (see previous section). The Cp bit in conjunction with the L bit can be used to prevent more than one generation of copying of copyright material. The state of the L bit in effect signifies the ‘generation’ of the signal, whether ‘0th generation’ (an original prerecorded work) or 1st generation and higher copies. The Cp bit now signifies whether the source material is copyright or not (‘0’ =; copyright). Unfortunately the complication does not stop here, because although the L bit is normally set so that in the ‘1’ state it represents an original (commercially released, prerecorded software) rather than a copy, with laser optical products and broadcast receivers it is the other way around!

The upshot of all this is that if a recording device sees that the C bit of a digital source is ‘0’ (copyright material) and the L bit indicates that the source is original prerecorded material, it will allow the copy. If the L bit indicates that the source is already a copy it will disallow the recording. When copyright material is copied from a prerecorded source a flag is recorded on the copy to state ‘I am a copy of a copyright source’, and when this recording is replayed the L bit will be set to show that it is not an original, thus disallowing further copies. Extremely thorough coverage of SCMS is to be found in an AES journal article by Sanchez³⁰.

4.8.8 SCMS in DAT Machines

SCMS was first introduced because of the perceived threat of copying with DAT machines. It works in such machines as follows. There are two DAT category codes: one called simply ‘DAT’ (code 11000000) and one called ‘DAT-P’ (code 11000001) – the difference between them is the L bit. On DAT recordings there are also two bits in the subcode recorded on the tape that controls copy protection and these are called the ‘ID6’ bits. An SCMS DAT machine looks at the ID6 bits recorded on the tape it is playing to determine how it should set the combination of L and C bits on the digital interface.

If the ID6 on the source tape is 00 (copies allowed) it will set the category to ‘DAT’ and the C bit to show ‘non-copyright’. An SCMS machine receiving that signal would also set the recorded ID6 of the copy to 00, since there would be no reason to prevent further copies, and any number of serial digital copies might then be made.

If a DAT recorder sees a ‘DAT-P’ source it will allow copies no matter what the status of the C bit, whereas if it sees a straightforward ‘DAT’ source it will only allow a copy if the source is not copy protected. So if the ID6 on the source tape is 10 (copies not allowed), the source machine will set the category to ‘DAT’ and the C bit to ‘copyright’, then the recorder will not be able to copy that tape. If the ID6 on the source tape is 11 (one copy allowed), the machine will set the category to ‘DAT-P’ and the C bit to ‘copyright’, then a receiver would be able to record the signal (since DAT-P allows copies no matter what the © status). It would know, though, that it was copying © material. The ID6 of the copy would then automatically be set to 10 to prevent further copies.

Concerning copies made between DAT machines and other systems, or between pre-SCMS and SCMS machines, the following applies:

1 Digital Copies From CD

The CD category code (10000000) when recognized by an SCMS DAT machine will result in a copy whose ID6 is set to 10 (copies not allowed). (The L bit is ‘0’ for original prerecorded material in laser optical products.)

2 Digital Copies from Other Digital Sources

Copies made from sources having the ‘General’ category code (00000000) will have their ID6 set to 11, whatever the © status, allowing one further copy only. Sources asserting this code are likely to be such things as A/D convertors and some older DAT machines. It acknowledges that the source of the material is unclear, and might be copyright or might not.

3 Digital Copies to SCMS DAT Machines From Pre-SCMS Machines

Pre-SCMS machines may use either the ‘General’ category or the ‘DAT’ category, depending on when they were made and by whom. They will not normally be able to recognize the difference between a recorded ID6 of 11 and an ID6 of 10, because prior to SCMS the machine only had to look at one bit to detect © status. Therefore such a machine will normally interpret both codes as indicating that the recording is copy protected (not even allowing one copy), and set the © flag on the digital output.

Whether or not the receiver will record the data depends on whether the category is ‘General’ or ‘DAT’. If it is ‘General’ then see 2 above. If it is ‘DAT’, then not even a single copy will be allowed. The only case in which unlimited copies will be allowed is when the source tape has an ID6 of 00, which is likely to be the case with many tapes recorded on pre-SCMS machines.

4 Digital Copies of Recordings made from Analog Inputs

Unfortunately, SCMS DAT machines will set the ID6 of analog-sourced recordings to 11, thus allowing only one digital copy. This is a nuisance when the source is a perfectly legitimate non-copyright signal, such as one of your own private recordings.

5 Digital Copies of Prerecorded DAT Tapes

The ID6 of prerecorded tapes is set to 11, thus allowing one further copy if using an SCMS machine. If using a pre-SCMS replay machine, the 11 will be interpreted as 10 (see 3 above) and the © bit will be asserted on the interface. A copy will only be possible if the category of the source machine is ‘General’, but not if it is ‘DAT’.

6 Recording Non-Copy-Protected Material on SCMS Machines

There is no way to record completely unprotected material on an SCMS machine, except by feeding it with a digital source having a category code other than ‘General’ and a recorded ID6 of 00. This might be feasible if you have an early DAT machine. Even material recorded via the SCMS machine’s analog inputs cannot be copied beyond a single generation.

7 Digital Copying from SCMS Machines to Pre-SCMS Machines

Such copies will only be possible at 48 kHz (or 32 kHz if you have such a tape); 44.1 kHz recordings will be blocked on unmodified machines. Source tapes with ID6 set to either 11 or 10 will cause the CP status to be asserted on the digital interface, and, since pre-SCMS machines tend to ignore the L bit, the copy will not be allowed at all. Source tapes with ID6 set to 00 may be copied.

8 Manual Setting of ID6 Status

Consumer machines will not allow the ID6 status of tapes to be set, but some recent professional machines will allow this.

4.8.9 Channel Status in Consumer CD Machines

It was originally intended that the first four bits of the Q subcode from the CD would be copied into the first four bits of the channel status data. The first four bits of Q subcode are as follows:

Bit 0	Two or four channel (two channel = ‘0’)
Bit 1	Undefined
Bit 2	Copy protect
Bit 3	Pre-emphasis

Apart from bit 0 these are compatible with IEC 60958. Since CDs have never implemented a four-channel mode, bit 0 remains in the ‘0’ state which is compatible with the ‘consumer’ status of bit 0 of IEC 60958. Other than this, the channel status format of the CD category code is the same as the general format, with the sampling frequency bits set to ‘0000’ to indicate 44.1 kHz. The user bits contain the subcode information as discussed in section 4.7.2.

4.9 Data-Reduced Audio Over Standard Two-Channel Interfaces

4.9.1 General Principles

The standard two-channel interface was originally designed for linear PCM audio samples but in recent years there has been increasing use of data-reduced audio coding systems such as Dolby Digital (AC-3), DTS and MPEG. Because consumer systems in particular needed the ability to transfer such signals digitally, the non-audio mode of the interface has been adapted to the purpose. This is described in a relatively new IEC standard numbered 61937³¹. In addition to a general specification detailing the principles it also has a number of parts that describe the handling of specific data-reduced audio formats, some of which are not yet finalized at the time of writing. A similar but not identical SMPTE standard (337M)³² describes professional non-audio applications of the interface, including its use for carrying Dolby E data (see below). SMPTE 338M and 339M specify data types to be used with this standard. The SMPTE standard is also more generic than the IEC standard, designed to deal with a variety of data uses of the interface, not just low-bit rate audio. It can also carry time-stamp data in the form of SMPTE 12M timecode. The reader is referred to the standards for more precise details regarding implementation of specific formats.

In both SMPTE and IEC versions the low-bit rate audio data is carried in bursts in place of the normal linear PCM audio information, with bit 1 of channel status set to the ‘other uses’ or ‘non-audio’ state. (The SMPTE standard makes it clear, though, that professional devices should not rely on this. Some digital video tape recorders, for example, cannot control the non-audio bit yet it is desirable that they should receive/transmit and store low-bit rate audio such as Dolby E.) Sometimes the validity bit is also set to indicate ‘invalid’ or ‘unsuitable for conversion to analog’, as a further measure. In the IEC standard the data is carried in the 16 most significant bits of the audio data slot, from bits 12 to 27 in IEC nomenclature. In the SMPTE standard it is possible for the data to occupy 16, 20 or 24 bits. In the IEC standard the two interface subframes are treated as conveying a single data stream whereas in the SMPTE standard the subframes can be handled together or separately (for example, one subframe could carry PCM and the other data-reduced audio).

A data burst of low-bit rate audio (representing the encoded frame of a number of original PCM samples) typically occupies a number of consecutive subframes, the last subframe of the burst being packed with zeros if required. Each burst is preceded by a preamble of four 16-bit words, the first two of which are a synchronization pattern, the third of which indicates the mode of data being carried and the fourth of which indicates the length of the burst, as shown in Figure 4.23. In the IEC standard up to eight independent bitstreams can be carried in this way, each being identified in the third byte of the preamble. The SMPTE standard can carry up to 14 independent streams in the independent subframe mode and the data type preambles are not identical to IEC 61937. Because the total data rate may be lower than that required for linear PCM audio, packing zero bits can be used between bursts of low-bit rate audio data and there is a requirement for at least four subframes to have bits 12–27 set to zero every 4096 frames.

Figure 4.23 Format of the data burst in IEC 61937.

4.9.2 Data-Reduced Consumer Formats

A number of parts of IEC 61937 describe the transmission of audio signals encoded to different standards, some of which are manufacturer- or system-specific and others are internationally standardized. Data-reduced bitstreams of current relevance here are AC-3 (Dolby Digital), DTS (Digital Theatre Systems), MPEG 1 and 2-BC, MPEG 2-AAC and Sony ATRAC. The most commonly encountered applications in consumer systems at the moment are for the transfer of encoded multichannel surround sound data from DVD players to home cinema systems, using either Dolby Digital or DTS encoding. The digital output of DVD players is typically an IEC 60958 interface on a phono connector or optical interface that can be used to carry 5.1-channel surround sound data for decoding and D/A conversion in a separate surround sound processor and amplifier.

4.9.3 Data-Reduced Professional Formats

The SMPTE 337M standard allows a number of data-reduced audio formats to be transmitted over the interface. The most commonly used of these are Dolby AC-3 and Dolby E, but data types are also specified for MPEG 1 and 2.

Dolby E is a data reduction system designed for professional purposes, using mild data reduction in order to minimize generation losses. It was introduced to satisfy a need to transfer production multichannel surround sound signals over two-channel media such as digital interfaces and video tape recorder audio tracks, in order to ease the transition from two-channel operations to 5.1-channel operations in broadcasting and post-production environments. It packs the audio data into the two-channel frame in a similar way to that described in the previous section. The resolution can be adapted to fit 16-, 20- or 24-bit media, the most common implementation using 20-bit frame format mode (both subframes used together) at a data rate, including overheads, of about 1.92 Mbit/s. (The 16- and 24-bit modes run at data rates of 1.536 and 2.304 Mbit/s respectively.) Dolby E packets are aligned with video frames so that the audio can be switched or edited synchronously with video. For example, there are 25 Dolby E packets per second when synchronized with 25 fps video.

4.10 AES42 Digital Microphone Interface

This digital microphone interface is based on the AES3 two-channel interface and includes options for powering and synchronization of microphones. Most digital microphones currently in existence employ conventional capsule technology with A/D conversion very close to the microphone capsule rather than direct digital transduction of the acoustic waveform. There are nonetheless some advantages to be had in using digital transmission of audio signals from microphones, principally the potential for higher quality and lower noise as a result of conversion close to the capsule and the avoidance of long cable runs in the analog domain at low signal level. A microphone that conforms to this standard is typically referred to as an AES3-MIC.

The AES42 standard³³ notes that some patent rights may relate to the interface in question and that licensing of some elements may be required.

4.10.1 Principles

The AES42 (AES3-MIC) interface is a standard AES3 interface that also carries power for the microphone. There is also a proposal to adopt a slightly different XLR connector to the normal one, termed the XLD connector, intended to avoid the possibility of damaging equipment not designed for the power supplying capacity of this interface. The XLD connector is identified with a striped ‘zebra’ ring to distinguish it, but this is not mandatory and there is some disagreement about the need for it (some say that studio practice has managed adequately for years with some XLR connectors carrying phantom power and others not). A combination of coded grooves and keys enables XLD connectors to be used in a variety of combinations with ordinary XLR connectors, or in the fully coded form may prevent one connector from being inserted into a socket of the other type.

If the microphone is monophonic both subframes of the digital interface carry identical information, except in single-channel-double-sampling-frequency modes where they carry successive samples of the one channel.

4.10.2 Powering

The form of phantom powering in this standard is not the same as the 48 volt system used with analog microphones. In this standard the so-called ‘digital phantom power’ (DPP) is 10 volts applied to both legs of the balanced AES3 cable via a centre tap on the cable side of the transformer. Maximum continuous load is specified as 250 mA, with a peak load of 300 mA when charging additional load capacitance.

4.10.3 Remote Control and Status Reporting

An AES3-MIC may be remote controlled using pulsed modulation of the power supply voltage (see below), with positive-going pulses of 2 ±0.2 volts that carry data at a rate of 750 bits per second (at 48 kHz sampling frequency or multiples) or proportionally lower for 44.1 kHz and multiples. The remote control information can indicate changes of microphone settings such as directivity pattern (omni, cardioid, etc.), attenuation, limiting, gain, muting and high-pass filtering. There is also the option for manufacturer-specific settings and extended instructions involving changes of more sophisticated features such as dither type, sampling frequency and so forth.

The microphone’s status can be reported back to the receiver by means of the user bit channel of the AES3 interface. In this application the user bits are assembled into a 24-byte structure, in the same way as channel status information, synchronized by the same Z preamble that indicates the start of a 192-bit channel status block.

4.10.4 Synchronization

There are two modes of operation of an AES3-MIC. In Mode 1 the microphone is self-clocking and generates its own sampling frequency reference. As a consequence of this all mics in a studio would be unsynchronized and each would have a slightly different sampling frequency. Any mixing console dealing with their digital signals would have to apply sampling frequency conversion. In Mode 2, microphones can be synchronized to a common reference signal and this is achieved by transmitting additional data in the remote control information to the microphone. As shown in Figure 4.24 it is intended that a phase comparator be present in each AES3-MIC receiver (say a mixing console input) that compares the relative phase of the word clock extracted from the incoming microphone signal to a reference signal. A binary value is returned to the microphone in the remote control data that adjusts the frequency of its internal clock accordingly, using a D/A convertor to convert the remote control data into a DC voltage and a voltage-controlled crystal oscillator to generate the word clock.

Figure 4.24 Conceptual example of AES42 digital microphone interface.

The resolution of the sync information can be extended from eight to 13 bits for use in high resolution applications where clock accuracy and low jitter are crucial.

4.11 The Standard Multichannel Interface (MADI)

Originally proposed in the UK in 1988 by four manufacturers of professional audio equipment (Sony, Neve, Mitsubishi and Solid State Logic), the so-called ‘MADI’ interface is now an AES and ANSI standard. It was designed to simplify cabling in large installations, especially between multitrack recorders and mixers, and has a lot in common with the format of the two-channel interface. The standard concerned is AES10-1991¹² (ANSI S4.43-1991), and a recent draft revision has been issued. This interface was intentionally designed to be transparent to standard two-channel data making the incorporation of two-channel signals into a MADI multiplex a relatively straightforward matter. The original channel status, user and auxiliary data remain intact within the multichannel format.

MADI stands for Multichannel Audio Digital Interface; in the original standard 56 channels of audio are transferred serially in asynchronous form and consequently the data rate is much higher than that of the two-channel interface. For this reason the data is transmitted either over a coaxial transmission line with 75 ohm termination (not more than 50 m) or over a fibre optic link. The protocol is based closely on the FDDI (Fibre-Distributed Digital Interface) protocol suggesting that fibre optics would be a natural next step³⁴. The recent draft revision proposes a means of allowing higher sampling frequencies and an extension of the channel capacity.

4.11.1 Format of the Multichannel Interface

The serial data structure is as shown in Figure 4.25. It is divided into subframes which, apart from the preamble area, are identical to AES3 subframes. The preamble is not required here because the interface is synchronized in a different way, so the four-bit slot is replaced with four ‘mode bits’ the functions of which are labelled in the diagram. There are 56 subframes in a frame and bit 0 signifies the start of channel 0 (it is set true for that frame only); bit 1 indicates whether a particular subframe or audio channel is active (1 for active); bit 2 indicates whether the subframe is either the A or B channel of a two-channel pair derived from an AES3 source (1 for ‘B’); and bit 3 indicates the start of a new channel status block for the channel concerned. The audio part of the frame is handled in the same way as in the two-channel interface, and the V, U, C and P bits apply on a per-channel basis, with parity applying over bits 4–31.

Figure 4.25 Format of the MADI frame.

The channel code is different from that used in the two-channel version, and another important contrast is that the link transmission rate is independent of the audio sampling frequency or number of channels involved. In the original standard the highest data transfer rate is at the highest sampling rate (54 kHz) and number of channels (56) times the number of bits per subframe (32), that is 54 000 × 56 × 32 = 96.768 Mbit/s. It is assumed that the transmitter and receiver will be independently synchronized to a common sampling frequency reference in order that they operate at identical sampling frequencies. In the recent draft revision sampling frequencies up to 96 kHz are allowed and the maximum channel capacity is extended to 64 by limiting the sampling frequency to no more than 48 kHz (removing the varispeed tolerance in other words). Samples of 96 kHz are handled by reducing the channel capacity to 28 and by either using an approach similar to the AES3 single-channel-double-sampling-frequency mode described earlier, or by transmitting two sets of samples successively within one 20.8 μs frame.

The MADI link itself does not synchronize the receiver’s sampling clock. The channel-coding process involves two stages: first the 32-bit subframe is divided into groups of 4 bits, and these groups are then encoded into 5-bit words chosen to minimize the DC content of the data signal, according to Table 4.10 (4/5-bit encoding).

Table 4.10 4/5-bit encoding in MADI

4-bit groups	5-bit codes
0000	11110
0001	01001
0010	10100
0011	10101
0100	01010
0101	01011
0110	01110
0111	01111
1000	10010
1001	10011
1010	10110
1011	10111
1100	11010
1101	11011
1110	11100
1111	11101

The actual transmission rate of the data is thus 25% higher than the original data rate, and 32-bit subframes are transmitted as 40 bits. To carry the 4/5-encoded data over the link a ‘1’ is represented by a transition (in either direction) and a ‘0’ by no transition, as shown in the example of Figure 4.26.

Figure 4.26 An example of the NRZI channel code.

Special synchronization symbols are inserted by the transmitter in between encoded subframes, and these take the binary form 11000 10001, transmitted from the left (a pattern which does not arise otherwise). These have the function of synchronizing the receiver (but not its sample clock) and may be inserted between subframes or at the end of the frame in order to fill the total data capacity of the link which is 125 Mb/s ± 100 ppm. The prototype MADI interfaces were designed around AMD’s TAXI (Transparent Asynchronous Xmitter/Receiver interface) chips, which were becoming more widely used in high speed computer networks, and these chips normally take care of the insertion of synchronizing symbols so that the transmission rate of the link remains constant.

4.11.2 Electrical Characteristics

The coaxial version of the interface consists of a 75 ohm transmission line terminated in BNC connectors, using cable with a characteristic impedance of 75 ± 2 ohms and losses of <0.1 dB/m (1–100 MHz). Suggested driver and receiver circuits are illustrated in the standard, and the receiver is expected to decode data with a minimum eye pattern as shown in Figure 4.27. Equalization is not permitted at the receiver, and distances of up to 50 metres may be covered.

Figure 4.27 The minimum eye pattern acceptable for correct decoding of MADI data.

The block diagram of transmitter-to-receiver communication is shown in Figure 4.28. Here the source data is buffered, 4/5 encoded and then formatted with the sync symbols before being transmitted. The receiver extracts the sync symbols, decodes the 4/5 symbols and a buffer handles any short-term variation in timing due to the asynchronous nature of the interface. An external sync reference ensures that the two sampling clocks are locked.

Figure 4.28 Block diagram of MADI transmission and reception.

4.12 Manufacturer-Specific Interfaces

Interfaces other than the internationally standardized two-channel and multichannel types will be described and discussed in this section. For example, a number of interfaces have been introduced by specific manufacturers and are normally only found on that manufacturer’s products, or are licensed for use by others. Some of these proprietary interfaces have become quite widely used in commercial products – for example, the ADAT ‘lightpipe’ interface is widely encountered on computer sound cards because it is a small-sized optical connector capable of carrying eight channels of digital audio. Some of the technology described in this chapter is the subject of patents and implementers may need to enter into licensing agreements.

MIDI, included briefly in the previous editions of this book, is not strictly a digital audio interface and coverage of it has been removed in this edition. The interested reader is referred to MIDI Systems and Control³⁵ or the forthcoming Desktop Audio Technology by Francis Rumsey.

4.12.1 Sony Digital Interface for LPCM (SDIF-2)

Sony’s original interface for linear PCM data was SDIF-2. It was designed for the transfer of one channel of digital audio information per physical link at a resolution of up to 20 bits (although most devices only make use of 16). The interface has also been used on equipment other than Sony’s, for the sake of compatibility, but the use of this interface is declining as the standard two-channel interface becomes more widely used.

The interface is unbalanced and uses 75 ohm coaxial cable terminating in 75 ohm BNC-type connectors, one for each audio channel. TTL-compatible electrical levels (0–5 V) are used. The audio data is accompanied by a word clock signal on a separate physical link (see Figure 4.29), which is a square wave at the sampling frequency used to synchronize the receiver’s sample clock. Sony’s multitrack machines use SDIF also, but with a differential electrical interface conforming to RS-422 standards (see section 1.7.2) and using 50 pin D-type multiway connectors, the pinouts of which are shown in Table 4.11. A single BNC connector carries the word clock as before.

Figure 4.29 SDIF-2 interconnection for two audio channels.

Table 4.11 Pinouts for differential SDIF multichannel interface

Pin	Function
1, 2	Ch. 1 (-/+)
3, 4	Ch. 2 (-/+)
5, 6	Ch. 3 (-/+)
etc.	etc.
to	to
47, 48	Ch. 24 (-/+)
49, 50	NC

In each audio sample period, the equivalent of 32 bits of data is transmitted over each physical link, although only the first 29 bits of the word are considered valid, since the last three-bit cell periods are divided into two cells of one-and-a-half times the normal duration, violating the NRZ code in order to act as a synchronizing pattern. As shown in Figure 4.30, 20 bits of audio data are transmitted with the MSB first (although typically only 16 bits are used), followed by nine control or user bits (although the user bits are rarely employed). A block structure is created for the control/user bits which repeats once every 256 sample periods, signalled using the block sync flag in bit 29 of the first word of the block. The resulting data rate is 1.53Mb/s at 48kHz sampling rate and 1.21 Mb/s at 44.1 kHz.

Figure 4.30 At (a) is the clock content of the SDIF signal; (b) shows the synchronizing pattern used for data reception. At (c) user bits form a block that is synchronized every 256 sample periods.

The SDIF-2 interface was originally used mainly for the transfer of audio data between Sony professional digital audio products, particularly the PCM-1610 and 1630 PCM adaptors, but also from semi-professional Sony equipment which had been modified to give digital inputs and outputs (such as the PCM-701 and various DAT machines). It has also been used on a number of disk-based workstations. It is not recommended for use over long distances and it is important that the coaxial leads for each channel and the word clock are kept to the same length otherwise timing errors may arise. Problems occasionally arise with third-party implementations of this interface that do not use the 1.5-bit cell sync pattern at the end of words, requiring some trial and error involving delays of the data signal with relation to the separate word clock in order for the link to function correctly.

4.12.2 Sony Digital Interface for DSD (SDIF-3)

Sony has recently introduced a high-resolution digital audio format known as ‘Direct Stream Digital’ or DSD. This encodes audio using one-bit sigma–delta conversion at a very high sampling frequency of typically 2.8224 MHz (64 times 44.1 kHz). There are no internationally agreed interfaces for this format of data, but Sony has released some preliminary details of an interface that can be used for the purpose, known as SDIF-3. Some early DSD equipment used a data format known as ‘DSD-raw’ which was simply a stream of DSD samples in non-return-to-zero (NRZ) form, as shown in Figure 4.31(a).

Figure 4.31 Direct Stream Digital interface data is either transmitted ‘raw’ as shown at (a) or phase modulated as in the SDIF-3 format shown at (b).

In SDIF-3 data is carried over 75 ohm unbalanced coaxial cables, terminating in BNC connectors. The bit rate is twice the DSD sampling frequency (or 5.6448 Mbit/s at the sampling frequency given above) because phase modulation is used for data transmission as shown in Figure 4.31(b). A separate word clock at 44.1 kHz is used for synchronization purposes. It is also possible to encounter a DSD clock signal connection at 64 times 44.1 kHz (2.8224 MHz).

4.12.3 Sony Multichannel DSD Interface (MAC-DSD)

Sony has also developed a multichannel interface for DSD signals, capable of carrying 24 channels over a single physical link³⁶. The transmission method is based on the same technology as used for the Ethernet 100BASE-TX (100 Mbit/s) twisted-pair physical layer (PHY), but it is used in this application to create a point-to-point audio interface. Category 5 cabling is used, as for Ethernet, consisting of eight conductors. Two pairs are used for bi-directional audio data and the other two pairs for clock signals, one in each direction.

Twenty-four channels of DSD audio require a total bit rate of 67.7 Mbit/s, leaving an appreciable spare capacity for additional data. In the MAC-DSD interface this is used for error correction (parity) data, frame header and auxiliary information. Data is formed into frames that can contain Ethernet MAC headers and optional network addresses for compatibility with network systems. Audio data within the frame is formed into 352 32-bit blocks, 24 bits of each being individual channel samples, six of which are parity bits and two of which are auxiliary bits.

In a recent enhancement of this interface, Sony has introduced ‘SuperMAC’ which is capable of handling either DSD or PCM audio with very low latency (delay), typically less than 50 μs. The number of channels carried depends on the sampling frequency. Twenty-four DSD channels can be handled, or 48 PCM channels at 44.1/48 kHz, reducing proportionately as the sampling frequency increases. In conventional PCM mode the interface is transparent to AES3 data including user and channel status information.

4.12.4 Tascam Digital Interface (TDIF)

Tascam’s interfaces have become popular owing to the widespread use of the company’s DA-88 multitrack recorder and more recent derivatives. The primary TDIF-1 interface uses a 25 pin D-sub connector to carry eight channels of audio information in two directions (in and out of the device), sampling frequency and pre-emphasis information (on separate wires, two for f_s and one for emphasis) and a synchronizing signal. The interface is unbalanced and uses CMOS voltage levels. Each data connection carries two channels of audio data, odd channel and MSB first, as shown in Figure 4.32. As can be seen, the audio data can be up to 24 bits long, followed by two bits to signal the word length, one bit to signal emphasis and one for parity. There are also four user bits per channel that are not usually used. This resembles a modified form of the AES3 interface frame format. An accompanying left/right clock signal is high for the odd samples and low for the even samples of the audio data. It is difficult to find information about this interface but the output channel pairs appear to be on pins 1–4 with the left/right clock on pin 5, while the inputs are on pins 13–10 with the left/right clock on pin 9. Pins 7, 14–17 (these seem to be related to output signals) and 22–25 (related to the input signals) are grounded. The unbalanced, multi-conductor, non-coaxial nature of this interface makes it only suitable for covering short distances up to 5 metres.

Figure 4.32 Format of TDIF data and LRsync signal.

4.12.5 Alesis Digital Interface

The ADAT multichannel optical digital interface, commonly referred to as the ‘light pipe’ interface or simply ‘ADAT Optical’, is a serial, self-clocking, optical interface that carries eight channels of audio information. It is described in US Patent 5,297,181: ‘Method and apparatus for providing a digital audio interface protocol’. The interface is capable of carrying up to 24 bits of digital audio data for each channel and the eight channels of data are combined into one serial frame that is transmitted at the sampling frequency. The data is encoded in NRZI format for transmission, with forced ones inserted every five bits (except during the sync pattern) to provide clock content. This can be used to synchronize the sampling clock of a receiving device if required, although some devices require the use of a separate 9 pin ADAT sync cable for synchronization. The sampling frequency is normally limited to 48 kHz with varispeed up to 50.4 kHz and TOSLINK optical connectors are typically employed (Toshiba TOCP172 or equivalent). In order to operate at 96 kHz sampling frequency some implementations use a ‘double-speed’ mode in which two channels are used to transmit one channel’s audio data (naturally halving the number of channels handled by one serial interface). Although 5 metre lengths of optical fibre are the maximum recommended, longer distances may be covered if all the components of the interface are of good quality and clean. Experimentation is required.

As shown in Figure 4.33 the frame consists of an 11-bit sync pattern consisting of 10 zeros followed by a forced one. This is followed by four user bits (not normally used and set to zero), the first forced one, then the first audio channel sample (with forced ones every five bits), the second audio channel sample, and so on.

Figure 4.33 Basic format of ADAT data.

4.12.6 Roland R-Bus

Roland has recently introduced its own proprietary multichannel audio interface that, like TDIF (but not directly compatible with it), uses a 25-way D-type connector to carry eight channels of audio in two directions. Called R-bus it is increasingly used on Roland’s digital audio products, and convertor boxes are available to mediate between R-bus and other interface formats. Little technical information about R-bus is available publicly at the time of writing.

4.12.7 Mitsubishi Digital Interfaces

This section is included primarily for historical completeness, as Mitsubishi no longer manufactures digital audio equipment. Mitsubishi’s ProDigi format tape machines used a digital interface similar to SDIF but not compatible with it. Separate electrical interconnections were used for each audio channel. Interfaces labelled ‘Dub A’ and ‘Dub B’ were 16-channel interfaces found on multitrack machines, handling respectively tracks 1–16 and 17–32. These interfaces terminated in 50-way D-type connectors and utilized differential balanced drivers and receivers. One sample period was divided into 32-bit cells, only the first 16 of which were used for sample data (MSB first), the rest being set to zero. There was no sync pattern within the audio data (such as there is between bits 30 and 32 in the SDIF-2 format). The audio data was accompanied by a separate bit clock (1.536 MHz square wave at 48 kHz sampling rate) and a word clock signal which went low only for the first bit cell of each 32-bit audio data word (unlike SDIF which uses a sampling rate square wave), as shown in Figure 4.34. Status information was passed over two separate channels, which take the same format as an audio channel but carried information about the record status of each of the 32 channels of the ProDigi tape machine. One status channel (Rec ‘A’) handled tracks 1–16, and the other (Rec ‘B’) handled tracks 17–32 of a multitrack machine. The pin assignments for these connectors are shown in Table 4.12.

Figure 4.34 Data format of the Mitsubishi multitrack interface.

Table 4.12 (a) Pinouts for Mitsubishi ‘Dub A’ connector

Table 4.12 (b) Pinouts for Mitsubishi ‘Dub B’ connector

Mitsubishi interfaces labelled ‘Dub C’ were two-channel interfaces. These terminated in 25-way D-type connectors and utilized unbalanced drivers and receivers. One sample period was divided into 24-bit cells, only the first 16 or 20 of which are normally used, depending on the resolution of the recording in question. Again audio data is accompanied by a separate bit clock (1.152 MHz square wave at 48 kHz sampling rate) and a word clock signal taking the same form as the multitrack version. No record status information was carried over this interface but an additional ‘master clock’ was offered at 2.304 MHz. The pin assignments are shown in Table 4.13.

Table 4.13 Pinouts for Mitsubishi ‘Dub C’ connectors

Pin	Function
1, 14	Left (+/−)
2, 15	Right (+/−)
5, 18	Bit clock (+/−)
6, 19	WCLK (+/−)
7, 20	Master clock (+/−)
12, 25	GND

4.12.8 Sony to Mitsubishi Conversion

Comparing the SDIF format with the Mitsubishi multichannel format it may be appreciated that to interconnect the two would only require minor modifications to the signals. Both have a 32-bit structure, MSB first, with only 16 bits being used for audio resolution, the difference being the sync pattern in Sony’s bits 29–32, plus the fact that the Mitsubishi does not include control and user bits. The word clock of the Sony is a square wave at the sampling frequency, whereas that of Mitsubishi only goes low for one bit period, but simple logic could convert from one to the other. If transferring from Sony to Mitsubishi it would be necessary also to derive a bit clock, and this could be multiplied up from the word clock using a suitable phase-locked loop. Commercial interfaces are available which perform this task neatly.

4.12.9 Yamaha Interface

Yamaha digital audio equipment is often equipped with a ‘cascade’ connector to allow a number of devices to be operated in cascade, such that the two-channel mix outputs of a mixer, for example, may be fed into a further mixer to be combined with another mixed group of channels. The so-called Y1 format is monophonic and the Y2 format carries two channels in serial form. The two types are very similar, Y1 simply ignoring the second channel of data.

The two-channel cascade interface terminates in an eight pin DIN-type connector, as shown in Figure 4.35, and carries two channels of 24-bit audio data over an RS-422-standard differential line. The two channels of data are multiplexed over a single serial link, with a 32-bit word of left channel data followed by a 32-bit word of right channel data (the 24 bits of audio are sent LSB first, followed by eight zeros). The word clock alternates between low state for the left channel and high state for the right channel, as shown in Figure 4.36. Coils of 20 H are connected between pins 6 and 7 and ground to enable suppression of radio frequency interference. The OUT socket is only enabled when its pin 8 is connected to ground.

Figure 4.35 Pinouts of the Yamaha two-channel cascade interface.

Figure 4.36 Data format of the Yamaha ‘cascade’ interface.

References

1. AES, AES3-1985 (ANSI S4.40-1985). Serial transmission format for linearly represented digital audio data. Journal of the Audio Engineering Society, vol. 33, pp. 975–984 (1985)

2.Finger, R., AES3-1992: the revised two channel digital audio interface. Journal of the Audio Engineering Society, vol. 40, March, pp. 107–116 (1992)

3. EBU, Tech. 3250-E. Specification of the digital audio interface. Technical Centre of the European Broadcasting Union, Brussels (1985)

4. CCIR, Rec. 647. A digital audio interface for broadcasting studios. Green Book, vol. 10, pt. 1. International Radio Consultative Committee, Geneva (1986)

5. CCIR, Rec. 647 (Mod. F). Draft digital audio interface for broadcast studios. CCIR, Geneva (1990)

6. IEC, IEC 958. Digital audio interface, first edition. International Electrotechnical Commission, Geneva (1989)

7. EIAJ, CP-340. A digital audio interface. Electronic Industries Association of Japan, Tokyo (1987)

8. EIAJ, CP-1201. Digital audio interface (revised). Electronic Industries Association of Japan, Tokyo (1992)

9. British Standards Institute, BS 7239. Specification for digital audio interface. British Standards Institute, London (1989)

10. IEC, IEC 60958-1 to 4. Digital audio interface, parts 1–4. International Electrotechnical Commission, Geneva (1999)

11. AES, AES3-1992 (r1997) (ANSI S4.40-1992). Serial transmission format for linearly represented digital audio data (1992)

12. AES, AES10-1991 (ANSI S4.43-1991). Serial multichannel audio digital interface (MADI). Journal of the Audio Engineering Society, vol. 39, pp. 369–377 (1991)

13. CCITT, Rec. V.11. Electrical characteristics for balanced double-current interchange circuits for general use with integrated circuit equipment in the field of data communications. International Telegraph and Telephone Consultative Committee (1976, 1980)

14. EIA, Industrial electronics bulletin no. 12. EIA standard RS-422A. Electronics Industries Association, Engineering Dept., Washington, DC

15.Dunn, J., Considerations for interfacing digital audio equipment to the standards AES3, AES5 and AES11. In Proceedings of the AES 10th International Conference, 7–9 September, p. 122, Audio Engineering Society (1991)

16.Ajemian, R.G. and Grundy, A.B., Fiber optics – the new medium for audio: a tutorial. Journal of the Audio Engineering Society, vol. 38, March, pp. 160–175 (1990)

17. AES, AES-3-ID. AES information document for digital audio engineering – transmission of AES3 formatted data by unbalanced coaxial cable (2001)

18.SMPTE, SMPTE 276M: AES/EBU audio over coaxial cable (1995)

19.Rorden, B. and Graham, M., A proposal for integrating digital audio distribution into TV production. JSMPTE, September, pp. 606–608 (1992)

20.AES, AES-2-ID. AES information document for digital audio engineering – guidelines for the use of the AES3 interface (1996)

21.Gilchrist, N., Coordination signals in the professional digital audio interface. In Proceedings of the AES/EBU Interface Conference, 12–13 September, pp. 13–15, Audio Engineering Society British Section (1989)

22.Komly, A. and Viallevieille, A., Programme labelling in the user channel. In Proceedings of the AES/EBU Interface Conference, 12–13 September, pp. 28–51, Audio Engineering Society British Section (1989)

23. ISO 3309, Information processing systems – data communications – high level data link frame structure. International Organization for Standardization (1984)

24. AES18, Format for the user data channel of the AES digital audio interface. Journal of the Audio Engineering Society, vol. 40, no. 3, March, pp. 167–183 (1992)

25.Nunn, J.P., Ancillary data in the AES/EBU digital audio interface. In Proceedings of the 1st NAB Radio Montreux Symposium, 10–13 June, pp. 29–41 (1992)

26. AES, AES18-1996. Format for the user data channel of the AES digital audio interface (1996)

27. DAT Conference Part V, Digital audio taperecorder system (RDAT). Recommended design standard (1986)

28. EBU, EBU Technical Recommendation R72-1999: Allocation of the audio modes in the digital audio interface (EBU document Tech. 3250) (1999)

29.Rumsey, F., Spatial Audio. Focal Press (2001)

30. Sanchez, An understanding and implementation of the SCMS serial copy management system for digital audio transmission. Journal of the Audio Engineering Society, vol. 42, no. 3, March, pp. 162–186 (1994)

31. IEC, IEC 61937 Digital audio – interface for non-linear PCM encoded audio bitstreams applying IEC 60958 (2000)

32. SMPTE, SMPTE 337M: Television – format for non-PCM audio and data in AES3 serial digital audio interface (2000)

33. AES, AES42-2001: AES standard for acoustics – digital interface for microphones (2001)

34.Wilton, P., MADI (Multichannel audio digital interface). In Proceedings of the AES/EBU Interface Conference, 12–13 September, pp. 117–130, Audio Engineering Society British Section (1989)

35.Rumsey, F.J., MIDI Systems and Control, 2nd ed. Focal Press (1994)

36.Page, M. et al., Multichannel audio connection for Direct Stream Digital. Presented at AES 113th Convention, Los Angeles, 5–8 October (2002)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 4: Dedicated audio interfaces

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 4: Dedicated audio interfaces