This chapter provides further details about the main formats in which digital audio data is stored and moved between systems. This includes coverage of audio file formats, consumer disk standards, digital interfaces and networked audio interchange, concentrating mainly on those issues of importance for professional applications.

AUDIO FILE FORMATS FOR DIGITAL WORKSTATIONS

There used to be almost as many file formats for audio as there are days in the year. In the computer games field, for example, this is still true to some extent. For a long time the specific file storage strategy used for disk-based digital audio was the key to success in digital workstation design, because disk drives were relatively slow and needed clever strategies to ensure that they were capable of handling a sufficiently large number of audio channels. Manufacturers also worked very much in isolation and the size of the market was relatively small, leading to virtually every workstation or piece of software using a different file format for audio and edit list information.

There are still advantages in the use of filing structures specially designed for real-time applications such as audio and video editing, because one may obtain better performance from a disk drive in this way, but the need is not as great as it used to be. Interchange is becoming at least as important as, if not more important than, ultimate transfer speed and the majority of hard disk drives available today are capable of replaying many channels of audio in real time without needing to use a dedicated storage strategy. As the use of networked workstations grows, the need for files to be transferred between systems also grows and either by international standardization or by sheer force of market dominance certain file formats are becoming the accepted means by which data is exchanged. This is not to say that we will only be left with one or two formats, but that systems will have to be able to read and write files in the common formats if users are to be able to share work with others.

The recent growth in the importance of metadata (data about data), and the representation of audio, video and metadata as objects’, has led to the development of interchange methods that are based on object-oriented concepts and project ‘packages’ as opposed to using simple text files and separate media files. There is increasing integration between audio and other media in multimedia authoring and some of the file formats mentioned below are closely related to international efforts in multimedia file exchange.

It is not proposed to attempt to describe all of the file formats in existence, because that would be a relatively pointless exercise and would not make for interesting reading. It is nonetheless useful to have a look at some examples taken from the most commonly encountered file formats, particularly those used for high-quality audio by desktop and multimedia systems, since these are amongst the most widely used in the world and are often handled by audio workstations even if not their native format. It is not proposed to investigate the large number of specialized file formats developed principally for computer music on various platforms, nor the files used for internal sounds and games on many computers.

File formats in general

A data file is simply a series of data bytes formed into blocks and stored either contiguously or in fragmented form. Files themselves are largely independent of the operating system and filing structure of the host computer, because a file can be transferred to another platform and still exist as an identical series of data blocks. It is the filing system that is often the platform- or operating-system-dependent entity. There are sometimes features of data files that relate directly to the operating system and filing system that created them, these being fairly fundamental features, but they do not normally prevent such files being translated by other platforms.

For example, there are two approaches to byte ordering: the so-called little-endian order in which the least significant byte comes first or at the lowest memory address, and the big-endian format in which the most significant byte comes first or at the highest memory address. These originally related to the byte ordering used in data processing by the two most common microprocessor families and thereby to the two most common operating systems used in desktop audio workstations. Motorola processors, as originally used in the Apple Macintosh, deal in big-endian byte ordering, and Intel processors, as used in MS-DOS machines, deal in little-endian byte ordering. It is relatively easy to interpret files either way around but it is necessary to know that there is a need to do so if one is writing software.

Second, some Macintosh files may have two parts – a resource fork and a data fork – whereas Windows files only have one part. High-level ‘resources’ are stored in the resource fork (used in some audio files for storing information about the file, such as signal processing to be applied, display information and so forth) whilst the raw data content of the file is stored in the data fork (used in audio applications for audio sample data). The resource fork is not always there, but may be. The resource fork can get lost when transferring such files between machines or to servers, unless Mac-specific protocols are used (e.g. MacBinary or BinHex).

Some data files include a ‘header’, that is a number of bytes at the start of the file containing information about the data that follows. In audio systems this may include the sampling rate and resolution of the file. Audio replay would normally be started immediately after the header. On the other hand, some files are simply raw data, usually in cases where the format is fixed. ASCII text files are a well-known example of raw data files – they simply begin with the first character of the text. More recently file structures have been developed that are really ‘containers’ for lots of smaller files, or data objects, each with its own descriptors and data. The RIFF structure, described below is an early example of the concept of a ‘chunk-based’ file structure. Apple’s Bento container structure, used in OMFI, and the container structure of AAF are more advanced examples of such an approach.

The audio data in most common high-quality audio formats is stored in two’s complement form (see Chapter 8) and the majority of files are used for 16 or 24 bit data, thus employing either 2 or 3 bytes per audio sample. Eight bit files use 1 byte per sample.

Sound Designer formats

Sound Designer files originate with the Californian company Digidesign, manufacturer of probably the world’s most widely used digital audio hardware for desktop computers. Many systems handle Sound Designer files because they were used widely for such purposes as the distribution of sound effects on CD-ROM and for other short music sample files.

The Sound Designer I format (SD I) is for mono sounds and it is recommended principally for use in storing short sounds. It originated on the Macintosh, so numerical data is stored in big-endian byte order but it has no resource fork. The data fork contains a header of 1336 bytes, followed by the audio data bytes. The header contains information about how the sample should be displayed in Sound Designer editing software, including data describing vertical and horizontal scaling. It also contains details of ‘loop points’ for the file (these are principally for use with audio/MIDI sampling packages where portions of the sound are repeatedly cycled through while a key is held down, in order to sustain a note). The header contains information on the sample rate, sample period, number of bits per sample, quantization method (e.g. ‘linear’, expressed as an ASCII string describing the method) and size of RAM buffer to be used. The audio data is normally either 8 or 16 bit, and always MS byte followed by LS byte of each sample.

Sound Designer II has been one of the most commonly used formats for audio workstations and has greater flexibility than SD I. Again it originated as a Mac file and unlike SD I it has a separate resource fork which contains the file’s ‘vital statistics’. The data fork contains only the audio data bytes in two’s complement form, either 8 or 16 bits per sample. SD II files can contain audio samples for more than one channel, in which case the samples are interleaved, as shown in Figure 10.1, on a sample by sample basis (i.e. all the bytes for one channel sample followed by all the bytes for the next, etc.). It is unusual to find more than stereo data contained in SD II files and it is recommended that multichannel recordings are made using separate files for each channel.

FIGURE 10.1 Sound Designer II files allow samples for multiple audio channels to be interleaved. Four channel, 16 bit example shown.

AIFF and AIFF-C formats

The AIFF format is widely used as an audio interchange standard, because it conforms to the EA IFF 85 standard for interchange format files used for various other types of information such as graphical images. AIFF is an Apple standard format for audio data and is encountered widely on Macintosh-based audio workstations and some Silicon Graphics systems. Audio information can be stored at a number of resolutions and for any number of channels if required, and the related AIFF-C (file type AIFC’) format allows also for compressed audio data. It consists only of a data fork, with no resource fork, making it easier to transport to other platforms.

All IFF-type files are typically made up of ‘chunks’ of data as shown in Figure 10.2. A chunk consists of a header and a number of data bytes to follow. The simplest AIFF files contain a ‘common chunk’, which is equivalent to the header data in other audio files, and a ‘sound data’ chunk containing the audio sample data. These are contained overall by a ‘form’ chunk as shown in Figure 10.3. AIFC files must also contain a ‘version chunk’ before the common chunk to allow for future changes to AIFC.

RIFF WAVE format

The RIFF WAVE (often called WAV) format is the Microsoft equivalent of Apple’s AIFF. It has a similar structure, again conforming to the IFF pattern, but with numbers stored in little-endian rather than big-endian form. It is used widely for sound file storage and interchange on PC workstations, and for multimedia applications involving sound. Within WAVE files it is possible to include information about a number of cue points, and a playlist to indicate the order in which the cues are to be replayed. WAVE files use the file extension ‘.wav’.

FIGURE 10.2 General format of an IFF file chunk.

FIGURE 10.3 General format of an AIFF file.

FIGURE 10.4 Diagrammatic representation of a simple RIFF WAVE file, showing the three principal chunks. Additional chunks may be contained within the overall structure, for example a ‘bext’ chunk for the Broadcast WAVE file.

A basic WAV file consists of three principal chunks, as shown in Figure 10.4, the RIFF chunk, the FORMAT chunk and the DATA chunk. The RIFF chunk contains 12 bytes, the first four of which are the ASCII characters ‘RIFF’, the next four indicating the number of bytes in the remainder of the file (after the first eight) and the last four of which are the ASCII characters ‘WAVE’. The format chunk contains information about the format of the sound file, including the number of audio channels, sampling rate and bits per sample, as shown in Table 10.1.

The audio data chunk contains a sequence of bytes of audio sample data, divided as shown in the FORMAT chunk. Unusually, if there are only 8 bits per sample or fewer each value is unsigned and ranges between 0 and 255 (decimal), whereas if the resolution is higher than this the data is signed and ranges both positively and negatively around zero. Audio samples are interleaved by channel in time order, so that if the file contains two channels a sample for the left channel is followed immediately by the associated sample for the right channel. The same is true of multiple channels (one sample for time-coincident sample periods on each channel is inserted at a time, starting with the lowest numbered channel), although basic WAV files were nearly always just mono or two channel.

The RIFF WAVE format is extensible and can have additional chunks to define enhanced functionality such as surround sound and other forms of coding. This is known as ‘WAVE-format extensible’ (see http://www.microsoft.com/hwdev/tech/audio/multichaud.asp). Chunks can include data relating to cue points, labels and associated data, for example. The Broadcast WAVE format is one example of an enhanced WAVE file (see Fact File 10.1), which is used widely in professional applications for interchange purposes.

Table 10.1 Contents of FORMAT chunk in a basic WAVE PCM file

Byte ID	ID	Contents
0–3	ckID	‘fmt_’ (ASCII characters)
4–7	nChunkSize	Length of FORMAT chunk (binary, hex value: & 00000010)
8–9	wFormatTag	Audio data format (e.g. & 0001 WAVE format PCM) Other formats are allowed, for example IEEE floating point and MPEG format (& 0050 MPEG 1)
10–11	nChannels	Number of channels (e.g. & 0001 mono, & 0002 stereo)
12–15	nSamplesPerSec	Sample rate (binary, in Hz)
16–19	nAvgBytesPerSec	Bytes per second
20–21	nBlockAlign	Bytes per sample: e.g. & 0001 8 bit mono; & 0002 8 bit stereo or 16 bit mono; & 0004 16 bit stereo
22–23	nBitsPerSample	Bits per sample

MPEG audio file formats

It is possible to store MPEG-compressed audio in AIFF-C or WAVE files, with the compression type noted in the appropriate header field. There are also older MS-DOS file extensions used to denote MPEG audio files, notably. MPA (MPEG Audio) or. ABS (Audio Bit Stream). However, owing to the ubiquity of the so-called ‘MP3’ format (MPEG 1, Layer 3) for audio distribution on the Internet, MPEG audio files are increasingly denoted with the extension ‘.MP3’. Such files are relatively simple, being really no more than MPEG audio frame data in sequence, each frame being preceded by a frame header.

DSD-IFF file format

The DSD-IFF file format is based on a similar structure to other IFF-type files, described above, except that it is modified slightly to allow for the large file sizes that may be encountered with the high-resolution Direct Stream Digital format used for SuperAudio CD. Specifically the container FORM chunk is labeled ‘FRM8’ and this identifies all local chunks that follow as having ‘length’ indications that are 8 bytes long rather than the normal 4. In other words, rather than a 4 byte chunk ID followed by a 4 byte length indication, these files have a 4 byte ID followed by an 8 byte length indication. This allows for the definition of chunks with a length greater than 2 Gbytes, which may be needed for mastering SuperAudio CDs. There are also various optional chunks that can be used for exchanging more detailed information and comments such as might be used in project interchange. Further details of this file format, and an excellent guide to the use of DSD-IFF in project applications, can be found in the DSD-IFF specification, as described in the ‘Recommended further reading’ at the end of this chapter.

FACT FILE 10.1 BROADCAST WAVE FORMAT

The Broadcast WAVE format, described in EBU Tech. 3285, was standardized by the European Broadcasting Union (EBU) because of a need to ensure compatibility of sound files and accompanying information when transferred between workstations. It is based on the RIFF WAVE format described above, but contains an additional chunk that is specific to the format (the ‘broadcast_audio_extension’ chunk, ID ‘bext’) and also limits some aspects of the WAVE format. Version 0 was published in 1997 and Version 1 in 2001, the only difference being the addition of an SMPTE UMID (Unique Material Identifier) in version 1 (this is a form of metadata). Such files currently only contain either PCM or MPEG-format audio data.

Broadcast WAVE files contain at least three chunks: the broadcast_audio_extension chunk, the format chunk and the audio data chunk. The broadcast extension chunk contains the data shown in the table below. Optionally files may also contain further chunks for specialized purposes and may contain chunks relating to MPEG audio data (the ‘fact’ and ‘mpeg_audio_extension’ chunks). MPEG applications of the format are described in EBU Tech. 3285, Supplement 1 and the audio data chunk containing the MPEG data normally conforms to the MP3 frame format.

A multichannel extension chunk defines the channel ordering, surround format, downmix coefficients for creating a two-channel mix, and some descriptive information. There are also chunks defined for metadata describing the audio contained within the file, such as the ‘quality chunk’ (ckID ‘qlty’), which together with the coding history contained in the ‘bext’ chunk make up the so-called ‘capturing report’. These are described in Supplement 2 to EBU Tech. 3285. Finally there is a chunk describing the peak audio level within a file, which can aid automatic program level setting and program interchange.

BWF files can be either mono, two-channel or multichannel (sometimes called polyfiles, or BWF-P), and utilities exist for separating polyfiles into individual mono files which some applications require.

Broadcast audio extension chunk format

Data	size(bytes)	Description
ckID	4	Chunk ID =‘bext’
ckSize	4	Description of the sound clip
Description	256	Description of the sound clip
Originator	32	Name of the originator
OriginatorReference	32	Unique identifier of the originator (issued by the EBU)
OriginationDate	10	‘yyyy-mm-dd’
OriginationTime	8	‘hh-mm-ss’
TimeReferenceLow	4	Low byte of the first sample count since midnight
TimeReferenceHigh	4	High byte of the first sample count since midnight
Version	2	BWF version number, e.g. & 0001 is Version 1
UMID	64	UMID according to SMPTE 330M. If only a 32 byte UMID then the second half should be padded with zeros
Reserved	190	Reserved for extensions. Set to zero in Version 1
CodingHistory	Unrestricted	A series of ASCII strings, each terminated by CR/LF (carriage return, line feed) describing each stage of the audio coding history, according to EBU R-98

Edit decision list (EDL) files and project interchange

EDL formats were historically proprietary but the need for open interchange of project data has increased the use of standardized EDL structures and ‘packaged’ project formats to make projects transportable between systems from different manufacturers. There is an old and widely used format for EDLs in the video world that is known as the CMX-compatible form. CMX is a well-known manufacturer of video editing equipment and most editing systems will read CMX EDLs for the sake of compatibility. These can be used for basic audio purposes, and indeed a number of workstations can read CMX EDL files for the purpose of auto-conforming audio edits to video edits performed on a separate system. The CMX list defines the cut points between source material and the various transition effects at joins, and it can be translated reasonably well for the purpose of defining audio cut points and their timecode locations, provided video frame accuracy is adequate.

Project interchange can involve the transfer of edit list, mixing, effects and audio data. Many of these are proprietary, such as the Digidesign ProTools session format. Software, such as SSL Pro-Convert, can be obtained for audio and video workstations that translates EDLs or projects between a number of different systems to make interchange easier, although it is clear that this process is not always problem-free and good planning of in-house processes is vital. The OMFI (Open Media Framework Interchange) structure, originally developed by Avid, was one early attempt at an open project interchange format and contained a format for interchanging edit list data. Other options include XML-tagged formats that identify different items in the edit list in a text-based form. AES-31 is now gaining considerable popularity among workstation software manufacturers as a simple means of exchanging audio editing projects between systems, and is described in more detail below.

AES-31 format

AES-31 is an international standard designed to enable straightforward interchange of audio files and projects between systems. Audio editing packages are increasingly offering AES-31 as a simple interchange format for edit lists. In Part 1 the standard specifies a disk format that is compatible with the FAT32 file system, a widely used structure for the formatting of computer hard disks. Part 2 describes the use of the Broadcast WAVE audio file format. Part 3 describes simple project interchange, including a format for the communication of edit lists using ASCII text that can be parsed by a computer as well as read by a human. The basis of this is the edit decision markup language (EDML). It is not necessary to use all the parts of AES-31 to make a satisfactory interchange of elements. For example, one could exchange an edit list according to part 3 without using a disk based on part 1. Adherence to all the parts would mean that one could take a removable disk from one system, containing sound files and a project file, and the project would be readable directly by the receiving device.

EDML documents are limited to a 7 bit ASCII character set in which white space delimits fields within records. Standard carriage return (CR) and line-feed (LF) characters can be included to aid the readability of lists but they are ignored by software that might parse the list. An event location is described by a combination of time code value and sample count information. The time code value is represented in ASCII using conventional hours, minutes, seconds and frames (e.g. HH:MM:SS:FF) and the optional sample count is a four figure number denoting the number of samples after the start of the frame concerned at which the event actually occurs. This enables sample-accurate edit points to be specified. It is slightly more complicated than this because the ASCII delimiters between the time code fields are changed to indicate various parameters:

HH:MM delimiter Frame count and timebase indicator (see Table 10.2)

MM:SS delimiter Film frame indicator (if not applicable, use the previous delimiter)

SS:FF delimiter Video field and timecode type (see Table 10.3)

The delimiter before the sample count value is used to indicate the audio sampling frequency, including all the pull-up and pull-down options (e.g. f_s times 1/1.001). There are too many of these possibilities to list here and the interested reader is referred to the standard for further information. This is an example of a time code and (after the slash denoting 48 kHz sampling frequency) optional sample count value:

14:57:24.03/0175

Table 10.2 Frame count and timebase indicator coding in AES-31

Table 10.3 Video field and timecode type indicator in AES-31

Video Field
Counting mode	Field 1	Field 2
PAL	.	:
NTSC non-drop-frame	.	:
NTSC non-drop-frame	,	;

The Audio xDecision List (ADL) is contained between two ASCII keyword tags <ADL> and </ADL>. It in turn contains a number of sections, each contained within other keyword tags such as <VERSION>, <PROJECT>, <SYSTEM> and <SEQUENCE>. The edit points themselves are contained in the <EVENT_LIST> section. Each event begins with the ASCII keyword “(Entry)”, which serves to delimit events in the list, followed by an entry number (32 bit integer, incrementing through the list) and an entry type keyword to describe the nature of the event (e.g. “(Cut)”). Each different event type then has a number of bytes following that define the event more specifically. The following is an example of a simple cut edit, as suggested by the standard:

(Entry) 0010 (Cut) F ‘‘FILE://VOL/DIR/FILE’’ 1 1 03:00:00;00/0000 01:00:00:00/0000 01:00:10:00/0000 _

This sequence essentially describes a cut edit, entry number 0010, the source of which is the file (F) with the path shown, using channel 1 of the source file (or just a mono file), placed on track 1 of the destination timeline, starting at timecode three hours in the source file, placed to begin at one hour in the destination timeline (the ‘in point’) and to end ten seconds later (the ‘out point’). Some workstation software packages store a timecode value along with each sound file to indicate the nominal start time of the original recording (e.g. BWF files contain a timestamp in the ‘bext’ chunk), otherwise each sound file is assumed to start at time zero.

It is assumed that default crossfades will be handled by the workstation software itself. Most workstations introduce a basic short crossfade at each edit point to avoid clicks, but this can be modified by ‘event modifier’ information in the ADL. Such modifiers can be used to adjust the shape and duration of a fade in or fade out at an edit point. There is also the option to point at a rendered crossfade file for the edit point, as described in Chapter 9.

MXF – the media exchange format

MXF was developed by the Pro-MPEG forum as a means of exchanging audio, video and metadata between devices, primarily in television operations. It is based on the modern concept of media objects that are split into ‘essence’ and ‘metadata’. Essence files are the raw material (i.e. audio and video) and the metadata describes things about the essence (such as where to put it, where it came from and how to process it).

MXF files attempt to present the material in a ‘streaming’ format, that is one that can be played out in real time, but they can also be exchanged in conventional file transfer operations. As such they are normally considered to be finished program material, rather than material that is to be processed somewhere downstream, designed for playout in broadcasting environments. The bit stream is also said to be compatible with recording on digital videotape devices.

AAF – the advanced authoring format

AAF is an authoring format for multimedia data that is supported by numerous vendors, including Avid which has adopted it as a migration path from OMFI. Parts of OMFI 2.0 form the basis for parts of AAF and there are also close similarities between AAF and MXF (described in the previous section). Like the formats to which it has similarities, AAF is an object-oriented format that combines essence and metadata within a container structure. Unlike MXF it is designed for project interchange such that elements within the project can be modified, post-processed and resynchronized. It is not, therefore, directly suitable as a streaming format but can easily be converted to MXF for streaming if necessary.

Rather like OMFI it is designed to enable complex relationships to be described between content elements, to map these elements onto a timeline, to describe the processing of effects, synchronize streams of essence, retain historical metadata and refer to external essence (essence not contained within the AAF package itself). It has three essential parts: the AAF Object Specification (which defines a container for essence and metadata, the logical contents of objects and rules for relationships between them); the AAF Low-Level Container Specification (which defines a disk filing structure for the data, based on Microsoft’s Structured Storage); and the AAF SDK Reference Implementation (which is a software development kit that enables applications to deal with AAF files). The Object Specification is extensible in that it allows new object classes to be defined for future development purposes.

The basic object hierarchy is illustrated in Figure 10.5, using an example of a typical audio post-production scenario. ‘Packages’ of metadata are defined that describe either compositions, essence or physical media. Some package types are very ‘close’ to the source material (they are at a lower level in the object hierarchy, so to speak) – for example, a ‘file source package might describe a particular sound file stored on disk. The metadata package, however, would not be the file itself, but it would describe its name and where to find it. Higher-level packages would refer to these lower-level packages in order to put together a complex program. A composition package is one that effectively describes how to assemble source clips to make up a finished program. Some composition packages describe effects that require a number of elements of essence to be combined or processed in some way.

FIGURE 10.5 Graphical conceptualization of some metadata package relationships in AAF – a simple audio post production example.

Packages can have a number of ‘slots’. These are a bit like tracks in more conventional terminology, each slot describing only one kind of essence (e.g. audio, video, graphics). Slots can be static (not time-dependent), timeline (running against a timing reference) or event-based (one-shot, triggered events). Slots have segments that can be source clips, sequences, effects or fillers. A source clip segment can refer to a particular part of a slot in a separate essence package (so it could refer to a short portion of a sound file that is described in an essence package, for example).

Disk pre-mastering formats

The original tape format for submitting CD masters to pressing plants was Sony’s audio-dedicated PCM 1610/1630 format on U-matic video tape. This is now ‘old technology’ has been replaced by alternatives based on more recent data storage media and file storage protocols. These include the PMCD (pre-master CD), CD-R, Exabyte and DLT tape formats. DVD mastering also requires high-capacity media for transferring the many gigabytes of information to mastering houses so that glass masters can be created.

The Disk Description Protocol (DDP) developed by Doug Carson and Associates is now widely used for describing disk masters. Version 1 of the DDP layed down the basic data structure but said little about higher-level issues involved in interchange, making it more than a little complicated for manufacturers to ensure that DDP masters from one system would be readable on another. Version 2 addressed some of these issues.

DDP is a protocol for describing the contents of a disk, which is not medium specific. That said it is common to interchange CD masters with DDP data on 8 mm Exabyte data cartridges and DVD masters are typically transferred on DLT Type III or IV compact tapes or on DVD-R(A) format disks with CMF (cutting master format) DDP headers. DDP files can be supplied separately to the audio data if necessary. DDP can be used for interchanging the data for a number of different disk formats, such as CDROM, CD-DA, CD-I and CD-ROM-XA, DVD-Video and -Audio, and the protocol is really extremely simple. It consists of a number of ‘streams’ of data, each of which carries different information to describe the contents of the disk. These streams may be either a series of packets of data transferred over a network, files on a disk or tape, or raw blocks of data independent of any filing system. The DDP protocol simply maps its data into whatever block or packet size is used by the medium concerned, provided that the block or packet size is at least 128 bytes. Either a standard computer filing structure can be used, in which case each stream is contained within a named file, or the storage medium is used ‘raw’ with each stream starting at a designated sector or block address.

The ANSI tape labeling specification is used to label the tapes used for DDP transfers. This allows the names and locations of the various streams to be identified. The principal streams included in a DDP transfer for CD mastering are as follows:

1. DDP ID stream or ‘DDPID’ file. 128 bytes long, describing the type and level of DDP information, various ‘vital statistics’ about the other DDP files and their location on the medium (in the case of physically addressed media), and a user text field (not transferred to the CD).

2. DDP Map stream or ‘DDPMS’ file. This is a stream of 128 byte data packets which together give a map of the CD contents, showing what types of CD data are to be recorded in each part of the CD, how long the streams are, what types of subcode are included, and so forth. Pointers are included to the relevant text, subcode and main streams (or files) for each part of the CD.

3. Text stream. An optional stream containing text to describe the titling information for volumes, tracks or index points (not currently stored in CD formats), or for other text comments. If stored as a file, its name is indicated in the appropriate map packet.

4. Subcode stream. Optionally contains information about the subcode data to be included within a part of the disk, particularly for CD-DA. If stored as a file, its name is indicated in the appropriate map packet.

5. Main stream. Contains the main data to be stored on a part of the CD, treated simply as a stream of bytes, irrespective of the block or packet size used. More than one of these files can be used in cases of mixed-mode disks, but there is normally only one in the case of a conventional audio CD. If stored as a file, its name is indicated in the appropriate map packet.

CONSUMER OPTICAL DISK FORMATS

Compact discs and drives

The CD is not immediately suitable for real-time audio editing and production, partly because of its relatively slow access time compared with hard disks, but can be seen to have considerable value for the storage and transfer of sound material that does not require real-time editing. Broadcasters use them for sound effects libraries, and studios and mastering facilities use them for providing customers and record companies with ‘acetates’ or test pressings of a new recording. They have also become quite popular as a means of transferring finished masters to a CD pressing plant in the form of the PMCD (pre-master CD). They are ideal as a means of ‘proofing’ CDROMs and other CD formats, and can be used as low-cost backup storage for computer data.

Compact Discs (CDs) are familiar to most people as a consumer read-only optical disc for audio (CD-DA) or data (CD-ROM) storage. Standard audio CDs (CD-DA) conform to the Red Book standard published by Philips. The CD-ROM standard (Yellow Book) divides the CD into a structure with 2048 byte sectors, adds an extra layer of error protection, and makes it useful for general purpose data storage including the distribution of sound and video in the form of computer data files. It is possible to find discs with mixed modes, containing sections in CD-ROM format and sections in CD-Audio format. The CD Plus is one such example.

CD-R is the recordable CD, and may be used for recording CD-Audio format or other CD formats using a suitable drive and software. The Orange Book, Part 2, contains information on the additional features of CD-R, such as the area in the center of the disc where data specific to CD-R recordings is stored. Audio CDs recorded to the Orange Book standard can be ‘fixed’ to give them a standard Red Book table of contents (TOC), allowing them to be replayed on any conventional CD player. Once fixed into this form, the CD-R may not subsequently be added to or changed, but prior to this there is a certain amount of flexibility, as discussed below. CD-RW discs are erasable and work on phase-change principles, requiring a drive compatible with this technology, being described in the Orange Book, Part 3.

The degree of reflectivity of CD-RW discs is much lower than that of typical CD-R and CD-ROM. This means that some early drives and players may have difficulties reading them. However, the ‘multi-read’ specification developed by the OSTA (Optical Storage Technology Association) describes a drive that should read all types of CD, so recent drives should have no difficulties here.

DVD

DVD was the natural successor to CD, being a higher density optical disc format aimed at the consumer market, having the same diameter as CD and many similar physical features. It uses a different laser wavelength to CD (635 – 650 nm as opposed to 780 nm) so multi-standard drives need to be able to accommodate both. Data storage capacity depends on the number of sides and layers to the disc, but ranges from 4.7 Gbytes (single-layer, single-sided) up to about 18 Gbytes (double-layer, double-sided). The data transfer rate at ‘one times’ speed is just over 11 Mbit/s.

DVD can be used as a general purpose data storage medium. Like CD, there are numerous different variants on the recordable DVD, partly owing to competition between the numerous different ‘factions’ in the DVD consortium. These include DVD-R, DVD-RAM, DVD-RW and DVD + RW, all of which are based on similar principles but have slightly different features, leading to a compatibility minefield (see Fact File 10.2). The ‘DVD Multi’ guidelines produced by the DVD Forum are an attempt to foster greater compatibility between DVD drives and discs, and many drives are now available that will read and write most of the DVD formats.

FACT FILE 10.2 RECORDABLE DVD FORMATS

Recordable DVD type	Description
DVD-R (A and G)	DVD equivalent of CD-R. One-time recordable in sequential manner, replayable on virtually any DVD-ROM drive. Supports ‘incremental writing’ or ‘disc at once’ recording. Capacity either 3.95 (early discs) or 4.7 Gbyte per side. ‘Authoring’ (A) version (recording laser wavelength 635 nm) can be used for pre-mastering DVDs for pressing, including DDP data for disc mastering (see Chapter 6). ‘General’ (G) version (recording laser wavelength 650 nm) intended for consumer use, having various ‘content protection’ features that prevent encrypted commercial releases from being cloned
DVD-RAM	Sectored format, rather more like a hard disk in data structure when compared with DVD-R. Uses phase-change (PD-type) principles allowing direct overwrite. Version 2 discs allow 4.7 Gbyte per side (reduced to about 4.2 Gbyte after formatting). Type 1 cartridges are sealed and Type 2 allow the disc to be removed. Double-sided discs only come in sealed cartridges. Can be rewritten about 100 000 times. The recent Type 3 is a bare disc that can be placed in an open cartridge for recording
DVD-RW	Pioneer development, similar to CD-RW in structure, involving sequential writing. Does not involve a cartridge. Can be rewritten about 1000 times. 4.7 Gbyte per side
DVD+RW	Non-DVD-Forum alternative to DVD-RAM (and not compatible), allowing direct overwrite. No cartridge. Data can be written in either CLV (for video recording) or CAV (for random access storage) modes. There is also a write-once version known as DVD+R

Writeable DVDs are a useful option for backup of large projects, particularly DVD-RAM because of its many-times overwriting capacity and its hard disk-like behavior. It is possible that a format like DVD-RAM could be used as primary storage in a multitrack recording/editing system, as it has sufficient performance for a limited number of channels and it has the great advantage of being removable. Indeed one company has used it in a multichannel location recorder. However, it is likely that hard disks will retain the performance edge for the foreseeable future.

DVD-Video is the format originally defined for consumer distribution of movies with surround sound, typically incorporating MPEG-2 video encoding and Dolby Digital surround sound encoding. It also allows for up to eight channels of 48 or 96 kHz linear PCM audio, at up to 24 bit resolution. DVD-Audio was intended for very high-quality multichannel audio reproduction and allowed for linear PCM sampling rates up to 192 kHz, with numerous configurations of audio channels for different surround modes, and optional lossless data reduction (MLP). However, it has not been widely adopted in the commercial music industry.

DVD-Audio had a number of options for choosing the sampling frequencies and resolutions of different channel groups, it being possible to use a different resolution on the front channels from that used on the rear, for example. The format was more versatile in respect of sampling frequency than DVD-Video, having also accommodated multiples of the CD sample frequency of 44.1 kHz as options (the DVD-Video format allows only for multiples of 48 kHz). Bit resolution could be 16, 20 or 24 bits per channel, and again this could be divided unequally between the channels, according to the channel group split described below.

Meridian Lossless Packing (MLP) was licensed through Dolby Laboratories for DVD-A and is a lossless coding technique designed to reduce the data rate of audio signals without compromising sound quality. It has both a variable bit rate mode and a fixed bit rate mode. The variable mode delivers the optimum compression for storing audio in computer data files, but the fixed mode is important for DVD applications where one must be able to guarantee a certain reduction in peak bit rate.

Super Audio CD (SACD)

Version 1.0 of the SACD specification is described in the ‘Scarlet Book’, available from Philips licensing department. SACD uses DSD (Direct Stream Digital) as a means of representing audio signals, as described in Chapter 8, so requires audio to be sourced in or converted to this form. SACD aims to provide a playing time of at least 74 minutes for both two channel and six channel balances. The disc is divided into two regions, one for two-channel audio, the other for multichannel, as shown in Figure 10.6. A lossless data packing method known as Direct Stream Transfer (DST) can be used to achieve roughly 2:1 data reduction of the signal stored on disc so as to enable high-quality multichannel audio on the same disc as the two channel mix. SACD has only achieved a relatively modest market penetration compared with formats such as CD and DVD-Video, but is used by some specialized high-quality record labels. A considerable number of recordings have nonetheless been issued in the SACD format.

FIGURE 10.6 Different regions of a Super Audio CD, showing separate two-channel and multichannel regions.

SACDs can be manufactured as single- or dual-layer discs, with the option of the second layer being a Red Book CD layer (the so-called ‘hybrid disc’ that will also play on a normal CD player). SACDs, not being a formal part of the DVD hierarchy of standards (although using some of the optical disc technology), do not have the same options for DVD-Video objects as DVD-Audio. The disc is designed first and foremost as a super-high-quality audio medium. Nonetheless there is provision for additional data in a separate area of the disc. The content and capacity of this is not specified but could be video clips, text or graphics, for example.

Blu-Ray disk

The Blu-Ray disk is a higher density optical disk format than DVD, which uses a shorter wavelength blue-violet laser (wavelength 405 nm) to achieve a high packing density of data on the disk surface. Single-layer disks offer 25 Gbytes of storage and dual-layer disks offer 50 Gbytes, and the basic transfer rate is also higher than DVD at around 36 Mbit/s although a higher rate of 54 Mbit/s is required for HD movie replay, which is achieved by using at least 1.5 times playback speed. Like DVD, a range of read-only, writeable and rewriteable formats is possible. There is an audio-only version of the player specification, known as BD-Audio, which does not have to be able to decode video, making possible a high-resolution surround playback format that might offer an alternative to DVD-Audio or SACD. Audio-only transfer rates of the disk vary depending on the format concerned.

As far as audio formats are concerned, Linear PCM, Dolby Digital and DTS Digital Surround are mandatory in Blu-Ray players and recorders, but it is up to individual studios to decide what formats to include on their disk releases. Alternative optional audio formats include higher-resolution versions of Dolby and DTS formats, known as Dolby Digital Plus and DTS-HD respectively, as well as losslessly encoded versions known as Dolby TrueHD and DTS-HD Master Audio. High sampling frequencies (up to 192 kHz) are possible on Blu-Ray, as are audio sample resolutions of 16, 20 or 24 bits. The standard limits audio reproduction to six channels of 192 kHz, 24 bit uncompressed digital audio, which gives rise to a data transfer rate of 27.7 Mbit/s.

INTERCONNECTING DIGITAL AUDIO DEVICES

Introduction

In the case of analog interconnection between devices, replayed digital audio is converted to the analog domain by the replay machine’s D/A convertors, routed to the recording machine via a conventional audio cable and then reconverted to the digital domain by the recording machine’s A/D convertors. The audio is subject to any gain changes that might be introduced by level differences between output and input, or by the record gain control of the recorder and the replay gain control of the player. Analog domain copying is necessary if any analog processing of the signal is to happen in between one device and another, such as gain correction, equalization, or the addition of effects such as reverberation. Most of these operations, though, are now possible in the digital domain.

An analog domain copy cannot be said to be a perfect copy or a clone of the original master, because the data values will not be exactly the same (owing to slight differences in recording level, differences between convertors, the addition of noise, and so on). For a clone it is necessary to make a true digital copy. This can either involve a file copying process, perhaps over a network using a workstation, or a digital interface or network infrastructure may be used for the streamed interconnection of recording systems and other audio devices such as mixers and effects units.

Professional digital audio systems, and some consumer systems, have digital interfaces conforming to one of the standard protocols and allow for a number of channels of digital audio data to be transferred between devices with no loss of sound quality. Any number of generations of digital copies may be made without affecting the sound quality of the latest generation, provided that errors have been fully corrected. (This assumes that the audio is in a linear PCM format and has not been subject to low bit rate decoding and re-encoding.) The digital outputs of a recording device are taken from a point in the signal chain after error correction, which results in the copy being error corrected. The copy does not suffer from any errors that existed in the master, provided that those errors were correctable. This process takes place in real time, requiring the operator to put the receiving device into record mode such that it simply stores the incoming stream of audio data. Any accompanying metadata may or may not be recorded (often most of it is not).

Making a copy of a recording or transferring audio data between devices using any of the digital interface standards involves the connection of appropriate cables between player and recorder, and the switching of the receiver’s input to ‘digital’ as opposed to analog’, since this sets it to accept a signal from the digital input as opposed to the A/D convertor. It is necessary for both machines to be operating at the same sampling frequency (unless a sampling frequency convertor is used) and may require the recorder to be switched to ‘external sync’ mode, so that it can lock its sampling frequency to that of the player. (Some devices such as effects may lock themselves to an incoming digital signal as a matter of course.) Alternatively (and preferably) a common reference (e.g. word clock) signal may be used to synchronize all devices that are to be interconnected digitally. If one of these methods of ensuring a common sampling frequency is not used then either audio will not be decoded at all by the receiver, or regular clicks will be audible at a rate corresponding to the difference between the two sampling frequencies (at which point samples are either skipped or repeated owing to the ‘sample slippage’ that is occurring between the two machines). A receiver should be capable of at least the same quantizing resolution (number of bits per sample) as the source device, otherwise audio resolution will be lost. If there is a difference in resolution between the systems it is advisable to use a processor in between the machines that optimally dithers the signal for the new resolution, or alternatively to use redithering options on the source machine to prepare the signal for its new resolution (see Chapter 8).

Increasingly generic computer data interconnects are used to transfer audio as explained in Fact File 10.3.

FACT FILE 10.3 COMPUTER NETWORKS VS DIGITAL AUDIO INTERFACES

Dedicated ‘streaming’ interfaces, as employed in broadcasting, production and post-production environments, are the digital audio equivalent of analog signal cables, down which signals for one or more channels are carried in real time from one point to another, possibly with some auxiliary information (metadata) attached. An example is the AES-3 interface, described in the main text. Such an audio interface uses a data format dedicated to audio purposes, whereas a computer data network may carry numerous types of information.

Dedicated interfaces are normally unidirectional, point-to-point connections, and should be distinguished from computer data interconnects and networks that are often bidirectional and carry data in a packet format for numerous sources and destinations. With dedicated interfaces sources may be connected to destinations using a routing matrix or by patching individual connections, very much as with analog signals. Audio data are transmitted in an unbroken stream, there is no handshaking process involved in the data transfer, and erroneous data are not retransmitted because there is no mechanism for requesting its retransmission. The data rate of a dedicated audio interface is usually directly related to the audio sampling frequency, word length and number of channels of the audio data to be transmitted, ensuring that the interface is always capable of serving the specified number of channels. If a channel is unused for some reason its capacity is not normally available for assigning to other purposes (such as higher-speed transfer of another channel, for example).

There is an increasing trend towards employing standard computer interconnects and networks to transfer audio information, as opposed to using dedicated audio interfaces. Such computer networks are typically used for a variety of purposes in general data communications and they may need to be adapted for audio applications that require sample-accurate real-time transfer. The increasing ubiquity of computer systems in audio environments makes it inevitable that generic data communication technology will gradually take the place of dedicated interfaces. It also makes sense economically to take advantage of the ‘mass market’ features of the computer industry.

Computer networks are typically general purpose data carriers that may have asynchronous features and may not always have the inherent quality-of-service (QoS) features that are required for ‘streaming’ applications. They also normally use an addressing structure that enables packets of data to be carried from one of a number of sources to one of a number of destinations and such packets will share the connection in a more or less controlled way. Data transport protocols such as TCP/IP are often used as a universal means of managing the transfer of data from place to place, adding overheads in terms of data rate, delay and error handling that may work against the efficient transfer of audio. Such networks may be designed primarily for file transfer applications where the time taken to transfer the file is not a crucial factor – ‘as fast as possible’ will do. This has required some special techniques to be developed for carrying real-time data such as audio information.

Desktop computers and consumer equipment are also increasingly equipped with general purpose serial data interfaces such as USB (Universal Serial Bus) and Firewire (IEEE 1394). These are examples of personal area network (PAN) technology, allowing a number of devices to be interconnected within a limited range around the user. These have a high enough data rate to carry a number of channels of audio data over relatively short distances, either over copper or optical fiber. Audio protocols also exist for these as discussed in the main text.

Dedicated audio interface formats

There are a number of types of digital interface, some of which are international standards and others of which are manufacturer specific. They all carry digital audio for one or more channels with at least 16 bit resolution and will operate at the standard sampling rates of 44.1 and 48 kHz, as well as at 32 kHz if necessary, some having a degree of latitude for varispeed. Some interface standards have been adapted to handle higher sampling frequencies such as 88.2 and 96 kHz. The interfaces vary as to how many physical interconnections are required. Some require one link per channel plus a synchronization signal, whilst others carry all the audio information plus synchronization information over one cable.

The most common interfaces are described below in outline. It is common for subtle incompatibilities to arise between devices, even when interconnected with a standard interface, owing to the different ways in which non-audio information is implemented. This can result in anything from minor operational problems to total non-communication and the causes and remedies are unfortunately far too detailed to go into here. The reader is referred to The Digital Interface Handbook by Rumsey and Watkinson, as well as to the standards themselves, if a greater understanding of the intricacies of digital audio interfaces is required.

The AES/EBU interface (AES-3)

The AES-3 interface, described almost identically in AES-3-1992, IEC 60958 and EBU Tech. 3250E among others, allows for two channels of digital audio (A and B) to be transferred serially over one balanced interface, using drivers and receivers similar to those used in the RS422 data transmission standard, with an output voltage of between 2 and 7 volts as shown in Figure 10.7. The interface allows two channels of audio to be transferred over distances up to 100 m, but longer distances may be covered using combinations of appropriate cabling, equalization and termination. Standard XLR-3 connectors are used, often labeled DI (for digital in) and DO (for digital out).

FIGURE 10.7 Recommended electrical circuit for use with the standard two-channel interface.

Each audio sample is contained within a ‘subframe’ (see Figure 10.8), and each subframe begins with one of three synchronizing patterns to identify the sample as either the A or B channel, or to mark the start of a new channel status block (see Figure 10.9). These synchronizing patterns violate the rules of bi-phase mark coding (see below) and are easily identified by a decoder. One frame (containing two audio samples) is normally transmitted in the time period of one audio sample, so the data rate varies with the sampling frequency. (Note, though, that the recently introduced ‘single-channel-double-sampling-frequency’ mode of the interface allows two samples for one channel to be transmitted within a single frame in order to allow the transport of audio at 88.2 or 96 kHz sampling frequency.)

FIGURE 10.8 Format of the standard two-channel interface frame.

FIGURE 10.9 Three different preambles (X, Y and Z) are used to synchronize a receiver at the starts of subframes.

FIGURE 10.10 Overview of the professional channel status block.

Additional data is carried within the subframe in the form of 4 bits of auxiliary data (which may either be used for additional audio resolution or for other purposes such as low-quality speech), a validity bit (V), a user bit (U), a channel status bit (C) and a parity bit (P), making 32 bits per subframe and 64 bits per frame. Channel status bits are aggregated at the receiver to form a 24 byte word every 192 frames, and each bit of this word has a specific function relating to interface operation, an overview of which is shown in Figure 10.10. Examples of bit usage in this word are the signaling of sampling frequency and pre-emphasis, as well as the carrying of a sample address ‘timecode’ and labeling of source and destination. Bit 1 of the first byte signifies whether the interface is operating according to the professional (set to 1) or consumer (set to 0) specification.

Bi-phase mark coding, the same channel code as used for SMPTE/EBU timecode, is used in order to ensure that the data is self-clocking, of limited bandwidth, DC free, and polarity independent, as shown in Figure 10.11. The interface has to accommodate a wide range of cable types and a nominal 110 ohm characteristic impedance is recommended. Originally (AES-3-1985) up to four receivers with a nominal input impedance of 250 ohms could be connected across a single professional interface cable, but a later modification to the standard recommended the use of a single receiver per transmitter, having a nominal input impedance of 110 ohms.

FIGURE 10.11 An example of the bi-phase mark channel code.

Standard consumer interface (IEC 60958-3)

The most common consumer interface (historically related to SPDIF – the Sony/Philips digital interface) is very similar to the AES-3 interface, but uses unbalanced electrical interconnection over a coaxial cable having a characteristic impedance of 75 ohms, as shown in Figure 10.12. It can be found on many items of semi-professional or consumer digital audio equipment, such as CD players, DVD players and DAT machines, and is also widely used on computer sound cards because of the small physical size of the connectors. It usually terminates in an RCA phono connector, although some equipment makes use of optical fiber interconnects (TOS-link) carrying the same data. Format convertors are available for converting consumer format signals to the professional format, and vice versa, and for converting between electrical and optical formats. Both the professional (AES-3 equivalent) and consumer interfaces are capable of carrying data-reduced stereo and surround audio signals such as MPEG and Dolby Digital as described in Fact File 10.4.

FIGURE 10.12 The consumer electrical interface. (transformer and capacitor are optional but may improve the electrical characteristics of the interface.)

The data format of subframes is the same as that used in the professional interface, but the channel status implementation is almost completely different, as shown in Figure 10.13. The second byte of channel status in the consumer interface has been set aside for the indication of ‘category codes’, these being set to define the type of consumer usage. Examples of defined categories are (00000000) for the General category, (10000000) for Compact Disc and (11000000) for a DAT machine. Once the category has been defined, the receiver is expected to interpret certain bits of the channel status word in a particular way, depending on the category. For example, in CD usage, the four control bits from the CD’s ‘Q’ channel subcode are inserted into the first four control bits of the channel status block (bits 1 – 4). Copy protection can be implemented in consumer-interfaced equipment, according to the Serial Copy Management System (SCMS).

FIGURE 10.13 Overview of the consumer channel status block.

FACT FILE 10.4 CARRYING DATA-REDUCED AUDIO

The increased use of data-reduced multichannel audio has resulted in methods by which such data can be carried over standard two-channel interfaces, for either professional or consumer purposes. This makes use of the ‘non-audio’ or ‘other uses’ mode of the interface, indicated in the second bit of channel status, which tells conventional PCM audio decoders that the information is some other form of data that should not be converted directly to analog audio. Because data-reduced audio has a much lower rate than the PCM audio from which it was derived, a number of audio channels can be carried in a data stream that occupies no more space than two channels of conventional PCM. These applications of the interface are described in SMPTE 337M (concerned with professional applications) and IEC 61937, although the two are not identical. SMPTE 338M and 339M specify data types to be used with this standard. The SMPTE standard packs the compressed audio data into 16, 20 or 24 bits of the audio part of the AES-3 subframe and can use the two subframes independently (e.g. one for PCM audio and the other for data-reduced audio), whereas the IEC standard only uses 16 bits and treats both subframes the same way.

Consumer use of this mode is evident on DVD players, for example, for connecting them to home cinema decoders. Here the Dolby Digital or DTS-encoded surround sound is not decoded in the player but in the attached receiver/decoder. IEC 61937 has parts, either pending or published, dealing with a range of different codecs including ATRAC, Dolby AC-3, DTS and MPEG (various flavors). An ordinary PCM convertor trying to decode such a signal would simply reproduce it as a loud, rather unpleasant noise, which is not advised and does not normally happen if the second bit of channel status is correctly observed. Professional applications of the mode vary, but are likely to be increasingly encountered in conjunction with Dolby E data reduction – a relatively recent development involving mild data reduction for professional multichannel applications in which users wish to continue making use of existing AES-3-compatible equipment (e.g. VTRs, switchers and routers). Dolby E enables 5.1-channel surround audio to be carried over conventional two-channel interfaces and through AES-3-transparent equipment at a typical rate of about 1.92 Mbit/s (depending on how many bits of the audio subframe are employed). It is designed so that it can be switched or edited at video frame boundaries without disturbing the audio.

The user bits of the consumer interface are often used to carry information derived from the subcode of recordings, such as track identification and cue point data. This can be used when copying CDs and DAT tapes, for example, to ensure that track start ID markers are copied along with the audio data. This information is not normally carried over AES/EBU interfaces.

Proprietary digital interfaces

Tascam’s interfaces became popular owing to the widespread use of the company’s DA-88 multitrack recorder and derivatives. The primary TDIF-1 interface uses a 25-pin D-sub connector to carry eight channels of audio information in two directions (in and out of the device), sampling frequency and pre-emphasis information (on separate wires, two for f_s and one for emphasis) and a synchronizing signal. The interface is unbalanced and uses CMOS voltage levels. Each data connection carries two channels of audio data, odd channel and MSB first, as shown in Figure 10.14. As can be seen, the audio data can be up to 24 bits long, followed by 2 bits to signal the word length, 1 bit to signal emphasis and 1 bit for parity. There are also 4 user bits per channel that are not usually used.

FIGURE 10.14 Basic Format of TDIF data and LRsync signal.

The Alesis ADAT multichannel optical digital interface, commonly referred to as the ‘light pipe’ interface or simply ‘ADAT Optical’, is a serial, self-clocking, optical interface that carries eight channels of audio information. It is described in US Patent 5,297,181: ‘Method and apparatus for providing a digital audio interface protocol’. The interface is capable of carrying up to 24 bits of digital audio data for each channel and the eight channels of data are combined into one serial frame that is transmitted at the sampling frequency. The data is encoded in NRZI format for transmission, with forced ones inserted every 5 bits (except during the sync pattern) to provide clock content. This can be used to synchronize the sampling clock of a receiving device if required, although some devices require the use of a separate 9-pin ADAT sync cable for synchronization. The sampling frequency is normally limited to 48 kHz with varispeed up to 50.4 kHz and TOSLINK optical connectors are typically employed (Toshiba TOCP172 or equivalent). In order to operate at 96 kHz sampling frequency some implementations use a ‘double-speed’ mode in which two channels are used to transmit one channel’s audio data (naturally halving the number of channels handled by one serial interface). Although 5 m lengths of optical fiber are the maximum recommended, longer distances may be covered if all the components of the interface are of good quality and clean. Experimentation is required.

As shown in Figure 10.15 the frame consists of an 11 bit sync pattern consisting of 10 zeros followed by a forced one. This is followed by 4 user bits (not normally used and set to zero), the first forced one, then the first audio channel sample (with forced ones every 5 bits), the second audio channel sample, and so on.

FIGURE 10.15 Basic format of ADAT data.

FIGURE 10.16 Direct Stream Digital interface data is either transmitted ‘raw’, as shown at (a), or phase modulated, as in the SDIF-3 format shown at (b).

SDIF is the original Sony interface for digital audio, most commonly encountered in SDIF-2 format on BNC connectors, along with a word clock signal. However, this is not often used these days. SDIF-3 is Sony’s interface for high-resolution DSD data (see Chapter 8), although some early DSD equipment used a data format known as ‘DSD-raw’, which was simply a stream of DSD samples in non-return-to-zero (NRZ) form, as shown in Figure 10.16(a). (The latter is essentially the same as SDIF-2.) In SDIF-3 data is carried over 75 ohm unbalanced coaxial cables, terminating in BNC connectors. The bit rate is twice the DSD sampling frequency (or 5.6448 Mbit/s at the sampling frequency given above) because phase modulation is used for data transmission as shown in Figure 10.16(b). A separate word clock at 44.1 kHz is used for synchronization purposes. It is also possible to encounter a DSD clock signal connection at the 64 times 44.1 kHz (2.8224 MHz).

Sony also developed a multichannel interface for DSD signals, capable of carrying 24 channels over a single physical link. The transmission method is based on the same technology as used for the Ethernet 100BASETX (100 Mbit/s) twisted-pair physical layer (PHY), but it is used in this application to create a point-to-point audio interface. Category 5 cabling is used, as for Ethernet, consisting of eight conductors. Two pairs are used for bi-directional audio data and the other two pairs for clock signals, one in each direction.

Twenty-four channels of DSD audio require a total bit rate of 67.7 Mbit/s, leaving an appreciable spare capacity for additional data. In the MAC-DSD interface this is used for error correction (parity) data, frame header and auxiliary information. Data is formed into frames that can contain Ethernet MAC headers and optional network addresses for compatibility with network systems. Audio data within the frame is formed into 352 32 bit blocks, 24 bits of each being individual channel samples, six of which are parity bits and two of which are auxiliary bits.

More recently Sony introduced SuperMAC’ which is capable of handling either DSD or PCM audio with very low latency (delay), typically less than 50 μs, over Cat-5 Ethernet cables using the 100BASE-TX physical layer. The number of channels carried depends on the sampling frequency. Twenty-four bidirectional DSD channels can be handled, or 48 PCM channels at 44.1/48 kHz, reducing proportionately as the sampling frequency increases. In conventional PCM mode the interface is transparent to AES-3 data including user and channel status information. Up to 5 Mbit/s of Ethernet control information can be carried in addition. A means of interchange based on this was standardized by the AES as AES-50. ‘HyperMAC’ runs even faster, carrying up to 384 audio channels on gigabit Ethernet Cat-6 cable or optical fiber, together with 100 Mbit/s Ethernet control data. Recently Sony sold this networking technology to Klark Teknik.

The advantage of these interfaces is that audio data thus formatted can be carried over the physical drivers and cables common to Ethernet networks, carrying a lot of audio at high speed. In a sense these interfaces bridge the conceptual gap between dedicated audio interfaces and generic computer networks, as they use some of the hardware and the physical layer of a computer network to transfer audio in a convenient form. They do not, however, employ all the higher layers of computer network protocols as mentioned in the next section. This means that the networking protocol overhead is relatively low, minimal buffering is required and latency can be kept to a minimum. Dedicated routing equipment is, however, required. One of the main applications so far has been in a Midas router for live performance mixing.

Data networks and computer interconnects

A network carries data either on wire or optical fiber, and is normally shared between a number of devices and users. The sharing is achieved by containing the data in packets of a limited number of bytes (usually between 64 and 1518), each with an address attached. The packets usually share a common physical link, normally a high-speed serial bus of some kind, being multiplexed in time either using a regular slot structure synchronized to a system clock (isochronous transfer) or in an asynchronous fashion whereby the time interval between packets may be varied or transmission may not be regular, as shown in Figure 10.17. The length of packets may not be constant, depending on the requirements of different protocols sharing the same network. Packets for a particular file transfer between two devices may not be contiguous and may be transferred eratically, depending on what other traffic is sharing the same physical link.

Figure 10.18 shows some common physical layouts for local area networks (LANs). LANs are networks that operate within a limited area, such as an office building or studio center, within which it is common for every device to ‘see’ the same data, each picking off that which is addressed to it and ignoring the rest. Routers and bridges can be used to break up complex LANs into subnets. WANs (wide area networks) and MANs (metropolitan area networks) are larger entities that link LANs within communities or regions. PANs (personal area networks) are typically limited to a range of a few tens of meters around the user (e.g. Firewire, USB, Bluetooth). Wireless versions of these network types are increasingly common. Different parts of a network can be interconnected or extended as explained in Fact File 10.5.

FIGURE 10.17 Packets for different destinations (A, B and C) multiplexed onto a common serial bus. (a) Time division multiplexed into a regular time slot structure. (b) Asynchronous transfer showing variable time gaps and packet lengths between transfers for different destinations.

FIGURE 10.18 Two examples of computer network topologies. (a) Devices connected by spurs to a common hub, and (b) devices connected to a common ‘backbone’. The former is now by far the most common, typically using CAT 5 cabling.

Network communication is divided into a number of ‘layers’, each relating to an aspect of the communication protocol and interfacing correctly with the layers either side. The ISO seven-layer model for open systems interconnection (OSI) shows the number of levels at which compatibility between systems needs to exist before seamless interchange of data can be achieved (Figure 10.19). It shows that communication begins when the application is passed down through various stages to the layer most people understand – the physical layer, or the piece of wire over which the information is carried. Layers 3, 4 and 5 can be grouped under the broad heading of ‘protocol’, determining the way in which data packets are formatted and transferred. There is a strong similarity here with the exchange of data on physical media, as discussed earlier, where a range of compatibility layers from the physical to the application determine whether or not one device can read another’s disks.

FIGURE 10.19 The ISO model for Open Systems Interconnection is arranged in seven layers, as shown here.

FACT FILE 10.5 EXTENDING A NETWORK

It is common to need to extend a network to a wider area or to more machines. As the number of devices increases so does the traffic, and there comes a point when it is necessary to divide a network into zones, separated by ‘repeaters’, ‘bridges’ or ‘routers’. Some of these devices allow network traffic to be contained within zones, only communicating between the zones when necessary. This is vital in large interconnected networks because otherwise data placed anywhere on the network would be present at every other point on the network, and overload could quickly occur.

A repeater is a device that links two separate segments of a network so that they can talk to each other, whereas a bridge isolates the two segments in normal use, only transferring data across the bridge when it has a destination address on the other side. A router is very selective in that it examines data packets and decides whether or not to pass them depending on a number of factors. A router can be programmed only to pass certain protocols and only certain source and destination addresses. It therefore acts as something of a network policeman and can be used as a first level of ensuring security of a network from unwanted external access. Routers can also operate between different standards of network, such as between FDDI and Ethernet, and ensure that packets of data are transferred over the most time-/cost-effective route.

One could also use some form of router to link a local network to another that was quite some distance away, forming a wide area network (WAN). Data can be routed either over dialed data links such as ISDN, in which the time is charged according to usage just like a telephone call, or over leased circuits. The choice would depend on the degree of usage and the relative costs. The Internet provides a means by which LANs are easily interconnected, although the data rate available will depend on the route, the service provider and the current traffic.

Audio network requirements

The principal application of computer networks in audio systems is in the transfer of audio data files between workstations, or between workstations and a central ‘server’ which stores shared files. The device requesting the transfer is known as the ‘client’ and the device providing the data is known as the ‘server’. When a file is transferred in this way a byte-for-byte copy is reconstructed on the client machine, with the file name and any other header data intact. There are considerable advantages in being able to perform this operation at speeds in excess of real time for operations in which real-time feeds of audio are not the aim. For example, in a news editing environment a user might wish to upload a news story file from a remote disk drive in order to incorporate it into a report, this being needed as fast as the system is capable of transferring it. Alternatively, the editor might need access to remotely stored files, such as sound files on another person’s system, in order to work on them separately. In audio post-production for films or video there might be a central store of sound effects, accessible by everyone on the network, or it might be desired to pass on a completed portion of a project to the next stage in the post-production process.

Wired Ethernet is fast enough to transfer audio data files faster than real time, depending on network loading and speed. For satisfactory operation it is advisable to use 100 Mbit/s or even 1 Gbit/s Ethernet as opposed to the basic 10 Mbit/s version. Switched Ethernet architectures allow the bandwidth to be more effectively utilized, by creating switched connections between specific source and destination devices. Approaches using FDDI or ATM are appropriate for handling large numbers of sound file transfers simultaneously at high speed. Unlike a real-time audio interface, the speed of transfer of a sound file over a packet-switched network (when using conventional file transfer protocols) depends on how much traffic is currently using it. If there is a lot of traffic then the file may be transferred more slowly than if the network is quiet (very much like motor traffic on roads). The file might be transferred erratically as traffic volume varies, with the file arriving at its destination in ‘spurts’. There therefore arises the need for network communication protocols designed specifically for the transfer of real-time data, which serve the function of reserving a proportion of the network bandwidth for a given period of time. This is known as engineering a certain ‘quality of service’.

Without real-time protocols the computer network can not be relied upon for transferring audio where an unbroken audio output is to be reconstructed at the destination from the data concerned. The faster the network the more likely it is that one would be able to transfer a file fast enough to feed an unbroken audio output, but this should not be taken for granted. Even the highest speed networks can be filled up with traffic! This may seem unnecessarily careful until one considers an application in which a disk drive elsewhere on the network is being used as the source for replay by a local workstation, as illustrated in Figure 10.20. Here it must be possible to ensure guaranteed access to the remote disk at a rate adequate for real-time transfer, otherwise gaps will be heard in the replayed audio.

Storage area networks

An alternative setup involving the sharing of common storage by a number of workstations is the Storage Area Network (SAN). This employs a networking technology known as Fibre Channel that can run at speeds of 4 Gbit/s, and can also employ fiber optic links to allow long connections between shared storage and remote workstations. RAID arrays (see Chapter 9) are typically employed with SANs, and special software such as Apple’s XSAN is needed to enable multiple users to access the files on such common storage.

FIGURE 10.20 In this example of a networked system a remote disk is accessed over the network to provide data for real-time audio playout from a workstation used for on-air broadcasting. Continuity of data flow to the on-air workstation is of paramount importance here.

Protocols for the internet

The Internet is now established as a universal means for worldwide communication. Although real-time protocols and quality of service do not sit easily with the idea of a free-for-all networking structure, there is growing evidence of applications that allow real-time audio and video information to be streamed with reasonable quality. The RealAudio format, for example, developed by Real Networks, is designed for coding audio in streaming media applications, achieving respectable quality at the higher data rates. People are also increasingly using the Internet for transferring multimedia projects between sites using FTP (file transfer protocol).

The Internet is a collection of interlinked networks with bridges and routers in various locations, which originally developed amongst the academic and research community. The bandwidth (data rate) available on the Internet varies from place to place, and depends on the route over which data is transferred. In this sense there is no easy way to guarantee a certain bandwidth, nor a certain ‘time slot’, and when there is a lot of traffic it simply takes a long time for data transfers to take place. Users access the Internet through a service provider (ISP), originally using a telephone line and a modem, or ISDN, or these days more likely an ADSL connection. The most intensive users will probably opt for high-speed leased lines giving permanent access to the Internet.

The common protocol for communication on the Internet is called TCP/IP (Transmission Control Protocol/Internet Protocol). This provides a connection-oriented approach to data transfer, allowing for verification of packet integrity, packet order and retransmission in the case of packet loss. At a more detailed level, as part of the TCP/IP structure, there are highlevel protocols for transferring data in different ways. There is a file transfer protocol (FTP) used for downloading files from remote sites, a simple mail transfer protocol (SMTP) and a post office protocol (POP) for transferring email, and a hypertext transfer protocol (HTTP) used for interlinking sites on the world wide web (WWW). The WWW is a collection of file servers connected to the Internet, each with its own unique IP address (the method by which devices connected to the Internet are identified), upon which may be stored text, graphics, sounds and other data.

UDP (user datagram protocol) is a relatively low-level connectionless protocol that is useful for streaming audio over the Internet. Being connectionless, it does not require any handshaking between transmitter and receiver, so the overheads are very low and packets can simply be streamed from a transmitter without worrying about whether or not the receiver gets them. If packets are missed by the receiver, or received in the wrong order, there is little to be done about it except mute or replay distorted audio, but UDP can be efficient when bandwidth is low and quality of service is not the primary issue.

Various real-time protocols have also been developed for use on the Internet, such as RTP (real-time transport protocol). Here packets are time-stamped and may be reassembled in the correct order and synchronized with a receiver clock. RTP does not guarantee quality of service or reserve bandwidth but this can be handled by a protocol known as RSVP (reservation protocol). RTSP is the real-time streaming protocol that manages more sophisticated functionality for streaming media servers and players, such as stream control (play, stop, fast-forward, etc.) and multicast (streaming to numerous receivers).

Wireless networks

Increasing use is made of wireless networks these days, the primary advantage being the lack of need for a physical connection between devices. There are various IEEE 802 standards for wireless networking, including 802.11 which covers wireless Ethernet or ‘Wi-Fi’. These typically operate on either the 2.4 GHz or 5 GHz radio frequency bands, at relatively low power, and use various interference reduction and avoidance mechanisms to enable networks to coexist with other services. It should, however, be recognized that wireless networks will never be as reliable as wired networks owing to the differing conditions under which they operate, and that any critical applications in which real-time streaming is required would do well to stick to wired networks where the chances of experiencing drop-outs owing to interference or RF fading are almost non-existent. They are, however, extremely convenient for mobile applications and when people move around with computing devices, enabling reasonably high data rates to be achieved with the latest technology.

Bluetooth is one example of a wireless personal area network (WPAN) designed to operate over limited range at data rates of up to 1 Mbit/s. Within this there is the capacity for a number of channels of voice quality audio at data rates of 64 kbit/s and asynchronous channels up to 723 kbit/s. Taking into account the overhead for communication and error protection, the actual data rate achievable for audio communication is usually only sufficient to transfer data-reduced audio for a few channels at a time.

Audio over firewire (IEEE 1394)

Firewire is an international standard serial data interface specified in IEEE 1394-1995. One of its key applications has been as a replacement for SCSI (Small Computer Systems Interface) for connecting disk drives and other peripherals to computers. It is extremely fast, running at rates of 100, 200 and 400 Mbit/s in its original form, with higher rates appearing all the time up to 3.2 Gbit/s. It is intended for optical fiber or copper interconnection, the copper 100 Mbit/s (S100) version being limited to 4.5 m between hops (a hop is the distance between two adjacent devices). The S100 version has a maximum realistic data capacity of 65 Mbit/s, a maximum of 16 hops between nodes and no more than 63 nodes on up to 1024 separate buses. On the copper version there are three twisted pairs – data, strobe and power – and the interface operates in half duplex mode, which means that communications in two directions are possible, but only one direction at a time. The ‘direction’ is determined by the current transmitter which will have arbitrated for access to the bus. Connections are ‘hot pluggable’ with auto-reconfiguration – in other words one can connect and disconnect devices without turning off the power and the remaining system will reconfigure itself accordingly. It is also relatively cheap to implement. A recent implementation, 1394c, allows the use of gigabit Ethernet connectors, which may improve the reliability and usefulness of the interface in professional applications.

Firewire combines features of network and point-to-point interfaces, offering both asynchronous and isochronous communication modes, so guaranteed latency and bandwidth are available if needed for time-critical applications. Communications are established between logical addresses, and the end point of an isochronous stream is called a ‘plug’. Logical connections between devices can be specified as either ‘broadcast’ or ‘point-to-point’. In the broadcast case either the transmitting or receiving plug is defined, but not both, and broadcast connections are unprotected in that any device can start and stop it. A primary advantage for audio applications is that point-to-point connections are protected – only the device that initiated a transfer can interfere with that connection, so once established the data rate is guaranteed for as long as the link remains intact. The interface can be used for real-time multichannel audio interconnections, file transfer, MIDI and machine control, carrying digital video, carrying any other computer data and connecting peripherals (e.g. disk drives).

Originating partly in Yamaha’s ‘m-LAN’ protocol, the 1394 Audio and Music Data Transmission Protocol is now also available as an IEC PAS component of the IEC 61883 standard (a PAS is a publically available specification that is not strictly defined as a standard but is made available for information purposes by organizations operating under given procedures). It offers a versatile means of transporting digital audio and MIDI control data.

Audio over Universal Serial Bus (USB)

The Universal Serial Bus is not the same as IEEE 1394, but it has some similar implications for desktop multimedia systems, including audio peripherals. USB has been jointly supported by a number of manufacturers including Microsoft, Digital, IBM, NEC, Intel and Compaq. Version 1.0 of the copper interface runs at a lower speed than 1394 (typically either 1.5 or 12 Mbit/s) and is designed to act as a low-cost connection for multiple input devices to computers such as joysticks, keyboards, scanners and so on. USB 2.0 runs at a higher rate up to 480 Mbit/s and is supposed to be backwards-compatible with 1.0.

USB 1.0 supports up to 127 devices for both isochronous and asynchronous communication and can carry data over distances of up to 5 m per hop (similar to 1394). A hub structure is required for multiple connections to the host connector. Like 1394 it is hot pluggable and reconfigures the addressing structure automatically, so when new devices are connected to a USB setup the host device assigns a unique address. Limited power is available over the interface and some devices are capable of being powered solely using this source – known as ‘bus-powered’ devices – which can be useful for field operation of, say, a simple A/D convertor with a laptop computer.

Data transmissions are grouped into frames of 1 ms duration in USB 1.0 but a ‘micro-frame’ of 1/8th of 1 ms was also defined in USB 2.0. A start-of-frame packet indicates the beginning of a cycle and the bus clock is normally at 1 kHz if such packets are transmitted every millisecond. So the USB frame rate is substantially slower than the typical audio sampling rate. The transport structure and different layers of the network protocol will not be described in detail as they are long and complex and can be found in the USB 2.0 specification. However, it is important to be aware that transactions are set up between sources and destinations over so-called ‘pipes’ and that numerous ‘interfaces’ can be defined and run over a single USB cable, only dependent on the available bandwidth.

The way in which audio is handled on USB is well defined and somewhat more clearly explained than the 1394 audio/music protocol. It defines three types of communication: audio control, audio streaming and MIDI streaming. We are concerned primarily with audio streaming applications. Audio data transmissions fall into one of three types. Type 1 transmissions consist of channel-ordered PCM samples in consecutive subframes, whilst Type 2 transmissions typically contain non-PCM audio data that does not preserve a particular channel order in the bitstream, such as certain types of multichannel data-reduced audio stream. Type 3 transmissions are a hybrid of the two such that non-PCM data is packed into pseudo-stereo data words in order that clock recovery can be made easier.

Audio samples are transferred in subframes, each of which can be 1 – 4 bytes long (up to 24 bits resolution). An audio frame consists of one or more subframes, each of which represents a sample of different channels in the cluster (see below). As with 1394, a USB packet can contain a number of frames in succession, each containing a cluster of subframes. Frames are described by a format descriptor header that contains a number of bytes describing the audio data type, number of channels, subframe size, as well as information about the sampling frequency and the way it is controlled (for Type 1 data). An example of a simple audio frame would be one containing only two subframes of 24 bit resolution for stereo audio.

Audio of a number of different types can be transferred in Type 1 transmissions, including PCM audio (two’s complement, fixed point), PCM-8 format (compatible with original 8 bit WAV, unsigned, fixed point), IEEE floating point, A-law and μ-law (companded audio corresponding to relatively old telephony standards). Type 2 transmissions typically contain data-reduced audio signals such as MPEG or AC-3 streams. Here the data stream contains an encoded representation of a number of channels of audio, formed into encoded audio frames that relate to a large number of original audio samples. An MPEG encoded frame, for example, will typically be longer than a USB packet (a typical MPEG frame might be 8 or 24 ms long), so it is broken up into smaller packets for transmission over USB rather like the way it is streamed over the IEC 60958 interface described in Fact File 10.4. The primary rule is that no USB packet should contain data for more than one encoded audio frame, so a new encoded frame should always be started in a new packet. The format descriptor for Type 2 is similar to Type 1 except that it replaces subframe size and number of channels indication with maximum bit rate and number of audio samples per encoded frame. Currently only MPEG and AC-3 audio are defined for Type 2.

Audio data for closely related synchronous channels can be clustered for USB transmission in Type 1 format. Up to 254 streams can be clustered and there are 12 defined spatial positions for reproduction, to simplify the relationship between channels and the loudspeaker locations to which they relate. (This is something of a simplification of the potentially complicated formatting of spatial audio signals and assumes that channels are tied to loudspeaker locations, but it is potentially useful. It is related to the channel ordering of samples within a WAVE format extensible file, described earlier.) The first six defined streams follow the internationally standardized order of surround sound channels for 5.1 surround, that is left, right, centre, LFE (low frequency effects), left surround, right surround (see Chapter 17). Subsequent streams are allocated to other loudspeaker locations around a notional listener. Not all the spatial location streams have to be present but they are supposed to be presented in the defined order. Clusters are defined in a descriptor field that includes ‘bNrChannels’ (specifying how many logical audio channels are present in the cluster) and ‘wChannelConfig’ (a bit field that indicates which spatial locations are present in the cluster). If the relevant bit is set then the relevant location is present in the cluster. The bit allocations are shown in Table 10.4.

Table 10.4 Channel identification in USB audio cluster descriptor

Data bit	Spatial location
D0	Left Front (L)
D1	Right Front (R)
D2	Center Front (C)
D3	Low Frequency Enhancement (LFE)
D4	Left Surround (LS)
D5	Right Surround (RS)
D6	Left of Center (LC)
D7	Right of Center (RC)
D8	Surround (S)
D9	Side Left (SL)
D10	Side Right (SR)
D11	Top (T)
D…15	Reserved

AES-47: audio over ATM

AES-47 defines a method by which linear PCM data, either conforming to AES-3 format or not, can be transferred over ATM (Asynchronous Transfer Mode) networks. There are various arguments for doing this, not least being the increasing use of ATM-based networks for data communications within the broadcasting industry and the need to route audio signals over longer distances than possible using standard digital interfaces. There is also a need for low latency, guaranteed bandwidth and switched circuits, all of which are features of ATM. Essentially an ATM connection is established in a similar way to making a telephone call. A SETUP message is sent at the start of a new ‘call’ that describes the nature of the data to be transmitted and defines its vital statistics. The AES-47 standard describes a specific professional audio implementation of this procedure that includes information about the audio signal and the structure of audio frames in the SETUP at the beginning of the call.

WEBSITES

Audio Engineering Society standards. www.aes.org/

IEEE 1394. www.1394ta.org/

Universal Serial Bus. www.usb.org/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
Chapter 10 Digital Audio Formats and Interchange

CHAPTER 10

Digital Audio Formats and Interchange

AUDIO FILE FORMATS FOR DIGITAL WORKSTATIONS

File formats in general

Sound Designer formats

AIFF and AIFF-C formats

RIFF WAVE format

MPEG audio file formats

DSD-IFF file format

Edit decision list (EDL) files and project interchange

AES-31 format

MXF – the media exchange format

AAF – the advanced authoring format

Disk pre-mastering formats

CONSUMER OPTICAL DISK FORMATS

Compact discs and drives

DVD

Super Audio CD (SACD)

Blu-Ray disk

INTERCONNECTING DIGITAL AUDIO DEVICES

Introduction

Dedicated audio interface formats

The AES/EBU interface (AES-3)

Standard consumer interface (IEC 60958-3)

Proprietary digital interfaces

Data networks and computer interconnects

Audio network requirements

Storage area networks

Protocols for the internet

Wireless networks

Audio over firewire (IEEE 1394)

Audio over Universal Serial Bus (USB)

AES-47: audio over ATM

RECOMMENDED FURTHER READING

WEBSITES

Table of Contents for Chapter 10 Digital Audio Formats and Interchange

Create new playlist

Sign In

Sign Up

CHAPTER 10

Digital Audio Formats and Interchange

AUDIO FILE FORMATS FOR DIGITAL WORKSTATIONS

File formats in general

Sound Designer formats

AIFF and AIFF-C formats

RIFF WAVE format

MPEG audio file formats

DSD-IFF file format

Edit decision list (EDL) files and project interchange

AES-31 format

MXF – the media exchange format

AAF – the advanced authoring format

Disk pre-mastering formats

CONSUMER OPTICAL DISK FORMATS

Compact discs and drives

DVD

Super Audio CD (SACD)

Blu-Ray disk

INTERCONNECTING DIGITAL AUDIO DEVICES

Introduction

Dedicated audio interface formats

The AES/EBU interface (AES-3)

Standard consumer interface (IEC 60958-3)

Proprietary digital interfaces

Data networks and computer interconnects

Audio network requirements

Storage area networks

Protocols for the internet

Wireless networks

Audio over firewire (IEEE 1394)

Audio over Universal Serial Bus (USB)

AES-47: audio over ATM

RECOMMENDED FURTHER READING

WEBSITES

Table of Contents for
Chapter 10 Digital Audio Formats and Interchange