8  Operational and systems issues

This chapter is concerned with technical, operational and systems issues that may arise when using computer-based audio hardware and software. It is not intended as a ‘how to’ chapter, because all software packages are different and other books are available that explain them, but it does deal with some of the more specialised issues that may arise in high quality professional audio environments and familiarises the reader with preparations for various consumer release formats and media.

8.1 Level control and metering

Typical audio systems today have a very wide dynamic range that equals or exceeds that of the human hearing system. Distortion and noise inherent in the recording or processing of audio are at exceptionally low levels owing to the use of high resolution A/D convertors, up to 24-bit storage, and wide range floating-point signal processing. This is not to say that quality is perfect. It is more intended to support the assertion that level control is less crucial than it used to be in the days when a recording engineer struggled to optimise a recording’s dynamic range between the noise floor and the distortion ceiling (see Figure 8.1).

The dynamic range of a typical digital audio system can now be well over 100 dB and there is room for the operator to allow a reasonable degree of ‘headroom’ between the peak audio signal level and the maximum allowable level. Meters are provided to enable the signal level to be observed, and they are usually calibrated in dB, with zero at the top and negative dBs below this. The full dynamic range is not always shown, and there may be a peak bar that can hold the maximum level permanently or temporarily. As explained in Chapter 2, 0 dBFS (full scale) is the point at which all of the bits available to represent the signal have been used. Above this level the signal clips and the effect of this is quite objectionable, except on very short transients where it may not be noticed. It follows that signals should never be allowed to clip.

image

Figure 8.1 Comparison of analog and digital dynamic range. (a) Analog tape has increasing distortion as the recording level increases, with an effective maximum output level at 3% third harmonic distortion. (b) Modern high-resolution digital systems have wider dynamic range with a noise floor fixed by dither noise and a maximum recording level at which clipping occurs. The linearity of digital systems does not normally become poorer as signal level increases, until 0 dBFS is reached. This makes level control a somewhat less important issue at the initial recording stage, provided sufficient headroom is allowed for peaks

There is a tendency in modern audio production to want to master everything so that it sounds as loud as possible, and to ensure that the signal peaks as close to 0 dBFS as possible. This level maximising or normalising process can be done automatically in most packages, the software searching the audio track for its highest level sample and then adjusting the overall gain so that this just reaches 0 dBFS. In this way the recording can be made to use all the bits available, which can be useful if it is to be released on a relatively low-resolution consumer medium where noise might be more of a problem. (It is important to make sure that correct redithering is used when altering the level and requantising, as explained in Section 2.8.) This does not, of course, take into account any production decisions that might be involved in adjusting the overall levels of individual tracks on an album or other compilation, where relative levels should be adjusted according to the nature of the individual items, their loudness and the producer’s intent.

A little-known but important fact is that even if the signal is maximised in the automatic fashion, so that the highest sample value just does not clip, subsequent analog electronics in the signal chain may still do so. Some equipment is designed in such a way that the maximum digital signal level is aligned to coincide with the clipping voltage of the analog electronics in a D/A convertor. In fact, owing to the response of the reconstruction filter in the D/A convertor (which reconstructs an analog waveform from the PAM pulse train) inter-sample signal peaks can be created that slightly exceed the analog level corresponding to 0 dBFS, thereby clipping the analog side of the convertor. For this reason it is recommended that digital-side signals are maximised so that they peak a few dB below 0 dBFS, in order to avoid the distortion that might otherwise result on the analog side. Some mastering software provides detailed analysis of the signal showing exactly how many samples occur in sequence at peak level, which can be a useful warning of potential or previous clipping.

8.2 Spatial reproduction formats

Now that two-channel stereo is no longer the ubiquitous release format that it was, users need to understand something about alternative multichannel reproduction formats, such as used for surround sound. These are used widely on DVD, SACD and for television, games and movie production. Audio applications now provide many facilities for working in these multichannel formats. This section is a short introduction, more detailed explanation of which can be found in my book Spatial Audio.

8.2.1 Introduction to multichannel formats

Whereas two-channel stereo normally employs two loudspeakers at ±30° in front of the listener, creating a stereophonic ‘scene’ or ‘panorama’ between the loudspeakers, multichannel stereo or surround sound employs more loudspeakers to increase the sense of realism and spatial complexity. A variety of formats have been tried over the years, including things like quadraphonic sound in the 1970s, but the most common production formats at the time of writing are formats based on movie-style surround sound. These typically employ a number of front channels to enable accurate phantom imaging and a number of side or rear channels that are used for ambience or effects. The centre channel (not present in two-channel stereo) has the effect of widening the listening area and anchoring dialogue or vocals in the centre of the image, even for off-centre listeners.

In international standards describing stereo loudspeaker configurations the nomenclature for the configuration is often in the form ‘n-m stereo’, where n is the number of front channels and m is the number of rear or side channels. This distinction can be helpful as it reinforces the slightly different role of the surround channels, although many simply refer to these formats as x-channel surround, making no distinction between front and rear channels.

Audio in these multichannel formats is often encoded for consumer release using low bit-rate coding algorithms such as Dolby Digital, although it can be stored in linear PCM form if sufficient space or bandwidth is available. Examples of these coding approaches were given in Section 2.12.

8.2.2 4-channel surround (3-1 stereo)

‘3-1 stereo’, or ‘LCRS surround’, is a format used quite widely in analog cinema installations and older home cinema systems that used Dolby matrix encoding and decoding technology. In the 3-1 approach a single ‘effects’ or ‘surround’ channel is routed to a loudspeaker or loudspeakers located behind (and possibly to the sides) of listeners. It was developed first for cinema applications, enabling a greater degree of audience involvement in the viewing/listening experience by providing a channel for ‘wrap-around’ effects. There is no specific intention in 3-1 stereo to use the effects channel as a means of enabling 360° image localisation. In any case, this would be virtually impossible with most configurations as there is only a single audio channel feeding a larger number of surround loudspeakers, effectively in mono.

Figure 8.2 shows the typical loudspeaker configuration for this format. In the cinema there are usually a large number of surround loudspeakers fed from the single channel, in order to cover a wide audience area. This has the tendency to create a relatively diffuse or distributed reproduction of the effects signal. The surround speakers are sometimes electronically decorrelated to increase the degree of spaciousness or diffuseness of surround effects, in order that they are not specifically localised to the nearest loudspeaker or perceived inside the head. In consumer systems reproducing 3-1 stereo, the mono surround channel is normally fed to two surround loudspeakers located in similar positions to the 3-2 format described below (these are dipoles in the Home THX system, so as to create a more diffuse spatial effect). The gain of the channel is usually reduced by 3 dB so that the summation of signals from the two speakers does not lead to a level mismatch between front and rear.

image

Figure 8.2 3-1 format reproduction uses a single surround channel usually routed (in cinema environments) to an array of loudspeakers to the sides and rear of the listening area. In consumer reproduction the mono surround channel may be reproduced through only two surround loudspeakers, possibly using artificial decorrelation and/or dipole loudspeakers to emulate the more diffused cinema experience

This surround format is usually matrix encoded using a Dolby Stereo encoder. This takes the four channels and combines them in a manner that creates a two-channel compatible signal from which the centre and surround information can subsequently be extracted if required. The reason for this was compatibility with two-channel analog media and to enable surround audio to be encoded on two optical film sound tracks. Matrix decoding often involves some sort of active ‘steering’ to increase the channel separation, such as employed in Dolby Prologic decoders. Dolby Stereo or Surround is normally monitored and mixed through an encoder and decoder matrix, in order to hear the effect, as it is not a perfect process.

The mono surround channel is the main limitation in this format. Despite the use of multiple loudspeakers to reproduce the surround channel, it is still not possible to create a good sense of envelopment or spaciousness without using surround signals that are different on both sides of the listener. Most of the psychoacoustics research suggests that the ears need to be provided with decorrelated signals to create the best sense of envelopment and effects can be better spatialised using stereo surround channels.

8.2.3 5.1 channel surround (3-2 stereo)

The 3-2 configuration has been standardised for numerous surround sound applications, including cinema, television and consumer applications. Because of its wide use in general parlance, though, the term ‘5.1 surround’ will be used.

The mono surround limitation is removed in the 5.1-channel system, enabling the provision of stereo effects or room ambience to accompany a primarily front-oriented sound stage. Essentially the front three channels are intended to be used for a conventional three-channel stereo sound image, while the rear/side channels are only intended for generating supporting ambience, effects or ‘room impression’. In this sense, the standard does not directly support the concept of 360° image localisation, although it may be possible to arrive at recording techniques or signal processing methods that achieve this to a degree.

One cannot introduce the 5.1 surround system without explaining the meaning of the ‘.1’ component. This is a dedicated low-frequency effects (LFE) channel or sub-bass channel. It is called ‘.1’ because of its limited bandwidth (normally up to 120 Hz). It is intended for conveying special low-frequency content that requires greater sound pressure levels and headroom than can be handled by the main channels. It is not intended for conveying the low-frequency component of the main channel signals, and its application is likely to be primarily in sound-for-picture applications where explosions and other high-level rumbling noises are commonplace, although it may be used in other circumstances. With cinema reproduction the in-band gain of this channel is usually 10 dB higher than that of the other individual channels. This is achieved by a level increase of the reproduction channel, not by an increased recording level. (This does not mean that the broadband or weighted SPL of the LFE loudspeaker should measure 10 dB higher than any of the other channels – in fact it will be considerably less than this as its bandwidth is narrower.)

The loudspeaker layout and channel configuration is specified in the ITU-R BS.775 standard. This is shown in Figure 8.3. A display screen is also shown in this diagram for sound with picture applications, and there are recommendations concerning the relative size of the screen and the loudspeaker base width shown in the standard. The left and right loudspeakers are located at ±30° for compatibility with two channel stereo reproduction. In many ways this need for compatibility with 2/0 is a pity, because the centre channel unavoidably narrows the front sound stage in many applications, and the front stage could otherwise take advantage of the wider spacing facilitated by three-channel reproduction. It was nonetheless considered crucial for the same loudspeaker configuration to be usable for all standard forms of stereo reproduction, for reasons most people will appreciate.

image

Figure 8.3 3-2 format reproduction according to the ITU-R BS.775 standard uses two independent surround channels routed to one or more loudspeakers per channel

In the 5.1 standard there are normally no loudspeakers directly behind the listener, which can make for creative difficulties. This has led to a Dolby proposal called EX (described below) that places an additional speaker at the centre-rear location. (This is not part of the current standard, though.) The ITU standard also allows for additional surround loudspeakers to cover the region around listeners, similar to the 3-1 arrangement described earlier. If these are used then they are expected to be distributed evenly in the angle between ±60° and ±150°.

The limitations of the 5.1 format, particularly in some peoples’ view for music purposes, have led to various non-standard uses of the five or six channels available on new consumer disk formats such as DVD-A (Digital Versatile Disk – Audio) and SACD (Super Audio Compact Disc). For example, some are using the sixth channel (that would otherwise be LFE) in its full bandwidth form on these media to create a height channel. Others are making a pair out of the ‘LFE’ channel and the centre channel so as to feed a pair of front-side loudspeakers, enabling the rear loudspeakers to be further back. These are non-standard uses and should be clearly indicated on any recordings.

8.2.4 Dolby EX

In 1998 Dolby and Lucasfilm THX joined forces to promote an enhanced surround system that added a centre rear channel to the standard 5.1-channel setup. They introduced it, apparently, because of frustrations felt by sound designers for movies in not being able to pan sounds properly to the rear of the listener – the surround effect typically being rather diffuse. This system was christened ‘Dolby Digital – Surround EX’, and apparently uses matrix-style centre channel encoding and decoding between the left and right surround channels of a 5.1-channel mix. The loudspeakers at the rear of auditorium are then driven separately from those on the left and right sides, using the feed from this new ‘rear centre’ channel, as shown in Figure 8.4.

image

Figure 8.4 Dolby EX adds a centre-rear channel fed from a matrix-decoded signal that was originally encoded between left and right surround channels in a manner similar to the conventional Dolby Stereo matrix process

8.2.5 7.1 channel surround

Deriving from widescreen cinema formats, the 7.1 channel configuration normally adds two further loudspeakers to the 5.1 channel configuration, located at centre left (CL) and centre right (CR), as shown in Figure 8.5. This is not a format primarily intended for consumer applications, but for large cinema auditoria where the screen width is such that the additional channels are needed to cover the angles between the loudspeakers satisfactorily for all the seats in the auditorium. Sony’s SDDS cinema system is the most common proprietary implementation of this format.

image

Figure 8.5 Some cinema sound formats for large auditorium reproduction enhance the front imaging accuracy by the addition of two further loudspeakers, centre left and centre right

image

Figure 8.6 Approximate loudspeaker layout suitable for Lexicon’s Logic 7 reproduction. Notice the additional side loudspeakers that enable a more enveloping image and may enable rear loudspeakers to be placed further to the rear

Lexicon and Meridian have also implemented a 7-channel mode in their consumer surround decoders, but the recommended locations for the loudspeakers are not quite the same as in the cinema application. The additional channels are used to provide a wider side-front component and allow the rear speakers to be moved round more to the rear than in the 5.1 arrangement (see Figure 8.6).

8.2.6 Surround panning and spatial effects

Pairwise amplitude panning, although relatively crude in many ways, is the type of pan control most commonly implemented in simple surround panners, being based on an extension of the two-channel sine/cosine panner to more loudspeakers. It involves adjusting the relative amplitudes between a pair of adjacent loudspeakers with the aim of creating a phantom image at some point between them. Panning between widely spaced side loudspeakers is not particularly successful at creating accurate phantom images though (see Figure 8.7).

In some applications designed for five-channel surround work, particularly in the film domain, separate panners are provided for L-C-R, LS-RS, and front–back. Combinations of positions of these amplitude panners enable sounds to be moved to various locations, some more successfully than others. For example sounds panned so that some energy is emanating from all loudspeakers (say, panned centrally on all three pots) tend to sound diffuse for centre listeners and in the nearest loudspeaker for those sitting off centre. Joystick panners combine these amplitude relationships under the control of a single ‘lever’. Moving effects made possible by these joysticks are often unconvincing and need to be used with experience and care.

image

Figure 8.7 Imaging accuracy in five-channel surround sound reproduction

Other more sophisticated panners may involve psychoacoustic filtering, binaural information or Ambisonic priniciples, and it is possible to encounter advanced spatial audio processing plug-ins that can be used to manipulate stereo images and alter spatial characteristics of implied environments. Distance and movement can be simulated effectively by changing direct/reverberant ratio, level, high frequency content and reflections, as well as Doppler shifts. These sometimes go hand in hand with reverberation processing, as this is one way of adding spatial content to a mix.

8.3 Controlling and maintaining sound quality

The sound quality achievable with modern workstations is now exceptionally high. As mentioned earlier in this chapter, there are now few technical reasons why distortion, noise, frequency response and other performance characteristics should not match the limits of human perception. Of course there will always be those for whom improvements can be made, but technical performance of digital audio systems is no longer really a major issue today.

If one accepts the foregoing argument, the maintenance of sound quality in computer-based production comes down more to understanding the operational areas in which quality can be compromised. These include things like ensuring as few A/D and D/A conversions as possible, maintaining audio resolution at 24 bits or more throughout the signal chain (assuming this is possible), redithering appropriately at points where requantising is done, and avoiding sampling frequency conversions. The rule of thumb should be to use the highest sampling frequency and resolution that one can afford to use, but no higher than strictly necessary for the purpose, otherwise storage space and signal processing power will be squandered. The scientific merits of exceptionally high sampling frequencies are dubious, for all but a few afficionados, although the marketing value may be considerable.

The point at which quality can be affected in a digital audio system is at A/D and D/A conversion. In fact the quality of an analog signal is irretrievably fixed at the point of A/D conversion, so this should be done with the best equipment available. There is very little that can be done afterwards to improve the quality of a poorly converted signal. At conversion stages the stability of timing of the sampling clock is crucial, because if it is unstable the audio signal will contain modulation artefacts that give rise to increased distortions and noise of various kinds. This so-called clock jitter is one of the biggest factors affecting sound quality in convertors and high quality external convertors usually have much lower jitter than the internal convertors used on PC sound cards.

The quality of a digital audio signal, provided it stays in the digital domain, is not altered unless the values of the samples are altered. It follows that if a signal is recorded, replayed, transferred or copied without altering sample values then the quality will not have been affected, despite what anyone may say. Sound quality, once in the digital domain, therefore depends entirely on the signal processing algorithms used to modify the program. There is little a user can do about this except choose high-quality plug-ins and other software, written by manufacturers that have a good reputation for DSP that takes care of rounding errors, truncation, phase errors and all the other nasties that can arise in signal processing. This is really no different from the problems of choosing good-sounding analog equipment. Certainly not all digital equaliser plug-ins sound the same, for example, because this depends on the filter design. Storage of digital data, on the other hand, does not affect sound quality at all, provided that no errors arise and that the signal is stored at full resolution in its raw PCM form (in other words, not MPEG encoded or some other form of lossy coding).

The sound quality the user hears when listening to the output of a workstation is not necessarily what the consumer will hear when the resulting program is issued on the release medium. One reason for this is that the sound quality depends on the quality of the D/A convertors used for monitoring. The consumer may hear better or worse, depending on the convertors used, assuming the bit stream is delivered without modification. One hopes that the convertors used in professional environments are better than those used by consumers, but this is not always the case. High-resolution audio may be mastered at a lower resolution for consumer release (e.g. 96 kHz, 24 bit recordings reduced to 44.1 kHz, 16 bits for release on CD), and this can affect sound quality. It is very important that any down-conversion of master recordings be done using the best dithering and/or sampling frequency conversion possible, especially when sampling frequency conversion is of a non-integer ratio.

Low bit-rate coders (e.g. MPEG) can reduce quality in the consumer delivery chain, but it is the content-provider’s responsibility to optimise quality depending on the intended release format. Where there are multiple release formats it may be necessary to master the program differently in each case. For example, really low bit rate Internet streaming may require some enhancement (e.g. compression and equalisation) of the audio to make it sound reasonable under such unfavourable conditions.

When considering the authoring of interactive media such as games or virtual reality audio, there is a greater likelihood that the engineer, author, programmer and producer will have less control over the ultimate sound quality of what the consumer hears. This is because much of the sound material may be represented in the form of encoded ‘objects’ that will be rendered at the replay stage, as shown in Figure 8.8. Here the quality depends more on the quality of consumer’s rendering engine, which may involve resynthesis of some elements, based on control data. This is a little like the situation when distributing a song as a MIDI sound file, using General MIDI voices. The audible results, unless one uses downloadable sounds (and even then there is some potential for variation), depends on the method of synthesis and the precise nature of the voices available at the consumer end of the chain.

image

Figure 8.8 (a) In conventional audio production and delivery, sources are combined and delivered at a fixed quality to the user, who simply has to replay it. The quality is limited by the resolution of the delivery link. (b) In some virtual and synthetic approaches the audio information is coded in the form of described objects that are rendered at the replay stage. Here the quality is strongly dependent on the capabilities of the rendering engine and the accuracy of description

8.4 Preparing for and understanding release media

Consumer release formats such as CD, DVD, SACD and MP3 usually require some form of mastering and pre-release preparation. This can range from subtle tweaks to the sound quality and relative levels on tracks to PQ encoding, DVD authoring, data encoding and the addition of graphics, video and text. Some of these have already been mentioned in other places in this book.

8.4.1 CD-Audio

PQ encoding for CD mastering can often be done in some of the application packages designed for audio editing, such as SADiE and Sonic Solutions. In this case it may involve little more than marking the starts and ends of the tracks in the play list and allowing the software to work out the relevant frame advances and Red Book requirements for the assembly of the PQ code that will either be written to a CD-R or included in the DDP file for sending to the pressing plant (see Chapter 6). The CD only comes at one resolution and sampling frequency (16 bit, 44.1 kHz) making release preparation a relatively straightforward matter.

8.4.2 DVD

DVD mastering is considerably more complicated than CD and requires advanced authoring software that can deal with all the different options possible on this multi-faceted release format. DVD-Video allows for 48 or 96 kHz sampling frequency and 16, 20 or 24 bit PCM encoding. A two-channel downmix must be available on the disk in linear PCM form (for basic compatibility), but most disks also include Dolby Digital or possibly DTS surround audio. Dolby Digital encoding usually involves the preparation of a file or files containing the compressed data, and a range of settings have to be made during this process, such as the bit rate, dialogue normalisation level, rear channel phase shift and so on. A typical control screen is shown in Figure 8.9. Then of course there are the pictures, but they are not the topic of this book.

There are at least three DVD player types on the market (audio, universal and video), and there are two types of DVD-Audio disc, one containing only audio objects and the other (the DVD-AudioV) capable of holding video objects as well. The video objects on a DVD-AudioV are just the same as DVD-Video objects and therefore can contain video clips, Dolby AC-3 compressed audio and other information. In addition, there is the standard DVD-Video disc, as shown in Figure 8.10.

DVD-AudioV discs should play back in audio players and universal players. Any video objects on an AudioV disk should play back on video-only players. The requirement for video objects on DVD-AudioV discs to contain PCM audio was dropped at the last moment so that such objects could only contain AC-3 audio if desired. This means that an audio disc could contain a multichannel AC-3 audio stream in a video object, enabling it to be played in a video player. This is a good way of ensuring that a multichannel audio disc plays back in as many different types of player as possible, but requires that the content producer makes sure to include the AC-3 video object in addition to MLP or PCM audio objects. The video object can also contain a DTS audio bitstream if desired.

image

Figure 8.9 Screen display of Dolby Digital encoding software options

image

Figure 8.10 Compatibility of DVD discs and players (Courtesy of DVD working group)

DVD-Audio has a number of options for choosing the sampling frequencies and resolutions of different channel groups, it being possible to use a different resolution on the front channels from that used on the rear, for example. There are also decisions to be made about the bit budget available on the disk, and whether or not the audio data needs to be MLP encoded for release (see below). The format is more versatile in respect of sampling frequency than DVD-Video, having also accommodated multiples of the CD sample frequency of 44.1 kHz as options (the DVD-Video format allows only for multiples of 48 kHz). Consequently, the allowed sample frequencies for DVD-Audio are 44.1, 48, 88.2, 96, 176.4, 192 kHz. The sample frequencies are split into two groups – multiples of 44.1 and multiples of 48 kHz. While it is possible to split frequencies from one group among the audio channels on a DVD-A (see below), one cannot combine frequencies across the groups for reasons of simple clock rate division. Bit resolution can be 16, 20 or 24 bits per channel, and again this can be divided unequally between the channels, according to the channel group split described below.

Playing time depends on the way in which producers decide to use the space available on the disc, and this requires the juggling of the available bit budget. DVD-Audio can store at least 74 minutes of stereo audio even at the highest sample rate and resolution (192/24). Other modes are possible, with up to six channels of audio playing for at least 74 minutes, using combinations of sample frequency and resolution, together with MLP. Six-channel audio can only operate at the two lower sample rates of either class (44.1/88.2 or 48/96).

A downmixing technique known as SMART (System Managed Audio Resource Technique) is mandatory in DVD-Audio players but optional for content producers. It enables a stereo downmix of the multichannel material to be made in the player but under content producer control, so this information has to be provided at authoring time. The gains, phases and panning of each audio channel can be controlled in the downmix. A separate two-channel mix (L0/R0) can be included within an MLP bitstream. If a separate stereo mix is provided on the disc then this is automatically used instead of the player downmix.

All modes other than mono or 2-channel have the option to split the channels into two groups. Group 1 would normally contain the front channels (at least left and right) of the multichannnel balance, while Group 2 could contain the remaining channels. This is known as scalable audio. The resolution of Group 2 channels can be lower than that of Group 1, enabling less important channels to be coded at appropriate resolutions to manage the overall bit budget. The exact point of the split between the channel groups depends on the mode, and there are in fact 21 possible ways of splitting the channels.

It is also possible to ‘bit-shift’ channels that do not use the full dynamic range of the channel. For example, surround channels that might typically under-record compared with the front channels can be bit shifted upwards so as to occupy only the 16 MSBs of the channel. On replay they are restored to their original gains.

It is not mandatory to use the centre channel on DVD-Audio. Some content producers may prefer to omit a centre speaker feed and rely on the more conventional stereo virtual centre. The merits or demerits of this continue to be debated.

Meridian Lossless Packing (MLP) is licensed through Dolby Laboratories and is a lossless coding technique designed to reduce the data rate of audio signals without compromising sound quality. It has both a variable bit rate mode and a fixed bit rate mode. The variable mode delivers the optimum compression for storing audio in computer data files, but the fixed mode is important for DVD applications where one must be able to guarantee a certain reduction in peak bit rate. The use of MLP on DVD-A discs is optional, but is an important tool in the management of bit budget. Using MLP one would be able to store separate two-channel and multichannel mixes on the same disc, avoiding the need to rely on the semiautomatic downmixing features of DVD players. Owing to the so-called Lossless Matrix technology employed, an artistically controlled L0/R0 downmix can be made at the MLP mastering stage, taking up very little extra space on the disc owing to redundancy between the multichannel and two-channel information. MLP is also the key to obtaining high resolution multichannel audio on all channels without scaling.

DVD masters are usually transferred to the pressing plant on DLT tapes, using the Disk Description Protocol (DDP), as described in Chapter 6, or on DVD-R(A) disks as a disk image with a special CMF (cutting master format) header in the disk lead-in area containing the DDP data.

8.4.3 Super Audio CD (SACD)

Version 1.0 of the SACD specification is described in the ‘Scarlet Book’, available from Philips licensing department. SACD uses DSD (Direct Stream Digital) as a means of representing audio signals, as described in Chapter 2, so requires audio to be sourced in or converted to this form. SACD aims to provide a playing time of at least 74 minutes for both two-channel and six-channel balances. The disc is divided into two regions, one for two-channel audio, the other for multichannel, as shown in Figure 8.11. A lossless data packing method known as Direct Stream Transfer (DST) can be used to achieve roughly 2:1 data reduction of the signal stored on disc so as to enable high quality multichannel audio on the same disc as the two-channel mix.

SACDs can be manufactured as single or dual-layer discs, with the option of the second layer being a Red Book CD layer (the so-called ‘hybrid disc’). SACDs, not being a formal part of the DVD hierarchy of standards (although using some of the optical disc technology), do not have the same options for DVD-Video objects as DVD-Audio. The disc is designed first and foremost as a super-high-quality audio medium. Nonetheless there is provision for additional data in a separate area of the disc. The content and capacity of this is not specified but could be video clips, text or graphics, for example. Authoring software enables the text information to be added, as shown in Figure 8.12. SACD masters are normally submitted to the pressing plant on AIT format data tapes (see Chapter 5).

Sony and Philips have paid considerable attention to copy protection and anti-piracy measures on the disc itself. Comprehensive visible and invisible watermarking are standard features of the SACD. Using a process known as PSP (Pit Signal Processing) the width of the pits cut into the disc surface is modulated in such a fashion as to create a visible image on the surface of the CD layer, if desired by the originator. This provides a visible means of authentication. The invisible watermark is a mandatory feature of the SACD layer and is used to authenticate the disc before it will play on an SACD player. The watermark is needed to decode the data on the disc. Discs without this watermark will simply be rejected by the player. It is apparently not possible to copy this watermark by any known means. Encryption of digital music content is also optional, at the request of software providers.

image

Figure 8.11 Different regions of a Super Audio CD, showing separate two-channel and multichannel regions

8.4.4 MP3

MP3, as already explained in Section 2.12, is actually MPEG-1, Layer 3 encoded audio, stored in a data file, usually for distribution to consumers either on the Internet or on other release media. Consumer disk players are increasingly capable of replaying MP3 files from CDs, for example. MP3 mastering requires that the two-channel audio signal is MPEG-encoded, using one of the many MP3 encoders available, possibly with the addition of the ID3 tag described in Chapter 6. Some mastering software now includes MP3 encoding as an option.

Some of the choices to be made in this process concern the data rate and audio bandwidth to be encoded, as this affects the sound quality. The lowest bit rates (e.g. below 64 kbit s–1) will tend to sound noticeably poorer than the higher ones, particularly if full audio bandwidth is retained. For this reason some encoders limit the bandwidth or halve the sampling frequency for very low bit rate encoding, because this tends to minimise the unpleasant side-effects of MPEG encoding. It is also possible to select joint stereo coding mode, as this will improve the technical quality somewhat at low bit rates, possibly at the expense of stereo imaging accuracy. As mentioned above, at very low bit rates some audio processing may be required to make sound quality acceptable when squeezed down such a small pipe.

image

Figure 8.12 Example of SACD text authoring screen from SADiE

8.4.5 MPEG-4, web and interactive authoring

Commercial tools for interactive authoring and MPEG-4 encoding are only just beginning to appear at the time of writing. Such tools enable audio scenes to be described and data encoded in a scalable fashion so that they can be rendered at the consumer replay end of the chain, according to the processing power available.

Interactive authoring for games is usually carried out using low-level programming and tools for assembling the game assets, there being few universal formats or standards in this business at the present time. It requires detailed understanding of the features of the games console in question and these platforms differ considerably. Making the most of the resources available is a specialised task, and a number of books have been written on the subject (see Further reading at the end of this chapter). Multimedia programs involving multiple media elements are often assembled using authoring software such as Director, but that will not be covered further here. Preparing audio for web (Internet) delivery is also a highly specialised topic covered very well in other books (see Further reading).

8.5 Synchronisation

There are many cases in which it is necessary to ensure that the recording and replay of audio and/or MIDI data are time-synchronised to an external reference of some sort. Under the heading of synchronisation comes the subject of locking both recording and replay to a source of SMPTE/EBU timecode or MIDI TimeCode (MTC) (see Chapter 4), as well as locking to an external sampling rate clock, video sync reference or digital audio sync reference. This is needed when the audio workstation is to be integrated with other audio and video equipment, and when operating in an all-digital environment where the sampling frequencies of interconnected systems must be the same. The alternative, when the workstation is operating in isolation, is for all operations to be performed with relation to an internal timing reference locked to the prevailing audio sampling frequency.

8.5.1 Requirements for synchronisation

The synchronisation of an audio application requires that the replay or recording speed and sampling frequency are kept in step with an external timing reference and that there is no long-term drift between this external reference and the passage of time in the replayed audio signal. When lock is required to an external reference there is the possibility that this reference may drift in speed, may have timing jitter or may ‘jump’ in time (if it is a ‘real time’ reference such as timecode). Such situations require that the replay speed and sampling rate of the workstation be adjusted regularly and continuously to follow any variations in the timing reference, or to ‘flywheel’ over them, or even ignore them in some cases (e.g. timecode discontiguities). Speed variations, depending on the rate, can give rise to audible artefacts due to clock jitter, or to the variation of the output sampling rate outside the tolerances acceptable by any other digitally interfaced device in the system, requiring care in system design and implementation.

8.5.2 Timecode synchronisation

The most common synchronisation requirement is for replay to be locked to a source of SMPTE/EBU timecode, because this is used universally as a timing reference in audio and video recording. LTC (longitudinal timecode) is an audio signal that can be recorded on a tape and VITC is contained within lines of a video signal, requiring a suitable reader in the workstation sync interface. A number of desktop workstations that have MIDI features lock to MIDI TimeCode (MTC), which is a representation of SMPTE/EBU timecode in the form of MIDI messages. Details of both types of timecode were given in Chapter 4.

It is important to know what kind of synchronisation is used by your hardware and software. One of the factors that must be considered is whether external timecode is simply used as a timing reference against which sound file replay is triggered, or whether the system continues to slave to external timecode for the duration of replay. In some cases these modes are switchable because they both have their uses. In the first case replay is simply ‘fired off’ when a particular timecode is registered, and in such a mode no long-term relationship is maintained between the timecode and the replayed audio. This may be satisfactory for some basic operations but is likely to result in a gradual drift between audio replay and the external reference if files longer than a few seconds are involved. It may be useful though, because replay remains locked to the workstation’s internal clock reference, which may be more stable than external references, potentially leading to higher audio quality from the system’s convertors. Some cheaper systems do not ‘clean up’ external clock signals very well before using them as the sample clock for D/A conversion, and this can seriously affect audio quality.

In the second mode a continuous relationship is set up between timecode and audio replay, such that long-term lock is achieved and no drift is encountered. This is more difficult to achieve because it involves the continual comparison of timecode to the system’s internal timing references and requires that the system follows any drift or jump in the timecode. Jitter in the external timecode is very common, especially if this timecode derives from a video tape recorder, and this should be minimised in any sample clock signals derived from the external reference. This is normally achieved by the use of a high-quality phase-locked loop, often in two stages. Wow and flutter in the external timecode can be smoothed out using suitable time constants in the software that converts timecode to sample address codes, such that short-term changes in speed are not always reflected in the audio output but longer-term drifts are.

Sample frequency conversion may be employed at the digital audio outputs of a system to ensure that changes in the internal sample rate caused by synchronisation action are not reflected in the output sampling rate. This may be required if the system is to be interfaced digitally to other equipment in an all-digital studio.

8.5.3 Synchronisation to external audio, film or video references

In all-digital systems it is necessary for there to be a fixed sampling frequency, to which all devices in the system lock. This is so that digital audio from one device can be transferred directly to others without conversion or loss of quality. In systems involving video it is often necessary for the digital audio sampling frequency to be locked to the video frame rate and for timecode to be locked to this as well. The reference signal is likely to be a ‘house sync’ composite video signal that does not necessarily carry time-of-day information. It would be used to lock the internal sampling frequency clock of the workstation. An alternative to this is a digital audio sync signal such as word clock or an AES11 standard sync reference (a stable AES3 format signal, without any audio).

Other sync signals could include tachometer or control track pulses from tape machines or frame rate pulses from film equipment. If a system is to be able to resolve to any or all of these, as well as to timecode and digital audio inputs, a very versatile ‘gearbox’ will be required to perform the relevant multiplications and divisions of synchronisation signals at different rates, such that they can be used to derive the internal sampling rate clock of the system. A stable voltage-controlled oscillator (VCO) and phase-locked loop are commonly used for this purpose.

Figure 8.13 (page 260) illustrates a possible conceptual diagram of synchronised operation, with a variety of references and a constant sampling rate output. The sampling frequency convertor is not necessary if suitably constant external relationships can be maintained between the different forms of sync signal and the audio sampling frequency.

8.6 System troubleshooting

8.6.1 Troubleshooting MIDI

When a MIDI system fails to perform as expected, or when devices appear not to be responding to data being transmitted from a controller, it is important to adopt logical fault-finding techniques rather than pressing every button in sight and starting to replug cables. The fault will normally be a simple one and there are only a limited number of possible causes. It is often worth starting at the end of the system nearest to the device that exhibits the problem and working backwards towards the controller, asking a number of questions as you go. You are basically trying to find out either where the control signal is getting lost or why the device is responding in a strange way.

image

Figure 8.13 Conceptual diagram of replay synchronised to one of a number of timing sources. Blocks of data are fetched from disk at a rate determined by the current sampling clock

Look at the hints in Figure 8.14. Firstly, is MIDI data getting to the device in question? Most devices have some means of indicating that they are receiving MIDI data, either by a flashing light on the front panel or some other form of display. Alternatively it is possible to buy small analysers which in their simplest form may do something like flashing a light if MIDI data is received. If data is getting to the device then the problem is probably either within the device or after its audio output. The most common mistake that people make is to think that they have a MIDI problem when in fact they have an audio problem. Check that the audio output is actually connected to something and that its destination is turned on and faded up. Plug in a pair of headphones to check if the device is responding to MIDI data. If sound comes out of the headphones then the problem most probably lies in the audio system.

If the device is receiving MIDI data but not producing an audio output, try setting the receive mode to ‘omni on’ so that it responds on all channels. If this works then the problem must be related to the way in which a particular channel’s data is being handled. Check that the device is enabled to receive on the MIDI channel in question. Check that the volume is set to something other than zero and that any external MIDI controllers assigned to volume are not forcing the volume to zero (such as any virtual faders in the sequencer package). Check that the voice assigned to the channel in question is actually assigned to an audio output that is connected to the outside world. Check that the main audio output control on the unit itself is turned up. Also try sending note messages for a number of different notes – it may be that the voice in question is not set up to respond over the whole note range.

If no MIDI data is reaching the device then move one step further back down the MIDI signal chain. Check the MIDI cable. Swap it for another one. If the device is connected to a MIDI router of some kind, check that the router input receiving the required MIDI data is routed to the output concerned. Try connecting a MIDI keyboard directly to the input concerned to see if the patch is working. If this works then the problem lies further up the chain, either in the MIDI interface attached to the controller or in the controller itself. If the controller is a computer with an external MIDI interface, it may be possible to test the MIDI port concerned. The setup software for the MIDI interface may allow you to enter a ‘Test’ mode in which you can send unspecified note data directly to the physical port concerned. This should test whether or not the MIDI interface is working. Most interfaces have lights to show when a particular port is receiving or transmitting data, and this can be used for test purposes. It may be that the interface needs to be reconfigured to match a changed studio setup. Now go back to the controller and make sure that you are sending data to the right output on the required MIDI channel and that you are satisfied, from what you know about it, that the software concerned should be transmitting.

image

Figure 8.14 A number of suggestions to be considered when troubleshooting a MIDI system

If no data is getting from the computer to the interface, check the cables to the interface. Then try resetting the interface and the computer. This sometimes re-establishes communication between the two. Reset the interface first, then the computer, so that the computer ‘sees’ the interface (this may involve powering down, then up). Alternatively, a soft reset may be possible using the setup software for the interface. If this does not work, check that no applications are open on the computer which might be taking over the interface ports concerned (some applications will not give up control over particular I/O ports easily). Check the configuration of any software MIDI routers within the computer to make sure that MIDI data is ‘connected’ from the controlling package to the I/O port in question.

Ask yourself the question: ‘Was it working the last time I tried it?’ If it was, it is unlikely that the problem is due to more fundamental reasons such as the wrong port drivers being installed in the system or a specific incompatibility between hardware and software, but it is worth thinking through what you have done to the system configuration since the last time it was used. It is possible that new software extensions or new applications may conflict with your previously working configuration, and removing them will solve the problem. Try using a different software package to control the device which is not responding. If this works then the problem is clearly with the original package. Assuming that the device in question had been responding correctly on a previous occasion, any change in response to MIDI messages such as program and control changes is most likely due either to an altered internal setup or a message getting to the device which was not intended for it.

Most of the internal setup parameters on a MIDI-controlled device are accessible either using the front panel or using system exclusive messages. It is often quite a long-winded process to get to the parameter in question using the limited front panel displays of many devices, but it may be necessary to do this in order to check the intended response to particular MIDI data. If the problem is one with unusual responses (or no response) to program change messages then it may be that the program change map has been altered and that a different stored voice or patch is being selected from the one intended. Perhaps the program change number in question is not assigned to a stored voice or patch at all. If the device is switching between programs when it should not then it may be that your MIDI routeing is at fault. Perhaps the device is receiving program changes intended for another. Check the configuration of your MIDI patcher or multiport interface. A similar process applies to controller messages. Check the internal mapping of controller messages to parameters, and check the external MIDI routing to make sure that devices are receiving only the information intended for them.

When more than one person uses a MIDI-controlled studio, or when you have a lot of different setups yourself, virtually the only way to ensure that you can reset the studio quickly to a particular state is to store system exclusive dumps of the full configuration of each device and to store any patcher or MIDI operating system maps. These can either be kept in separate librarian files or as part of a sequence, to be downloaded to the devices before starting the session. Once you have set up a configuration of a device that works for a particular purpose it should be stored on the computer so that it could be dumped back down again at a later date.

8.6.2 Digital interface troubleshooting

If a digital interface between two devices appears not to be working it could be due to one or more of the following conditions. These are covered in more detail in The Digital Interface Handbook (see Further reading).

Asynchronous sample rates

The two devices must normally operate at the same sampling frequency, preferably locked to a common reference. Ensure that the receiver is in external sync mode and that a synchronizing signal (common to the transmitter) is present at the receiver’s sync input. If the incoming signal’s transmitter cannot be locked to the reference it must be resynchronized or sample rate converted. Alternatively, set the receiver to lock to the clock contained in the digital audio input (standard two-channel interfaces only).

‘Sync’ or ‘locked’ indicator flashing or out on the receiver normally means that no sync reference exists or that it is different from that of the signal at the digital input. Check that sync reference and input are at correct rate and locked to the same source. Decide on whether to use internal or external sync reference, depending on application.

If problems with ‘good lock’ or drifting offset arise when locking to other machines or when editing, check that any timecode is synchronous with video and sampling rate.

Sampling frequency mode

The transmitter may be operating in the AES3 single-channel-double-sampling-frequency mode in which case successive sub-frames will carry adjacent samples of a single channel at twice the normal sampling frequency. This might sound like audio pitch-shifted downwards if decoded and converted by a standard receiver incapable of recognising this mode. Alternatively the devices may be operating at entirely different sampling frequencies and therefore not communicate.

Digital input

It may be that the receiver is not switched to accept a digital input.

Data format

Received data is in the wrong format. Both transmitter and receiver must operate to the same format. Conflicts may exist in such areas as channel status, and there may be a consumer–professional conflict. Use a format convertor to set the necessary flags.

Non-audio or ‘other uses’ set

The data transmitted over the interface may be data-reduced audio, such as AC-3 or DTS format. It can only be decoded by receivers specially designed for the task. The data will sound like noise if it is decoded and converted by a standard linear PCM receiver, but in such receivers it will normally be muted because of the indication in channel status and/or the validity bit.

Cables and connectors

Cables or connectors may be damaged or incorrectly wired. Cable may be too long, of the wrong impedance, or generally of poor quality. Digital signal may be of poor quality. Check eye height on scope against specification and check for possible noise and interference sources. Alternatively make use of an interface analyser.

SCMS (consumer interface only)

Copy protect or SCMS flag may be set by transmitter. For professional purposes, use a format convertor to set the necessary flags or use the professional interface which is not subject to SCMS.

Receiver mode

Receiver is not in record or input monitor mode. Some recorders must be at least in record–pause before they will give an audible and metered output derived from a digital input.

8.6.3 Troubleshooting software

This could form a book in its own right and depends a lot on the operating system and applications in question. There are, however, a few rules of thumb to be observed when trying to get software to work.

Firstly, make sure you have all the latest updates and revisions to the current system software and applications. Latest versions tend to be reasonably safe together. Patches and updates can often be downloaded from the Internet. Check also that the memory and CPU requirements of the application are met. Begin with a basic set of system extensions and don’t load any more software or extensions than you need onto an audio workstation. General purpose extensions and third-party software can sometimes conflict with the smooth operation of audio workstation packages and many people run only audio software on such platforms rather than trying to use them as general purpose computers as well.

Make sure that you are using the correct and latest drivers for any sound cards and MIDI interfaces in the system and that the disk interface and drivers are suitable for high speed audio and video operation.

Further reading

Beggs, J. and Thede, D. (2001) Designing Web Audio. O’Reilly and Associates.

Boer, J. (2002) Game Audio Programming. Charles River Media.

Marks, A. (2001) The Complete Guide to Game Audio. CMP Books.

Rumsey, F. (2001) Spatial Audio. Focal Press.

Rumsey, F. and Watkinson, J. (2004) The Digital Interface Handbook, third edition. Focal Press.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset