CHAPTER 9

Digital Recording, Editing and Mastering Systems

CHAPTER CONTENTS

Digital Tape Recording

Background to digital tape recording

Channel coding for dedicated tape formats

Error correction

Digital tape formats

Editing digital tape recordings

Mass Storage-based Systems

Magnetic hard disks

Optical discs

Memory cards

Recording audio on to mass storage media

Media formatting

Audio Processing for Computer Workstations

Introduction

DSP resources for audio processing

Audio processing architectures

Integrated sound cards

Mixing ‘in the box’

Mass Storage-based Editing System Principles

Introduction

Sound files and sound segments

Edit point handling

Crossfading

Editing modes

Simulation of ‘reel-rocking’

Editing Software

Mastering and Restoration

Specialized software

Level control and loudness in mastering

Preparing for and Understanding Release Media

Physical media

Downloads and streaming

Mastered for iTunes

 

This chapter describes digital audio recording systems and the principles of digital audio editing and mastering.

DIGITAL TAPE RECORDING

Although it is still possible to find examples of dedicated digital tape recording formats in use, they have largely been superseded by recording systems that use computer mass storage media. The economies of scale of the computer industry have made data storage relatively cheap and there is no longer a strong justification for systems dedicated to audio purposes. Tape has a relatively slow access time, because it is a linear storage medium. However, a dedicated tape format can easily be interchanged between recorders, provided that another machine operating to the same standard can be found. Computer mass storage media, on the other hand, come in a very wide variety of sizes and formats, and there are numerous levels at which compatibility must exist between systems before interchange can take place. This matter is discussed in the next chapter.

Background to digital tape recording

When commercial digital audio recording systems were first introduced in the 1970s and early 1980s it was necessary to employ recorders with sufficient bandwidth for the high data rates involved (a machine capable of handling bandwidths of a few megahertz was required). Analog audio tape recorders were out of the question because their bandwidths extended only up to around 35 kHz at best, so video tape recorders (VTRs) were often utilized because of their wide recording bandwidth. PCM adaptors converted digital audio data into a waveform which resembled a television waveform, suitable for recording on to a VTR. The Denon company of Japan developed such a system in partnership with the NHK broadcasting organization and they released the world’s first PCM recording onto LP in 1971. In the early 1980s, devices such as Sony’s PCM-F1 became available at modest prices, allowing 16 bit, 44.1 kHz digital audio to be recorded on to a consumer VTR, resulting in widespread proliferation of stereo digital recording. Dedicated open-reel digital recorders using stationary heads were also developed (see Fact File 9.1). High-density tape formulations were then manufactured for digital use, and this, combined with new channel codes (see below), improvements in error correction and better head design, led to the use of a relatively low number of tracks per channel, or even single-track recording of a given digital signal, combined with playing speeds of 15 or 30 inches per second. Dedicated rotary-head systems, not based on a VTR, were also developed — the R-DAT format being the most well-known.

FACT FILE 9.1 ROTARY AND STATIONARY HEADS

There are two fundamental mechanisms for the recording of digital audio on tape, one which uses a relatively low linear tape speed and a quickly rotating head, and one which uses a fast linear tape speed and a stationary head. In the rotary-head system the head either describes tracks almost perpendicular to the direction of tape travel, or it describes tracks which are almost in the same plane as the tape travel. The former is known as transverse scanning, and the latter is known as helical scanning, as shown in (a). Transverse scanning uses more tape when compared with helical scanning. It is not common for digital tape recording to use the transverse scanning method. The reason for using a rotary head is to achieve a high head-to-tape speed, since it is this which governs the available bandwidth. Rotary-head recordings cannot easily be splice-edited because of the track pattern, but they can be electronically edited using at least two machines.

Stationary heads allow the design of tape machines that are very similar in many respects to analog transports. With stationary-head recording it is possible to record a number of narrow tracks in parallel across the width of the tape, as shown in (b). Tape speed can be traded off against the number of parallel tracks used for each audio channel, since the required data rate can be made up by a combination of recordings made on separate tracks. This approach was used in the DASH format, where the tape speed could be 30ips (76cms−1) using one track per channel, 15 ips using two tracks per channel, or 7.5ips using four tracks per channel.

image

Digital recording tape is thinner (27.5 microns) than that used for analog recordings; long playing times can be accommodated on a reel, but also thin tape contacts the machine’s heads more intimately than does standard 50 micron thickness tape which tends to be stiffer. Intimate contact is essential for reliable recording and replay of such a densely packed and high bandwidth signal.

FACT FILE 9.2 DATA RECOVERY

Channel-coded data must be decoded on replay, but first the audio data must be separated from the clock information which was combined with it before recording. This process is known as data and sync separation, as shown in (a).

It is normal to use a phase-locked loop for the purpose of regenerating the clock signal from the replayed data, as shown in (b), this being based around a voltage-controlled oscillator (VCO) which runs at some multiple of the off-tape clock frequency. A phase comparator compares the relative phases of the divided VCO output and the clock data off tape, producing a voltage proportional to the error which controls the frequency of the VCO. With suitable damping, the phase-locked oscillator will ‘flywheel’ over short losses or irregularities of the off-tape clock.

Recorded data is usually interspersed with synchronizing patterns in order to give the PLL in the data separator a regular reference in the absence of regular clock data from the encoded audio signal, since many channel codes have long runs without a transition. Even if the off-tape data and clock have timing irregularities, such as might manifest themselves as ‘wow’ and ‘flutter’ in analog reproducers (see Chapter 18), these can be removed in digital systems. The erratic data (from tape or disk, for example) is written into a short-term solid state memory (RAM) and read out again a fraction of a second later under control of a crystal clock (which has an exceptionally stable frequency), as shown in (c). Provided that the average rate of input to the buffer is the same as the average rate of output, and the buffer is of sufficient size to soak up short-term irregularities in timing, the buffer will not overflow or become empty.

image

Channel coding for dedicated tape formats

Since ‘raw’ binary data is normally unsuitable for recording directly by dedicated digital recording systems, a ‘channel code’ is used which matches the data to the characteristics of the recording system, uses storage space efficiently, and makes the data easy to recover on replay. A wide range of channel codes exists, each with characteristics designed for a specific purpose. The channel code converts a pattern of binary data into a different pattern of transitions in the recording or transmission medium. It is another stage of modulation, in effect. Thus the pattern of bumps in the optical surface of a CD bears little resemblance to the original audio data, and the pattern of magnetic flux transitions on a DAT cassette would be similarly different. Given the correct code book, one could work out what audio data was represented by a given pattern from either of these systems.

image

FIGURE 9.1 Examples of three channel codes used in digital recording. Miller-squared is the most efficient of those shown since it involves the smallest number of transitions for the given data sequence.

Some examples of channel codes used in audio systems are shown in Figure 9.1. FM is the simplest, being an example of binary frequency modulation. It is otherwise known as ‘biphase mark’, one of the Manchester codes, and is the channel code used by SMPTE/EBU timecode (see Chapter 15). MFM and Miller-squared are more efficient in terms of recording density. MFM is more efficient than FM because it eliminates the transitions between successive ones, only leaving them between successive zeros. Miller-squared eliminates the DC content present in MFM by removing the transition for the last one in an even number of successive ones.

Group codes, such as that used in the Compact Disc and R-DAT, involve the coding of patterns of bits from the original audio data into new codes with more suitable characteristics, using a look-up table or ‘code book’ to keep track of the relationship between recorded and original codes. This has clear parallels with coding as used in intelligence operations, in which the recipient of a message requires the code book to be able to understand the message. CD uses a method known as 8-to-14 modulation, in which 16 bit audio sample words are each split into two 8 bit words, after which a code book is used to generate a new 14 bit word for each of the 256 possible combinations of 8 bits. Since there are many more words possible with 14 bits than with 8, it is possible to choose those which have appropriate characteristics for the CD recording channel. In this case, it is those words which have no more than 11 consecutive bits in the same state, and no less than three. This limits the bandwidth of the recorded data, and makes it suitable for the optical pickup process, whilst retaining the necessary clock content.

Error correction

There are two stages to the error correction process used in digital tape recording systems. First, the error must be detected, and then it must be corrected. If it cannot be corrected then it must be concealed. In order for the error to be detected it is necessary to build in certain protection mechanisms.

Two principal types of error exist: the burst error and the random error. Burst errors result in the loss of many successive samples and may be due to major momentary signal loss, such as might occur at a tape drop-out or at an instant of impulsive interference such as an electrical spike induced in a cable or piece of dirt on the surface of a CD. Burst error correction capability is usually quoted as the number of consecutive samples which may be corrected perfectly. Random errors result in the loss of single samples in randomly located positions, and are more likely to be the result of noise or poor signal quality. Random error rates are normally quoted as an average rate, for example 1 in 106. Error correction systems must be able to cope with the occurrence of both burst and random errors in close proximity.

Audio data is normally interleaved before recording, which means that the order of samples is shuffled (as shown conceptually in Figure 9.2). Samples that had been adjacent in real time are now separated from each other on the tape. The benefit of this is that a burst error, which destroys consecutive samples on tape, will result in a collection of single-sample errors in between good samples when the data is deinterleaved, allowing for the error to be concealed. A common process, associated with interleaving, is the separation of odd and even samples by a delay. The greater the interleave delay, the longer the burst error that can be handled. Redundant data is also added before recording. Redundancy, in simple terms, involves the recording of data in more than one form or place so that if it is damaged in one place it can be retrieved from another.

image

FIGURE 9.2 Interleaving is used in digital recording and broadcasting systems to rearrange the original order of samples for storage or transmission. This can have the effect of converting burst errors into random errors when the samples are deinterleaved.

Cyclic redundancy check (CRC) codes, calculated from the original data and recorded along with that data, are used in many systems to detect the presence and position of errors on replay. Complex mathematical procedures are also used to form codewords from audio data which allow for both burst and random errors to be corrected perfectly up to a given limit. Reed-Solomon encoding is another powerful system which is used to protect digital recordings against errors, but it is beyond the scope of this book to cover these codes in detail.

Up to a certain random error rate or burst error duration an error correction system will be able to reconstitute erroneous samples perfectly. Such corrected samples are indistinguishable from the originals, and sound quality will not be affected. Such errors are often signaled by green lights showing ‘CRC’ failure or ‘Parity’ failure. When the error rate exceeds the limits for perfect correction, interpolation between good samples can be used to arrive at a value for a missing sample. The interpolated value is the mathematical average of the foregoing and succeeding samples. This process is also known as concealment or averaging, and the audible effect is not unpleasant, although it will result in a temporary reduction in audio bandwidth. In extreme cases a system may ‘hold’. In other words, it will repeat the last correct sample value. The audible effect of this will not be marked in isolated cases, but is still a severe condition. Most systems will not hold for more than a few samples before muting. Hold is normally indicated by a red light. When an error correction system is completely overwhelmed it will usually mute the audio output of the system. The alternative to muting is to hear the output, regardless of the error. Depending on the severity of the error, it may sound like a small ‘spit’, click, or even a more severe breakup of the sound.

Digital tape formats

There have been a number of commercial recording formats over the last 20 years, and only a brief summary will be given here of the most common.

Sony’s PCM-1610 and PCM-1630 adaptors dominated the CD-mastering market for a number of years, although by today’s standards they used a fairly basic recording format and relied on 60 Hz/525 line U-matic cassette VTRs (Figure 9.3). The system operated at a sampling rate of 44.1 kHz and used 16 bit quantization, being designed specifically for the making of tapes to be turned into CDs. Recordings made in this format could be electronically edited using the Sony DAE3000 editing system, and the playing time of tapes ran up to 75 minutes using a tape specially developed for digital audio use.

image

FIGURE 9.3 Sony DMR-4000 digital master recorder. (Courtesy of Sony Broadcast and Professional Europe.)

The R-DAT or DAT format was a small stereo, rotary-head, cassette-based format offering a range of sampling rates and recording times, including the professional rates of 44.1 and 48 kHz. Originally, consumer machines operated at 48 kHz to avoid the possibility for digital copying of CDs, but professional versions became available which would record at either 44.1 or 48 kHz. Consumer machines would record at 44.1kHz, but usually only via the analog inputs. DAT was a 16 bit format, but had a non-linearly encoded long-play mode as well, sampled at 32 kHz. Truly professional designs offering editing facilities, external sync and IEC-standard timecode were also developed. The format became exceptionally popular with professionals owing to its low cost, high performance, portability and convenience. Various non-standard modifications were introduced, including a 96 kHz sampling rate machine and adaptors enabling the storage of 20 bit audio on such a high sampling rate machine (sacrificing the high sampling rate for more bits). The IEC timecode standard for R-DAT was devised in 1990. It allowed for SMPTE/EBU timecode of any frame rate to be converted into the internal DAT ‘running-time’ code, and then converted back into any SMPTE/EBU frame rate on replay. A typical machine is pictured in Figure 9.4.

image

FIGURE 9.4 Sony PCM-7030 professional DAT machine. (Courtesy of Sony Broadcast and Professional Europe.)

image

FIGURE 9.5 Nagra-D open-reel digital tape recorder. (Courtesy of Sound PR.)

The Nagra-D recorder (Figure 9.5) was designed as a digital replacement for the world-famous Nagra analog recorders, and as such was intended for professional use in field recording and studios. The format was designed to have considerable commonality with the audio format used in D1 — and D2-format digital VTRs, having rotary heads, although it used open reels for operational convenience. Allowing for 20–24 bits of audio resolution, the Nagra-D format was appropriate for use with high-resolution convertors. The error correction and recording density used in this format were designed to make recordings exceptionally robust, and recording time could be up to 6 hours on a 7 inch (18cm) reel, in two-track mode. The format was also designed for operation in a four-track mode at twice the stereo tape speed, such that in stereo the tape travels at 4.75cms−1, and in four track at 9.525cms−1.

The DASH (Digital Audio Stationary Head) format consisted of a whole family of open-reel stationary-head recording formats from two tracks up to 48 tracks. DASH-format machines operated at 44.1kHz or 48 kHz rates (and sometimes optionally at 44.056kHz), and they allowed varispeed ±12.5%. They were designed to allow gapless punch-in and punch-out, splice editing, electronic editing and easy synchronization. Multitrack DASH machines (an example is shown in Figure 9.6) gained wide acceptance in studios, but the stereo machines did not. Later developments resulted in DASH multitracks capable of storing 24 bit audio instead of the original 16 bits.

Subsequently budget modular multitrack formats were introduced. Most of these were based on eight-track cassettes using rotary head transports borrowed from consumer video technology. The most widely used were the DA-88 format (based on Hi-8 cassettes) and the ADAT format (based on VHS cassettes). These offered most of the features of open-reel machines and a number of them could be synchronized to expand the channel capacity. An example is shown in Figure 9.7.

image

FIGURE 9.6 An open-reel digital multitrack recorder: the Sony PCM-3348. (Courtesy of Sony Broadcast and Professional Europe.)

image

FIGURE 9.7 A modular digital multitrack machine, Sony PCM-800. (Courtesy of Sony Broadcast and Professional Europe.)

Editing digital tape recordings

Razor blade cut-and-splice editing was possible on open-reel digital formats, and the analog cue tracks were monitored during these operations. The thin tape could easily be damaged during the cut-and-splice edit procedure and this method failed to gain an enthusiastic following, despite its having been the norm in the analog world. Electronic editing was far more desirable, and was the usual method.

Electronic editing normally required the use of two machines plus a control unit, as shown in the example in Figure 9.8. A finished master tape was assembled from source takes on player machines and the original source tape was left unaltered. This was a relatively slow process, as it involved real-time copying of audio from one machine to another, and modifications to the finished master were difficult. A crossfade was introduced at edits to smooth the join.

image

FIGURE 9.8
In electronic tape copy editing selected takes are copied in sequence from player to recorder with appropriate crossfades at joins.

MASS STORAGE-BASED SYSTEMS

Once audio is in a digital form it can be handled by a computer, like any other data. The only real difference is that audio requires a high sustained data rate, substantial processing power and large amounts of storage compared with more basic data such as text. The following is an introduction to some of the technology associated with computer-based audio workstations and audio recording using computer mass storage media such as hard disks. The MIDI-based aspects of such systems are covered in Chapter 14, while file formats and interchange are introduced in Chapter 10.

Magnetic hard disks

Magnetic hard disk drives are random-access systems — in other words any data can be accessed at random and with only a short delay. There exist both removable and fixed media disk drives, but in almost all cases the fixed media drives have a higher performance than removable media drives. This is because the design tolerances can be made much finer when the drive does not have to cope with removable media, allowing higher data storage densities to be achieved. Some disk drives have completely removable drive cartridges containing the surfaces and mechanism, enabling hard disk drives to be swapped between systems for easy project management (an example is shown in Figure 9.9).

The general structure of a hard disk drive is shown in Figure 9.10. It consists of a motor connected to a drive mechanism that causes one or more disk surfaces to rotate at anything from a few hundred to many thousands of revolutions per minute. This rotation may either remain constant or may stop and start, and it may either be at a constant rate or a variable rate, depending on the drive. One or more heads are mounted on a positioning mechanism which can move the head across the surface of the disk to access particular points, under the control of hardware and software called a disk controller. The heads read data from and write data to the disk surface.

image

FIGURE 9.9 A typical removable disk drive system allowing multiple drives to be inserted or removed from the chassis at will. Frame housing multiple removable drives. (Courtesy of Glyph Technologies Inc.)

The disk surface is normally divided up into tracks and sectors, not physically but by means of ‘soft’ formatting (see Figure 9.11). Low-level formatting places logical markers, which indicate block boundaries, amongst other processes. On most hard disks the tracks are arranged as a series of concentric rings, but with some optical discs there is a continuous spiral track.

image

FIGURE 9.10 The general mechanical structure of a disk drive.

Disk drives look after their own channel coding, error detection and correction so there is no need for system designers to devise dedicated audio processes for disk-based recording systems. The formatted capacity of a disk drive is all available for the storage of ‘raw’ audio data, with no additional overhead required for redundancy and error checking codes. ‘Bad blocks’ are mapped out during the formatting of a disk, and not used for data storage. If a disk drive detects an error when reading a block of data it will attempt to read it again. If this fails then an error is normally generated and the file cannot be accessed, requiring the user to resort to one of the many file recovery packages on the market. Disk-based audio systems do not resort to error interpolation or sample hold operations, unlike tape recorders. Replay is normally either correct or not possible.

image

FIGURE 9.11 Disk formatting divides the storage area into tracks and sectors.

RAID arrays enable disk drives to be combined in various ways as described in Fact File 9.3.

Optical discs

There are a number of families of optical disc drive that have differing operational and technical characteristics, although they share the universal benefit of removable media. They are all written and read using a laser, which is a highly focused beam of coherent light, although the method by which the data is actually stored varies from type to type. Optical discs are sometimes enclosed in a plastic cartridge that protects the disc from damage, dust and fingerprints, and they have the advantage that the pickup never touches the disc surface making them immune from the ‘head crashes’ that can affect magnetic hard disks. Drives split between those that handle CD/DVD/BD formats (see Fact File 9.4) and those that handle magneto-optical (M-O) and other cartridge-type ISO standard disc formats. The latter were considered more suitable for ‘professional purposes’ whereas the former are often encountered in consumer equipment. As mass storage media for computers optical media have declined in importance in recent years, as capacity and speed have failed to keep up with magnetic hard drives, making them less useful for backup and secondary storage. However they remain important as consumer data storage and distribution media, particularly in recent formats such as Blu-Ray Disc.

FACT FILE 9.3 RAID ARRAYS

Hard disk drives can be combined in various ways to improve either data integrity or data throughput. RAID stands for Redundant Array of Independent Disks, and is a means of linking ordinary disk drives under one controller so that they form an array of data storage space. A RAID array can be treated as a single volume by a host computer. Historically there were a number of basic levels of RAID array, each of which was designed for a slightly different purpose, as summarized in the table. However there are also hybrid and non-standard alternatives that combine different features of multi-drive arrays. Recent classifications divide arrays into three categories: ‘Failure resistant’, ‘Failure tolerant’ and ‘Disaster tolerant’, based on increasing ability to withstand various failure and protection criteria. It’s now left up to manufacturers how to implement these criteria.

RAID Features level
0 Data blocks split alternately between a pair of disks, but no redundancy so actually less reliable than a single disk. Transfer rate is higher than a single disk. Can improve access times by intelligent controller positioning of heads so that next block is ready more quickly.
1 Offers disk mirroring. Data from one disk is automatically duplicated on another. A form of real-time backup.
2 Uses bit interleaving to spread the bits of each data word across the disks, so that, say, eight disks each hold one bit of each word, with additional disks carrying error protection data. Non-synchronous head positioning. Slow to read data, and designed for mainframe computers.
3 Similar to level 2, but synchronizes heads on all drives, and ensures that only one drive is used for error protection data. Allows high-speed data transfer, because of multiple disks in parallel. Cannot perform simultaneous read and write operations.
4 Writes whole blocks sequentially to each drive in turn, using one dedicated error protection drive. Allows multiple read operations but only single write operations.
5 As level 4 but splits error protection between drives, avoiding the need for a dedicated check drive. Allows multiple simultaneous reads and writes.
6 As level 5 but incorporates RAM caches for higher performance.

FACT FILE 9.4 CONSUMER OPTICAL DISK FORMATS

Compact discs and drives

Compact Discs (CDs) are familiar to most people as a consumer read-only optical disc for audio (CD-DA) or data (CD-ROM) storage. Standard audio CDs (CD-DA) conform to the Red Book standard published by Philips. The CD-ROM standard (Yellow Book) divides the CD into a structure with 2048 byte sectors, adds an extra layer of error protection, and makes it useful for general purpose data storage including the distribution of sound and video in the form of computer data files. It is possible to find discs with mixed modes, containing sections in CD-ROM format and sections in CD-Audio format.

CD-R is the recordable CD, and may be used for recording CD-Audio format or other CD formats using a suitable drive and software. The Orange Book, Part 2, contains information on the additional features of CD-R, such as the area in the center of the disc where data specific to CD-R recordings is stored. Audio CDs recorded to the Orange Book standard can be ‘fixed’ to give them a standard Red Book table of contents (TOC), allowing them to be replayed on any conventional CD player. Once fixed into this form, the CD-R may not subsequently be added to or changed, but prior to this there is a certain amount of flexibility, as discussed below. CD-RW discs are erasable and work on phase-change principles, requiring a drive compatible with this technology, being described in the Orange Book, Part 3.

DVD

DVD was the natural successor to CD, being a higher density optical disc format aimed at the consumer market, having the same diameter as CD and many similar physical features. It uses a different laser wavelength to CD (635–650nm as opposed to 780nm) so multi-standard drives need to be able to accommodate both. Data storage capacity depends on the number of sides and layers to the disc, but ranges from 4.7Gbytes (single-layer, single-sided) up to about 18Gbytes (double-layer, double-sided). The data transfer rate at ‘one times’ speed is just over 11 Mbit/s.

DVD can be used as a general purpose data storage medium. Like CD, there are numerous different variants on the recordable DVD, partly owing to competition between the numerous different ‘factions’ in the DVD consortium. These include DVD-R, DVD-RAM, DVD-RW and DVD + RW, all of which are based on similar principles but have slightly different features, leading to a compatibility minefield. The ‘DVD Multi’ guidelines produced by the DVD Forum were an attempt to foster greater compatibility between DVD drives and discs, and many drives are now available that will read and write most of the DVD formats.

DVD-Video is the format originally defined for consumer distribution of movies with surround sound, typically incorporating MPEG-2 video encoding and Dolby Digital surround sound encoding. It also allows for up to eight channels of 48 or 96kHz linear PCM audio, at up to 24 bit resolution. DVD-Audio was intended for very high-quality multichannel audio reproduction and allowed for linear PCM sampling rates up to 192 kHz, with numerous configurations of audio channels for different surround modes, and optional lossless data reduction (MLP). However, it has not been widely adopted in the commercial music industry.

Super Audio CD (SACD)

Version 1.0 of the SACD specification is described in the ‘Scarlet Book’, available from Philips licensing department. SACD uses DSD (Direct Stream Digital) as a means of representing audio signals, as described in Chapter 8, so requires audio to be sourced in or converted to this form. SACD aims to provide a playing time of at least 74 minutes for both two-channel and six-channel balances. The disc is divided into two regions, one for two-channel audio, the other for multichannel. A lossless data packing method known as Direct Stream Transfer (DST) can be used to achieve roughly 2:1 data reduction of the signal stored on disc so as to enable high-quality multichannel audio on the same disc as the two channel mix. SACD has only achieved a relatively modest market penetration compared with formats such as CD and DVD-Video, but is still used by some specialized high-quality record labels. SACDs can be manufactured as single- or dual-layer discs, with the option of the second layer being a Red Book CD layer (the so-called ‘hybrid disc’ that will also play on a normal CD player).

Blu-Ray disk

The Blu-Ray disk is a higher density optical disk format than DVD, which uses a shorter wavelength blue-violet laser (wavelength 405nm) to achieve a high packing density of data on the disk surface. Single-layer disks offer 25Gbytes of storage and dual-layer disks offer 50Gbytes, and the basic transfer rate is also higher than DVD at around 36Mbit/s although a higher rate of 54Mbit/s is required for HD movie replay, which is achieved by using at least 1.5 times playback speed. Like DVD, a range of read-only, writeable and rewriteable formats is possible. There is an audio-only version of the player specification, known as BD-Audio, which does not have to be able to decode video, making possible a high-resolution surround playback format that might offer an alternative to DVD-Audio or SACD. Audio-only transfer rates of the disk vary depending on the format concerned.

As far as audio formats are concerned, Linear PCM, Dolby Digital and DTS Digital Surround are mandatory in Blu-Ray players and recorders, but it is up to individual studios to decide what formats to include on their disk releases. Alternative optional audio formats include higher-resolution versions of Dolby and DTS formats, known as Dolby Digital Plus and DTS-HD respectively, as well as losslessly encoded versions known as Dolby TrueHD and DTS-HD Master Audio. High sampling frequencies (up to 192 kHz) are possible on Blu-Ray, as are audio sample resolutions of 16, 20 or 24 bits. The standard limits audio reproduction to six channels of 192kHz, 24 bit uncompressed digital audio, which gives rise to a data transfer rate of 27.7 Mbit/s.

Pure Audio Blu-Ray is a variant promoted by msm Studios that has a number of audio-specific features, such as Java code to enable easy remote control, select audio format and download content over a network. These discs are intended to be playable in any standard Profile 2.0 BD player.

WORM (Write-Once-Read-Many) discs may only be written once by the user, after which the recording is permanent (a CD-R is therefore a type of WORM disc). Other types of optical discs can be written numerous times, either requiring pre-erasure or using direct overwrite methods (where new data is simply written on top of old, erasing it in the process). The read/ write process of most current rewritable discs is typically ‘phase change’ or ‘magneto-optical’. The CD-RW is an example of a rewritable disc that now uses direct overwrite principles.

Memory cards

Increasing use is also made in audio systems of small flash memory cards, particularly in portable recorders. These cards are capable of storing many gigabytes of data on a solid state chip with fast access time, and they have no moving parts which makes them relatively robust. Additionally they have the benefit of being removable, which makes them suitable for transfer of some projects between systems, although the capacity and speed limitations still make disks the medium of choice for large professional projects. Such memory cards come in a variety of formats such as Compact Flash (CF), Secure Digital (SD) and Memory Stick, and card readers can be purchased that will read multiple types. There is a limit to the number of times such devices can be rewritten, which is likely to be lower than that for a typical magnetic disk drive.

A number of digital audio recording systems are available that use memory cards as the primary storage medium, having the advantage of minimal mechanical noise pickup by onboard microphones, as well as portability, low power consumption and compactness. One example of a stereo studio recorder using memory cards is shown in Figure 9.12, and these are becoming the natural successor to DAT recorders. Files are typically stored in Broadcast WAVE format (see Chapter 10).

Recording audio on to mass storage media

Mass storage media need to offer at least a minimum level of performance capable of handling the data rates and capacities associated with digital audio, as described in Fact File 9.5. Most standard drives now have no problem meeting this requirement for numerous audio channels.

image

FIGURE 9.12
The Tascam HS-2 is a solid state memory card recorder, capable of stereo recording at up to 192 kHz, 24 bit resolution. (Courtesy of Tascam.)

FACT FILE 9.5 STORAGE REQUIREMENTS OF DIGITAL AUDIO

The table shows the data rates required to support a single channel of digital audio at various resolutions. Media to be used as primary storage would need to be able to sustain data transfer at a number of times these rates to be useful for multimedia workstations. The table also shows the number of megabytes of storage required per minute of audio, showing that the capacity needed for audio purposes is considerably greater than that required for text or simple graphics applications. Storage requirements increase pro rata with the number of audio channels to be handled.

Sampling rate Resolution Bit rate Capacity/min Capacity/hour
kHz bits kbit/s Mbytes/min Mbytes/hour
192 24 4608 33.0 1980
96 24 2304 16.5 989
88.1 16 1410 10.1 605
48 20 960 6.9 412
48 16 768 5.5 330
44.1 16 706 5.0 303
44.1 8 353 2.5 151

Data rates and capacities for linear PCM

The discontinuous ‘bursty’ nature of recording onto such media usually requires the use of a buffer RAM (Random Access Memory) during replay, which accepts this interrupted data stream and stores it for a short time before releasing it as a continuous stream. It performs the opposite function during recording, as shown in Figure 9.13. Several things cause a delay in the retrieval of information from disks: the time it takes for the head positioner to move across a disk, the time it takes for the required data in a particular track to come around to the pickup head, and the transfer of the data from the disk via the buffer RAM to the outside world, as shown in Figure 9.14. Total delay, or data access time, is usually several milliseconds, depending on the rotational speed. The instantaneous rate at which the system can accept or give out data is called the transfer rate and varies with the storage device.

Sound is stored in named data files on the disk, the files consisting of a number of blocks of data stored either separately or together. A directory stored on the disk keeps track of where the blocks of each file are stored so that they can be retrieved in correct sequence. Each file normally corresponds to a single recording of a single channel of audio, although some stereo file formats exist (see Chapter 10).

image

FIGURE 9.13
RAM buffering is used to convert burst data flow to continuous data flow, and vice versa.

image

FIGURE 9.14
The delays involved in accessing a block of data stored on a disk.

Multiple channels are handled by accessing multiple files from the disk in a time-shared manner, with synchronization between the tracks being performed subsequently in RAM. The storage capacity of a disk can be divided between channels in whatever proportion is appropriate, and it is not necessary to pre-allocate storage space to particular audio channels. A feature of the disk system is that unused storage capacity is not necessarily ‘wasted’ as can be the case with a tape system. During recording of a multitrack tape there will often be sections on each track with no information recorded, but that space cannot be allocated elsewhere. On a disk these gaps do not occupy storage space and can be used for additional space on other channels at other times.

The number of audio channels that can be recorded or replayed simultaneously depends on the performance of the storage device, interface, drivers and host computer. Slow systems may only be capable of handling a few channels whereas faster systems with multiple disk drives may be capable of expansion up to a virtually unlimited number of channels. External disks are usually connected using high-speed serial interfaces such as Firewire (IEEE 1394) or Thunderbolt (see Fact File 9.6), and as desktop computers get faster and more capable there is no longer a strong need to have dedicated cards for connecting audio-only disk drives. These days one or more of the host computer’s internal or external disks is usually employed, although it is often recommended that this is not the same disk as used for system software in order to avoid conflicts of demand between system housekeeping and audio needs.

FACT FILE 9.6 PERIPHERAL INTERFACES

A variety of different physical interfaces can be used for interconnecting storage devices and host workstations. Some are internal buses only designed to operate over limited lengths of cable and some are external interfaces that can be connected over several meters. The interfaces can be broadly divided into serial and parallel types, the serial types being by far the most common now. The disk interface can be slower than the drive attached to it in some cases, making it into a bottleneck in some applications. There is no point having a super-fast disk drive if the interface cannot handle data at that rate.

SCSI

For many years the most commonly used interface for connecting mass storage media to host computers was SCSI (the Small Computer Systems Interface), pronounced ‘scuzzy’. It is still used quite widely for very high performance applications but other interfaces have taken over for desktop systems. SCSI is a high-speed parallel interface, originally allowing up to seven peripheral devices to be connected to a host on a single bus. SCSI grew through a number of improvements and revisions, resulting in various Ultra SCSI standards. Serial Attached SCSI (SAS) interfaces retain many of the features of SCSI but use a serial format, while iSCSI is designed to be used over Internet connections, usually based on Ethernet.

ATA/IDE

The ATA and IDE family of interfaces evolved through the years as the primary internal interface for connecting disk drives to PC system buses. It is cheap and ubiquitous, also with various ‘Ultra’ versions running at high speed. ATAPI (ATA Packet Interface) is a variant used for storage media such as CD drives. Serial ATA (SATA) is designed to enable disk drives to be interfaced serially, thereby reducing the physical complexity of the interface. It is intended primarily for internal connection of disks within host workstations, rather than as an external interface like USB or Firewire, although a version known as eSATA is suitable for external drives.

Firewire and USB

Firewire and USB are both serial interfaces for connecting external peripherals. They both enable disk drives to be connected in a very simple manner, with high transfer rates (many hundreds of megabits per second), although USB 1.0 devices are limited to 12Mbit/s. A key feature of these interfaces is that they can be ‘hot plugged’ (in other words devices can be connected and disconnected with the power on). The interfaces also supply basic power that enables some simple devices to be powered from the host device. Interconnection cables can usually be run up to between 5 and 10 meters, depending on the cable and the data rate.

Thunderbolt

Thunderbolt is a very high speed serial interface developed by Apple and Intel that combines PCIe (PCI Express) data and DisplayPort data over the same cable, using a meta-protocol, along with 10 watts of power to peripherals. It is capable of 10Gbit/s transfer rates. Up to six peripherals can be daisy-chained.

Media formatting

The process of formatting a storage device erases all of the information in the volume. (It may not actually do this, but it rewrites the directory and volume map information to make it seem as if the disk is empty again.) Effectively the volume then becomes virgin territory again and data can be written anywhere.

When a disk is formatted at a low level the sector headers are written and the bad blocks mapped out. A map is kept of the locations of bad blocks so that they may be avoided in subsequent storage operations. Low-level formatting can take quite a long time as every block has to be addressed. During a high-level format the disk may be subdivided into a number of ‘partitions’. Each of these partitions can behave as an entirely independent ‘volume’ of information, as if it were a separate disk drive (see Figure 9.15). It may even be possible to format each partition in a different way, such that a different filing system may be used for each partition. Each volume then has a directory created, which is an area of storage set aside to contain information about the contents of the disk. The directory indicates the locations of the files, their sizes and various other vital statistics.

image

FIGURE 9.15 A disk may be divided up into a number of different partitions, each acting as an independent volume of information.

The most common general purpose filing systems in audio workstations are HFS (Hierarchical Filing System) or HFS Plus (for Mac OS), FAT 32 (File Allocation Table, for Windows PCs) and NTFS (for Windows NT and 2000). The Unix operating system is used on some multi-user systems and high-powered workstations and also has its own filing system. These were not designed principally with real-time requirements such as audio and video replay in mind but they have the advantage that disks formatted for a widely used filing system will be more easily interchangeable than those using proprietary systems. Further information about audio file formats and interchange is provided in the next chapter.

image

FIGURE 9.16 At (a) a file is stored in three contiguous blocks and these can be read sequentially without moving the head. At (b) the file is fragmented and is distributed over three remote blocks, involving movement of the head to read it. The latter read operation will take more time.

When an erasable volume like a hard disk has been used for some time there will be a lot of files on the disk, and probably a lot of small spaces where old files have been erased. New files must be stored in the available space and this may involve splitting them up over the remaining smaller areas. This is known as disk fragmentation, and it seriously affects the overall performance of the drive. The reason is clear to see from Figure 9.16. More head seeks are required to access the blocks of a file than if they had been stored contiguously, and this slows down the average transfer rate considerably. It may come to a point where the drive is unable to supply data fast enough for the purpose.

There are only two solutions to this problem: one is to reformat the disk completely (which may be difficult, if one is in the middle of a project), the other is to optimize or consolidate the storage space. Various software utilities exist for this purpose, whose job is to consolidate all the little areas of free space into fewer larger areas. They do this by juggling the blocks of files between disk areas and temporary RAM — a process that often takes a number of hours. Power failure during such an optimization process can result in total corruption of the drive, because the job is not completed and files may be only half moved, so it is advisable to back up the drive before doing this. It has been known for some such utilities to make the files unusable by some audio editing packages, because the software may have relied on certain files being in specific physical places, so it is wise to check first with the manufacturer.

AUDIO PROCESSING FOR COMPUTER WORKSTATIONS

Introduction

Most audio processing now takes place within the workstation, usually relying either on the host computer’s processing power (using the CPU to perform signal processing operations) or on one or more DSP (digital signal processing) cards attached to an expansion bus. Professional systems usually use external A/D and D/A convertors, connected to a ‘core’ card attached to the computer’s expansion bus. This is because it is often difficult to obtain the highest technical performance from convertors mounted on internal sound cards, owing to the relatively ‘noisy’ electrical environment inside most computers. Furthermore, the number of channels required may not fit onto an internal card. As more and more audio work takes place entirely in the digital domain, though, the need for analog convertors decreases. Digital interfaces are also often provided on external ‘breakout boxes’, partly for convenience and partly because of physical size of the connectors. Compact connectors such as the optical connector used for the ADAT eight-channel interface or the two-channel SPDIF phono connector are accommodated on some cards, but multiple AES/EBU connectors cannot be.

It is also becoming increasingly common for substantial audio processing power to exist on integrated sound cards that contain digital interfaces and possibly A/D and D/A convertors. These cards are typically used for consumer or semi-professional applications on desktop computers, although many now have very impressive features and can be used for advanced operations. Such cards are now available in ‘full duplex’ configurations that enable audio to be received by the card from the outside world, processed and/or stored, then routed back to an external device. Full duplex operation usually allows recording and replay simultaneously.

Sound cards and DSP cards are commonly connected to the workstation using the PCI (peripheral component interface) expansion bus. Older ISA (PC) buses or NuBus (Mac) slots did not have the same data throughput capabilities and performance was therefore somewhat limited. PCI or the more recent PCI Express (PCIe) bus can be extended to an external expansion chassis that enables a larger number of cards to be connected than allowed for within the host computer. Sufficient processing power can now be installed for the workstation to become the audio processing ‘heart’ of a larger studio system, as opposed to using an external mixing console and effects units. The higher the sampling frequency, the more DSP operations will be required per second, so it is worth bearing in mind that going up to, say, 96 kHz sampling frequency for a project will require double the processing power and twice the storage space compared with 48 kHz. The same is true of increasing the number of channels to which processing is applied.

The issue of latency is important in the choice of digital audio hardware and software, as discussed in Fact File 9.7.

DSP resources for audio processing

DSP cards can be added to widely used workstation packages such as Avid’s Pro Tools. These were termed ‘DSP Farms’ or ‘Mix Farms’ in the past, and are expansion cards that connect to the PCI bus of the workstation and take on much of the ‘number crunching’ work involved in effects processing and mixing. ‘Plug-in’ processing software is now a popular and cost-effective way of implementing effects processing within the workstation, and this is discussed further in Chapter 13. Such plug-ins rely either on DSP cards or on host-based processing (see below) to handle this load.

Avid’s various Pro Tools systems are a useful example of the way in which audio processing can be handled within the workstation. Depending on the ‘level’ of the system, audio processing is either handled ‘natively’, using the host CPU, or by PCI-connected DSP cards. Of the two possible card solutions, the company’s earlier TDM system, now being retired, processed audio using 24-bit fixed point devices, whereas the more recent HDX system uses 32-bit floating point processing. The latter offers the possibility for greater processing speed and capacity (affecting the number of simultaneous operations and audio channels) and lower latency. Floating point processing typically allows for a greater internal dynamic range, because it represents numbers in the form of a mantissa and exponent. (The mantissa is a basic numerical value and the exponent is a power of two to which it is multiplied.) The latest versions of Pro Tools, both native and HDX card-based, claim an impressive 64-bit floating point resolution for internal mix summing. 24-bit fixed point processing, however, does not necessarily mean that the numerical operations are limited to 24-bit resolution. In fact ‘double precision’ mode can be used in mixing and plug-ins, allowing 48-bit resolution by splitting the samples into two chunks and operating on them separately.

FACT FILE 9.7 AUDIO PROCESSING LATENCY

Latency is the delay incurred in executing audio operations between input and output of a system. The lower the better is the rule, particularly when operating systems in ‘full duplex’ mode, because processed sound may be routed back to musicians (for foldback purposes) or may be combined with undelayed sound at some point. The management of latency is a software issue and some systems have sophisticated approaches to ensuring that all supposedly synchronous audio reaches the output at the same time no matter what processing it has encountered on the way.

Minimum latency achievable is both a hardware and a software issue. The poorest systems can give rise to tens or even hundreds of milliseconds between input and output whereas the best reduce this to a few milliseconds. Audio I/O that connects directly to an audio processing card can help to reduce latency, otherwise the communication required between host and various cards can add to the delay. Some real-time audio processing software also implements special routines to minimize and manage critical delays and this is often what distinguishes professional systems from cheaper ones. The audio driver software or ‘middleware’ that communicates between applications and sound cards influences latency considerably. One example of such middleware intended for low latency audio signal routing in computers is Steinberg’s ASIO (Audio Stream Input Output).

An alternative to using dedicated DSP cards is to use the internal processing capacity of a typical desktop computer. The success of such ‘host-based processing’ depends on the number of tasks that the workstation is required to undertake and this capacity may vary with time and context. Typical modern CPUs, though, can easily run DSP plug-ins for implementing equalization, mixing and a wide range of effects, for a considerable number of tracks. The ‘multi-core’ (e.g. quad-core) processor architectures of some modern computers enables the division of processing power between applications, and in some cases one can allocate a specific number of processor cores to an audio application, leaving, say, one or two for system tasks and other applications. This ensures the greatest degree of hardware independence between processing tasks, and avoids conflicts of demand at times of peak processor load.

The software architecture required to run plug-ins on the host CPU may be different to that used on dedicated DSP cards, so it may be necessary to obtain the correct plug-in for the environment in question. A number of applications, however, enable the integration of host-based (or ‘native’) plug-ins and dedicated DSP cards. With Pro Tools, for example, the same floating point audio processing algorithms are used for AAX plug-ins in both native and HDX forms, so they should sound the same. This is not necessarily true for TDM compared with RTAS plug-ins (see Chapter 13). Audio processing that runs on the host may be subject to greater latency (input to output delay) than when using dedicated signal processing, and it obviously takes up processing power that could be used for running the user interface or other software. It is nonetheless a cost-effective option for many users.

Audio processing architectures

Apple’s Core Audio is an example of that manufacturer’s internal tools and architecture for handling audio in computers using the Mac OS X operating system. Core Audio provides plug-in facilities for audio signal processing and synthesis, as well as audio-to-MIDI synchronization. Its audio plug-ins are called Audio Units (AUs). A number of standard AUs are provided with the OS X operating system, offering a range of audio processing options to other Core Audio compatible software that runs on the platform. Audio workstation packages such as Logic, for example, work closely with Core Audio to implement aspects of their functionality, including plug-ins. Core Audio normally expects to work with audio represented as 32-bit floating point linear PCM, but there are means to translate between this and other PCM formats, as well as to coded formats such as MP3, AAC or Apple Lossless Audio Coding (ALAC). It supports the main audio interchange file formats described in Chapter 10, such as AIFF and WAVE.

Core Audio functions are written in C code (a widely used software authoring language) and can be ‘called’ from other compatible applications using Application Programming Interfaces (APIs) designed for the task. It also uses an internal representation of audio hardware known as a Hardware Abstraction Layer (HAL), which can be used to simplify the interface between Core Audio elements and physical audio devices.

Integrated sound cards

Integrated sound cards typically contain all the components necessary to handle audio for gaming and entertainment purposes within a desktop computer and may be able to operate in full duplex mode (in and out at the same time). They typically incorporate convertors, DSP, a digital interface, and one or more sound synthesis engines. Additionally they may offer 3D audio processing and low bit rate audio decoding, such as Dolby Digital and DTS. Optionally, they may also include some sort of I/O daughter board that can be connected to a break-out audio interface, increasing the number of possible connectors and the options for external analog conversion. Such cards also tend to sport MIDI/joystick interfaces. A typical example of this type of card is the ‘SoundBlaster’ series from Creative Labs. Any analog audio connections are normally unbalanced and the convertor quality may be lower than the best external devices. For professional purposes it is advisable to use high-quality external convertors and balanced analog audio connections.

Mixing ‘in the box’

There has been a lot of debate about the relative merits of audio mixing ‘in the box’ versus using a conventional external mixer. ‘In the box’ is the familiar term often used to describe mixing carried out using the DSP facilities of the workstation package, whereas ‘out of the box’ usually means feeding all the workstation’s separate audio outputs to a (possibly analog) mixer. Issues of audio quality have sometimes been questioned when using in-the-box signal processing, which may in some cases have been related to the effects of limited resolution numerical processing or improper dithering in poor systems. With good recent software, though, which uses high-precision internal maths and correct dithering, this is unlikely to be a factor.

Some engineers also simply prefer the sound of analog processing, liking the unique phase and amplitude characteristics of particular equalizers and compressors. Although there is no arguing with the preference for specific external effects, carefully controlled listening tests comparing good in-the-box mixes with external mixes of the same material (avoiding EQ and compression) have not shown reliably detectable differences.

The ergonomic advantages of external mixing can be applied to in-the-box mixing by using a remote control surface designed to interface to the workstation in question, having dedicated faders and knobs, as discussed towards the end of Chapter 5. The argument for in-the-box mixing is taken even further by the availability of digital plug-ins that attempt to emulate the performance of classic analog processors by sampling their impulse responses or otherwise modeling their characteristics. These can provide very convincing plug-in alternatives to the analog hardware they model, and may be badged or marketed by the makers of the original hardware in recognition of the increasingly widespread prevalence of entirely in-the-box mixing.

MASS STORAGE-BASED EDITING SYSTEM PRINCIPLES

Introduction

The random access nature of mass storage media led to the coining of the term non-linear editing for the process of audio editing. With non-linear editing the editor may preview a number of possible masters in their entirety before deciding which should be the final one. Even after this, it is a simple matter to modify the edit list to update the master. Edits may also be previewed and experimented with in order to determine the most appropriate location and processing. Crossfades may be modified and adjustments made to equalization and levels, all in the digital domain. Non-linear editing has also come to feature very widely in post-production for video and film.

Non-linear editing is truly non-destructive in that the edited master only exists as a series of instructions to replay certain parts of sound files at certain times, with specified signal processing overlaid, as shown in Figure 9.17. The original sound files remain intact at all times, and a single sound file can be used as many times as desired in different locations and on different tracks without the need for copying the audio data. Editing may involve the simple joining of sections, or it may involve more complex operations such as long crossfades between one album track and the next, or gain offsets between one section and another. All these things are possible without affecting the original source material.

image

FIGURE 9.17
Instructions from an edit decision list (EDL) are used to control the replay of sound file segments from disk, which may be subjected to further processing (also under EDL control) before arriving at the audio outputs.

The modern workstation production technique of compiling or ‘comping’ vocal tracks also relies on this characteristic of random-access storage. Multiple takes of a lead vocal line, for example, can be stored and a final version comped from short sections of different takes arranged in a suitable order with appropriate crossfades or joins.

Sound files and sound segments

In the case of music editing sound files might be session takes, anything from a few bars to a whole movement, while in picture dubbing they might contain a phrase of dialog or a sound effect. In multitrack production, each separately recorded chunk of an individual track is likely to be stored in a separate sound file. Usually such files are mono, but they can be stereo and occasionally multichannel, as discussed in Chapter 10. If the channels of a recording are to be processed or moved around separately then they should be stored in mono. Specific segments of these sound files can be defined while editing, in order to get rid of unwanted material or to select useful extracts. The terminology varies but such identified parts of sound files are usually termed either ‘clips’ or ‘segments’. Rather than creating a copy of the segment or clip and storing it as a separate sound file, it is normal simply to store it as a ‘soft’ entity — in other words as simply commands in an edit list or project file that identify the start and end addresses of the segment concerned and the sound file to which it relates. It may be given a name by the operator and subsequently used as if it were a sound file in its own right. An almost unlimited number of these segments can be created from original sound files, without the need for any additional audio storage space. Figure 9.18 shows an example of Logic’s audio ‘bin’ containing both entire wav files, and segments thereof that can be used independently and dragged to appropriate places and tracks in the editing time line.

image

FIGURE 9.18 Example of Logic’s audio bin containing both entire audio files and edited segments thereof.

Edit point handling

Edit points can be simple butt joins or crossfades. A butt join is very simple because it involves straightforward switching from the replay of one sound segment to another. Since replay involves temporary storage of the sound file blocks in RAM (see above) it is a relatively simple matter to ensure that both outgoing and incoming files in the region of the edit are available in RAM simultaneously (in different address areas). Up until the edit, blocks of the outgoing file are read from the disk into RAM and thence to the audio outputs. As the edit point is reached a switch occurs between outgoing and incoming material by instituting a jump in the memory read address corresponding to the start of the incoming material. Replay then continues by reading subsequent blocks from the incoming sound file. It is normally possible to position edits to single sample accuracy, making the timing resolution as fine as a number of tens of microseconds if required.

The problem with butt joins is that they are quite unsubtle. Audible clicks and bumps may result because of the discontinuity in the waveform that may result, as shown in Figure 9.19. It is normal, therefore, to use at least a short crossfade at edit points to hide the effect of the join. This is what happens when analog tape is spliced, because the traditional angled cut has the same effect as a short crossfade (of between 5 and 20 ms depending on the tape speed and angle of cut). Most workstations have considerable flexibility with crossfades and are not limited to short durations. It is now common to use crossfades of many shapes and durations (e.g. linear, root cosine, equal power) for different creative purposes. This, coupled with the ability to preview edits and fine-tune their locations, has made it possible to put edits in places previously considered impossible.

image

FIGURE 9.19 (a) A bad butt edit results in a waveform discontinuity. (b) Butt edits can be made to work if there is minimal discontinuity.

The locations of edit points are kept in an edit decision list (EDL) which contains information about the segments and files to be replayed at each time, the in and the out points of each section and details of the crossfade time and shape at each edit point. It may also contain additional information such as signal processing operations to be performed (gain changes, EQ, etc.). EDL interchange formats for moving projects between systems are discussed in Chapter 10.

Crossfading

Crossfading is similar to butt joining, except that it requires access to data from both incoming and outgoing files for the duration of the crossfade. The crossfade calculation involves simple signal processing, during which the values of outgoing samples are multiplied by gradually decreasing coefficients whilst the values of incoming samples are multiplied by gradually increasing coefficients. Time coincident samples of the two files are then added together to produce output samples, as described in the previous chapter. The duration and shape of the crossfade can be adjusted by altering the coefficients involved and the rate at which the process is executed.

Crossfades are either performed in real time, as the edit point passes, or pre-calculated and written to disk as a file. Real-time crossfades can be varied at any time and are simply stored as commands in the EDL, indicating the nature of the fade to be executed. The process is similar to that for the butt edit, except that as the edit point approaches samples from both incoming and outgoing segments are loaded into RAM in order that there is an overlap in time. During the crossfade samples from both incoming and outgoing segments are loaded into their respective areas of RAM, then routed to the crossfade processor, as shown in Figure 9.20. The resulting samples are then available for routing to the output. Alternatively the crossfade can be calculated in non-real time. This incurs a short delay while the system works out the sums, after which a new sound file is stored which contains only the crossfade. Replay of the edit then involves playing the outgoing segment up to the beginning of the crossfade, then the crossfade file, then the incoming segment from after the crossfade, as shown in Figure 9.21. Load on the disk drive is no higher than normal in this case, except that the crossfade file has to be retrieved.

The shape of the crossfade can usually be changed to suit different operational purposes. Standard linear fades (those where the gain changes uniformly with time) are not always the most suitable for music editing, especially when the crossfade is longer than about ten milliseconds. The result may be a momentary drop in the resulting level in the center of the crossfade that is due to the way in which the sound levels from the two files add together. If there is a random phase difference between the signals, as there often is in music, the rise in level resulting from adding the two signals will normally be around 3 dB, but the linear crossfade is 6 dB down in its center resulting in an overall level drop of around 3 dB (see Figure 9.22). Exponential crossfades and other such shapes may be more suitable for these purposes, because they have a smaller level drop in the center. It may even be possible to design customized crossfade laws. It is often possible to alter the offset of the start and end of the fade from the actual edit point and to have a faster fade-up than fade-down.

image

FIGURE 9.20 Conceptual diagram of the sequence of operations which occur during a crossfade. X and Y are the incoming and outgoing sound segments.

image

FIGURE 9.21 Replay of a precalculated crossfade file at an edit point between files X and Y.

image

FIGURE 9.22 Summation of levels at a crossfade. (a) A linear crossfade can result in a level drop if the incoming and outgoing material are non-coherent. (b) An exponential fade, or other similar laws, can help to make the level more constant across the edit.

image

FIGURE 9.23 The system may allow the user to program a gain profile around an edit point, defining the starting gain (A), the fade-down time (B), the fade-up time (D), the point below unity at which the two files cross over (C) and the final gain (E).

Many systems also allow automated gain changes to be introduced as well as fades, so that level differences across edit points may be corrected. Figure 9.23 shows a crossfade profile which has a higher level after the edit point than before it, and different slopes for the in-and out-fades. A lot of the difficulties that editors encounter in making edits work can be solved using a combination of these facilities.

Editing modes

During the editing process the operator will load appropriate sound files and audition them, both on their own and in a sequence with other files. The exact method of assembling the edited sequence depends very much on the user interface, but it is common to present the user with a visual analogy of moving tape, allowing files to be ‘cut and spliced’ or ‘copied and pasted’ into appropriate locations along the virtual tape. These files, or edited clips of them, are then played out at the time locations corresponding to their positions on this ‘virtual tape’. It is also quite common to display a representation of the audio waveform that allows the editor to see as well as hear the signal around the edit point (see Figure 9.24).

In non-linear systems the tape-based approach is often simulated, allowing the user to roughly locate an edit point while playing the virtual tape followed by a fine trim using simulated reel-rocking or a detailed view of the waveform. Some software presents source and destination streams as well, in further simulation of the tape approach. It is also possible to insert or change sections in the middle of a finished master, provided that the EDL and source files are still available. To take an example, assume that an edited opera has been completed and that the producer now wishes to change a take somewhere in the middle (see Figure 9.25). The replacement take is unlikely to be exactly the same length but it is possible simply to shuffle all of the following material along or back slightly to accommodate it, this being only a matter of changing the EDL rather than modifying the stored music in any way. The files are then simply played out at slightly different times than in the first version of the edit.

image

FIGURE 9.24 Example from SADiE editing system showing the ‘trim editor’ in which is displayed a detailed view of the audio waveform around the edit point, together with information about the crossfade.

image

FIGURE 9.25 Replacing a take in the middle of an edited programme. (a) Tape based copy editing results in a gap of fixed size, which may not match the new take length. (b) Non-linear editing allows the gap size to be adjusted to match the new take.

It is also normal to allow edited segments to be fixed in time if desired, so that they are not shuffled forwards or backwards when other segments are inserted. This ‘anchoring’ of segments is often used in picture dubbing when certain sound effects and dialog have to remain locked to the picture.

Simulation of ‘reel-rocking’

It is common to simulate the effect of analog tape ‘reel-rocking’ in non-linear editors, providing the user with the sonic impression that reels of analog tape are being ‘rocked’ back and forth. The simulation of variable speed replay in both directions is usually controlled by a wheel or sideways movement of a mouse which moves the ‘tape’ in either direction around the current play location. The magnitude and direction of this movement is used to control the rate at which samples are read from the disk file, via the buffer. Good simulation requires very fast, responsive action and an ergonomically suitable control. A mouse is rather unsuitable for the purpose. It also requires a certain amount of DSP to filter the signal correctly, in order to avoid the aliasing that can be caused by varying the sampling rate. Systems differ very greatly as to the sound quality achieved in this mode and many current operators, not brought up with tape, prefer to judge edit points accurately ‘on the fly’, followed by trimming or nudging them either way if they are not successful the first time.

EDITING SOFTWARE

It is increasingly common for MIDI (see Chapter 14) and digital audio editing to be integrated within one software package, particularly for pop music recording and other multitrack productions where control of electronic sound sources is integrated with recorded natural sounds. Such applications used to be called sequencers but this is less common now that MIDI sequencing is only one of many tasks that are possible. Although most sequencers contain some form of audio editing these days, there are some software applications more specifically targeted at high-quality audio editing and production. These have tended to come from a professional audio background rather than a MIDI sequencing background, although it is admitted that the two fields have met in the middle now and it is increasingly hard to distinguish a MIDI sequencer with added audio features from an audio editor with added MIDI features.

Audio applications such as those described here are used in contexts where MIDI is not particularly important and where fine control over editing crossfades, dithering, mixing, mastering and post-production functions are required. Here the editor needs tools for such things as: previewing and trimming edits, such as might be necessary in classical music post-production; PQ editing CD masters; preparing surround sound material for Blu-Ray Disc encoding; MP3 or AAC encoding of audio material; post-production of sound tracks for pictures. The following example, based on the SADiE audio editing system from Prism Sound, demonstrates some of the practical concepts.

SADiE workstations run on the PC platform and most utilize an external audio interface. There are studio versions mounted in rack units as well as location recording systems running on a laptop. There is also a ‘native’ version that runs using the internal processing and storage of the host computer, and can be used with either proprietary or standard computer audio hardware. Both PCM and DSD signal processing options are available and the system makes provision for various mastering and delivery formats, as well as lossless and lossy audio encoding. A typical user interface for SADiE is shown in Figure 9.26. It is possible to see transport controls, the mixer interface and the playlist display. The main part of the screen is occupied by a horizontal display of recording tracks or ‘streams’, and these are analogous to the tracks of a multitrack tape recorder. A record icon associated with each stream is used to arm it ready for recording. As recording proceeds, the empty streams are filled from left to right across the screen in real time, led by a vertical moving cursor. These streams can be displayed either as solid continuous blocks or as waveforms, the latter being the usual mode when editing is undertaken. After recording, extra streams can be recorded if required simply by disarming the record icons of the streams already used and arming the record icons of empty streams below them, making it possible to build up a large number of ‘virtual’ tracks as required. The maximum number that can be replayed simultaneously depends upon the memory and DSP capacity of the system used. A basic two-input/four-output system might allow up to eight streams to be replayed (depending on the amount of DSP being used for other tasks), and a fully equipped system can allow at least 32 simultaneous streams of program material to be recorded and replayed.

image

FIGURE 9.26 SADiE editor displays, showing mixer, playlist, transport controls and project elements.

Replay involves either using the transport control display or clicking the mouse at a desired position on a time-bar towards the top of the screen, this positioning the moving cursor (which is analogous to a tape head) where one wishes replay to begin. Editing is performed by means of a razor-blade icon, which will make the cut where the moving cursor is positioned. Alternatively, an edit icon can be loaded to the mouse’s cursor for positioning anywhere on any individual stream to make a cut.

Audio can be arranged in the playlist by the normal processes of placing, dragging, copying and pasting, and there is a range of options for slipping material left or right in the list to accommodate new material (this ensures that all previous edits remain attached in the right way when the list is slipped backwards or forwards in time). Audio to be edited in detail can be viewed in the trim window (shown earlier in Figure 9.24) which shows a detailed waveform display, allowing edits to be previewed either to or from the edit point, or across the edit, using the play controls in the top right-hand corner (this is particularly useful for music editing). The crossfade region is clearly visible, with different colors and shadings used to indicate the ‘live’ audio streams before and after the edit. There are many stages of undo and redo so that nothing need be permanent at this stage. When a satisfactory edit is achieved, it can be written back to the main edit list where it will be incorporated. Scrub and jog actions for locating edit points are also possible. A useful ‘lock to time’ icon is provided which can be activated to prevent horizontal movement of the streams so that they cannot be accidentally moved out of sync with each other during editing.

The mixer section can be thought of in conventional terms, and indeed some systems offer physical mixing controllers with moving fader automation for those who prefer them. As well as mouse control of such things as fader, pan, solo and mute, processing such as EQ, filters, aux send and compression can be selected from an effects ‘rack’, and each can be dragged across and dropped in above a fader where it will become incorporated into that channel. Third party ‘plug-in’ software is also available for many systems to enhance the signal processing features, including CEDAR audio restoration software, as described below. The latest software allows for the use of DirectX plug-ins for audio processing. Automation of faders and other processing is also possible.

MASTERING AND RESTORATION

Specialized software

Some software applications are designed specifically for the mastering and restoration markets. These products are designed either to enable ‘fine tuning’ of master recordings prior to commercial release, involving subtle compression, equalization and gain adjustment (mastering), or to enable the ‘cleaning up’ of old recordings that have hiss, crackle and clicks (restoration).

CEDAR applications or plug-ins are good examples of the restoration group. The company has also introduced advanced visualization tools (known as Retouch) that enable restoration engineers to ‘touch up’ audio material using an interface not dissimilar to that used for photo editing on computers. Audio anomalies (unwanted content) can be seen in the time and frequency domains, highlighted and interpolated based on information either side of the anomaly. CEDAR’s restoration algorithms are typically divided into ‘decrackle’, ‘declick’, ‘dethump’ and ‘denoise’, each depending on the nature of the anomaly to be corrected. Some typical user interfaces for controlling these processes are shown in Figure 9.27.

image

FIGURE 9.27 CEDAR Retouch display for SADiE, showing frequency (vertical) against time (horizonal) and amplitude (color/density). Problem areas of the spectrographic display can be highlight and a new signal synthesized using information form the surrounding region. (a) Harmonics of an interfering signal can be clearly seen.(b) A short-term spike crosses most of the frequency range.

Mastering software usually incorporates advanced dynamics control such as the TC Works Master X series, based on its Finalizer products, a user interface of which is pictured in Figure 9.28. Here compressor curves and frequency dependency of dynamics can be adjusted and metered. The display also allows the user to view the number of samples at peak level to watch for digital overloads that might be problematic.

image

FIGURE 9.28
CEDAR restoration plugins for SADiE, showing (a) Declick and (b) Denoise processes.

Level control and loudness in mastering

Level control, it might be argued, is less crucial than it used to be in the days when a recording engineer struggled to optimize a recording’s dynamic range between the noise floor and the distortion ceiling (see Figure 9.29). However, there are still artistic and technical considerations.

The dynamic range of a typical digital audio system can now be well over 100 dB and there is room for the operator to allow a reasonable degree of ‘headroom’ between the peak audio signal level and the maximum allowable level. Meters are provided to enable the signal level to be observed, and they are usually calibrated in dB, with zero at the top and negative dBs below this. The full dynamic range is not always shown, and there may be a peak bar that can hold the maximum level permanently or temporarily. As explained in Chapter 8, 0dBFS (full scale) is the point at which all of the bits available to represent the signal have been used. Above this level the signal clips and the effect of this is quite objectionable, except on very short transients where it may not be noticed. It follows that signals should never be allowed to clip (see Figure 9.30).

There is a tendency in modern audio production to want to master everything so that it sounds as loud as possible, and to ensure that the signal peaks as close to 0dBFS as possible. This level maximizing or normalizing process can be done automatically in most packages, the software searching the audio track for its highest level sample and then adjusting the overall gain so that this just reaches 0dBFS. In this way the recording can be made to use all the bits available, which can be useful if it is to be released on a relatively low-resolution consumer medium where noise might be more of a problem. (It is important to make sure that correct redithering is used when altering the level and requantizing, as explained in Chapter 8.) This does not, of course, take into account any production decisions that might be involved in adjusting the overall levels of individual tracks on an album or other compilation, where relative levels should be adjusted according to the nature of the individual items, their loudness and the producer’s intent.

image

FIGURE 9.29 TC Works MasterX mastering dynamics plug-in interface.

image

FIGURE 9.30 Comparison of analogue and digital dynamic range. (a) Analogue tape has increasing distortion as the recording level increases, with an effective maximum output level at 3% third harmonic distortion. (b) Modern high resolution digital systems have wider dynamic range with a noise floor fixed by dither noise and a maximum recording level at which clipping occurs. The linearity of digital systems does not normally become poorer as signal level increases, until 0 dBFS is reached. This makes level control a somewhat less important issue at the initial recording stage, provided sufficient headroom is allowed for peaks.

However, even if the signal is maximized in the automatic fashion, so that the highest sample value just does not clip, subsequent stages in the signal chain may still do so, such as low bit rate codecs and oversampling convertors. Some equipment, for example, is designed in such a way that the maximum digital signal level is aligned to coincide with the clipping voltage of the analog electronics in a D/A convertor. In fact, owing to the response of the reconstruction filter in an (oversampling) D/A convertor (which reconstructs an analog waveform from the PAM pulse train) intersample signal peaks can be created that slightly exceed the analog level corresponding to 0dBFS, thereby clipping the analog side of the convertor. For this reason it is recommended that digital-side signals are maximized so that they peak a few dB below 0 dBFS, in order to avoid the possibility of clipping. Some mastering software, such as the Mastered for iTunes suite discussed below, provides detailed analysis of the signal showing exactly how many samples occur in sequence at peak level, which can be a useful warning of potential or previous clipping.

The recent tendency to master recordings as loud as possible has been challenged by the introduction of loudness normalization. Loudness meters are also available that show weighted averages and peaks over a period of time. Loudness normalization is increasingly applied to audio material, particularly in the broadcast domain, and this aims to ensure that program items have a more comparable loudness to each other. The ITU and other organizations now publish detailed standards such as ITU-R BS.1770 and EBU R128, which specify algorithms to measure program loudness, and it is now becoming compulsory in many regions of the world to use these in broadcast delivery. Dolby’s Dialnorm (dialogue normalization) is an example of something similar used in movie sound tracks. It is also an optional feature of some music players such as iTunes (Sound Check), and loudness metadata describing the average loudness level of tracks in LUFS (loudness units related to full scale) are increasingly added to recordings either at the mastering stage or by player software. This has the effect of causing material that was mastered at very high levels to be automatically reduced during replay, if the feature is turned on in players. Some online music delivery services do this automatically. The results of replay normalization can also expose the unpleasant sound quality artefacts that may arise from over-compressed and potentially clipped masters. A return to producing masters with more headroom and dynamic range is therefore increasingly recommended by professionals in the field.

PREPARING FOR AND UNDERSTANDING RELEASE MEDIA

Consumer release formats such as CD, DVD and MP3 usually require some form of mastering and pre-release preparation. This can range from subtle tweaks to the sound quality and relative levels on tracks to PQ encoding, data encoding and the addition of graphics, video and text.

Physical media

PQ encoding for CD mastering can often be done in some of the application packages designed for audio editing, such as SADiE and Pyramix. In this case it may involve little more than marking the starts and ends of the tracks in the playlist and allowing the software to work out the relevant frame advances and Red Book requirements for the assembly of the PQ code that will either be written to a CD-R or included in a DDP (Disc Description Protocol) file for sending to the pressing plant. The CD only comes at one resolution and sampling frequency ( 16 bit, 44.1 kHz) making release preparation a relatively straightforward matter.

DVD mastering is considerably more complicated than CD and requires advanced authoring software that can deal with all the different options possible on this multi-faceted release format. A number of different combinations of players and discs are possible, although the format devised specifically for high resolution surround audio, known as the DVD-Audio format, was not particularly successful commercially. DVD-Video allows for 48 or 96 kHz sampling frequency and 16, 20 or 24 bit PCM encoding. A two-channel downmix must be available on the disc in linear PCM form (for basic compatibility), but most discs also include Dolby Digital or possibly DTS surround audio. Dolby Digital encoding usually involves the preparation of a file or files containing the compressed data, and a range of settings have to be made during this process, such as the bit rate, dialog normalization level, rear channel phase shift and so on. A typical control screen is shown in Figure 9.31. Then of course there are the pictures, but they are not the topic of this book. DVD masters are usually transferred to the pressing plant on DLT tapes, using the Disc Description Protocol, or on DVD-R(A) discs as a disc image with a special CMF (cutting master format) header in the disc lead-in area containing the DDP data.

SACD Authoring software enables the text information to be added, as shown in Figure 9.32. SACD masters are normally submitted to the pressing plant on AIT format data tapes.

Pure Audio Blu-Ray is a format pioneered by Stefan Bock of msm Studios. Pure Audio Blu-Ray can work at up to 192 kHz/24 bit resolution and there are also the losslessly compressed HD formats introduced by Dolby and DTS, as well as FLAC lossless encoding and MP3. It’s possible to have all this content, including 7.1 surround, on one Blu-Ray disk without running out of space as one can store around 50 Gbytes of data. mShuttle is an option that was introduced for Pure Audio Blu-Ray disks, which turns the player into a small web server, enabling audio files to be served from the player over the home network to other devices. That way the content can be used on portable media devices or played out of alternative file-based audio players. This requires a player working to ‘Profile 2.0’, which was introduced in 2009 and includes the provision of network functionality. Mastering for these disks is currently relatively specialized, as it involves the inclusion of Java script code to implement the player features specific to the format, such as mShuttle and the ability to select replay mode using the colored buttons on the remote. There are moves to make the format more universally available and standardized.

image

FIGURE 9.31 Screen display of Dolby Digital encoding software options.

image

FIGURE 9.32 Example of SACD text authoring screen from SADiE.

Downloads and streaming

Mastering and preparation of material for online delivery and downloads is now of at least as much importance as the preparation for physical media, as the Internet has become the dominant mode of delivery in many markets. This has led to the introduction of schemes such as Apple’s Mastered for iTunes, which is described in more detail below.

MP3, as already explained in Chapter 8, is actually MPEG-1, Layer 3 encoded audio, stored in a data file, usually for distribution to consumers either on the Internet or on other release media. MP3 mastering requires that the two-channel audio signal is MPEG-encoded, using one of the many MP3 encoders available. Mastering software now usually includes MP3 encoding as an option, as well as other data-reduced formats such as the AAC encoding used for iTunes releases. It is advisable to use a high quality MP3 encoder as the format does not specify how the encoding should be done, only the bit stream and the decoder, so there are definitely good and bad solutions on the market. Fraunhofer and Sonnox joined forces to introduce a plug-in that enables a number of different codecs and bit rates to be compared in real time, so that the mastering engineer can audition the effects of encoding before committing to a final rendering for delivery. This includes blind listening comparison tools for reliable results.

Some of the choices to be made in this process concern the data rate and audio bandwidth to be encoded, as this affects the sound quality. The lowest bit rates (e.g. below 64kbit/s) will tend to sound noticeably poorer than the higher ones, particularly if full audio bandwidth is retained. For this reason some encoders limit the bandwidth or halve the sampling frequency for very low bit rate encoding, because this tends to minimize the unpleasant side effects of MPEG encoding. It is also possible to select joint stereo coding mode, as this will improve the technical quality somewhat at low bit rates, possibly at the expense of stereo imaging accuracy. As mentioned above, at very low bit rates some audio processing may be required to make sound quality acceptable when squeezed down such a small pipe. For the highest quality it is preferable to use bit rates for MPEG AAC of around 256kbit/s with constrained variable bit rate, for example, as used in iTunes Plus.

Mastered for iTunes

Apple’s Mastered for iTunes program was introduced as a way of trying to give mastering engineers better control over the sound quality of high resolution source material released as iTunes Plus downloads. Before Mastered for iTunes (MfiT), tracks for iTunes were either simply ripped from CDs or taken from the major record company servers and loaded into iTunes Producer (a software package for preparing iTunes tracks). With MfiT, AAC encoding is done from 24 bit masters, often with lowered level to avoid clipping and get a much cleaner result. (The aim is to get the best results out of the 256kbit/s constrained variable bit-rate [CVBR] of the iTunes Plus format.)

All the encoding for an iTunes release is done by Apple and there is identical free Apple software called ‘afconvert’ (see Fact File 9.8) that enables users to do the same thing themselves before submitting masters. The first process in this software is Sound Check, which looks at the relative loudness levels of songs to be encoded and attempts to determine how much their levels should be raised or lowered on replay to make their loudness comparable. It adds metadata that can be used by players to avoid loudness differences when tracks are played alongside each other. If the track is at a higher sampling frequency than 44.1 kHz it is down-sampled to 44.1 kHz, otherwise it is left alone. There is also a process that will convert the AAC encoded track back to PCM so that you can hear the decoded version. ‘afclip’ looks at the likely on-sample and inter-sample clips, behaving like a true peak-reading meter, enabling the user to determine the potential for encoder and post-decoder clipping. After the track is transferred to Apple, it is encoded in exactly the same way as the user would have done. ‘Test pressings’ are then returned to the record company to confirm what is about to be released on iTunes. Usually these turn out to be bit-for-bit the same as the final encoding created by the mastering engineer, which confirms the integrity of the process.

FACT FILE 9.8 APPLE’S MFIT TOOLS

The tools contained in the free mastering suite that can be downloaded from Apple include the Master for iTunes Droplet, which is used to automate the creation of iTunes Plus masters. The suite of tools requires at least the Snow Leopard (10.6) version of OS X to run. The droplet needs either AIFF or WAVE files to be provided as source material and converts them temporarily to Apple’s Core Audio Format (CAF) with a Sound Check metadata profile attached that can normalize the relative loudness levels of songs on replay. AAC files are then encoded.

afconvert is a command line utility that enables more direct control over all of the above MfiT encoding operations.

AURoundTripAAC is an Audio Unit (AU) that allows the comparison of encoded audio against the original source file, which also includes clip and peak detection (see screen shots (a)-(c) opposite). There is a listening facility that allows a simple double-blind ABX test to be set up, in order that users can check whether they can reliably tell the difference between source and encoded versions. The plug-in can be used with workstation software that conforms to the AU plugin format, such as Logic, or alternatively the AU Lab application can be used to run the process.

AU Lab is a free standalone digital mixer utility that lets you use AU-type plug-ins without needing an AU-compatible DAW.

afclip is a Unix command line tool that can be used to check a file for on-sample and inter-sample clipping. Inter-sample clipping can arise in oversampling D/A converters used after decoding, for example. (Four-times oversampling is used to estimate sample values in afclip.) When mastering a track for iTunes that peaks very close to digital maximum it’s necessary to check it using this tool and reduce the level slightly until an acceptable number of clips is indicated (which may be zero, unless a small number turn out to be inaudible). If there’s any on-sample clipping the output of this process is an audio file (.wav) where the left channel data is the original audio and the right channel contains impulses where the audio is clipped, so that clips can be quickly located visually in a digital audio editor. There’s also a table that comes up in the Terminal window (see screen shot (d) opposite) to show the timing locations of clips and the amount by which the samples exceed the clipping point. ‘Pinned samples’ can also be reported — that is, any in a series with a digital level of exactly ±1.0 (peak level), which suggests on-sample clipping may have occurred.

Finally, the Audio to WAVE Droplet converts files that are in other audio file formats (any supported by Mac OS X) to the WAVE format.

image

Apple prefers to receive high-resolution masters at sampling frequencies above 44.1 kHz, preferably 96 kHz. That way the encoding process is forced to use its mastering-quality sample rate conversion that generates 32-bit floating point CAF files (see Chapter 10) as the input to AAC encoding. It’s claimed that this avoids the need for redithering and preserves all the dynamic range inherent in the original file, avoiding the potential for aliasing or clipping that can otherwise arise in sample rate conversion. (If you supply 44.1 kHz files to Apple the advantages of the above process are bypassed as the sample rate conversion is not initiated.)

RECOMMENDED FURTHER READING

Collins, M., 2013. Pro Tools 11: Music Production, Recording, Editing, and Mixing. Focal Press.

Katz, B., 2007. Mastering Audio: The Art and Science, second edition. Focal Press.

Katz, B., 2012. iTunes Music: Mastering High Resolution Audio Delivery: Produce Great Sounding Music with Mastered for iTunes. Focal Press.

Leider, C., 2004. Digital Audio Workstation: Mixing, Recording and Mastering Your MAC or PC. McGraw-Hill Professional.

Watkinson, J., 2001. The Art of Digital Audio, third edition. Focal Press.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset