3  Recording, replay and editing principles

This chapter is concerned with the principles of audio recording and replay using mass storage media, including various approaches to editing.

3.1 The sound file

In audio workstations, recordings are stored in sound files on mass storage media. The storage medium is normally a disk but other media may be used in certain circumstances such as for backup of disks. For the sake of simplicity disks are assumed to be the primary means of storage in the following sections. A sound file is an individual recording of any length from a few seconds to a number of hours (within the limits of the system). With tape recording, parts of a tape may be recorded at different times, and in such a situation there will be sections of that tape that represent distinctly separate recordings: they may be ‘tracks’ for an album, ‘takes’ of a recording session, or short individual sounds such as sound effects. This is the closest that tape recording gets to the concept of the sound file: that is a distinct unit of recorded audio, the size of the unit being anything that fits into the available space.

In the audio workstation the disk can be thought of as a ‘sound store’ in which no one part has any specific time relationship to any other part – no section can be said to be ‘before’ another or ‘after’ another. This is the nature of random- or direct-access storage (although some forms of optical disk store data contiguously for all or part of their capacity, although they retain random accessibility). It has led to the use of the somewhat confusing description ‘non-linear recording’, which contrasts with the ‘linear’ recording process that takes place on tape. (To many people, the term ‘non-linear’ means that the audio has been quantised non-linearly, which is not the case in most professional audio systems.)

A disk may accommodate a number of sound files of different lengths. It is possible that one file might be a 10 minute music track whilst another might be a 1 second sound effect. As many sound files can be kept in the store as will fit in the space available, although some operating systems have upper limits on the number of individual files that can be handled by the directory structure. Each sound file is made up of a number of discrete data blocks and normally the block size will limit the minimum size occupied by a file since systems do not normally write partial blocks (see below).

Normally sound files are either mono or stereo – that is either a single channel or two related channels of audio combined into one file. They are rarely more than stereo, since multichannel operation is normally achieved by storing a number of separate mono files, one for each channel. Stereo sound files contain the left and right channels of a stereo pair, usually interleaved on a sample-by-sample basis as described in Chapter 6, and are useful when the two channels will always be replayed together and in a fixed timing relationship. Accessing a stereo file is then no different from accessing a mono file, except that the stereo file requires twice the amount of data to be transferred for the same duration of audio. As far as the user is concerned, the system can present a stereo sound file under a single title and note in the file header that it is stereo. In this case any buffering (see below) would have to be split such that left channel samples would be written to and read from one group of memory addresses, and right channel samples to and from another. As would be expected, stereo files take up twice the amount of disk space of the equivalent mono file.

3.2 RAM buffering

Computer disk drives were not originally designed for recording audio, although they can be made to serve this purpose. As explained in Chapter 5, a disk is normally formatted in sectors, often grouped into blocks, and the blocks making up a file need not be stored contiguously (contiguous means physically adjacent). The result of this is that data transfer to and from such media is not smooth but intermittent or burst-like. Furthermore, editing may involve the joining of sections from files stored in physically separate locations, resulting in breaks in the data flow from disk at edit points whilst the new file is located. Although this burst transfer rarely presents a problem in applications such as text processing (it does not matter if a text file is loaded in bursts) it is unsuitable for the recording and replay of realtime audio. Audio (and video in most cases) requires that samples are transferred to and from convertors or digital interfaces at a constant rate, in an unbroken stream. Consequently digital audio hardware and software must include mechanisms for converting burst data flow into continuous data flow and vice versa. This is achieved by using RAM (random access memory) as a short-term ‘buffer’ or reservoir.

RAM is temporary solid-state memory with a very fast access time and transfer rate. It can be addressed directly by the processing hardware of the audio workstation, and is used as an intermediate store for audio samples on their way to and from the disk drive (see Figure 3.1). During recording, audio samples are written into the RAM at a regular rate and read out again a short time later to be written as blocks of data on the disk. At least one complete sector of audio is transferred in one operation, and usually a number of sectors are written in one operation (see Section 3.4). The transfer is effectively time compressed, since samples acquired over, say, 100 ms, may be written to the disk in a short burst lasting only 20 ms, followed by a gap. During simple replay, data blocks are transferred from the disk into RAM in bursts and then read out at a steady rate for transfer to a D/A convertor or digital interface. The process of transferring out from the buffer normally begins before the file has been transferred completely into the buffer, because otherwise (a) there would be an unacceptable delay between the initiation of replay and the onset of an audible output, and (b) the size of the buffer would have to be great enough to hold the largest sound file entirely.

image

Figure 3.1 RAM buffering is used to convert burst data flow to continuous data flow, and vice versa

image

Figure 3.2 RAM buffering may be likened to a water reservoir that acts to convert intermittent filling to continuous outflow

The RAM buffer acts in a similar way to a water reservoir. It allows supply and demand to vary at its input and its output whilst remaining able to provide an unbroken supply, assuming that sufficient water remains in the reservoir. Figure 3.2 shows an analogy with a water bucket that has a hole in the bottom, filled by a tap. One may liken the tap to a disk drive and the water flowing out of the hole to an audio output. The tap may fill the bucket in bursts, but within certain limits this is converted into continuous outflow. Provided that the average flow rate of water entering the bucket is the same as the average rate at which it flows out of the hole, then the bucket will neither empty nor overflow (within the limits of the size of the bucket). If water flows out of the hole faster than it is supplied by the tap then the bucket will eventually become empty. On the other hand, the bucket could overflow if the tap was left on all the time and was filling the bucket faster than the hole could empty it.

image

Figure 3.3 A control system could be added to the simple reservoir to regulate inflow and outflow so that supply and demand are linked

Clearly some control mechanism is called for. Sensors could be attached to the insides of the bucket to detect high and low water levels, as shown in Figure 3.3, connected to control logic which operated a valve in the supply line. The valve would be opened when the water level was getting low, and closed when it was getting high. A tap on the bucket outlet could be added to stop and start the flow (the equivalent of the PLAY button for audio replay). Equivalents of this control mechanism exist in audio workstation software. Pointers are incremented up and down to register the state of fullness of RAM buffers during record and replay operations, and action is taken at certain states of fullness either to transfer new blocks of data to and from the disk or to halt transfer.

The analogy can be taken further. There might be more than one hole in the bucket (more audio outputs), larger holes in the bucket (higher sampling rates and resolutions) or a tap with low water pressure (a slow storage device). Audio system design is largely a matter of juggling with these parameters and others to optimise the system performance. (The bucket analogy does not hold water if examined too closely, as water will flow faster out of the holes in the bucket the fuller the bucket, and this does not hold true for memory buffers in audio workstations!)

RAM buffering has a number of other uses. Firstly, it can be used to ensure that any short term timing irregularities in the data coming from the storage device will be ironed out and will not be allowed to affect audio quality. Data written into memory from the store, even if it has timing jitter, can be read out from the store at a constant steady rate, under control of an accurate crystal clock. The only penalty of buffering is that it introduces a small delay between the input to and the output from the buffer, the extent of which depends on the delay between the writing of samples to the RAM and the reading of them out again. The maximum delay is limited by the size of the buffer, as with a small buffer there will come a point where the memory is filled and must be partially emptied before any new samples can be written in. The delay effect of the buffer can be disguised in operation because data can be read from disk ahead of the required time and written at an appropriate time after sample acquisition.

Secondly, the buffer may be used for synchronisation purposes. If audio data is to be synchronised with an external reference such as timecode, then the rate at which data is read out of the buffer can be finely adjusted to ensure that lock is maintained. It is also possible to align the timings of multiple audio channels that are supposed to be nominally in sync with each other.

The size of buffer in a digital audio system may or may not be under the user’s control, but is typically in the region of 0.5–2 Mbytes. An area of operating RAM will be set aside for this purpose, sometimes located on the audio processing board itself rather than being system RAM of the host computer. Generally, the more channels to be handled, the larger the buffer, since each channel requires its own memory space; also a larger buffer can help to compensate for badly fragmented storage space (see Section 5.4), although it cannot make up for a disk drive that is too slow overall.

3.3 Disk drive performance issues

Access time and transfer rate are important features governing the suitability of disk drives for primary digital audio storage. The sustained transfer rate is far more important than the instantaneous rate, since this is more likely to represent the performance in real file transfer operations.

Tables 3.1 and 3.2 show the data rates and capacities required for different resolutions of digital audio, either linear PCM or data-reduced. From this one can begin to work out the performance requirements of storage devices. The data rate for one channel of audio at 48 kHz, 16 bits, amounts to around 0.75 Mbit s–1, thus it might be assumed that a device with a transfer rate of 0.75 Mbit s–1 would be able to handle the replay of one audio channel’s data satisfactorily. If the store were made up of solid state RAM which has a negligible access time (of the order of tens or hundreds of nano seconds) then a transfer rate of 0.75 Mbit s–1 would be adequate, but in the usual case where the store is a disk drive, the access time will severely limit the average transfer rate. Although the burst transfer rate from the disk to the buffer may be high, the gaps between transfers as the drive searches for new blocks of data will reduce the effective rate. It is therefore the combination of access time and transfer rate that go to make up the effective transfer rate. What is needed is a fast transfer rate and a fast access time.

Table 3.1 Data rates and capacities for linear PCM

image

Table 3.2 Data rates and capacities for data-reduced audio

Bit rate (kbit s–1) Capacity/min. (Mbytes min–1) Capacity/hour (Mbytes hour–1)
64 0.5 27
96 0.7 41
128 0.9 55
196 1.4 84
256 1.8 110
384 2.7 165

The job of the buffer is to disguise the effects of access time delays, and it may be seen that the size of the buffer will depend on the potential access delay, among other things. If transfer is erratic, that is with long gaps and then extremely fast transfers, the buffer is likely to swing between being very full and very empty, rather than deviating a small amount around a half full position. In the former case it is likely that a larger buffer will be required.

Over a period of time the disk is likely to become fragmented and this will lead to file blocks being stored in a number of physically separate locations. The more fragmented a store becomes the lower the efficiency of data retrieval, as a file will be transferred in a number of short bursts separated by breaks while the next block is accessed. Furthermore, the access time depends on how far apart the blocks are, as the retrieval mechanism will take less time to travel a short distance than to travel a long way. (This is covered further in Chapter 5.) For this reason, figures quoted for access time can only ever be a rough guide.

Certain storage media have different access times and transfer rates when recording (writing) to those encountered when replaying (reading). For example, hard disks use a magnetic recording method that overwrites old information completely without erasing it first. Some magneto-optical drives require a two-stage process in order to rewrite over old data, so the required block must be erased on one revolution and then written on the next. There may also be a ‘verify’ pass after writing. This suggests that recording performance may not always be as good as replay performance, and that a disk drive may be able to replay more channels simultaneously than it can record.

For all the above reasons it is often difficult to calculate how many channels of audio one may expect a disk drive to be able to handle. To take an example, assume an older disk drive with an average access time of 20 milliseconds and a transfer rate of 20 Mbit s–1. If the access time was near zero then the transfer rate of 20 Mbit s–1 would allow around 26 channels of audio to be transferred at the example resolution given above, but the effective transfer rate in real operation will bring this number down to perhaps twelve or fewer channels for safe, reliable operation in a wide variety of operational circumstances. Editing operations also place considerable additional demands on disk drive performance, depending on how edits are carried out. Because of all this, some manufacturers play very safe and limit their systems to a small number of channels per disk drive, even if the drive might be able to handle more under some circumstances. Other software simply leaves it up to the user to determine when disk drive will fail to perform, or provides a warning when it is getting close to the limit. The effect of using a disk drive beyond the limits of its performance is normally to experience ‘drop-outs’ in replay, and system messages such as ‘drive too slow’ when attempting to replay large numbers of channels with many edits.

3.4 Allocation units or transfer blocks

Optimising the efficiency of data transfer to and from a storage device will depend on keeping the number of head seeks to a minimum for any given file transfer. This requires careful optimisation of the size and position of the audio transfer blocks or allocation units. Typically, a disk sector (that is the smallest addressable storage unit) contains 512 bytes of information, although some drives use 1024 byte (or greater) sectors. This is very small in relation to the size of a digital audio file of even moderate length and if a file were to be split up into chunks of 512 bytes spread all over the disk then efficiency would be impossibly reduced due to the large number of seeks required to different parts of the disk. For this reason a minimum transfer block or allocation unit is usually defined, which is a certain number of bytes that are transferred together and preferably stored contiguously in order to improve efficiency. It might be that a transfer block would contain 8 kbytes of audio data, which in the case of 512 byte sectors would correspond to 16 sectors. The size of the transfer block must be small enough to engender efficient use of the disk space in cases of fragmentation and large enough to result in efficient data transfer. If the digital audio system stores audio under the native filing system of the host computer then the size of the transfer block may be fixed during the formatting of the disk volume. A common size is 32 kbytes.

3.5 Multichannel recording and replay

3.5.1 Multitrack or multichannel?

It is important to understand a fundamental difference between the workstation concept of multichannel operation and the traditional concept of multitrack tape recording. The difference is that ‘tracks’ and ‘channels’ need not necessarily mean the same thing. In a multitrack tape recorder there may be up to 48 tracks of audio recorded onto the tape, each of which is an independent mono track lasting the length of the tape. Each of these tracks feeds a numbered audio output and is fed from a numbered input. Once sound is recorded onto a numbered track it is fixed in time and physical position in relation to other sounds recorded on the same tape and it will be replayed on the same-numbered audio channel at all times (unless the internal wiring of the machine is changed).

image

Figure 3.4 Tracks are represented in this simulated display as horizontal bands containing named sound file segments. The output to which that track is routed is selected at the left-hand side, along with recording and replay muting controls

In workstations the terms ‘track’ and ‘channel’ may be separated from each other, in that a sound file, once stored, may be replayed on any audio channel depending on the user’s choice. It may even be that the concept of the track is done away with altogether, but this depends on the user interface of the system. Most manufacturers have chosen to retain the concept of tracks because it is convenient and well understood. Tracks, in workstation terminology, are just ways of showing which sound elements have been grouped together for replay on the same channel, but they are not fixed as in tape recording. Figure 3.4 shows a simulated display from a multitrack package in which tracks are represented as horizontal bands containing sound file segments. On the left-hand side it is possible to change the physical audio output assigned for replay of that track. The sound segments can be moved around in time on the virtual track by sliding them left or right and they can be copied or moved to other tracks if necessary.

3.5.2 Inputs, outputs, tracks and channels

Because of the looser relationship between tracks, channels and audio inputs and outputs, confusion occasionally arises. Firstly, none of these are necessarily related to each other, although a designer may decide to relate them. In a 24-track tape machine, there are 24 inputs, 24 outputs, 24 tracks and 24 channels, so it is very easy to see a direct relationship between one and the others. It is even possible to say exactly where on the tape track 13 will be recorded at any point in time. In a workstation it is possible, for example, for there to be two inputs, eight outputs, 99 tracks, and eight channels. It is rarely possible to say exactly where track 13 will be recorded at any point, or what information is recorded on it, as it all depends on what the user has decided. In this example it may be that only two inputs have been provided because that is all that the designer is going to allow you to record at any one time, but it is highly likely that these two inputs could be routed to any ‘track’ or any output channel. The two inputs allow for the recording of stereo or mono sound files that will be stored in a free location and given names by the user. Although only two ‘tracks’ may be recorded at once, this operation may be performed many times to build up a large number of sound files in the store.

In some systems, the concept of the track has been considered as important, and in the above example there are 99 tracks (just a virtual concept) but only eight outputs or channels. This is because the user is allowed to record information onto any of the tracks, but he may only replay eight of them simultaneously. The number of simultaneous output channels is limited by the transfer rates of the storage devices, the signal processing capacity of the system and the number of D/A convertors or digital outputs employed. By expanding the system, adding more or faster disks and adding more processing power, more of the 99 tracks could be replayed simultaneously. Many manufacturers have taken this modular approach to system design, allowing the user to start off in a small way, expanding the capabilities of the system as time and money allow.

3.5.3 Track usage, storage capacity and disk assignment

The storage space required for multiple channels increases pro rata with the number of channels, although in fact eight-track recording may not require eight times the storage space of mono recording because many ‘tracks’ may be blank for large amounts of the time. If you think about the average multitrack recording on tape you will realise that many tracks have large gaps with nothing recorded. The total storage space used will depend on the total duration of the mono sound files used in the program, whatever tracks or channels they are assigned to. It has been estimated, for example, that sound effects tracks in feature film production contain about two-thirds silence and that dialogue tracks are only 10–20 per cent utilised.

(It is often said that disk-based systems do not record the silences on tracks and therefore do not use up as much storage space as might be expected, but the only time when silences save storage time is when they exist as blank spaces between the output of sound files, where no sound file is assigned to play (see Figure 3.5). Recorded silence uses as much disk space as recorded music!)

Multichannel disk recording systems sometimes use more than one disk drive, and there is a limit to the number of channels that can be serviced by a single drive. It is necessary, therefore, to determine firstly how many channels a storage device will handle realistically and then to work out how many are needed to give the total capacity required. Some older systems attempted to imitate a multitrack tape recorder in assigning certain disk drives permanently to certain groups of tracks, as shown in Figure 3.6, but this limited operational flexibility. If a sound file from one track were needed on another it might have to be copied to the appropriate drive, which would take time. This approach is becoming much less common now that the performance of disk drives is getting to the point where one can replay perhaps 16 channels simultaneously from a single drive. In modular systems, if one needs greater channel recording and replay capacity one can simply add further disk I/O cards, each connected to a separate disk. If one needs more storage capacity then more disk drives can be attached to the same SCSI bus, as shown in Figure 3.7. It is then relatively unimportant which drive a file is stored on, provided that the software is capable of handling the addressing of multiple drives. There may be some restrictions if the user has constructed a play list which requires more simultaneous file transfers from a certain disk than can be handled.

image

Figure 3.5 The silent section on the upper track does not require any disk space because no recording exists for this time slot. The silent section on the lower track was recorded as part of a file, and so consumes as much space as any other sound

image

Figure 3.6 Some older multitrack disk systems assigned disks permanently to certain tracks, as shown here

3.5.4 Dropping-in

In multitrack music systems the capability to ‘drop-in’ is important. Dropping-in involves instantaneous entry into record mode at the touch of a button and it is expected that a seamless join will result between old and new material, both at the start of the drop-in and at the drop-out into the old material again.

image

Figure 3.7 Arrangement of multiple disks in a typical modular system, showing how a number of disks can be attached to a single SCSI chain to increase storage capacity, and how additional disk I/O cards can be added to increase data throughput for additional audio channels

Dropping in and out are really very similar operations to those involved in editing, where a crossfade must be added between old and new material at the join. In terms of file operations, it may be appreciated that one cannot simply start to write new material half way through a previously written file, making it necessary to write a new file for the ‘dropped-in’ portion. Internally, as part of the replay schedule, the system will then have to keep a record of times at which it must crossfade from one file to the other and back again.

3.6 System latency

Latency is the delay that occurs between one event and another. In workstations the term latency is usually used to describe the delay between inputs and outputs of the audio hardware. It is particularly important because this latency affects the ease with which a workstation may be used as the principal audio signal-processing engine in a studio, this now being a realistic prospect. Large-scale audio workstations with multiple inputs and outputs can now handle most of the operations that would once have been handled by stand-alone mixers, effects and recording equipment. Furthermore they are now capable of real-time signal processing and ‘full duplex’ operation which means that audio signals can be taken from an input, processed and sent to an output, this taking only a few milliseconds. Low latency is therefore highly desirable, in particular when using workstation channels as foldback signal paths to provide cue signals to musicians when overdubbing new material. It may also be important to be able to fix the latency rather than having it change when different operations are undertaken. This issue is raised further at other appropriate points in the book.

3.7 Principles of audio editing

3.7.1 Advantages of non-linear editing

Speed and flexibility of editing is probably one of the greatest benefits obtained from nonlinear recording. Tape editing had some advantages but with digital audio it was often cumbersome, requiring material to be copied in real time from source tapes to a master tape. Difficulties also arose when making minor adjustments to a finished master. Tape-cut editing was very fast and cheap, being the main method used for years with analog tape, but it was rather unreliable on digital formats and little used in practice. When cut-editing tape, the editor fixed the edited sections in a physical and therefore in a temporal relationship with each other. If he or she desired to change any aspect of the edited master then it would be taken apart and rejoined, there usually only being one final version of the master tape.

The majority of editing is done today using audio workstations. Non-linear editing has also come to feature very widely in post-production for video and film, because it has a lot in common with film post-production techniques involving a number of independent mono sound reels. The editor may preview a number of possible masters in their entirety before deciding which should be the final one. Even after this, it is a simple matter to modify the edit list to update the master. Edits may also be previewed and experimented with in order to determine the most appropriate location and processing – an operation which is less easy with other forms of editing.

This kind of editing is truly non-destructive because the edited master only exists as a series of instructions to replay parts of certain sound files at specified times, with optional signal processing overlaid, as shown in Figure 3.8. The original sound files remain intact at all times and a single sound file can be used as many times as desired in different locations and on different tracks without the need to duplicate the audio data. Editing may involve the simple joining of sections, or it may involve more complex operations such as long crossfades between one album track and the next, or gain offsets between one section and another. The beauty of non-linear editing is that all these things are possible without in any way affecting the original source material.

3.7.2 Sound files and sound segments

Sound files are discussed further in Chapter 6: they are the individual sound recordings contained on a disk, each of which is catalogued in the disk directory. In the case of music editing sound files might be session takes, anything from a few bars to a whole movement, while in picture dubbing they might contain a phrase of dialogue or a sound effect. They are normally stored with a name to identify them. Specific segments of these sound files can be defined by the user while editing, in order to get rid of unwanted material or to select useful extracts. In such cases it is useful to be able to identify the wanted segment as an entity in its own right, so that it can be named and used wherever required. The terminology varies but such identified parts of sound files are usually termed either ‘clips’ or ‘segments’. They require the original sound files as source data and will not usually be replayable independently.

image

Figure 3.8 Instructions from an edit decision list (EDL) are used to control the replay of sound file segments from disk, which may be subjected to further processing (also under EDL control) before arriving at the audio outputs

Rather than creating a copy of the segment or clip and storing it as a separate sound file, it is normal simply to store it as a ‘soft’ entity – in other words as simply commands in an edit list or project file that identify the start and end addresses of the segment concerned and the sound file to which it relates. It may be given a name by the operator and subsequently used as if it were a sound file in its own right. An almost unlimited number of these segments can be created from original sound files, without the need for any additional audio storage space.

3.7.3 Edit point handling

Edit points can be simple butt joins or crossfades. A butt join is very simple because it involves straightforward switching from the replay of one sound segment to another. Since replay involves temporary storage of the sound file blocks in RAM (see above) it is a relatively simple matter to ensure that both outgoing and incoming files in the region of the edit are available in RAM simultaneously (in different address areas). Up until the edit, blocks of the outgoing file are read from the disk into RAM and thence to the audio outputs. As the edit point is reached a switch occurs between outgoing and incoming material by instituting a jump in the memory read address corresponding to the start of the incoming material. Replay then continues by reading subsequent blocks from the incoming sound file. It is normally possible to position edits right down to single sample accuracy, making the timing resolution as fine as a number of tens of microseconds if required.

image

Figure 3.9 (a) A bad butt edit results in a waveform discontinuity. (b) Butt edits can be made to work if there is minimal discontinuity

The problem with butt joins is that they are quite unsubtle. Audible clicks and bumps may result because of the discontinuity in the waveform that may result, as shown in Figure 3.9. It is normal, therefore, to use at least a short crossfade at edit points to hide the effect of the join. This is what happens when analog tape is spliced, because the traditional angled cut has the same effect as a short crossfade (of between 5 and 20 ms depending on the tape speed and angle of cut). Most workstations have considerable flexibility with crossfades and are not limited to short durations. It is now common to use crossfades of many shapes and durations (e.g. linear, root cosine, equal power) for different creative purposes. This, coupled with the ability to preview edits and fine-tune their locations, has made it possible to put edits in places previously considered impossible.

The locations of edit points are kept in an edit decision list (EDL) which contains information about the segments and files to be replayed at each time, the in and the out points of each section and details of the crossfade time and shape at each edit point. It may also contain additional information such as signal processing operations to be performed (gain changes, EQ, etc.).

3.7.4 Crossfading

Crossfading is similar to butt joining, except that it requires access to data from both incoming and outgoing files for the duration of the crossfade. The crossfade calculation involves simple signal processing, during which the values of outgoing samples are multiplied by gradually decreasing coefficients whilst the values of incoming samples are multiplied by gradually increasing coefficients. Time coincident samples of the two files are then added together to produce output samples, as described in Chapter 2. The duration and shape of the crossfade can be adjusted by altering the coefficients involved and the rate at which the process is executed.

image

Figure 3.10 Conceptual diagram of the sequence of operations that occur during a crossfade. X and Y are the incoming and outgoing sound segments

Crossfades are either performed in real time, as the edit point passes, or pre-calculated and written to disk as a file. There are merits to both approaches. Real-time crossfades can be varied at any time and are simply stored as commands in the EDL, indicating the nature of the fade to be executed. The process is similar to that for the butt edit, except that as the edit point approaches samples from both incoming and outgoing segments are loaded into RAM in order that there is an overlap in time. During the crossfade it is necessary to continue to load samples from both incoming and outgoing segments into their respective areas of RAM, and for these to be routed to the crossfade processor, as shown in Figure 3.10. The resulting samples are then available for routeing to the output. A consequence of this is that a temporary increase in disk activity occurs, because two streams of data rather than one are read during a crossfade. It is important, therefore, to have a disk drive and buffer size capable of handling the additional load present during real time crossfades, which represents a doubling in the transfer rate required. Eight channel replay would effectively become sixteen channel replay for the duration of a crossfade edit on all eight channels, for example. An editing system may consequently be pushed close to its limits if asked to perform long real time crossfades on multiple channels at the same time.

A common solution to this problem is for the crossfade to be calculated in non-real time when the edit point and crossfade duration is first determined by the user. This incurs a short delay while the system works out the sums, after which a new sound file is stored which is simply the crossfade period and nothing else. Replay of the edit is then a more simple matter, which involves playing the outgoing segment up to the beginning of the crossfade, then the cross-fade file, then the incoming segment from after the crossfade, as shown in Figure 3.11. Load on the disk drive is therefore no higher than normal. This approach has advantages because it makes any number and length of crossfade possible on any combination of tracks, with the sure knowledge that they can be replayed. The slight disadvantage is the need for the system to write a new crossfade file every time the edit is altered and the disk space taken up by the crossfade files (although this is normally quite small).

image

Figure 3.11 Replay of a precalculated crossfade file at an edit point between files X and Y

The shape of the crossfade is often able to be changed to suit different operational purposes. Standard linear fades (those where the gain changes uniformly with time) are not always the most suitable for music editing, especially when the crossfade is longer than about ten milliseconds. The result may be a momentary drop in the resulting level in the centre of the cross-fade that is due to the way in which the sound levels from the two files add together. If there is a random phase difference between the signals, as there will often be in music, the rise in level resulting from adding the two signals will normally be around 3 dB, but the linear crossfade is 6 dB down in its centre resulting in an overall level drop of around 3 dB (see Figure 3.12). Exponential crossfades and other such shapes may be more suitable for these purposes, because they have a smaller level drop in the centre. It may even be possible to design customised crossfade laws. Figure 3.13 shows the crossfade editing controls from a system by Sonic Solutions. It is possible to alter the offset of the start and end of the fade from the actual edit point and to have a faster fade up than fade down.

Many systems also allow automated gain changes to be introduced as well as fades, so that level differences across edit points may be corrected. Figure 3.14 shows a crossfade profile that has a higher level after the edit point than before it, and different slopes for the in and out fades. A lot of the difficulties that editors encounter in making edits work can be solved using a combination of these facilities.

3.7.5 Editing modes

During the editing process the operator will load appropriate sound files and audition them, both on their own and in a sequence with other files. The exact method of assembling the edited sequence depends very much on the user interface, but it is common to present the user with a visual analogy of moving tape, allowing files to be ‘cut-and-spliced’ or ‘copied and pasted’ into appropriate locations along the virtual tape. These files, or edited clips of them, are then played out at the timecode locations corresponding to their positions on this ‘virtual tape’ (an example is shown in Figure 3.15). It is also quite common to display a representation of the audio waveform that allows the editor to see as well as hear the signal around the edit point (see Figure 3.16).

image

Figure 3.12 Summation of levels at a crossfade. (a) A linear crossfade can result in a level drop if the incoming and outgoing material are non-coherent. (b) An exponential fade, or other similar laws, can help to make the level more constant across the edit

image

Figure 3.13 An example of crossfade control in Sonic Studio HD

image

Figure 3.14 The system may allow the user to program a gain profile around an edit point, defining the starting gain (A), the fade-down time (B), the fade-up time (D), the point below unity at which the two files cross over (C) and the final gain (E)

image

Figure 3.15 Example from SADiE editing system, showing audio clips assigned to different tracks on a virtual tape, against a timeline

In the editing of music using digital tape systems it was common to assemble an edited master from the beginning, copying takes from source tapes in sequence onto the master. An example of typical procedure will serve to illustrate the point. Starting at the beginning of the piece of music the first take would be copied to the master tape until a short time after the first edit was to be performed. The editor would then locate the edit point on the master tape (the outgoing take) by playing up to the approximate point and marking it, followed by fine trimming of this point, either by nudging it in small time increments, or by the simulation of analog ‘reel-rocking’. The edit point would then be confirmed and the same procedure performed on the source take to be joined at this point (the incoming take). This edit would then be auditioned, with a crossfade between outgoing and incoming material at the edit point, after which any further trimming would be performed before the edit was committed to the master tape by dropping it into record mode at the appropriate time.

image

Figure 3.16 Example from SADiE editing system showing the ‘trim editor’ in which is displayed a detailed view of the audio waveform around the edit point, together with information about the crossfade

In non-linear systems this approach is often simulated, allowing the user to roughly locate an edit point while playing the virtual tape followed by a fine trim using simulated reel-rocking or a detailed view of the waveform. Some software presents source and destination streams as well, in further simulation of the tape approach. Sound files and segments are treated as the equivalent of the ‘takes’ in the above example and the system notes the points in each segment at which one is to cease and another is to begin playing, with whatever overlap has been specified for cross-fading.

It is also possible to insert or change sections in the middle of a finished master, provided that the EDL and source files are still available. To take an example, assume that an edited opera has been completed and that the producer now wishes to change a take somewhere in the middle (see Figure 3.17). The replacement take is unlikely to be exactly the same length but it is possible simply to shuffle all of the following material along or back slightly to accommodate it, this being only a matter of changing the EDL rather than modifying the stored music in any way. The files are then simply played out at slightly different times than in the first version of the edit.

It is also normal to allow edited segments to be fixed in time if desired, so that they are not shuffled forwards or backwards when other segments are inserted. This ‘anchoring’ of segments is often used in picture dubbing when certain sound effects and dialogue have to remain locked to the picture.

image

Figure 3.17 Replacing a take in the middle of an edited program. (a) Tape based copy editing results in a gap of fixed size, which may not match the new take length. (b) Non-linear editing allows the gap size to be adjusted to match the new take

3.7.6 Simulation of ‘reel-rocking’

It is common to simulate the effect of reel-rocking in non-linear editors, providing the user with the sonic impression that reels of analog tape are being ‘rocked’ back and forth as they are in analog tape editing when fine-searching edit points. Editors are used to the sound of tape moving in this way, and are skilled at locating edit points when listening to such a sound.

The simulation of variable speed replay in both directions (forwards and backwards) is usually controlled by a wheel or sideways movement of a mouse which moves the ‘tape’ in either direction around the current play location. This magnitude and direction of this movement is used to control the rate at which samples are read from the disk file, via the buffer, and this replaces the fixed sampling rate clock as the controller of the replay rate. Systems differ very greatly as to the sound quality achieved in this mode, because it is in fact quite a difficult task to provide convincing simulation. So poor have been many attempts that many editors do not use the feature, preferring to judge edit points accurately ‘on the fly’, followed by trimming or nudging them either way if they are not successful the first time. Good simulation requires very fast, responsive action and an ergonomically suitable control. A mouse is very unsuitable for the purpose. It also requires a certain amount of DSP to filter the signal correctly, in order to avoid the aliasing that can be caused by varying the sampling rate.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset