10

Digital audio editing

Digital audio editing takes advantage of the freedom to store data in any suitable medium and the signal processing techniques developed in computation. This chapter shows how the edit process is achieved using combinations of storage media, processing and control systems.

10.1 Introduction

Editing ranges from a punch-in on a multi-track recorder, or the removal of ‘ums and ers’ from an interview to the assembly of myriad sound effects and mixing them with timecode-locked dialogue in order to create a film soundtrack. Mastering is a form of editing where various tracks are put together to make a master recording that will be duplicated for general sale. The duration of each musical piece, the length of any pauses between pieces and the relative levels of the pieces on the disk have to be determined at the time of mastering. The master recording will be compiled from source media that may each contain only some of the pieces required on the final CD, in any order. The recordings will vary in level, and may contain several retakes of a passage.

The purpose of the digital mastering editor is to take each piece, and insert sections from retakes to correct errors, and then to assemble the pieces in the correct order, with appropriate pauses between and with the correct relative levels to create the master tape.

Digital audio editors work in two basic ways, by assembling or by inserting sections of audio waveform to build the finished waveform. Both terms have the same meaning as in the context of video recording. Assembly begins with a blank master file or recording. The beginning of the work is copied from the source, and new material is successively appended to the end of the previous material. Figure 10.1 shows how a master recording is made from source recordings by the process of assembly. Insert editing begins with an existing recording in which a section is replaced by the edit process. Punch-in in multi-track recorders is a form of insert-editing.

10.2 Editing with random access media

In all types of audio editing the goal is the appropriate sequence of sounds at the appropriate times. In analog audio equipment, editing was almost always performed using tape or magnetically striped film. These media have the characteristic that the time through the recording is proportional to the distance along the track. Editing consisted of physically cutting and splicing the medium, in order to mechanically assemble the finished work, or of copying lengths of source medium to the master.

Figure 10.1 The function of an editor is to perform a series of assembles to produce a master tape from source tapes.

image

Whilst open-reel digital audio tape formats support splice editing, in all other digital audio editing samples from various sources are brought from the storage media to various pages of RAM. The edit is performed by crossfading between sample streams retrieved from RAM and by subsequently rewriting on the output medium. Thus the nature of the storage medium does not affect the form of the edit in any way except the amount of time needed to execute it.

Tapes only allow serial access to data, whereas disks and RAM allow random access and so can be much faster. Editing using random access storage devices is very powerful as the shuttling of tape reels is avoided. The technique is often called non-linear editing.

10.3 Editing on recording media

All digital recording media use error correction which requires an interleave, or reordering, of samples to reduce the impact of large errors, and the assembling of many samples into an error correcting codeword. Codewords are recorded in constant-sized blocks on the medium. Audio editing requires the modification of source material in the correct real-time sequence to sample accuracy. This contradicts the interleaved block based codes of real media.

Editing to sample accuracy simply cannot be performed directly on real media. Even if an individual sample could be located in a block, replacing the samples after it would destroy the codeword structure and render the block uncorrectable.

The only solution is to ensure that the medium itself is only edited at block boundaries so that entire error correction codewords are written down. In order to obtain greater editing accuracy, blocks must be read from the medium and deinterleaved into RAM, modified there and re-interleaved for writing back on the medium, the so called read-modify-write process.

In disks, blocks are often associated into clusters consisting of a fixed number of blocks in order to increase data throughput. When clustering is used, editing on the disk can only take place by rewriting entire clusters.

10.4 The structure of an editor

The digital audio editor consists of three main areas. First, the various contributory recordings must enter the processing stage at the right time with respect to the master recording. This will be achieved using a combination of timecode, transport synchronization and RAM timebase correction. The synchronizer will take control of the various transports during an edit so that one section reaches its out-point just as another reaches its in-point. Second, the audio signal path of the editor must take the appropriate action, such as a crossfade, at the edit point. This requires some digital processing circuitry. Third, the editing operation must be supervised by a control system which coordinates the operation of the transports and the signal processing to achieve the desired result.

Figure 10.2 A digital audio editor requires an audio path to process the samples, and a timing and synchronizing section to control the time alignment of signals from the various sources. A supervisory control system acts as the interface between the operator and the hardware.

image

Figure 10.2 shows a simple block diagram of an editor. Each source device, be it disk, tape or some other medium must produce timecode locked to the audio samples. The synchronizer section of the control system uses the timecode to determine the relative timing of sources and sends remote control signals to the transport to make the timing correct. The master recorder is also fed with timecode in such a way that it can make a contiguous timecode track when performing assembly edits. The control system also generates a master sampling rate clock to which contributing devices must lock in order to feed samples into the edit process. The audio signal processor takes contributing sources and mixes them as instructed by the control system. The mix is then routed to the recorder.

10.5 Timecode

Synchronization between timecode and the sampling rate is essential, otherwise there will be a conflict between the need to lock the various sampling rates in the system with the need to lock the timecodes. This can only be resolved with synchronous timecode. The EBU timecode format relates easily to digital audio sampling rates of 48 kHz, 44.1 kHz and 32 kHz, but it is not so easy with the drop-frame SMPTE timecode necessary for NTSC recording due to the 0.1 per cent slip between the actual field rate and 60 Hz.

The timecode used in the SMPTE standard for 525/60 is shown in Figure 10.3. PAL VTRs use EBU timecode that is basically similar to SMPTE. These store hours, minutes, seconds and frames as binary-coded decimal (BCD) numbers, which are serially encoded along with user bits into an FM channel code (see Chapter 6) which is recorded on one of the linear audio tracks of the tape. Disks also use timecode for audio synchronization, but the timecode forms part of the access mechanism so that samples may be retrieved by specifying the required timecode. This mechanism was detailed in Chapter 9.

A further problem with the use of video-based timecode is that the accuracy to which the edit must be made in audio is much greater than the frame boundary accuracy needed in video. When the exact edit point is chosen in an audio editor, it will be described to great accuracy and is stored as hours, minutes, seconds, frames and the number of the sample within the frame.

10.6 Locating the edit point

Digital audio editors must simulate the ‘rock and roll’ process of edit-point location in analog tape recorders where the tape reels are moved to and fro by hand. The solution is to transfer the recording in the area of the edit point to RAM in the editor. RAM access can take place at any speed or direction and the precise edit point can then be conveniently found by monitoring audio from the RAM.

Figure 10.4 shows how the area of the edit point is transferred to the memory. The source device is commanded to play, and the operator listens to replay samples via a DAC in the monitoring system. The same samples are continuously written into a memory within the editor. This memory is addressed by a counter which repeatedly overflows to give the memory a ring-like structure rather like that of a timebase corrector, but somewhat larger. When the operator hears the rough area in which the edit is required, he will press a button. This action stops the memory writing, not immediately, but one half of the memory contents later. The effect is then that the memory contains an equal number of samples before and after the rough edit point. Once the recording is in the memory, it can be accessed at leisure, and the constraints of the source device play no further part in the edit-point location.

Figure 10.3 In SMPTE standard timecode, the frame number and time are stored as eight BCD symbols. There is also space for 32 user-defined bits. The code repeats every frame. Note the asymmetrical sync word which allows the direction of tape movement to be determined.

image

Figure 10.4 The use of a ring memory which overwrites allows storage of samples before and after the coarse edit point.

image

There are a number of ways in which the memory can be read. If the memory address is supplied from a counter clocked at the appropriate rate, the edit area can be replayed at normal speed, or at some fraction of normal speed repeatedly. In order to simulate the analog method of finding an edit point, the operator is provided with a scrub wheel or rotor, and the memory address will change at a rate proportional to the speed with which the rotor is turned, and in the same direction. Thus the sound can be heard forward or backward at any speed, and the effect is exactly that of manually rocking an analog tape past the heads of an ATR.

The operation of a scrub wheel encoder was shown in Chapter 3. Although a simple device, there are some difficulties to overcome. There are not enough pulses per revolution to create a clock directly and the human hand cannot turn the rotor smoothly enough to address the memory directly without flutter. A phase-locked loop is generally employed to damp fluctuations in rotor speed and multiply the frequency. A standard sampling rate must be recreated to feed the monitor DAC and a rate convertor, or interpolator, is necessary to restore the sampling rate to normal. These items can be seen in Figure 10.5.

Figure 10.5 In order to simulate the edit location of analog recorders, the samples are read from memory under the control of a hand-operated rotor.

image

The act of pressing the coarse edit-point button stores the timecode of the source at that point, which is frame-accurate. As the rotor is turned, the memory address is monitored, and used to update the timecode to sample accuracy. Before assembly can be performed, two edit points must be determined, the out-point at the end of the previously recorded signal, and the in-point at the beginning of the new signal. The editor’s microprocessor stores these in an edit decision list (EDL) in order to control the automatic assemble process.

10.7 Editing with disk drives

Using one or other of the above methods, an edit list can be made which contains an in-point, an out-point and an audio filename for each of the segments of audio which need to be assembled to make the final work, along with a crossfade period and a gain parameter. This edit list will also be stored on the disk. When a preview of the edited work is required, the edit list is used to determine what files will be necessary and when, and this information drives the disk controller.

Figure 10.6 shows the events during an edit between two files. The edit list causes the relevant audio blocks from the first file to be transferred from disk to memory. These blocks will be accessed by the signal processor in order to produce the preview output. As the edit point approaches, the disk controller will also place blocks from the incoming file into the memory. It can do this because the rapid data-transfer rate of the drive allows blocks to be transferred to memory much faster than real time, leaving time for the positioner to seek from one file to another. In different areas of the memory there will be simultaneously the end of the outgoing recording and the beginning of the incoming recording. The signal processor will use the fine edit-point parameters to work out the relationship between the actual edit points and the cluster boundaries. The relationship between the cluster on disk and the RAM address to which it was transferred is known, and this allows the memory address to be computed in order to obtain samples with the correct timing.

Figure 10.6 In order to edit together two audio files, they are brought to memory sequentially. The audio processor accesses file pages from both together, and performs a crossfade between them. The silo produces the final output at constant steady-sampling rate.

image

Before the edit point, only samples from the outgoing recording are accessed, but as the crossfade begins, samples from the incoming recording are also accessed, multiplied by the gain parameter and then mixed with samples from the outgoing recording according to the crossfade period required. The output of the signal processor becomes the edited preview material, which can be checked for the required subjective effect. If necessary the in-or out-points can be trimmed, or the crossfade period changed, simply by modifying the edit-list file. The preview can be repeated as often as needed, until the desired effect is obtained. At this stage the edited work does not exist as a file, but is recreated each time by a further execution of the EDL. Thus a lengthy editing session need not fill up the disk.

It is important to realize that at no time during the edit process were the original audio files modified in any way. The edit process was performed solely by reading the audio files. The power of this approach is that if an edit list is created wrongly, the original recording is not damaged, and the problem can be put right simply by correcting the edit list. The advantage of a disk-based system for such work is that location of edit points, previews and reviews are all performed almost instantaneously, because of the random access of the disk. This can reduce the time taken to edit a program to a quarter of that needed with a tape machine.1

During an edit, the disk drive has to provide audio files from two different places on the disk simultaneously, and so it has to work much harder than for a simple playback. If there are many close-spaced edits, the drive may be hard-pressed to keep ahead of real time, especially if there are long crossfades, because during a crossfade the source data rate is twice as great as during replay. A large buffer memory helps this situation because the drive can fill the memory with files before the edit actually begins, and thus the instantaneous sample rate can be met by the memory’s emptying during disk-intensive periods. In practice crossfades measured in seconds can be achieved in a disk-based system, a figure not matched by tape systems.

Once the editing is finished, it will generally be necessary to transfer the edited material to form a contiguous recording so that the source files can make way for new work. If the source files already exist on tape the disk files can simply be erased. If the disks hold original recordings they will need to be backed up to tape if they will be required again. In large broadcast systems, the edited work can be broadcast directly from the disk file. In smaller systems it will be necessary to output to some removable medium, since the Winchester drives in the editor have fixed media. It is only necessary to connect the AES/EBU output of the signal processor to any type of digital recorder, and then the edit list is executed once more. The edit sequence will be performed again, exactly as it was during the last preview, and the results will be recorded on the external device.

10.8 Editing in DAT

In order to edit a DAT tape, many of the constraints of video editing apply. Editing can only take place at the beginning of an interleave block, known as a frame, which is contained in two diagonal tracks. The transport would need to perform a preroll, starting before the edit point, so that the drum and capstan servos would be synchronized to the tape tracks before the edit was reached. Fortunately, the very small drum means that mechanical inertia is minute by the standards of video recorders, and lock-up can be very rapid.

Although editing can be done on a DAT machine that can only record or play, a better solution, used in professional machines, is to fit two sets of heads in the drum. The standard permits the drum size to be increased and the wrap angle to be reduced provided that the tape tracks are recorded to the same dimensions. In normal recording, the first heads to reach the tape tracks would make the recording, and the second set of heads would be able to replay the recording immediately afterwards for confidence monitoring. For editing, the situation would be reversed. The first heads to meet a given tape track would play back the existing recording, and this would be de-interleaved and corrected, and presented as a sample stream to the record circuitry. The record circuitry would then interleave the samples ready for recording. If the heads are mounted a suitable distance apart in the scanner along the axis of rotation, the time taken for tape to travel from the first set of heads to the second will be equal to the decode/encode delay. If this process goes on for a few blocks, the signal going to the record head will be exactly the same as the pattern already on the tape, so the record head can be switched on at the beginning of an interleave block. Once this has been done, new material can be crossfaded into the sample stream from the advanced replay head, and an edit will be performed.

If insert editing is contemplated, following the above process, it will be necessary to crossfade back to the advanced replay samples before ceasing rerecording at an interleave block boundary. The use of overwrite to produce narrow tracks causes a problem at the end of such an insert. Figure 10.7 shows that this produces a track half the width it should be. Normally the error-correction system would take care of the consequences, but if a series of inserts were made at the same point in an attempt to make fine changes to an edit, the result could be an extremely weak signal for the duration of one track. One solution is to incorporate an algorithm into the editor so that the points at which the tape begins and ends recording change on every attempt. This does not affect the audible result as this is governed by the times at which the crossfader operates.

Figure 10.7 When editing a small track-pitch recording, the last track written will be 1.5 times the normal track width, since that is the width of the head. This erases half of the next track of the existing recording.

image

10.9 Editing in open-reel digital recorders

On many occasions in studio recording it is necessary to replace a short section of a long recording, because a wrong note was played or something fell over and made a noise. The tape is played back to the musicians before the bad section, and they play along with it. At a musically acceptable point prior to the error, the tape machine passes into record, a process known as punch-in, and the offending section is rerecorded. At another suitable time, the machine ceases recording at the punch-out point, and the musicians can subsequently stop playing.

Once more, a read-modify-write approach is necessary, using a record head positioned after the replay head. The mechanism necessary is shown in Figure 10.8 Prior to the punch-in point, the replay-head signal is de-interleaved, and this signal is fed to the record channel. The record channel re-interleaves the samples, and after some time will produce a signal identical to what is already on the tape. At a block boundary the record current can be turned on, when the existing recording will be rerecorded. At the punch-in point, the samples fed to the record encoder will be crossfaded to samples from the ADC. The crossfade takes place in the noninterleaved domain. The new recording is made to replace the unsatisfactory section, and at the end, punch-out is performed by crossfading to the samples from the replay head. After some time, the record head will once more be rerecording what is already on the tape, and at a block boundary the record current can be switched off. The crossfade duration can be chosen according to the nature of the recorded material. It is possible to rehearse the punch-in process and monitor what it would sound like by feeding headphones from the crossfader, and doing everything described except that the record head is disabled. The punch-in and punch-out points can then be moved to give the best subjective result. The machine can learn the sector addresses at which the punches take place, so the final punch is fully automatic.

Assemble editing, where parts of one or more source tapes are dubbed from one machine to another to produce a continuous recording, is performed in the same way as a punch-in, except that the punch-out never comes. After the new recording from the source machine is faded in, the two machines continue to dub until one of them is stopped. This will be done some time after the next assembly point is reached.

10.10 Jump editing

Conventional splice handling in stationary head recorders was detailed in Chapter 8. In an extension to the principle, suggested by Lagadec,2 the samples from the area of the splice are not heard. Instead an electronic edit is made between the samples before the splice and those after.

Figure 10.8 The four stages of an insert (punch-in/out) with interleaving: (a) rerecord existing samples for at least one constraint length; (b) crossfade to incoming samples (punch-in point); (c) crossfade to existing replay samples (punch-out point); (d) rerecord existing samples for at least one constraint length. An assemble edit consists of steps (a) and (b) only.

image

In this system, a tape splice is made physically with excess tape adjacent to the intended edit points. The timebase corrector has two read-address generators that can access the memory independently. It will be seen in Figure 10.9 that when the machine plays the tape, the capstan is phase-advanced so that the timebase corrector is causing a long delay to compensate. As the splice is detected, the corruption due to the splice enters the TBC memory and travels towards the output. As the splice nears the end of the memory, the machine output crossfades to a signal from the second TBC output that has been delayed much less. The data in the area of the tape splice are thus omitted. The capstan will now be effectively lagging because the delay has been shortened, and it will speed up slightly for a short period until the lead condition is re-established. This can be done without ill effect since the sample rate from the memory remains constant throughout. Although the splice is an irrevocable mechanical act, the precise edit timing can be changed at will by controlling the sector address at which the TBC jumps, which determines the out-point, and the address difference, which determines the length of tape omitted, and thus controls the in-point. The size of the jump is limited by the available memory.

Figure 10.9 Jump editing. (a) Splice approaches, capstan is advanced, and audio is delayed. (b) Splice passes head, and error burst travels down delay. (c) Crossfader fades to signal after splice. (d) Capstan accelerates, and delay increases. When the delay tap reaches the end, the crossfader can switch back ready for the next splice.

image

If only a short section of audio is to be removed, no splice is necessary at all as a memory jump can be used to omit a short length of the recording. Such a system would be excellent for news broadcasts where it is often necessary to remove many short sections of tape to eliminate hesitations and unwanted pauses from interviews. Control of the jumping could be by programming a CPU to recognize timecode or sector addresses and insert the commands, or by inserting the jump distance in the reference track prior to the splice. In either case machines not equipped to jump would handle any splices with mechanically determined timing.

Jump editing can also be used in rotary head recorders such as DAT and the Nagra-D. Rotary head machines have a low linear tape speed and so can accelerate the tape to omit quite long sections whilst replay continues from memory.

References

Todoroki, S., et al., New PCM editing system and configuration of total professional digital audio system in near future. Presented at the 80th Audio Engineering Society Convention (Montreux, 1986), Preprint 2319(A8)

Lagadec, R., Current status in digital audio. Presented at the IERE Video and Data Recording Conference (Southampton, 1984)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset