In the following chapter the basics of timecode and synchronization are discussed. In the days of analog recording the need for synchronization of audio signals was not obvious, whereas it has always been an issue for video systems. This was because analog recordings were not divided up into samples, blocks or frames that had to happen at specific instances in time – they were time-continuous entities with no explicit time structure. There was nonetheless a requirement to synchronize the speeds of recording and replay machines in some cases, particularly when it became necessary to run them alongside video machines, or to lock two analog recorders together. This was essentially what was meant by machine sychronization, and SMPTE/EBU timecode of some sort, based on video timing structures, was usually used as a timing reference. A form of this timecode is still used as a positional reference in digital audio and video systems, and a MIDI equivalent is also possible, as described below.

In these days of digital audio and video, the use of signal synchronization is unavoidable. For example, in order to handle multiple streams of either type of signal in a mixer or recording system it is usually necessary for them to be running at the same speed, having the same sampling frequency, and often with their frames, blocks or samples aligned in time. If not, all sorts of problems can arise, ranging from complete lack of function to errors, clicks and speed errors. In order to transfer audio from one machine to another the machines generally have to be operating at the same sampling frequency, and may need to be locked to a common reference. At the very least the receiving device needs to be able to lock to the sending device’s sample clock. In such cases, timecode is not usually adequate as a reference signal and a more accurate clock signal that relates to digital audio samples is required. In many cases timecode and sample frequency synchronization go hand in hand, the timecode providing a positional reference and the sample or word clock providing a fine-grained reference point for the individual audio samples.

SMPTE/EBU TIMECODE

The American Society of Motion Picture and Television Engineers proposed a system to facilitate the accurate editing of video tape in 1967. This became known as SMPTE (‘simpty’) code, and it is basically a continuously running eight-digit clock registering time from an arbitrary start point (which may be the time of day) in hours, minutes, seconds and frames, against which the program runs. The clock information was encoded into a signal which could be recorded on the audio track of a tape. Every single frame on a particular video tape had its own unique number called the timecode address and this could be used to pinpoint a precise editing position.

A number of frame rates are used, depending on the television standard to which they relate, the frame rate being the number of still frames per second used to give the impression of continuous motion: 30 frames per second (fps), or true SMPTE, was used for monochrome American television and for CD mastering in the Sony 1630 format (see Chapter 9); 29.97 fps is used for color NTSC television (mainly USA, Japan and parts of the Middle East), and is called ‘SMPTE drop-frame’ (see Fact File 15.1); 25 fps is used for PAL and SECAM TV and is called ‘EBU’ (Europe, Australia, etc.); and 24 fps is used for some film work.

FACT FILE 15.1 DROP-FRAME TIMECODE

When color TV (NTSC standard) was introduced in the USA it proved necessary to change the frame rate of TV broadcasts slightly in order to accommodate the color information within the same spectrum. The 30 fps of monochrome TV, originally chosen so as to lock to the American mains frequency of 60 Hz, was thus changed to 29.97fps, since there was no longer a need to maintain synchronism with the mains owing to improvements in oscillator stability. In order that 30 fps timecode could be made synchronous with the new frame rate it became necessary to drop two frames every minute, except for every tenth minute, which resulted in minimal long-term drift between timecode and picture (75 ms over 24 hours). The drift in the short-term gradually increased towards the minute boundaries and was then reset.

A flag is set in the timecode word to denote NTSC drop-frame timecode. This type of code should be used for all applications where the recording might be expected to lock to an NTSC video program.

Each timecode frame is represented by an 80 bit binary ‘word’, split principally into groups of 4 bits, with each 4 bits representing a particular parameter such as tens of hours, units of hours, and so forth, in BCD (binary-coded decimal) form (see Figure 15.1). Sometimes, not all 4 bits per group are required – the hours only go up to ‘23’, for example – and in these cases the remaining bits are either used for special control purposes or set to zero (unassigned): 26 bits in total are used for time address information to give each frame its unique hours, minutes, seconds, frame value; 32 are ‘user bits’ and can be used for encoding information such as reel number, scene number, day of the month and the like; bit 10 can denote drop-frame mode if a binary 1 is encoded there, and bit 11 can denote color frame mode if a binary 1 is encoded. The end of each word consists of 16 bits in a unique sequence, called the ‘sync word’, and this is used to mark the boundary between one frame and the next. It also allows the reader to tell in which direction the code is being read, since the sync word begins with 11 in one direction and 10 in the other.

If this data is to be recorded as an audio signal it is modulated in a simple scheme known as ‘bi-phase mark’, or FM, such that a transition from one state to the other (low to high or high to low) occurs at the edge of each bit period, but an additional transition is forced within the period to denote a binary 1 (see Figure 15.2). The result looks like a square wave with two frequencies, depending on the presence of ones and zeros in the code. Depending on the frame rate, the maximum frequency of square wave contained within the timecode signal is either 2400 Hz (80 bits × 30 fps) or 2000 Hz (80 bits × 25 fps), and the lowest frequency is either 1200 Hz or 1000 Hz, and thus it may easily be recorded on an audio machine. The code can be read forwards or backwards, and phase inverted. Readers are available which will read timecode over a very wide range of speeds, from around 0.1 to 200 times play speed. The rise-time of the signal, that is the time it takes to swing between its two extremes, is specified as 25 μs ± 5 μs, and this requires an audio bandwidth of about 10 kHz.

FIGURE 15.1 The data format of an SMPTE/EBU longitudinal timecode frame.

There is another form of timecode known as VITC (Vertical Interval Timecode), used widely in VTRs. VITC is recorded not on an audio track, but in the vertical sync period of a video picture, such that it can always be read when video is capable of being read, such as in slow-motion and pause modes. This will not be covered further here.

RECORDING TIMECODE

In the days of tape, timecode was recorded or ‘striped’ onto tape as an audio signal before, during or after the program material was recorded, depending on the application. In the case of disk-based digital systems the timecode is not usually recorded as an audio signal, but a form of time-stamping can be used to indicate the start times of audio files, from which an ongoing timecode can be synthesized, locked to the sampling frequency. The timecode should be locked to the same speed reference as that used to lock the speed of the tape machine or the sampling frequency of a digital system, otherwise a long-term drift can build up between one and the other. Such a reference is usually provided in the form of a video composite sync signal (or black and burst signal), and video sync inputs are often provided on digital tape recorders for this purpose. An alternative is to use a digital audio word clock signal, and this should also be locked to video syncs if they are present.

Timecode generators are available in a number of forms, either as stand-alone devices (such as that pictured in Figure 15.3), as part of a synchronizer or editor, or as part of a recording system. In large centers timecode is sometimes centrally distributed and available on a jackfield point. When generated externally, timecode normally appears as an audio signal on an XLR connector or jack, and this should be routed to the timecode input of any slave systems (slaves are devices expected to lock to the master timecode). Most generators allow the user to preset the start time and the frame-rate standard.

On tape timecode was often recorded onto an outside track of a multitrack machine (usually track 24), or a separate timecode or cue track was provided on digital audio or video machines. The signal was recorded at around 10 dB below reference level, and crosstalk between tracks or cables was often a problem due to the very audible mid-frequency nature of timecode. Some quarter-inch analog machines had a facility for recording timecode in a track which runs down the center of the guard band in the NAB track format (see Chapter 6). This was called ‘center-track timecode’, and a head arrangement similar to that shown in Figure 15.4 was used for recording and replay. Normally separate heads were used for recording timecode to those for audio, to avoid crosstalk, although some manufacturers circumvented this problem and used the same heads. In the former case a delay line was used to synchronize timecode and audio on the tape.

FIGURE 15.2 Linear timecode data is modulated before recording using a scheme known as ‘bi-phase mark’ or FM (frequency modulation). A transition from high to low or low to high occurs at every bit-cell boundary, and a binary ‘1’ is represented by an additional transition within a bit cell.

Professional R-DAT machines were capable of recording timecode, this being converted internally into a DAT running-time code which was recorded in the subcode area of the digital recording. On replay, any frame rate of timecode could be derived, no matter what was used during recording, which was useful in mixed-standard environments.

In mobile film and video work which often employs separate machines for recording sound and picture it is necessary to stripe timecode on both the camera’s tape or film and on the audio tape. This can be done by using the same timecode generator to feed both machines, but more usually each machine will carry its own generator and the clocks will be synchronized at the beginning of each day’s shooting, both reading absolute time of day. Highly stable crystal control ensures that sync between the clocks will be maintained throughout the day, and it does not then matter whether the two (or more) machines are run at different times or for different lengths of time because each frame has a unique time of day address code which enables successful post-production syncing.

The code should run for around 20 seconds or more before the program begins in order to give other machines and computers time to lock in. If the program is spread over several reels, the timecode generator should be set and run such that no number repeats itself anywhere throughout the reels, thus avoiding confusion during post-production. Alternatively the reels can be separately numbered.

FIGURE 15.3 A stand-alone timecode generator. (Courtesy of Avitel Electronics Ltd.)

FIGURE 15.4 The center-track timecode format on quarter-inch tape. (a) Delays are used to record and replay a timecode track in the guard band using separate heads. (Alternatively, specially engineered combination heads may be used.) (b) Physical dimensions of the center-track timecode format.

MACHINE SYNCHRONIZERS

Overview

A machine synchronizer was a device that read timecode from two or more machines and controlled the speeds of ‘slave’ machines so that their timecodes ran at the same rate as the ‘master’ machine. It did this by modifying the capstan speed of the slave machines, using an externally applied speed reference signal, usually in the form of a 19.2 kHz square wave whose frequency is used as a reference in the capstan servo circuit (see Figure 15.5). Such a synchronizer would be microprocessor controlled, and could incorporate offsets between the master and slave machines, programmed by the user. Often it would store pre-programmed points for such functions as record drop-in, drop-out, looping and autolocation, for use in post-production. Now that dedicated audio tape recorders are not so common these synchronization functions are often found as integral features of digital workstation software or dedicated disk recorders.

FIGURE 15.5 Capstan speed control is often effected using a servo circuit similar to this one. The frequency of a square wave pulse generated by the capstan tachometer is compared with an externally generated pulse of nominally the same frequency. A signal based on the difference between the two is used to drive the capstan motor faster or slower.

FIGURE 15.6 A simple chase synchronizer will read timecode, direction and tachometer information from the master, compare it with the slave’s position and control the slave accordingly until the two timecodes are identical (plus or minus any entered offset).

Chase synchronizer

A simple chase synchronizer could simply be a box with a timecode input for master and slave machines and a remote control interface for each machine (see Figure 15.6). Such a synchronizer was designed to cause the slave to follow the master wherever it went, like a faithful hound. If the master went into fast forward so did the slave, the synchronizer keeping the position of the slave as close as possible to the master, and when the master went back into play the synchronizer parked the slave as close as possible to the master position and then dropped it into play, adjusting the capstan speed to lock the two together.

Systems varied as to what they would do if the master timecode dropped out or jumped in time. In the former case most synchronizers waited a couple of seconds or so before stopping the slave, and in the latter case they tried to locate the slave to the new position (this depends on the type of lock employed, as discussed in Fact File 15.2).

FACT FILE 15.2 TYPES OF LOCK

Frame lock or absolute lock

This term or a similar term is used to describe the mode in which a synchronizer works on the absolute time values of master and slave codes. If the master jumps in time, due to a discontinuous edit, for example, then so does the slave, often causing the slave to spool off the end of the reel if it does not have such a value on the tape.

Phase lock or sync lock

These terms are often used to describe a mode in which the synchronizer initially locks to the absolute value of the timecode on master and slaves, switching thereafter to a mode in which it simply locks to the frame edges of all machines, looking at the sync word in the timecode and ignoring the absolute value. This is useful if discontinuities in the timecode track are known or anticipated, and ensures that a machine will not suddenly drop into a fast spool mode during a program.

Slow and fast relock

After initial lock is established, a synchronizer may lose lock due to a timecode drop-out or discontinuity in timecode phase. In fast relock mode the synchronizer will attempt to relock the machines as quickly as possible, with no concern for the audible effects of pitch slewing. In slow relock mode, the machines will relock more slowly at a rate intended to be inaudible.

Full-featured synchronizer

In post-production operations a controller was often required which offered more facilities than the simple chase synchronizer. Such a device allowed for multiple machines to be controlled from a single controller, perhaps using a computer network link to communicate commands from the controller to the individual tape machines. In some ‘distributed intelligence’ systems, each tape machine had a local chase synchronizer which communicated with the controller, the controller not being a synchronizer but a ‘command center’. The ESbus was a remote control bus used in such applications, designed to act as a remote control bus for audio and video equipment.

The sync controller in such a system offered facilities for storing full edit decision lists (EDLs) containing the necessary offsets for each slave machine and the record drop-in and drop-out points for each machine. This could be used for jobs such as automatic dialog replacement (ADR), in which sections of a program could be set to loop with a pre-roll (see Fact File 15.3) and drop-in at the point where dialog on a film or video production was to be replaced. A multitrack recorder could be used as a slave, being dropped in on particular tracks to build up a sound master tape. Music and effects would then be overdubbed.

In locked systems involving video equipment the master machine was normally the video machine, and the slaves were audio machines. This was because it was easier to synchronize audio machines, and because video machines needed to be locked to a separate video reference which dictated their running speed. In cases involving multiple video or digital audio machines, none of the machines was designated the master, and all machines slaved to the synchronizer which acts as the master. Its timecode generator was locked to the house video or audio reference, and all machines locked to its timecode generator. This technique was also used in video tape editing systems.

FACT FILE 15.3 SYNCHRONIZER TERMINOLOGY

Pre-roll

The period prior to the required lock point, during which machines play and are synchronized. Typically machines park about 5 seconds before the required lock point and then pre-roll for 5 seconds, after which it is likely that the synchronizer will have done its job. It is rare not to be able to lock machines in 5 seconds, and often it can be faster.

Post-roll

The period after a programmed record drop-out point during which machines continue to play in synchronized fashion.

Loop

A programmed section of tape which is played repeatedly under automatic control, including a pre-roll to lock the machines before each pass over the loop.

Drop-in and drop-out

Points at which the controller or synchronizer executes a pre-programmed record drop-in or drop-out on a selected slave machine. This may be at the start and end of a loop.

Offset

A programmed timecode value which offsets the position of a slave in relation to the master, in order that they lock at an offset. Often each slave may have a separate offset.

Nudge

Occasionally it is possible to nudge a slave’s position frame by frame with relation to the master once it has gained lock. This allows for small adjustments to be made in the relative positions of the two machines.

Bit offset

Some synchronizers allow for offsets of less than one frame, with resolution down to one-eightieth of a frame (one timecode bit).

DIGITAL AUDIO SYNCHRONIZATION

Requirements for digital audio synchronization

Unlike analog audio, digital audio has a discrete-time structure, because it is a sampled signal in which the samples may be further grouped into frames and blocks having a certain time duration. If digital audio devices are to communicate with each other, or if digital signals are to be combined in any way, then they need to be synchronized to a common reference in order that the sampling frequencies of the devices are identical and do not drift with relation to each other. It is not enough for two devices to be running at nominally the same sampling frequency (say, both at 44.1 kHz). Between the sampling clocks of professional audio equipment it is possible for differences in frequency of up to ± 10 parts per million (ppm) to exist and even a very slow drift means that two devices are not truly synchronous. Consumer devices can exhibit an even greater range of sampling frequencies that are nominally the same.

The audible effect resulting from a non-synchronous signal drifting with relation to a sync reference or another signal is usually the occurrence of a glitch or click at the difference frequency between the signal and the reference, typically at an audio level around 50 dB below the signal, due to the repetition or dropping of samples. This will appear when attempting to mix two digital audio signals whose sampling rates differ by a small amount, or when attempting to decode a signal such as an unlocked consumer source by a professional system which is locked to a fixed reference. This said, it is not always easy to detect asynchronous operation by listening, even though sample slippage is occurring, as it depends on the nature of audio signal at the time. Some systems may not operate at all if presented with asynchronous signals.

Furthermore, when digital audio is used with analog or digital video, the sampling rate of the audio needs to be locked to the video reference signal and to any timecode signals which may be used. In single studio operations the problem of ensuring lock to a common clock is not as great as it is in a multi-studio center, or where digital audio signals arrive from remote locations. In distributed system cases either the remote signals must be synchronized to the local sample clock as they arrive, or the remote studio must somehow be fed with the same reference signal as the local studio.

Timecode synchronization of audio workstations

The most common synchronization requirement is for replay to be locked to a source of SMPTE/EBU timecode, because this is used universally as a timing reference in audio and video recording. A number of desktop workstations that have MIDI features lock to MIDI TimeCode (MTC), which is a representation of SMPTE/EBU timecode in the form of MIDI messages (see below).

Systems vary as to the nature of timecode synchronization. External timecode may simply be used as a timing reference against which sound file replay is triggered, or the system may slave to external timecode for the duration of replay. In some cases these modes are switchable because they both have their uses. In the first case replay is simply ‘fired off’ when a particular timecode is registered, and in such a mode no long-term relationship is maintained between the timecode and the replayed audio. This may be satisfactory for some basic operations but is likely to result in a gradual drift between audio replay and the external reference if files longer than a few seconds are involved. It may be useful though, because replay remains locked to the workstation’s internal clock reference, which may be more stable than external references, potentially leading to higher audio quality from the system’s convertors. Some cheaper systems do not ‘clean up’ external clock signals very well before using them as the sample clock for D/A conversion, and this can seriously affect audio quality. In the second mode a continuous relationship is set up between timecode and audio replay, such that long-term lock is achieved and no drift is encountered. This is more difficult to achieve because it involves the continual comparison of timecode to the system’s internal timing references and requires that the system follows any drift or jump in the timecode. Jitter in the external timecode is very common, especially if this timecode derives from a video tape recorder, and this should be minimized in any sample clock signals derived from the external reference. This is normally achieved by the use of a high-quality phase-locked loop, often in two stages. Wow and flutter in the external timecode can be smoothed out using suitable time constants in the software that convert timecode to sample address codes, such that short-term changes in speed are not always reflected in the audio output but longer-term drifts are.

Sample frequency conversion can be employed at the digital audio outputs of a system to ensure that changes in the internal sample rate caused by synchronization action are not reflected in the output sampling rate. This may be required if the system is to be interfaced digitally to other equipment in an all-digital studio.

Digital audio signal synchronization

In all-digital systems it is necessary for there to be a fixed sampling frequency, to which all devices in the system lock. This is so that digital audio from one device can be transferred directly to others without conversion to analog or loss of quality, or so that signals from different sources can be processed together. In systems involving video it is often necessary for the digital audio sampling frequency to be locked to the video frame rate and for timecode to be locked to this as well. The relationship between audio sampling rates and video frame rates is discussed in Fact File 15.4.

In very simple digital audio systems it is possible to use one device in the system, such as a mixing console, to act as the sampling frequency reference for the other devices. For example, many digital audio devices will lock to the sample clock contained in AES-3-format signals (see Chapter 10) arriving at their input. This is sometimes called ‘genlock’. This can work if the system primarily involves signal flow in one direction, or is a daisy-chain of devices locked to each other. However, such setups can become problematic when loops are formed and it becomes unclear what is synchronizing what.

FACT FILE 15.4 RELATIONSHIPS BETWEEN VIDEO FRAME RATES AND AUDIO SAMPLING RATES

People using the PAL or SECAM television systems are fortunate in that there is a simple integer relationship between the sampling frequency of 48 kHz used in digital audio systems for TV and the video frame rate of 25 Hz (there are 1920 samples per frame). There is also a simple relationship between the other standard sampling frequencies of 44.1 and 32 kHz and the PAL/SECAM frame rate. Users of NTSC TV systems (such as the USA and Japan) are less fortunate because the TV frame rate is 30/1.001 (roughly 29.97) frames per second, resulting in a non-integer relationship with standard audio sampling frequencies. The sampling frequency of 44.056 kHz was introduced in early digital audio recording systems that used NTSC VTRs, as this resulted in an integer relationship with the frame rate. For a variety of historical reasons it is still quite common to encounter so-called pull-down’ sampling frequencies in video environments using the NTSC frame rate, these being 1/1.001 times the standard sampling frequencies.

Most professional audio equipment now has external sync inputs to enable each device to lock to an external reference signal of some kind. Typical sync inputs are word clock (WCLK), which is normally a square-wave TTL-level signal (0–5V) at the sampling rate, usually available on a BNC-type connector; ‘composite video’, which is a video reference signal consisting of either normal picture information or just ‘black and burst’ (a video signal with a blacked-out picture), or a proprietary sync signal such as the optional Alesis sync connection or the LRCK in the Tascam interface (see Chapter 10). WCLK may be ‘daisy-chained’ (looped through) between devices in cases where the AES/EBU interface is not available. Digidesign’s ProTools system also uses a so-called ‘Super Clock’ signal at a multiple of 256 times the sampling rate for slaving devices together with low sampling jitter. This is a TTL level (0–5V) signal on a BNC connector. In all cases one machine or source must be considered to be the ‘master’, supplying the sync reference to the whole system, and the others as ‘slaves’. An alternative to this is a digital audio sync signal such as word clock or an AES-11 standard sync reference (a stable AES-3-format signal, without any audio). Such house sync signals are usually generated by a central sync pulse generator (SPG) that resides in a machine room, whose outputs are widely distributed using digital distribution amplifiers to equipment requiring reference signals. In large systems a central SPG is really the only satisfactory solution. A diagram showing the typical relationship between synchronization signals is shown in Figure 15.7

FIGURE 15.7 In video environments all synchronization signals should be locked to a common clock, as shown here.

For integrating basic audio devices without external clock reference inputs into synchronized digital systems it is possible to employ an external sample frequency convertor that is connected to the digital audio outputs of the device. This convertor can then be locked to the clock reference so that audio from the problematic device can be made to run at the same sampling frequency as the rest of the system.

Sample clock jitter and effects on sound quality

Short-term timing irregularities in sample clocks may affect sound quality in devices such as A/D and D/A convertors and sampling frequency convertors. This is due to modulation in the time domain of the sample instant, resulting in low distortion and noise products within the audio spectrum. This makes it crucial to ensure stable jitter-free clock signals at points in a digital audio system where conversion to and from the analog domain is carried out. In a professional digital audio system, especially in areas where high-quality conversion is required, it is advisable either to re-clock any reference signal or to use a local high-quality reference generator, slaved to the central SPG, with which to clock any A/D or D/A convertors.

MIDI AND SYNCHRONIZATION

Introduction to MIDI synchronization

An important aspect of MIDI control is the handling of timing and synchronization data. MIDI timing data takes the place of the various older standards for synchronization on drum machines and sequencers that used separate ‘sync’ connections carrying a clock signal at one of a number of rates, usually described in pulses-per-quarter-note (ppqn). There used to be a considerable market for devices to convert clock signals from one rate to another, so that one manufacturer’s drum machine could lock to another’s sequencer, but MIDI has supplanted these by specifying standard synchronization data that shares the same data stream as note and control information.

Not all devices in a MIDI system will need access to timing information – it depends on the function fulfilled by each device. A sequencer, for example, will need some speed reference to control the rate at which recorded information is replayed and this speed reference could either be internal to the computer or provided by an external device. On the other hand, a normal synthesizer, effects unit or sampler is not usually concerned with timing information, because it has no functions affected by a timing clock. Such devices do not normally store rhythm patterns, although there are some keyboards with onboard sequencers that ought to recognize timing data.

As MIDI equipment has become more integrated with audio and video systems the need has arisen to incorporate timecode handling into the standard and into software. This has allowed sequencers to operate relative either to musical time (e.g. bars and beats) or to ‘real’ time (e.g. minutes and seconds). Using timecode, MIDI applications can be run in sync with the replay of an external audio or video machine, in order that the longterm speed relationship between the MIDI replay and the machine remains constant. Also relevant to the systems integrator is the MIDI Machine Control standard that specifies a protocol for the remote control of devices such as external recorders using a MIDI interface.

Music-related timing data

This section describes the group of MIDI messages that deals with ‘music-related’ synchronization – that is synchronization related to the passing of bars and beats as opposed to ‘real’ time in hours, minutes and seconds. It is normally possible to choose which type of sync data will be used by a software package or other MIDI receiver when it is set to ‘external sync’ mode.

A group of system messages called the ‘system real-time’ messages control the execution of timed sequences in a MIDI system and these are often used in conjunction with the song position pointer (SPP, which is really a system common message) to control autolocation within a stored song. The system real-time messages concerned with synchronization, all of which are single bytes, are:

&F8 Timing clock

&FA Start

&FB Continue

&FC Stop

The timing clock (often referred to as ‘MIDI beat clock’) is a single status byte (& F8) to be issued by the controlling device six times per MIDI beat. A MIDI beat is equivalent to a musical semiquaver or sixteenth note (see Table 15.1) so the increment of time represented by a MIDI clock byte is related to the duration of a particular musical value, not directly to a unit of real time. Twenty-four MIDI clocks are therefore transmitted per quarter note, unless the definition is changed. (Some software packages allow the user to redefine the notated musical increment represented by MIDI clocks.) At any one musical tempo, a MIDI beat could be said to represent a fixed increment of time, but this time increment would change if the tempo changed.

The ‘start’, ‘stop’ and ‘continue’ messages are used to remotely control the receiver’s replay. A receiver should only begin to increment its internal clock or song pointer after it receives a start or continue message, even though some devices may continue to transmit MIDI clock bytes in the intervening periods. For example, a sequencer may be controlling a number of keyboards, but it may also be linked to a drum machine that is playing back an internally stored sequence. The two need to be locked together, so the sequencer (running in internal sync mode) would send the drum machine (running in external sync mode) a ‘start’ message at the beginning of the song, followed by MIDI clocks at the correct intervals thereafter to keep the timing between the two devices correctly related. If the sequencer was stopped it would send ‘stop’ to the drum machine, whereafter ‘continue’ would carry on playing from the stopped position, and ‘start’ would restart at the beginning. This method of synchronization appears to be fairly basic, as it allows only for two options: playing the song from the beginning or playing it from where it has been stopped.

Table 15.1 Musical durations related to MIDI timing data

Note value	Number of MIDI beats	Number of MIDI clocks
Semibreve (whole note)	16	96
Minim (half note)	8	48
Crotchet (quarter note)	4	24
Quaver (eighth note)	2	12
Semiquaver (sixteenth note)	1	6

SPPs are used when one device needs to tell another where it is in a song. A sequencer or synchronizer should be able to transmit song pointers to other synchronizable devices when a new location is required or detected. For example, one might ‘fast-forward’ through a song and start again 20 bars later, in which case the other timed devices in the system would have to know where to restart. An SPP would be sent followed by ‘continue’ and then regular clocks. An SPP represents the position in a stored song in terms of number of MIDI beats (not clocks) from the start of the song. It uses two data bytes so can specify up to 16 384 MIDI beats. SPP is a system common message, not a real-time message. It is often used in conjunction with & F3 (song select), used to define which of a collection of stored song sequences (in a drum machine, say) is to be replayed. SPPs are fine for directing the movements of an entirely musical system, in which every action is related to a particular beat or subdivision of a beat, but not so fine when actions must occur at a particular point in real time. If, for example, one was using a MIDI system to dub music and effects to a picture in which an effect was intended to occur at a particular visual event, that effect would have to maintain its position in time no matter what happened to the music. If the effect was to be triggered by a sequencer at a particular number of beats from the beginning of the song, this point could change in real time if the tempo of the music was altered slightly to fit a particular visual scene. Clearly some means of real-time synchronization is required either instead of, or as well as, the clock and song pointer arrangement, such that certain events in a MIDI-controlled system may be triggered at specific times in hours, minutes and seconds.

Recent software may recognize and be able to generate the bar marker and time signature messages. The bar marker message can be used where it is necessary to indicate the point at which the next musical bar begins. It takes effect at the next & F8 clock. Some MIDI synchronizers will also accept an audio input or a tap switch input so that the user can program a tempo track for a sequencer based on the rate of a drum beat or a rate tapped in using a switch. This can be very useful in synchronizing MIDI sequences to recorded music, or fitting music which has been recorded ‘rubato’ to bar intervals.

MIDI timecode (MTC)

MIDI timecode has two specific functions. First, to provide a means for distributing conventional SMPTE/EBU timecode data (see above) around a MIDI system in a format that is compatible with the MIDI protocol. Second, to provide a means for transmitting ‘setup’ messages that may be downloaded from a controlling computer to receivers in order to program them with cue points at which certain events are to take place. The intention is that receivers will then read incoming MTC as the program proceeds, executing the pre-programmed events defined in the setup messages. Sequencers and some digital audio systems often use MIDI timecode derived from an external synchronizer or MIDI peripheral when locking to video or to another sequencer. MTC is an alternative to MIDI clocks and song pointers, for use when real-time synchronization is important.

There are two types of MTC synchronizing message: one that updates a receiver regularly with running timecode and another that transmits one-time updates of the timecode position. The latter can be used during high-speed cueing, where regular updating of each single frame would involve too great a rate of transmitted data. The former is known as a quarter-frame message (see Fact File 15.5), denoted by the status byte (& F1), whilst the latter is known as a full-frame message and is transmitted as a universal real-time SysEx message.

SYNCHRONIZING AUDIO/MIDI COMPUTER APPLICATIONS

It is increasingly common for multiple audio/MIDI applications to run simultaneously on the same workstation, and in such cases they may need to run in sync with each other. Furthermore it may be necessary to have audio and MIDI connections between the applications. Some while back a number of these functions were carried out by proprietary ‘middleware’ such as the Opcode Music System (OMS), but a more recent system that handles many of these functions as well as full audio routing and synchronization between applications is ‘Rewire’. This technology, developed by Propellerhead Systems, enables the internal real-time streaming of up to 256 audio channels and 4080 MIDI channels between applications, as well as high precision inter-application synchronization and the communication of common transport commands such as stop, play, record and rewind. One application is designated as the master and others as slaves.

FACT FILE 15.5 QUARTER-FRAME MTC MESSAGES

One timecode frame is represented by too much information to be sent in one standard MIDI message, so it is broken down into eight separate messages. Each message of the group of eight represents a part of the timecode frame value, as shown in the figure below, and takes the general form:

&[F1][DATA]

The data byte begins with zero (as always), and the next 7 bits of the data word are made up of a 3 bit code defining whether the message represents hours, minutes, seconds or frames, MSnibble or LSnibble, followed by the 4 bits representing the binary value of that nibble. In order to reassemble the correct timecode value from the eight quarter-frame messages, the LS and MS nibbles of hours, minutes, seconds and frames are each paired within the receiver to form 8 bit words as follows:

Frames: rrr qqqqq

where ‘rrr’ is reserved for future use and ‘qqqqq’ represents the frames value from 0 to 29;

Seconds: rr qqqqqq

where ‘rr’ is reserved for future use and ‘qqqqqq’ represents the seconds value from 0 to 59;

Minutes: rr qqqqqq

as for seconds; and

Hours: r qq ppppp

where ‘r’ is undefined, ‘qq’ represents the timecode type, and ‘ppppp’ is the hours value from 0 to 23. The timecode frame rate is denoted as follows in the ‘qq’ part of the hours value: 00 24 fps; 01 25 fps; 10 30 fps drop-frame; 11 30 fps non-drop-frame. Unassigned bits should be set to zero.

Table of Contents for
Chapter 15 Synchronization

CHAPTER 15

Synchronization

SMPTE/EBU TIMECODE

RECORDING TIMECODE