Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

4 MIDI and synthetic audio control

MIDI is the Music Instrument Digital Interface, a control protocol and interface standard for electronic musical instruments that has also been used widely in other music and audio products. Although it is relatively dated by modern standards it is still used extensively, which is something of a testament to its success. Even if the MIDI hardware interface is used less these days, either because more synthesis, sampling and processing takes place using software within the workstation, or because other data interfaces such as USB and Firewire are becoming popular, the protocol for communicating events and other control information is still widely encountered. A lot of software that runs on computers uses MIDI as a basis for controlling the generation of sounds and external devices.

Synthetic audio is used increasingly in audio workstations and mobile devices as a very efficient means of audio representation, because it only requires control information and sound object descriptions to be transmitted. Standards such as MPEG-4 Structured Audio enable synthetic audio to be used as an alternative or an addition to natural audio coding and this can be seen as a natural evolution of the MIDI concept in interactive multimedia applications.

4.1 Background

Electronic musical instruments existed widely before MIDI was developed in the early 1980s, but no universal means existed of controlling them remotely. Many older musical instruments used analogue voltage control, rather than being controlled by a microprocessor, and thus used a variety of analog remote interfaces (if indeed any facility of this kind was provided at all). Such interfaces commonly took the form of one port for timing information, such as might be required by a sequencer or drum machine, and another for pitch and key triggering information, as shown in Figure 4.1. The latter, commonly referred to as ‘CV and gate’, consisted of a DC (direct current) control line carrying a variable control voltage (CV) which was proportional to the pitch of the note, and a separate line to carry a trigger pulse. A common increment for the CV was 1 volt per octave (although this was by no means the only approach) and notes on a synthesiser could be triggered remotely by setting the CV to the correct pitch and sending a ‘note on’ trigger pulse which would initiate a new cycle of the synthesiser’s envelope generator. Such an interface would deal with only one note at a time, but many older synths were only monophonic in any case (that is, they were only capable of generating a single voice).

Figure 4.1 Prior to MIDI control, electronic musical instruments tended to use a DC remote interface for pitch and note triggering. A second interface handled a clock signal to control tempo and trigger pulses to control the execution of a stored sequence

Instruments with onboard sequencers would need a timing reference in order that they could be run in synchronisation with other such devices, and this commonly took the form of a square pulse train at a rate related to the current musical tempo, often connected to the device using a DIN-type connector, along with trigger lines for starting and stopping a sequence’s execution. There was no universal agreement over the rate of this external clock, and frequencies measured in pulses per musical quarter note (ppqn), such as 24 ppqn and 48 ppqn, were used by different manufacturers. A number of conversion boxes were available that divided or multiplied clock signals in order that devices from different manufacturers could be made to work together.

As microprocessor control began to be more widely used in musical instruments a number of incompatible digital control interfaces sprang up, promoted by the large synthesiser manufacturers, some serial and some parallel. Needless to say the plethora of non-standardised approaches to remote control made it difficult to construct an integrated system, especially when integrating equipment from different manufacturers. Owing to collaboration between the major parties in America and Japan, the way became cleared for agreement over a common hardware interface and command protocol, resulting in the specification of the MIDI standard in late 1982/early 1983. This interface grew out of an amalgamation of a proposed universal interface called USI (the Universal Synthesiser Interface) which was intended mainly for note on and off commands, and a Japanese specification which was rather more complex and which proposed an extensive protocol to cover other operations as well. Since MIDI’s introduction, the use of older remote interfaces has died away very quickly, but there remain available a number of specialised interfaces which may be used to interconnect non-MIDI equipment to MIDI systems by converting the digital MIDI commands into the type of analog information described above.

The standard has been subject to a number of addenda, extending the functionality of MIDI far beyond the original. The original specification was called the MIDI 1.0 specification, to which has been added such addenda as the MIDI Sample Dump protocol, MIDI Files, General MIDI (1 and 2), MIDI TimeCode, MIDI Show Control, MIDI Machine Control and Downloadable Sounds. The MIDI Manufacturers Association (MMA) seems now to be the primary association governing formal extensions to the standard, liaising closely with a Japanese association called AMEI (Association of Musical Electronics Industry).

4.2 What is MIDI?

MIDI is a digital remote control interface for music systems. It follows that MIDI-controlled equipment is normally based on microprocessor control, with the MIDI interface forming an I/O port. It is a measure of the popularity of MIDI as a means of control that it has now been adopted in many other audio and visual systems, including the automation of mixing consoles, the control of studio outboard equipment, the control of lighting equipment and of other studio machinery. Although many of its standard commands are music related, it is possible either to adapt music commands to non-musical purposes or to use command sequences designed especially for alternative methods of control.

The adoption of a serial standard for MIDI was dictated largely by economic and practical considerations, as it was intended that it should be possible for the interface to be installed on relatively cheap items of equipment and that it should be available to as wide a range of users as possible. A parallel system might have been more professionally satisfactory, but would have involved a considerable manufacturing cost overhead per MIDI device, as well as parallel cabling between devices, which would have been more expensive and bulky than serial interconnection. The simplicity and ease of installation of MIDI systems has been largely responsible for its rapid proliferation as an international standard.

Unlike its analog predecessors, MIDI integrates timing and system control commands with pitch and note triggering commands, such that everything may be carried in the same format over the same piece of wire. MIDI makes it possible to control musical instruments polyphonically in pseudo real time: that is, the speed of transmission is such that delays in the transfer of performance commands are not audible in the majority of cases. It is also possible to address a number of separate receiving devices within a single MIDI data stream, and this allows a controlling device to determine the destination of a command.

4.3 MIDI and digital audio contrasted

For many the distinction between MIDI and digital audio may be a clear one, but those new to the subject often confuse the two. Any confusion is often due to both MIDI and digital audio equipment appearing to perform the same task – that is the recording of multiple channels of music using digital equipment – and is not helped by the way in which some manufacturers refer to MIDI sequencing as digital recording.

Figure 4.2 (a) Digital audio recording and (b) MIDI recording contrasted. In (a) the sound waveform itself is converted into digital data and stored, whereas in (b) only control information is stored, and a MIDI-controlled sound generator is required during replay

Digital audio involves a process whereby an audio waveform (such as the line output of a musical instrument) is sampled regularly and then converted into a series of binary words that represent the sound waveform, as described in Chapter 2. A digital audio recorder stores this sequence of data and can replay it by passing the original data through a digital-to-analog convertor that turns the data back into a sound waveform, as shown in Figure 4.2. A multitrack recorder has a number of independent channels that work in the same way, allowing a sound recording to be built up in layers. MIDI, on the other hand, handles digital information that controls the generation of sound. MIDI data does not represent the sound waveform itself. When a multitrack music recording is made using a MIDI sequencer (see Chapter 7) this control data is stored, and can be replayed by transmitting the original data to a collection of MIDI-controlled musical instruments. It is the instruments that actually reproduce the recording.

A digital audio recording, then, allows any sound to be stored and replayed without the need for additional hardware. It is useful for recording acoustic sounds such as voices, where MIDI is not a great deal of help. A MIDI recording is almost useless without a collection of sound generators. An interesting advantage of the MIDI recording is that, since the stored data represents event information describing a piece of music, it is possible to change the music by changing the event data. MIDI recordings also consume a lot less memory space than digital audio recordings. It is also possible to transmit a MIDI recording to a different collection of instruments from those used during the original recording, thus resulting in a different sound. It is now common for MIDI and digital audio recording to be integrated in one software package, allowing the two to be edited and manipulated in parallel.

4.4 Basic MIDI principles

4.4.1 System specifications

The MIDI hardware interface and connections are described in Chapter 5. MIDI is a serial interface, running at a relatively slow rate by modern standards, over which control messages are sent as groups of bytes. Each byte is preceded by one start bit and followed by one stop bit per byte in order to synchronise reception of the data which is transmitted asynchronously, as shown in Figure 4.3. The addition of start and stop bits means that each 8-bit word actually takes 10 bit periods to transmit (lasting a total of 320 μs). Standard MIDI messages typically consist of one, two or three bytes, although there are longer messages for some purposes that will be covered later in this book.

Figure 4.3 A MIDI message consists of a number of bytes, each transmitted serially and asynchronously by a UART in this format, with a start and stop bit to synchronise the receiving UART. The total period of a MlDl data byte, including start and stop bits, is 320

Figure 4.4 The simplest form of MIDI interconnection involves connecting two instruments together as shown

4.4.2 Simple interconnection

In the simplest MIDI system, one instrument could be connected to another as shown in Figure 4.4. Here, instrument 1 sends information relating to actions performed on its own controls (notes pressed, pedals pressed, etc.) to instrument 2, which imitates these actions as far as it is able. This type of arrangement can be used for ‘doubling-up’ sounds, ‘layering’ or ‘stacking’, such that a composite sound can be made up from two synthesisers’ outputs. (The audio outputs of the two instruments would have to be mixed together for this effect to be heard.) Larger MIDI systems could be built up by further ‘daisy-chaining’ of instruments, such that instruments further down the chain all received information generated by the first (see Figure 4.5), although this is not a very satisfactory way of building a large MIDI system. In large systems some form of central routing helps to avoid MIDI ‘traffic jams’ and simplifies interconnection.

Figure 4.5 Further instruments can be added using THRU ports as shown, in order that messages from instrument 1 may be transmitted to all the other instruments

4.4.3 MIDI channels

MIDI messages are made up of a number of bytes. Each part of the message has a specific purpose, and one of these is to define the receiving channel to which the message refers. In this way, a controlling device can make data device-specific – in other words it can define which receiving instrument will act on the data sent. This is most important in large systems that use a computer sequencer as a master controller, when a large amount of information will be present on the MIDI data bus, not all of which is intended for every instrument. If a device is set in software to receive on a specific channel or on a number of channels it will act only on information which is ‘tagged’ with its own channel numbers. Everything else it will usually ignore. There are 16 basic MIDI channels and instruments can usually be set to receive on any specific channel or channels (omni off mode), or to receive on all channels (omni on mode). The latter mode is useful as a means of determining whether anything at all is being received by the device.

Later it will be seen that the limit of 16 MIDI channels can be overcome easily by using multiport MIDI interfaces connected to a computer. In such cases it is important not to confuse the MIDI data channel with the physical port to which a device may be connected, since each physical port will be capable of transmitting on all 16 data channels.

4.4.4 Message format

There are two basic types of MIDI message byte: the status byte and the data byte. The first byte in a MIDI message is normally a status byte. Standard MIDI messages can be up to three bytes long, but not all messages require three bytes, and there are some fairly common exceptions to the rule which are described below. Table 4.1 shows the format and content of MIDI messages under each of the statuses.

Status bytes always begin with a binary one to distinguish them from data bytes, which always begin with a zero. Because the most significant bit (MSB) of each byte is reserved to denote the type (status or data) there are only seven active bits per byte which allows 2⁷ (that is 128) possible values. As shown in Figure 4.6, the first half of the status byte denotes the message type and the second half denotes the channel number. Because four bits of the status byte are set aside to indicate the channel number, this allows for 2⁴ (or 16) possible channels. There are only three bits to denote the message type, because the first bit must always be a one. This theoretically allows for eight message types, but there are some special cases in the form of system messages (see below).

Table 4.1 MIDI messages summarised

Figure 4.6 General format of a MIDI message. The ‘sss’ bits are used to define the message type, the ‘nnnn’ bits define the channel number, whilst the ‘xxxxxxx’ and ‘yyyyyyy’ bits carry the message data. See text for details

4.5 MIDI messages in detail

In this section the MIDI communication protocol will be examined in detail. The majority of the basic message types and their meanings will be explained. The descriptions here are not intended as an alternative to reading the MIDI documentation itself, but rather as a commentary on it and an explanation of it. It follows that examples will be given, but that the reader should refer to the standard for a full description of the protocol. The standard has been extended and refined over the years and the following is to be regarded as an introduction to the basic messages. The prefix ‘&’ will be used to indicate hexadecimal values throughout the discussion; individual MIDI message bytes will be delineated using square brackets, e.g. [&45], and channel numbers will be denoted using ‘n’ to indicate that the value may be anything from &0 to &F (channels 1 to 16).

The MMA has defined Approved Protocols (APs) and Recommended Practices (RPs). An AP is a part of the standard MIDI specification and is used when the standard is further defined or when a previously undefined command is defined, whereas an RP is used to describe an optional new MIDI application that is not a mandatory or binding part of the standard. Not all MIDI devices will have all the following commands implemented, since it is not mandatory for a device conforming to the MIDI standard to implement every possibility.

4.5.1 Channel and system messages contrasted

Two primary classes of message exist: those that relate to specific MIDI channels and those that relate to the system as a whole. One should bear in mind that it is possible for an instrument to be receiving in ‘omni on’ mode, in which case it will ignore the channel label and attempt to respond to anything that it receives.

Channel messages start with status bytes in the range &8n to &En (they start at hexadecimal eight because the MSB must be a one for a status byte). System messages all begin with &F, and do not contain a channel number. Instead the least significant nibble of the system status byte is used for further identification of the system message, such that there is room for 16 possible system messages running from &F0 to &FF. System messages are themselves split into three groups: system common, system exclusive and system realtime. The common messages may apply to any device on the MIDI bus, depending only on the device’s ability to handle the message. The exclusive messages apply to whichever manufacturer’s devices are specified later in the message (see below) and the realtime messages are intended for devices which are to be synchronised to the prevailing musical tempo. (Some of the so-called realtime messages do not really seem to deserve this appellation, as discussed below.) The status byte &F1 is used for MIDI TimeCode.

MIDI channel numbers are usually referred to as ‘channels one to sixteen’, but it can be appreciated that in fact the binary numbers that represent these run from zero to fifteen (&0 to &F), as fifteen is the largest decimal number which can be represented with four bits. Thus the note on message for channel 5 is actually &94 (nine for note on, and four for channel 5).

4.5.2 Note on and note off messages

Much of the musical information sent over a typical MIDI interface will consist of these two message types. As indicated by the titles, the note on message turns on a musical note, and the note off message turns it off. Note on takes the general format:

[&8n] [Note number] [Velocity]

and note off takes the form:

[&9n] [Note number] [Velocity] (although see Section 4.5.3)

A MIDI instrument will generate note on messages at its MIDI OUT corresponding to whatever notes are pressed on the keyboard, on whatever channel the instrument is set to transmit. Also, any note which has been turned on must subsequently be turned off in order for it to stop sounding, thus if one instrument receives a note on message from another and then loses the MIDI connection for any reason, the note will continue sounding ad infinitum. This situation can occur if a MIDI cable is pulled out during transmission.

Table 4.2 MIDI note numbers related to the musical scale

Musical note	MIDI note number
C–2	0
C–1	12
C0	24
C1	36
C2	48
C3 (middle C)	60 (Yamaha convention)
C4	72
C5	84
C6	96
C7	96
C7	108
C8	120
G8	127

MIDI note numbers relate directly to the western musical chromatic scale and the format of the message allows for 128 note numbers which cover a range of a little over ten octaves – adequate for the full range of most musical material. This quantisation of the pitch scale is geared very much towards keyboard instruments, being less suitable for other instruments and cultures where the definition of pitches is not so black and white. Nonetheless, means have been developed of adapting control to situations where unconventional tunings are required. Note numbers normally relate to the musical scale as shown in Table 4.2, although there is a certain degree of confusion here. Yamaha established the use of C3 for middle C, whereas others have used C4. Some software allows the user to decide which convention will be used for display purposes.

4.5.3 Velocity information

Note messages are associated with a velocity byte that is used to represent the speed at which a key was pressed or released. The former will correspond to the force exerted on the key as it is depressed: in other words, ‘how hard you hit it’ (called ‘note on velocity’). It is used to control parameters such as the volume or timbre of the note at the audio output of an instrument and can be applied internally to scale the effect of one or more of the envelope generators in a synthesiser. This velocity value has 128 possible states, but not all MIDI instruments are able to generate or interpret the velocity byte, in which case they will set it to a value half way between the limits, i.e.: 64₁₀. Some instruments may act on velocity information even if they are unable to generate it themselves. It is recommended that a logarithmic rather than linear relationship should be established between the velocity value and the parameter which it controls, since this corresponds more closely to the way in which musicians expect an instrument to respond, although some instruments allow customised mapping of velocity values to parameters. The note on, velocity zero value is reserved for the special purpose of turning a note off, for reasons which will become clear in Section 4.5.4. If an instrument sees a note number with a velocity of zero, its software should interpret this as a note off message.

Note off velocity (or ‘release velocity’) is not widely used, as it relates to the speed at which a note is released, which is not a parameter that affects the sound of many normal keyboard instruments. Nonetheless it is available for special effects if a manufacturer decides to implement it.

4.5.4 Running status

Running status is an accepted method of reducing the amount of data transmitted. It involves the assumption that once a status byte has been asserted by a controller there is no need to reiterate this status for each subsequent message of that status, so long as the status has not changed in between. Thus a string of note on messages could be sent with the note on status only sent at the start of the series of note data, for example:

[&9n] [Data] [Velocity] [Data] [Velocity] [Data] [Velocity]

For a long string of notes this could reduce the amount of data sent by nearly one third. But in most music each note on is almost always followed quickly by a note off for the same note number, so this method would clearly break down as the status would be changing from note on to note off very regularly, thus eliminating most of the advantage gained by running status. This is the reason for the adoption of note on, velocity zero as equivalent to a note off message, because it allows a string of what appears to be note on messages, but which is, in fact, both note on and note off.

Running status is not used at all times for a string of same-status messages and will often only be called upon by an instrument’s software when the rate of data exceeds a certain point. Indeed, an examination of the data from a typical synthesiser indicates that running status is not used during a large amount of ordinary playing.

4.5.5 Polyphonic key pressure (aftertouch)

The key pressure messages are sometimes called ‘aftertouch’ by keyboard manufacturers. Aftertouch is perhaps a slightly misleading term as it does not make clear what aspect of touch is referred to, and many people have confused it with note off velocity. This message refers to the amount of pressure placed on a key at the bottom of its travel, and it is used to instigate effects based on how much the player leans onto the key after depressing it. It is often applied to performance parameters such as vibrato.

The polyphonic key pressure message is not widely used, as it transmits a separate value for every key on the keyboard and thus requires a separate sensor for every key. This can be expensive to implement and is beyond the scope of many keyboards, so most manufacturers have resorted to the use of the channel pressure message (see below). The message takes the general format:

[&An] [Note number] [Pressure]

Implementing polyphonic key pressure messages involves the transmission of a considerable amount of data that might be unnecessary, as the message will be sent for every note in a chord every time the pressure changes. As most people do not maintain a constant pressure on the bottom of a key whilst playing, many redundant messages might be sent per note. A technique known as ‘controller thinning’ may be used by a device to limit the rate at which such messages are transmitted and this may be implemented either before transmission or at a later stage using a computer. Alternatively this data may be filtered out altogether if it is not required.

4.5.6 Control change

As well as note information, a MIDI device may be capable of transmitting control information that corresponds to the various switches, control wheels and pedals associated with it. These come under the control change message group and should be distinguished from program change messages. The controller messages have proliferated enormously since the early days of MIDI and not all devices will implement all of them. The control change message takes the general form:

[&Bn] [Controller number] [Data]

so a number of controllers may be addressed using the same type of status byte by changing the controller number.

Although the original MIDI standard did not lay down any hard and fast rules for the assignment of physical control devices to logical controller numbers, there is now common agreement amongst manufacturers that certain controller numbers will be used for certain purposes. These are assigned by the MMA. There are two distinct kinds of controller: the switch type and the analog type. The analog controller is any continuously variable wheel, lever, slider or pedal that might have any one of a number of positions and these are often known as continuous controllers. There are 128 controller numbers available and these are grouped as shown in Table 4.3. Table 4.4 shows a more detailed breakdown of some of these, as found in the majority of MIDI-controlled musical instruments, although the full list is regularly updated by the MMA. The control change messages have become fairly complex and interested users are referred to the relevant standards.

Table 4.3 MIDI controller classifications

Controller number (hex)	Function
&00–1F	14 bit controllers, MSbyte
&20–3F	14 bit controllers, LSbyte
&40–65	7 bit controllers or switches
&66–77	Originally undefined
&78–7F	Channel mode control

Table 4.4 MIDI controller functions

Controller number (hex)	Function
00	Bank select
01	Modulation wheel
02	Breath controller
03	Undefined
04	Foot controller
05	Portamento time
06	Data entry slider
07	Main volume
08	Balance
09	Undefined
0A	Pan
0B	Expression controller
0C	Effect control 1
0D	Effect control 2
0E–0F	Undefined
10–13	General purpose controllers 1–4
14–1F	Undefined
20–3F	LSbyte for 14 bit controllers (same function order as 00–1F)
40	Sustain pedal
41	Portamento on/off
42	Sostenuto pedal
43	Soft pedal
44	Legato footswitch
45	Hold 2
46–4F	Sound controllers
50–53	General purpose controllers 5–8
54	Portamento control
55–5A	Undefined
5B–5F	Effects depth 1–5
60	Data increment
61	Data decrement
62	NRPC LSbyte (non-registered parameter controller)
63	NRPC MSbyte
64	RPC LSbyte (registered parameter controller)
65	RPC MSbyte
66–77	Undefined
78	All sounds off
79	Reset all controllers
7A	Local on/off
7B	All notes off
7C	Omni receive mode off
7D	Omni receive mode on
7E	Mono receive mode
7F	Poly receive mode

The first 64 controller numbers (that is up to &3F) relate to only 32 physical controllers (the continuous controllers). This is to allow for greater resolution in the quantisation of position than would be feasible with the seven bits that are offered by a single data byte. Seven bits would only allow 128 possible positions of an analog controller to be represented and this might not be adequate in some cases. For this reason the first 32 controllers handle the most significant byte (MSbyte) of the controller data, while the second 32 handle the least significant byte (LSbyte). In this way, controller numbers &06 and &38 both represent the data entry slider, for example. Together, the data values can make up a 14-bit number (because the first bit of each data word has to be a zero), which allows the quantisation of a control’s position to be one part in 2¹⁴ (16384₁₀). Clearly, not all controllers will require this resolution, but it is available if needed. Only the LSbyte would be needed for small movements of a control. If a system opts not to use the extra resolution offered by the second byte, it should send only the MSbyte for coarse control. In practice this is all that is transmitted on many devices.

On/off switches can be represented easily in binary form (0 for OFF, 1 for ON), and it would be possible to use just a single bit for this purpose, but, in order to conform to the standard format of the message, switch states are normally represented by data values between &00 and &3F for OFF and &40–&7F for ON. In other words switches are now considered as 7-bit continuous controllers. In older systems it may be found that only &00 = OFF and &7F = ON.

The data increment and decrement buttons that are present on many devices are assigned to two specific controller numbers (&60 and &61) and an extension to the standard defines four controllers (&62 to &65) that effectively expand the scope of the control change messages. These are the registered and non-registered parameter controllers (RPCs and NRPCs).

The ‘all notes off’ command (frequently abbreviated to ‘ANO’) was designed to be transmitted to devices as a means of silencing them, but it does not necessarily have this effect in practice. What actually happens varies between instruments, especially if the sustain pedal is held down or notes are still being pressed manually by a player. All notes off is supposed to put all note generators into the release phase of their envelopes, and clearly the result of this will depend on what a sound is programmed to do at this point. The exception should be notes which are being played while the sustain pedal is held down, which should only be released when that pedal is released. ‘All sounds off’ was designed to overcome the problems with ‘all notes off’, by turning sounds off as quickly as possible. ‘Reset all controllers’ is designed to reset all controllers to their default state, in order to return a device to its ‘standard’ setting.

4.5.7 Channel modes

Although grouped with the controllers, under the same status, the channel mode messages differ somewhat in that they set the mode of operation of the instrument receiving on that particular channel.

‘Local on/off’ is used to make or break the link between an instrument’s keyboard and its own sound generators. Effectively there is a switch between the output of the keyboard and the control input to the sound generators which allows the instrument to play its own sound generators in normal operation when the switch is closed (see Figure 4.7). If the switch is opened, the link is broken and the output from the keyboard feeds the MIDI OUT while the

Figure 4.7 The ‘local off’ switch disconnects a keyboard from its associated sound generators in order that the two parts may be treated independently in a MIDI system

sound generators are controlled from the MIDI IN. In this mode the instrument acts as two separate devices: a keyboard without any sound, and a sound generator without a keyboard. This configuration can be useful when the instrument in use is the master keyboard for a large sequencer system, where it may not always be desired that everything played on the master keyboard results in sound from the instrument itself.

‘Omni off’ ensures that the instrument will only act on data tagged with its own channel number(s), as set by the instrument’s controls. ‘Omni on’ sets the instrument to receive on all of the MIDI channels. In other words, the instrument will ignore the channel number in the status byte and will attempt to act on any data that may arrive, whatever its channel. Devices should power-up in this mode according to the original specification, but more recent devices will tend to power up in the mode that they were left. Mono mode sets the instrument such that it will only reproduce one note at a time, as opposed to ‘Poly’ (phonic) in which a number of notes may be sounded together.

In older devices the mono mode came into its own as a means of operating an instrument in a ‘multitimbral’ fashion, whereby MIDI information on each channel controlled a separate monophonic musical voice. This used to be one of the only ways of getting a device to generate more than one type of voice at a time. The data byte that accompanies the mono mode message specifies how many voices are to be assigned to adjacent MIDI channels, starting with the basic receive channel. For example, if the data byte is set to 4, then four voices will be assigned to adjacent MIDI channels, starting from the basic channel which is the one on which the instrument has been set to receive in normal operation. Exceptionally, if the data byte is set to 0, all 16 voices (if they exist) are assigned each to one of the 16 MIDI channels. In this way, a single multitimbral instrument can act as 16 monophonic instruments, although on cheaper systems all of these voices may be combined to one audio output.

Mono mode tends to be used mostly on MIDI guitar synthesisers because each string can then have its own channel and each can control its own set of pitch bend and other parameters. The mode also has the advantage that it is possible to play in a truly legato fashion – that is with a smooth take over between the notes of a melody – because the arrival of a second note message acts simply to change the pitch if the first one is still being held down, rather than re-triggering the start of a note envelope. The legato switch controller (see Table 2.4) allows a similar type of playing in polyphonic modes by allowing new note messages only to change the pitch.

In poly mode the instrument will sound as many notes as it is able at the same time. Instruments differ as to the action taken when the number of simultaneous notes is exceeded: some will release the first note played in favour of the new note, whereas others will refuse to play the new note. Some may be able to route excess note messages to their MIDI OUT ports so that they can be played by a chained device. The more intelligent of them may look to see if the same note already exists in the notes currently sounding and only accept a new note if is not already sounding. Even more intelligently, some devices may release the quietest note (that with the lowest velocity value), or the note furthest through its velocity envelope, to make way for a later arrival. It is also common to run a device in poly mode on more than one receive channel, provided that the software can handle the reception of multiple polyphonic channels. A multitimbral sound generator may well have this facility, commonly referred to as ‘multi’ mode, making it act as if it were a number of separate instruments each receiving on a separate channel. In multi mode a device may be able to dynamically assign its polyphony between the channels and voices in order that the user does not need to assign a fixed polyphony to each voice.

4.5.8 Program change

The program change message is used most commonly to change the ‘patch’ of an instrument or other device. A patch is a stored configuration of the device, describing the setup of the tone generators in a synthesiser and the way in which they are interconnected. Program change is channel-specific and there is only a single data byte associated with it, specifying to which of 128 possible stored programs the receiving device should switch. On non-musical devices such as effects units, the program change message is often used to switch between different effects and the different effects programs may be mapped to specific program change numbers. The message takes the general form:

&[Cn] [Program number]

If a program change message is sent to a musical device it will usually result in a change of voice, as long as this facility is enabled. Exactly which voice corresponds to which program change number depends on the manufacturer. It is quite common for some manufacturers to implement this function in such a way that a data value of zero gives voice number one. This results in a permanent offset between the program change number and the voice number, which should be taken into account in any software. On some instruments, voices may be split into a number of ‘banks’ of 8, 16 or 32, and higher banks can be selected over MIDI by setting the program change number to a value which is 8, 16 or 32 higher than the lowest bank number. For example, bank 1, voice 2, might be selected by program change &01, whereas bank 2, voice 2, would probably be selected in this case by program change &11, where there were 16 voices per bank.

There are also a number of other approaches used in commercial sound modules. Where more than 128 voices need to be addressed remotely, the more recent ‘bank select’ command may be implemented.

4.5.9 Channel aftertouch

Most instruments use a single sensor, often in the form of a pressure-sensitive conductive plastic bar running the length of the keyboard, to detect the pressure applied to keys at the bottom of their travel. In the case of channel aftertouch, one message is sent for the entire instrument and this will correspond to an approximate total of the pressure over the range of the keyboard, the strongest influence being from the key pressed the hardest. (Some manufacturers have split the pressure detector into upper and lower keyboard regions, and some use ‘intelligent’ zoning.) The message takes the general form:

&[Dn] [Pressure value]

There is only one data byte, so there are 128 possible values and, as with the polyphonic version, many messages may be sent as the pressure is varied at the bottom of a key’s travel. Controller ‘thinning’ may be used to reduce the quantity of these messages, as described above.

4.5.10 Pitch bend wheel

The pitch wheel message has a status byte of its own, and carries information about the movement of the sprung-return control wheel on many keyboards which modifies the pitch of any note(s) played. It uses two data bytes in order to give 14 bits of resolution, in much the same way as the continuous controllers, except that the pitch wheel message carries both bytes together. Fourteen data bits are required so that the pitch appears to change smoothly, rather than in steps (as it might with only seven bits). The pitch bend message is channel specific so ought to be sent separately for each individual channel. This becomes important when using a single multi-timbral device in mono mode (see above), as one must ensure that a pitch bend message only affects the notes on the intended channel. The message takes the general form:

&[En] [LSbyte] [MSbyte]

The value of the pitch bend controller should be halfway between the lower and upper range limits when it is at rest in its sprung central position, thus allowing bending both down and up. This corresponds to a hex value of &2000, transmitted as &[En] [00] [40]. The range of pitch controlled by the bend message is set on the receiving device itself, or using the RPC designated for this purpose (see Section 4.6.7).

4.5.11 System exclusive

A system exclusive message is one that is unique to a particular manufacturer and often a particular instrument. The only thing that is defined about such messages is how they are to start and finish, with the exception of the use of system exclusive messages for universal information, as discussed elsewhere. System exclusive messages generated by a device will naturally be produced at the MIDI OUT, not at the THRU, so a deliberate connection must be made between the transmitting device and the receiving device before data transfer may take place. Occasionally it is necessary to make a return link from the OUT of the receiver to the IN of the transmitter so that two-way communication is possible and so that the receiver can control the flow of data to some extent by telling the transmitter when it is ready to receive and when it has received correctly (a form of handshaking).

The message takes the general form:

&[F0] [ident.] [data] [data] … [F7]

where [ident.] identifies the relevant manufacturer ID, a number defining which manufacturer’s message is to follow. Originally, manufacturer IDs were a single byte but the number of IDs has been extended by setting aside the [00] value of the ID to indicate that two further bytes of ID follow. Manufacturer IDs are therefore either one or three bytes long. A full list of manufacturer IDs is available from the MMA.

Data of virtually any sort can follow the ID. It can be used for a variety of miscellaneous purposes that have not been defined in the MIDI standard and the message can have virtually any length that the manufacturer requires. It is often split into packets of a manageable size in order not to cause receiver memory buffers to overflow. Exceptions are data bytes that look like other MIDI status bytes (except realtime messages), as they will naturally be interpreted as such by any receiver, which might terminate reception of the system exclusive message. The message should be terminated with &F7, although this is not always observed, in which case the receiving device should ‘time-out’ after a given period, or terminate the system exclusive message on receipt of the next status byte. It is recommended that some form of error checking (typically a checksum) is employed for long system exclusive data dumps, and many systems employ means of detecting whether the data has been received accurately, asking for re-tries of sections of the message in the event of failure, via a return link to the transmitter.

Examples of applications for such messages can be seen in the form of sample data dumps (from a sampler to a computer and back again for editing purposes), although this is painfully slow, and voice data dumps (from a synthesiser to a computer for storage and editing of user-programmed voices). There are now an enormous number of uses of system exclusive messages, both in the universal categories and in the manufacturer categories.

4.5.12 Universal system exclusive messages

The three highest numbered IDs within the system exclusive message have been set aside to denote special modes. These are the ‘universal non-commercial’ messages (ID: &7D), the ‘universal non-realtime’ messages (ID: &7E) and the ‘universal realtime’ messages (ID: &7F). Universal sysex messages are often used for controlling device parameters that were not originally specified in the MIDI standard and that now need addressing in most devices. Examples are things like ‘chorus modulation depth’, ‘reverb type’ and ‘master fine tuning’.

Universal non-commercial messages are set aside for educational and research purposes and should not be used in commercial products. Universal non-realtime messages are used for universal system exclusive events which are not time critical and universal realtime messages deal with time critical events (thus being given a higher priority). The two latter types of message normally take the general form of:

&[F0] [ID] [dev. ID] [sub-ID #1] [sub-ID #2] [data] … … [F7]

Device ID used to be referred to as ‘channel number’, but this did not really make sense since a whole byte allows for the addressing of 128 channels and this does not correspond to the normal 16 channels of MIDI. The term ‘device ID’ is now used widely in software as a means of defining one of a number of physical devices in a large MIDI system, rather than defining a MIDI channel number. It should be noted, though, that it is allowable for a device to have more than one ID if this seems appropriate. Modern MIDI devices will normally allow their device ID to be set either over MIDI or from the front panel. The use of &7F in this position signifies that the message applies to all devices as opposed to just one.

The sub-IDs are used to identify firstly the category or application of the message (sub-ID #1) and secondly the type of message within that category (sub-ID #2). For some reason, the original MIDI sample dump messages do not use the sub-ID #2, although some recent additions to the sample dump do.

4.5.13 Tune request

Older analog synthesisers tended to drift somewhat in pitch over the time that they were turned on. The tune request is a request for these synthesisers to re-tune themselves to a fixed reference. (It is advisable not to transmit pitch bend or note on messages to instruments during a tune up because of the unpredictable behaviour of some products under these conditions.)

4.5.14 Active sensing

Active sensing messages are single status bytes sent roughly three times per second by a controlling device when there is no other activity on the bus. They act as a means of reassuring the receiving devices that the controller has not disappeared. Not all devices transmit active sensing information, and a receiver’s software should be able to detect the presence or lack of it. If a receiver has come to expect active sensing bytes then it will generally act by turning off all notes if these bytes disappear for any reason. This can be a useful function when a MIDI cable has been pulled out during a transmission, as it ensures that notes will not be left sounding for very long. If a receiver has not seen active sensing bytes since last turned on, it should assume that they are not being used.

4.5.15 Reset

This message resets all devices on the bus to their power-on state. The process may take some time and some devices mute their audio outputs, which can result in clicks, therefore the message should be used with care.

4.6 MIDI control of sound generators

4.6.1 MIDI note assignment in synthesisers and samplers

Many of the replay and signal processing aspects of synthesis and sampling now overlap so that it is more difficult to distinguish between the two. In basic terms a sampler is a device that stores short clips of sound data in RAM, enabling them to be replayed subsequently at different pitches, possibly looped and processed. A synthesiser is a device that enables signals to be artificially generated and modified to create novel sounds. Wavetable synthesis is based on a similar principle to sampling, though, and stored samples can form the basis for synthesis. A sound generator can often generate a number of different sounds at the same time. It is possible that these sounds could be entirely unrelated (perhaps a single drum, an animal noise and a piano note), or that they might have some relationship to each other (perhaps a number of drums in a kit, or a selection of notes from a grand piano). The method by which sounds or samples are assigned to MIDI notes and channels is defined by the replay program.

The most common approach when assigning note numbers to samples is to program the sampler with the range of MIDI note numbers over which a certain sample should be sounded. Akai, one of the most popular sampler manufacturers, calls these ‘keygroups’. It may be that this ‘range’ is only one note, in which case the sample in question would be triggered only on receipt of that note number, but in the case of a range of notes the sample would be played on receipt of any note in the range. In the latter case transposition would be required, depending on the relationship between the note number received and the original note number given to the sample (see above). A couple of examples highlight the difference in approach, as shown in Figure 4.8. In the first example, illustrating a possible approach to note assignment for a collection of drum kit sounds, most samples are assigned to only one note number, although it is possible for tuned drum sounds such as tom-toms to be assigned over a range in order to give the impression of ‘tuned toms’. Each MIDI note message received would replay the particular percussion sound assigned to that note number in this example.

In the second example, illustrating a suggested approach to note assignment for an organ, notes were originally sampled every musical fifth across the organ’s note range. The replay program has been designed so that each of these samples is assigned to a note range of a fifth, centred on the original pitch of each sample, resulting in a maximum transposition of a third up or down. Ideally, of course, every note would have been sampled and assigned to an individual note number on replay, but this requires very large amounts of memory and painstaking sample acquisition in the first place.

In further pursuit of sonic accuracy, some devices provide the facility for introducing a cross-fade between note ranges. This is used where an abrupt change in the sound at the boundary between two note ranges might be undesirable, allowing the takeover from one sample to another to be more gradual. For example, in the organ scenario introduced above, the timbre could change noticeably when playing musical passages that crossed between two note ranges because replay would switch from the upper limit of transposition of one sample to the lower limit of the next (or vice versa). In this case the ranges for the different samples are made to overlap (as illustrated in Figure 4.9). In the overlap range the system mixes a proportion of the two samples together to form the output. The exact proportion depends on the range of overlap and the note’s position within this range. Very accurate tuning of the original samples is needed in order to avoid beats when using positional crossfades. Clearly this approach would be of less value when each note was assigned to a completely different sound, as in the drum kit example.

Figure 4.8 (a) Percussion samples are often assigned to one note per sample, except for tuned percussion which sometimes covers a range of notes. (b) Organ samples could be transposed over a range of notes, centred on the original pitch of the sample

Crossfades based on note velocity allow two or more samples to be assigned to one note or range of notes. This requires at least a ‘loud sample’ and a ‘soft sample’ to be stored for each original sound and some systems may accommodate four or more to be assigned over the velocity range. The terminology may vary, but the principle is that a velocity value is set at which the replay switches from one stored sample to another, as many instruments sound quite different when they are loud to when they are soft (it is more than just the volume that changes: it is the timbre also). If a simple switching point is set, then the change from one sample to the other will be abrupt as the velocity crosses either side of the relevant value. This can be illustrated by storing two completely different sounds as the loud and soft samples, in which case the output changes from one to the other at the switching point. A more subtle effect is achieved by using velocity crossfading, in which the proportion of loud and soft samples varies depending on the received note velocity value. At low velocity values the proportion of the soft sample in the output would be greatest and at high values the output content would be almost entirely made up of the loud sample (see Figure 4.10).

Figure 4.9 Overlapped sample ranges can be crossfaded in order that a gradual shift in timbre takes place over the region of takeover between one range and the next

Figure 4.10 Illustration of velocity switch and velocity crossfade between two stored samples (‘soft’ and ‘loud’) over the range of MIDI note velocity values

4.6.2 Polyphony, voice and note assignment

Modern sound modules (synthesisers and samplers) tend to be multi-note polyphonic. When the polyphony of a device is exceeded the device should follow a predefined set of rules to determine what to do with the extra notes. Typically a sound module will either release the ‘oldest’ notes first, or possibly release the quietest. Alternatively, new notes that exceed the polyphony will simply not be sounded until others are released. Rules for this are defined in some of the recent General MIDI specifications (see Section 4.8), and composers may now even be able to exercise some control over what happens in devices with limited polyphony.

It is important to distinguish between the degree of polyphony offered by a device and the number of simultaneous voices it can generate. Sometimes these may be traded off against each other in multi-timbral devices, by allocating a certain number of notes to each voice, with the total adding up to the total polyphony. Either 16 notes could be allocated to one voice or four notes to each of four voices, for example. Dynamic allocation is often used to distribute the polyphony around the voices depending on demand and this is a particular feature of General MIDI sound modules.

A multi-timbral sound generator is one that is capable of generating more than one voice at a time, independent of polyphony considerations. A voice is a particular sound type, such as ‘grand piano’ or ‘accordion’. This capability is now the norm for modern sound modules. Older synthesisers used to be able to generate only one or two voices at a time, possibly allowing a keyboard split, and could sometimes make use of MIDI channel mode 4 (monophonic, omni off) to allow multiple monophonic voices to be generated under MIDI control. They tended only to receive polyphonically on one MIDI channel at a time. More recent systems are capable of receiving on all 16 MIDI channels simultaneously, with each channel controlling an entirely independent polyphonic voice.

4.6.3 MIDI functions of sound generators

The MIDI implementation for a particular sound generator should be described in the manual that accompanies it. A MIDI implementation chart such as the one shown in Figure 4.11 indicates which message types are received and transmitted, together with any comments relating to limitations or unusual features. Functions such as note off velocity and polyphonic aftertouch, for example, are quite rare. It is quite common for a device to be able to accept certain data and act upon it, even if it cannot generate such data from its own controllers. The note range available under MIDI control compared with that available from a device’s keyboard is a good example of this, since many devices will respond to note data over a full ten octave range yet still have only a limited (or no) keyboard. This approach can be used by a manufacturer who wishes to make a cheaper synthesiser that omits the expensive physical sensors for such things as velocity and aftertouch, while retaining these functions in software for use under MIDI control. Devices conforming to the General MIDI specification described in Section 4.8 must conform to certain basic guidelines concerning their MIDI implementation and the structure of their sound generators.

4.6.4 MIDI data buffers and latency

All MIDI-controlled equipment uses some form of data buffering for received MIDI messages. Such buffering acts as a temporary store for messages that have arrived but have not yet been processed and allows for a certain prioritisation in the handling of received messages. Cheaper devices tend to have relatively small MIDI input buffers and these can overflow easily unless care is taken in the filtering and distribution of MIDI data around a large system (usually accomplished by a MIDI router or multiport interface). When a buffer overflows it will normally result in an error message displayed on the front panel of the device, indicating that some MIDI data is likely to have been lost. More advanced equipment can store more MIDI data in its input buffer, although this is not necessarily desirable because many messages that are transmitted over MIDI are intended for ‘real-time’ execution and one would not wish them to be delayed in a temporary buffer. Such buffer delay is one potential cause of latency in MIDI systems. A more useful solution would be to speed up the rate at which incoming messages are processed.

Figure 4.11 A typical MIDI implementation chart for a synthesiser sound module. (Yamaha TG100, with permission)

4.6.5 Handling of velocity and aftertouch data

Sound generators able to respond to note on velocity will use the value of this byte to control assigned functions within the sound generators. It is common for the user to be able to program the device such that the velocity value affects certain parameters to a greater or lesser extent. For example, it might be decided that the ‘brightness’ of the sound should increase with greater key velocity, in which case it would be necessary to program the device so that the envelope generator that affected the brightness was subject to control by the velocity value. This would usually mean that the maximum effect of the envelope generator would be limited by the velocity value, such that it could only reach its full programmed effect (that which it would give if not subject to velocity control) if the velocity was also maximum. The exact law of this relationship is up to the manufacturer and may be used to simulate different types of ‘keyboard touch’. A device may offer a number of laws or curves relating changes in velocity to changes in the control value, or the received velocity value may be used to scale the preset parameter rather than replace it.

Another common application of velocity value is to control the amplitude envelope of a particular sound, such that the output volume depends on how hard the key is hit. In many synthesiser systems that use multiple interacting digital oscillators, these velocity-sensitive effects can all be achieved by applying velocity control to the envelope generator of one or more of the oscillators, as indicated earlier in this chapter.

Note off velocity is not implemented in many keyboards, and most musicians are not used to thinking about what they do as they release a key, but this parameter can be used to control such factors as the release time of the note or the duration of a reverberation effect. Aftertouch (either polyphonic or channel, as described in Section 4.5) is often used in synthesisers to control the application of low frequency modulation (tremolo or vibrato) to a note. Sometimes aftertouch may be applied to other parameters, but this is less common.

4.6.6 Handling of controller messages

The controller messages that begin with a status of &Bn, as listed in Table 4.4, turn up in various forms in sound generator implementations. It should be noted that although there are standard definitions for many of these controller numbers it is often possible to remap them either within sequencer software or within sound modules themselves. Fourteen-bit continuous controllers are rarely encountered for any parameter and often only the MSbyte of the controller value (which uses the first 32 controller numbers) is sent and used. For most parameters the 128 increments that result are adequate.

Controllers &07 (Volume) and &0A (Pan) are particularly useful with sound modules as a means of controlling the internal mixing of voices. These controllers work on a per channel basis, and are independent of any velocity control which may be related to note volume. There are two real-time system exclusive controllers that handle similar functions to these, but for the device as a whole rather than for individual voices or channels. The ‘master volume’ and ‘master balance’ controls are accessed using:

&[F0] [7F] [dev. ID] [04] [01 or 02] [data] [data] [F7]

where the sub-ID #1 of &04 represents a ‘device control’ message and sub-ID #2s of &01 or &02 select volume or balance respectively. The [data] values allow 14-bit resolution for the parameters concerned, transmitted LSB first. Balance is different to pan because pan sets the stereo positioning (the split in level between left and right) of a mono source, whereas balance sets the relative levels of the left and right channels of a stereo source (see Figure 4.12). Since a pan or balance control is used to shift the stereo image either left or right from a centre detent position, the MIDI data values representing the setting are ranged either side of a mid-range value that corresponds to the centre detent. The channel pan controller is thus normally centred at a data value of 63 (and sometimes over a range of values just below this if the pan has only a limited number of steps), assuming that only a single 7-bit controller value is sent. There may be fewer steps in these controls than there are values of the MIDI controller, depending on the device in question, resulting in a range of controller values that will give rise to the same setting.

Figure 4.12 (a) A pan control takes a mono input and splits it two ways (left and right), the stereo position depending on the level difference between the two channels. The attenuation law of pan controls is designed to result in a smooth movement of the source across the stereo ‘picture’ between left and right, with no apparent rise or fall in overall level when the control is altered. A typical pan control gain law is shown here. (b) A balance control simply adjusts the relative level between the two channels of a stereo signal so as to shift the entire stereo image either left or right

Some manufacturers have developed alternative means of expressive control for synthesisers such as the ‘breath controller’, which is a device which responds to the blowing effort applied by the mouth of the player. It was intended to allow wind players to have more control over expression in performance. Plugged into the synthesiser, it can be applied to various envelope generator or modulator parameters to affect the sound. The breath controller also has its own MIDI controller number. There is also a portamento controller (&54) that defines a note number from which the next note should slide. It is normally transmitted between two note on messages to create an automatic legato portamento effect between two notes.

The ‘effects’ and ‘sound’ controllers have been set aside as a form of general purpose control over aspects of the built-in effects and sound quality of a device. How they are applied will depend considerably on the architecture of the sound module and the method of synthesis used, but they give some means by which a manufacturer can provide a more abstracted form of control over the sound without the user needing to know precisely which voice parameters to alter. In this way, a user who is not prepared to get into the increasingly complicated world of voice programming can modify sounds to some extent.

The effects controllers occupy five controller numbers from &5B to &5F and are defined as Effects Depths 1–5. The default names for the effects to be controlled by these messages are respectively ‘External Effects Depth’, ‘Tremolo Depth’, ‘Chorus Depth’, ‘Celeste (Detune) Depth’ and ‘Phaser Depth’, although these definitions are open to interpretation and change by manufacturers. There are also ten sound controllers that occupy controller numbers from &46 to &4F. Again these are user- or manufacturer-definable, but five defaults were originally specified (listed in Table 4.5). They are principally intended as real-time controllers to be used during performance, rather than as a means of editing internal voice patches (the RPCs and NRPCs can be used for this as described below).

The sound variation controller is interesting because it is designed to allow the selection of one of a number of variants on a basic sound, depending on the data value that follows the controller number. For example, a piano sound might have variants of ‘honky tonk’, ‘soft pedal’, ‘lid open’ and ‘lid closed’. The data value in the message is not intended to act as a continuous controller for certain voice parameters, rather the different data values possible in the message are intended to be used to select certain pre-programmed variations on the voice patch. If there are less than the 128 possible variants on the voice then the variants should be spread evenly over the number range so that there is an equal number range between them.

Table 4.5 Sound controller functions (byte 2 of status &Bn)

MIDI controller number	Function (default)
&46	Sound variation
&47	Timbre/harmonic content
&48	Release time
&49	Attack time
&4A	Brightness
&4B-4F	No default

The timbre and brightness controllers can be used to alter the spectral content of the sound. The timbre controller is intended to be used specifically for altering the harmonic content of a sound, whilst the brightness controller is designed to control its high frequency content. The envelope controllers can be used to modify the attack and release times of certain envelope generators within a synthesiser. Data values less than &40 attached to these messages should result in progressively shorter times, whilst values greater than &40 should result in progressively longer times.

4.6.7 Registered and non-registered parameter numbers

The MIDI standard was extended a few years ago to allow for the control of individual internal parameters of sound generators by using a specific control change message. This meant, for example, that any aspect of a voice, such as the velocity sensitivity of an envelope generator, could be assigned a parameter number that could then be accessed over MIDI and its setting changed, making external editing of voices much easier. Parameter controllers are a subset of the control change message group, and they are divided into the registered and non-registered numbers (RPNs and NRPNs). RPNs are intended to apply universally and should be registered with the MMA, whilst NRPNs may be manufacturer specific. Only five parameter numbers were originally registered as RPNs, as shown in Table 4.6, but more may be added at any time and readers are advised to check the most recent revisions of the MIDI standard.

Parameter controllers operate by specifying the address of the parameter to be modified, followed by a control change message to increment or decrement the setting concerned. It is also possible to use the data entry slider controller to alter the setting of the parameter. The address of the parameter is set in two stages, with an MSbyte and then an LSbyte message, so as to allow for 16 384 possible parameter addresses. The controller numbers &62 and &63 are used to set the LS- and MSbytes respectively of an NRPN, whilst &64 and &65 are used to address RPNs. The sequence of messages required to modify a parameter is as follows:

Table 4.6 Some examples of RPC definitions

RPC number (hex)	Parameter
00 00	Pitch bend sensitivity
00 01	Fine tuning
00 02	Coarse tuning
00 03	Tuning program select
00 04	Tuning bank select
7F 7F	Cancels RPN or NRPN (usually follows Message 3)

Message 1

&[Bn] [62 or 64] [LSB]

Message 2

&[Bn] [63 or 65] [MSB]

Message 3

&[Bn] [60 or 61] [7F] or &[Bn] [06] [DATA] [38] [DATA]

Message 3 represents either data increment (&60) or decrement (&61), or a 14-bit data entry slider control change with MSbyte (&06) and LSbyte (&38) parts (assuming running status). If the control has not moved very far, it is possible that only the MSbyte message need be sent.

4.6.8 Voice selection

The program change message was adequate for a number of years as a means of selecting one of a number of stored voice patches on a sound generator. Program change on its own allows for up to 128 different voices to be selected and a synthesiser or sound module may allow a program change map to be set up in order that the user may decide which voice is selected on receipt of a particular message. This can be particularly useful when the module has more than 128 voices available, but no other means of selecting voice banks. A number of different program change maps could be stored, perhaps to be selected under system exclusive control.

Modern sound modules tend to have very large patch memories – often too large to be adequately addressed by 128 program change messages. Although some older synthesisers used various odd ways of providing access to further banks of voices, most modern modules have implemented the standard ‘bank select’ approach. In basic terms, ‘bank select’ is a means of extending the number of voices that may be addressed by preceding a standard program change message with a message to define the bank from which that program is to be recalled. It uses a 14-bit control change message, with controller numbers &00 and &20, to form a 14-bit bank address, allowing 16 384 banks to be addressed. The bank number is followed directly by a program change message, thus creating the following general message:

&[Bn] [00] [MSbyte (of bank)]

&[Bn] [20] [LSbyte]

&[Cn] [Program number]

4.7 MIDI tuning control

Conventional equal-tempered tuning is the norm in western musical environments, but there may be cases when alternative tuning standards are required in order to conform to other temperaments or to non-western musical styles. Many devices now have the capability to store a number of alternative tuning maps, or to be retuned ‘on the fly’. A number of manufacturer-specific methods were used in the past (prior to the MIDI Tuning Standard), being SysEx messages preceded by the relevant manufacturer ID, but the MIDI Tuning Standard now forms the basis for communicating information about alternative tunings.

Figure 4.13 MIDI tuning messages indicate the pitches to which MIDI notes should be tuned using three bytes, as shown here

The tuning standard assumes that any note on a sound generator can be tuned over the entire range 8.1758 Hz to 13 289.73 Hz. It then allows individual notes’ tuning to be adjusted in fractions of a semitone above a conventional MIDI note’s pitch (which would be based on the equal temperament convention). A semitone is divided into 100 cents. A cent is one hundredth of a semitone, and as such does not represent a constant frequency increment in hertz but represents a proportion of the frequency of the note concerned. As the pitch of the basic note rises, so the frequency increment represented by a cent also increases. Two MIDI data bytes are used to indicate the fraction of a semitone above the basic note pitch, so the maximum resolution possible is 100 cents/2¹⁴ which equals 0.0061 cents.

Tuning of individual notes is represented by three MIDI messages in total. The first specifies a numbered semitone in the MIDI note range on which the fractional tuning is to be based (the same as the MIDI note number in a note on message) and the second and third form a pair containing a 14-bit number (the MSB of each is 0), transmitted MSB first. This 14-bit number is used as described in the previous paragraph, with each increment representing a change of 0.0061 cents upwards in pitch from the basic semitone number (see Figure 4.13). A sound generator that is not capable of tuning to the accuracy contained in the message should tune to the nearest possible value, but it is recommended that it stores the full resolution tuning value in tuning memories, in case data is to be transmitted to other devices which are capable of full resolution. The frequency value of &[7F] [7F] [7F] is reserved to indicate no change to the tuning of a particular note.

A number of MIDI messages are associated with tuning. These break down into bulk dumps of tuning data (to retune a complete instrument), single note retuning messages and the selection of prestored tuning programs and banks of programs. The only one of these that is currently a real-time message is the single note retuning. Users may select stored tuning programs and banks of tuning programs using the RPN messages shown in Table 4.6. A device may request a bulk tuning dump from another using the general SysEx non-realtime form:

&[F0] [7E] [dev. ID] [08] [00] [tt] [F7]

where the sub-ID #1 of &08 indicates a MIDI tuning standard message and the sub-ID #2 of &00 indicates a bulk dump request. &[tt] defines the tuning program which is being requested. Such a request should result in the transmission of a bulk dump if such a tuning program exists, and the dump should take the form:

&[F0] [7E] [dev. ID] [08] [01] [tt] [tuning name] … … [tuning data] … … … [LL] [F7]

where [tuning name] is 16 bytes to name the tuning program (each byte holds a 7-bit ASCII character) and [tuning data] consists of 128 groups of 3 bytes to define the tuning of each note, in the format described in the previous section. &LL is a checksum byte.

A single note may be retuned using the SysEx realtime message:

&[F0] [7F] [dev. ID] [08] [02] [tt] [ll] ([kk] [tuning data]) … … [F7]

where &[ll] indicates the number of notes to be retuned, followed by that number of groups of tuning data. Each group of tuning data is preceded by &[kk] which defines the note to be retuned.

4.8 General MIDI

One of the problems with MIDI sound generators is that although voice patches can be selected using MIDI program change commands, there is no guarantee that a particular program change number will recall a particular voice on more than one instrument. In other words, program change 3 may correspond to ‘alto sax’ on one instrument and ‘grand piano’ on another. This makes it difficult to exchange songs between systems with any hope of the replay sounding the same as intended by the composer. General MIDI is an approach to the standardisation of a sound generator’s behaviour, so that songs can be exchanged more easily between systems and device behaviour can be predicted by controllers. It comes in three flavours: GM 1, GM Lite and GM 2.

General MIDI Level 1 specifies a standard voice map and it specifies a minimum degree of polyphony, requiring that a sound generator should be able to receive MIDI data on all 16 channels simultaneously and polyphonically, with a different voice on each channel. There is also a requirement that the sound generator should support percussion sounds in the form of drum kits, so that a General MIDI sound module is capable of acting as a complete ‘band in a box’.

Dynamic voice allocation is the norm in GM sound modules, with a requirement either for at least 24 dynamically allocated voices in total, or 16 for melody and 8 for percussion. Voices should all be velocity sensitive and should respond at least to the controller messages 1, 7, 10, 11, 64, 121 and 123 (decimal), RPNs 0, 1 and 2 (see above), pitch bend and channel after-touch. In order to ensure compatibility between sequences that are replayed on GM modules, percussion sounds are always allocated to MIDI channel 10. Program change numbers are mapped to specific voice names, with ranges of numbers allocated to certain types of sounds, as shown in Table 4.7. Precise voice names may be found in the GM documentation. Channel 10, the percussion channel, has a defined set of note numbers on which particular sounds are to occur, so that the composer may know for example that key 39 will always be a ‘hand clap’.

Table 4.7 General MIDI program number ranges (except channel 10)

Program change (decimal)	Sound type
0–7	Piano
8–15	Chromatic percussion
16–23	Organ
24–31	Guitar
32–39	Bass
40–47	Strings
48–55	Ensemble
56–63	Brass
64–71	Reed
72–79	Pipe
80–87	Synth lead
88–95	Synth pad
96–103	Synth effects
104–111	Ethnic
112–119	Percussive
120–128	Sound effects

General MIDI sound modules may operate in modes other than GM, where voice allocations may be different, and there are two universal non-realtime SysEx messages used to turn GM on or off. These are:

&[F0] [7E] [dev. ID] [09] [01] [F7]

to turn GM on, and:

&[F0] [7E] [dev. ID] [09] [02] [F7]

to turn it off.

There is some disagreement over the definition of ‘voice’, as in ‘24 dynamically allocated voices’ – the requirement that dictates the degree of polyphony supplied by a GM module. The spirit of the GM specification suggests that 24 notes should be capable of sounding simultaneously, but some modules combine sound generators to create composite voices, thereby reducing the degree of note polyphony.

General MIDI Lite (GML) is a cut-down GM 1 specification designed mainly for use on mobile devices with limited processing power. It can be used for things like ring tones on mobile phones and for basic music replay from PDAs. It specifies a fixed polyphony of 16 simultaneous notes, with 15 melodic instruments and 1 percussion kit on channel 10. The voice map is the same as GM Level 1. It also supports basic control change messages and the pitch-bend sensitivity RPN. As a rule, GM Level 1 songs will usually replay on GM Lite devices with acceptable quality, although some information may not be reproduced. An alternative to GM Lite is SPMIDI (see Section 4.9) which allows greater flexibility.

GM Level 2 is backwards compatible with Level 1 (GM 1 songs will replay correctly on GM 2 devices) but allows the selection of voice banks and extends polyphony to 32 voices. Percussion kits can run on channel 11 as well as the original channel 10. It adds MIDI tuning, RPN controllers and a range of universal system exclusive messages to the MIDI specification, enabling a wider range of control and greater versatility.

4.9 Scalable polyphonic MIDI (SPMIDI)

SPMIDI, rather like GM Lite, is designed principally for mobile devices that have issues with battery life and processing power. It has been adopted by the 3GPP wireless standards body for structured audio control of synthetic sounds in ring tones and multimedia messaging. It was developed primarily by Nokia and Beatnik. The SPMIDI basic specification for a device is based on GM Level 2, but a number of selectable profiles are possible, with different levels of sophistication.

The idea is that rather than fixing the polyphony at 16 voices the polyphony should be scalable according to the device profile (a description of the current capabilities of the device). SPMIDI also allows the content creator to decide what should happen when polyphony is limited – for example, what should happen when only four voices are available instead of 16. Conventional ‘note stealing’ approaches work by stealing notes from sounding voices to supply newly arrived notes, and the outcome of this can be somewhat arbitrary. In SPMIDI this is made more controllable. A process known as channel masking is used, whereby certain channels have a higher priority than others, enabling the content creator to put high priority material on particular channels. The channel priority order and maximum instantaneous polyphony are signalled to the device in a setup message at the initialisation stage.

4.10 Standard MIDI files (SMF)

Sequencers and notation packages typically store data on disk in their own unique file formats. The standard MIDI file was developed in an attempt to make interchange of information between packages more straightforward and it is now used widely in the industry in addition to manufacturers’ own file formats. It is rare now not to find a sequencer or notation package capable of importing and exporting standard MIDI files. MIDI files are most useful for the interchange of performance and control information. They are not so useful for music notation where it is necessary to communicate greater detail about the way music appears on the stave and other notational concepts. For the latter purpose a number of different file formats have been developed, including Music XML which is among the most widely used of the universal interchange formats today. Further information about Music XML resources and other notation formats may be found in the Further reading at the end of this chapter.

Three types of standard MIDI file exist to encourage the interchange of sequencer data between software packages. The MIDI file contains data representing events on individual sequencer tracks, as well as containing labels such as track names, instrument names and time signatures.

4.10.1 General structure of MIDI files

There are three MIDI file types. File type 0 is the simplest and is used for single-track data, whilst file type 1 supports multiple tracks which are ‘vertically’ synchronous with each other (such as the parts of a song). File type 2 contains multiple tracks that have no direct timing relationship and may therefore be asynchronous. Type 2 could be used for transferring song files made up of a number of discrete sequences, each with a multiple track structure.

The basic file format consists of a number of 8-bit words formed into chunk-like parts, very similar to the RIFF and AIFF audio file formats described in Chapter 6. SMFs are not exactly RIFF files though, because they do not contain the highest level FORM chunk. (To encapsulate SMFs in a RIFF structure, use the RMID format, described in Section 4.12.) The header chunk, which always heads a MIDI file, contains global information relating to the whole file, whilst subsequent track chunks contain event data and labels relating to individual sequencer tracks. Track data should be distinguished from MIDI channel data, since a sequencer track may address more than one MIDI channel. Each chunk is preceded by a preamble of its own, which specifies the type of chunk (header or track) and the length of the chunk in terms of the number of data bytes that are contained in the chunk. There then follow the designated number of data bytes (see Figure 4.14). The chunk preamble contains 4 bytes to identify the chunk type using ASCII representation and 4 bytes to indicate the number of data bytes in the chunk (the length). The number of bytes indicated in the length does not include the preamble (which is always 8 bytes).

4.10.2 Header chunk

The header chunk takes the format shown in Figure 4.15. After the 8-byte preamble will normally be found 6 bytes containing header data, considered as three 16-bit words, the first of which (‘format’) defines the file type as 0, 1 or 2 (see above), the second of which (‘ntrks’) defines the number of track chunks in the file, and the third of which (‘division’) defines the timing format used in subsequent track events.

Figure 4.14 The general format of a MIDI file chunk. Each chunk has a preamble consisting of a 4-byte ASCII ‘type’ followed by 4 bytes to represent the number of data bytes in the rest of the message (the ‘length’)

Figure 4.15 The header chunk has the type ‘MThd’ and the number of data bytes indicated in the ‘length’ is 6 (see text)

A zero in the MSB of the ‘division’ word indicates that events will be represented by ‘musical’ time increments of a certain number of ‘ticks per quarter note’ (the exact number is defined in the remaining bits of the word), whilst a one in the MSB indicates that events will be represented by real-time increments in number-of-ticks-per-timecode-frame. The frame rate of the timecode is given in the remaining bits of the most significant byte of ‘division’, being represented using negative values in twos complement form, so the standard frame rates are represented by one of the decimal values –24, –25, –29 (for 30 drop frame) or –30.

When a real-time format is specified in the header chunk, the least significant byte of ‘division’ is used to specify the subdivisions of a frame to which events may be timed. For example, a value of ‘4₁₀’ in this position would mean that events were timed to an accuracy of a quarter of a frame, corresponding to the arrival frequency of MIDI quarter-frame timecode messages, whilst a value of ‘80₁₀’ would allow events to be timed to bit accuracy within the timecode frame (there are 80 bits representing a single timecode frame value in the SMPTE/EBU longitudinal timecode format).

4.10.3 Track chunks

Following the header come a number of track chunks (see Figure 4.16), the number depending on the file type and the number of tracks. File type 0 represents a single track and will only contain a header and one track chunk, whilst file types 1 and 2 may have many track chunks. Track chunks contain strings of MIDI events, each labelled with a delta-time at which the event is to occur. Delta-times represent the number of ‘ticks’ since the last event, as opposed to the absolute time since the beginning of a song. The exact time increment specified by a tick depends on the definition of a tick contained in the ‘division’ word of the header (see above).

Delta-time values are represented in ‘variable length format’, which is a means of representing hexadecimal numbers up to &0FFFFFFF as compactly as possible. Variable length values represent the number in question using one, two, three or four bytes, depending on the size of the number. Each byte of the variable length value has its MSB set to a one, except for the last byte whose MSB should be zero. (This distinguishes the last byte of the value from the others, so that the computer reading the data knows when to stop compiling the number.) Seven bits of each byte are therefore available for the representation of numeric data (rather like the MIDI status and data bytes). A software routine must be written to convert normal hex values into this format and back again. The standard document gives some examples of hex numbers and their variable length equivalents, as shown in Table 4.8.

Figure 4.16 A track chunk has the type ‘MTrk’ and the number of data bytes indicated in the ‘length’ depends on the contents of the chunk. The data bytes which follow are grouped into events as shown

Table 4.8 Examples of numbers in variable length format

Original number (hex)	Variable length format (hex)
00000000	00
00000040	40
0000007F	7F
00000080	81 00
00002000	C0 00
00100000	C0 80 00
0FFFFFFF	FF FF FF 7F

4.10.4 MIDI file track events

The track events that occur at specified delta-times fall into the categories of ‘MIDI event’, ‘SysEx event’ and ‘meta-event’. In the case of the MIDI event, the data bytes that follow the delta-time are simply those of a MIDI channel message, with running status used if possible.

System exclusive (SysEx) events are used for holding MIDI system exclusive dumps that occur during a sequence. The event data is normally identical to the system exclusive data packet to be transmitted, except that the length of the packet is specified after the initial &[F0] byte that signals the beginning of a SysEx message and before the normal manufacturer ID, as follows:

&[F0] [length] [SysEx data] … …

The ‘length’ value should be encoded in variable length format, and the standard requires that &[F7] be used to terminate a SysEx event in a MIDI file. (Some software omits this when transmitting such data over MIDI.) It is also possible to have a special ‘SysEx’ event, as follows:

&[F7] [length] [data] … …

The standard says that this can be used as a form of ‘escape’ event, in order that data may be included in a standard MIDI file that would not normally be part of a sequencer file, such as real-time messages or MTC messages. The &F7 byte is also used as an identifier for subsequent parts of a system exclusive message that is to be transmitted in timed packets (some instruments require this). In such a case the first packet of the SysEx message uses the &F0 identifier and subsequent packets use the &F7 identifier, preceded by the appropriate delta-times to ensure correct timing of the packets.

The meta-event is used for information such as time signature, key signature, text, lyrics, instrument names and tempo markings. Its general format consists of a delta-time followed by the identifier &FF, as follows:

&[FF] [type] [length] [data] … …

The byte following &FF defines the type of meta-event, and the ‘length’ value is a variable length number describing the number of data bytes in the message which follows it. The number of bytes taken up by ‘length’ therefore depends on the message length to be represented.

Many meta-events exist and it is not intended to describe them all here, although some of the most common type identifiers are listed in Table 4.9. A full list of current meta-events can be obtained from the MMA. It is allowable for a manufacturer to include meta-events specific to a particular software package in a MIDI file, although this is only recommended if the standard MIDI file is to be used as the normal storage format by the software. In such a case the ‘type’ identifier should be set to &7F. A software package should expect to encounter events in MIDI files that it cannot deal with, and be able simply to ignore them, since either new event types may be defined after a package has been written, or a particular feature may be unimplemented.

A standardised meta-event format for lyrics has been published by the MMA as Recommended Practice RP-017, avoiding the confusion that used to be widespread regarding the way that lyrics were represented in MIDI files. There is also a recommended practice for including device names and program names in meta-events, called RP-019. This enables specific destination devices to be identified in MIDI files, so that a track’s events can subsequently be routed to that particular device. This is an alternative to identifying cable numbers on multi-port MIDI interfaces. The program name meta-event allows specific program or voice names to be included in the file so that the rather anonymous bank select and program change messages that are used to select sound generator voices can be identified by name.

Table 4.9 A selection of common meta-event type identifiers

Type (hex)	Length	Description
00	02	Sequence number
01	Var	Text event
02	Var	Copyright notice text
03	Var	Sequence or track name
04	Var	Instrument name
05	Var	Lyric text (normally one syllable per event)
06	Var	Marker text (rehearsal letters, etc.)
07	Var	Cue point text
20	01	MIDI channel prefix (ties subsequent events to a particular channel, until the next channel event, MIDI or meta)
2F	00	End of track. (No data follows)
51	03	Set tempo (μs per quarter note)
54	05	Timecode location (hh:mm:ss:ff:100ths) of track start (following the MTC convention for hours)
58	04	Time signature (see below)
59	02	Key signature. First data byte denotes number of sharps (+ve value) or flats (–ve value). Second data byte denotes major (0) or minor (1) key
7F	Var	Sequencer-specific meta-event (see above)

Figure 4.17 Meaning of the data bytes in the time signature meta-event

4.10.5 Time signatures and tempo maps

The format of time signature meta-events needs further explanation, as it is somewhat arcane. The event consists of four data bytes following the ‘length’ identifier, as shown in Figure 4.17. The first two of these define the conventional time signature (e.g.: 4/4 or 6/8) and the second two define the relationship between MIDI clocks and the notated music. The denominator of the time signature is represented as the power of two required to produce the number concerned. For example, this value would be &03 if the denominator was 8₁₀ because 2³ equals 8. The third data byte defines the number of MIDI clocks per metronome click (the metronome may click at intervals other than a quarter note, depending on the time signature) and the final byte allows the user to define the number of 32nd notes actually notated per 24 MIDI clocks. This last, perhaps unusual-sounding definition allows for a redefinition of the tempo unit represented by MIDI clocks (which would normally run at a rate of 6 per 16th note), in order to accommodate software packages that allow this relationship to be altered. The tempo map of a song may need to be transferred between one machine and another, and the MIDI file format may be used for this purpose. Such a file could be a type 0 file consisting solely of meta-events describing tempo changes, but otherwise the map must be contained in the first track chunk of a larger file. This is where reading devices will expect to find it.

4.11 Downloadable Sounds (DLS) and SoundFonts

A gradual convergence may be observed in the industry between the various different methods by which synthetic sounds can be described. These have been variously termed ‘Downloadable Sounds’, ‘SoundFonts’ and more recently ‘MPEG 4 Structured Audio Sample Bank Format’. Downloadable Sounds is an MMA specification for synthetic voice description that enables synthesisers to be programmed using voice data downloaded from a variety of sources. In this way a content creator could not only define the musical structure of his content in a universally usable way, using standard MIDI files, but could also define the nature of the sounds to be used with downloadable sounds. In these ways content creators can specify more precisely how synthetic audio should be replayed, so that the end result is more easily predicted across multiple rendering platforms.

The success of the most recent of these approaches depends to a large extent on the agreement around a common method of sound synthesis known as ‘wavetable synthesis’. Here basic sound waveforms are stored in wavetables (simply tables of sample values) in RAM, to be read out at different rates and with different sample skip values, for replay at different pitches. Subsequent signal processing and envelope shaping can be used to alter the timbre and temporal characteristics. Such synthesis capabilities exist on the majority of computer sound cards, making it a realistic possibility to implement the standard widely.

Downloadable Sounds Level 1, version 1.1a, was published in 1999 and contains a specification for devices that can deal with DLS as well as a file format for containing the sound descriptions. The basic idea is that a minimal synthesis engine should be able to replay a looped sample from a wavetable, apply two basic envelopes for pitch and volume, use low frequency oscillator control for tremolo and vibrato, and respond to basic MIDI controls such as pitch bend and modulation wheel. There is no option to implement velocity crossfading or layering of sounds in DLS Level 1, but keyboard splitting into 16 ranges is possible.

DLS Level 1 requires a minimum specification of the sound card or rendering device that can be used to replay or render the synthetic sounds described. This includes a minimum of the following: 512 KB of wavetable storage (assuming 16 bit samples); 128 instruments, sets of articulation data, regions and samples at once; 24 simultaneous voices and 22.05 kHz sampling rate. The DLS Level 1 file format is based on the RIFF structure, described in Chapter 6. It is based on chunks containing instrument definitions and WAVE file data (containing the sampled audio for the individual wavetables). So-called articulation information describes things like loop points and envelope shapes that indicate how the sound is to be replayed. The WAVE data is mono and stored either in 16-bit twos complement form, or in 8-bit unsigned form. DLS RIFF files (file extension ‘.dls’) may contain wave and articulation data for a number of instruments and for a number of note regions within those instruments. Information chunks provide textual information describing the instruments defined in the file, and instrument list chunks and sub-chunks contain the data for the instruments themselves – pointing to relevant wavetable data stored in WAVE chunks.

DLS Level 2 is somewhat more advanced, requiring two six-segment envelope generators, two LFOs, a low-pass filter with resonance and dynamic cut off frequency controls. It requires more memory for wavetable storage (2 MB), 256 instruments and 1024 regions, among other things. DLS Level 2 has been adopted as the MPEG-4 Structured Audio sample bank format (see Chapter 2 for more information about MPEG).

Emu developed so-called SoundFonts for Creative Labs and these have many similar characteristics to downloadable sounds. They have been used widely to define synthetic voices for Sound Blaster and other computer sound cards. In fact the formats have just about been harmonised with the issue of DLS Level 2 that apparently contains many of the advanced features of SoundFonts. SoundFont 2 descriptions are normally stored in RIFF files with the extension ‘.sf2’.

Figure 4.18 Structure of a basic RMID file containing both standard MIDI file (SMF) and downloadable sound (DLS) data. All header elements are 4 bytes long. The MIDI file data and the downloadable sound data are however long they are described to be

4.12 RMID and XMF files

RMID is a version of the RIFF file structure that can be used to combine a standard MIDI file and a downloadable sound file within a single structure. In this way all of the data required to replay a song using synthetic sounds can be contained within one file. As shown in Figure 4.18, it adds a 20-byte header to the file before the start of the SMF data, which contains the standard 4-byte ASCII ‘RIFF’ identification, followed by a 4-byte length indication and a 4-byte ‘RMID’ identifier. The ‘data’ chunk that follows contains the SMF data. A DLS chunk can be appended to the end of the SMF chunk within the overall RMID chunk length.

RMID seems to have been superseded by another file format known as XMF (eXtensible Music Format) that is designed to contain all of the assets required to replay a music file. It is based on Beatnik’s RMF (Rich Music Format) which was designed to incorporate standard MIDI files and audio files such as MP3 and WAVE so that a degree of interactivity could be added to audio replay. RMF can also address a Special Bank of MIDI sounds (an extension of GM) in the Beatnik Audio Engine. XMF is now the MMA’s recommended way of combining such elements. It is more extensible than RMID and can contain WAVE files and other media elements for streamed or interactive presentations. XMF introduces concepts such as looping and branching into standard MIDI files. RMF included looping but did not incorporate DLS into the file format. In addition to the features just described, XMF can incorporate 40-bit encryption for advanced data security as well as being able to compress standard MIDI files by up to 5:1 and incorporate metadata such as rights information. So far, XMF Type 0 and Type 1 have been defined, both of which contain SMF and DLS data, and which are identical except that Type 0 MIDI data may be streamed.

4.13 SAOL and SASL in MPEG 4 Structured Audio

SAOL is the Structured Audio Orchestra Language of MPEG 4 Structured Audio (a standard for low bit rate representation of digital audio). SASL is the Structured Audio Score Language. A SASL ‘score’ controls SAOL ‘instruments’. SAOL is an extension of CSound, a synthesis language developed over many years, primarily at MIT, and is more advanced than MIDI DLS (which is based only on simple wavetable synthesis). Although there is a restricted profile of Structured Audio that uses only wavetable synthesis (essentially DLS Level 2 for use in devices with limited processing power), a full implementation allows for a variety of other synthesis types such as FM, and is extensible to include new ‘unit generators’ (the CSound name for the elements of a synthesis patch).

SASL is more versatile than standard MIDI files in its control of SAOL instruments. There is a set of so-called ‘MIDI semantics’ that enables the translation of MIDI commands and controllers into SAOL events, so that MIDI commands can either be used instead of a SASL score, or in addition to it. If MPEG 4 Structured Audio (SA) gains greater ground and authoring tools become more widely available, the use of MIDI control and DLS may decline as they are inherently less versatile. MIDI, however, is inherently simpler than SA and could well continue to be used widely when the advanced features of SA are not required.

4.14 MIDI and synchronisation

4.14.1 Introduction to MIDI synchronisation

An important aspect of MIDI control is the handling of timing and synchronisation data. MIDI timing data takes the place of the various older standards for synchronisation on drum machines and sequencers that used separate ‘sync’ connections carrying a clock signal at one of a number of rates, usually described in pulses-per-quarter-note (ppqn). There used to be a considerable market for devices to convert clock signals from one rate to another, so that one manufacturer’s drum machine could lock to another’s sequencer, but MIDI has supplanted these by specifying standard synchronisation data that shares the same data stream as note and control information.

Not all devices in a MIDI system will need access to timing information – it depends on the function fulfilled by each device. A sequencer, for example, will need some speed reference to control the rate at which recorded information is replayed and this speed reference could either be internal to the computer or provided by an external device. On the other hand, a normal synthesiser, effects unit or sampler is not normally concerned with timing information, because it has no functions affected by a timing clock. Such devices do not normally store rhythm patterns, although there are some keyboards with onboard sequencers that ought to recognise timing data.

As MIDI equipment has become more integrated with audio and video systems the need has arisen to incorporate timecode handling into the standard and into software. This has allowed sequencers either to operate relative to musical time (e.g. bars and beats) or to ‘real’ time (e.g. minutes and seconds). Using timecode, MIDI applications can be run in sync with the replay of an external audio or video machine, in order that the long-term speed relationship between the MIDI replay and the machine remains constant. Also relevant to the systems integrator is the MIDI Machine Control standard that specifies a protocol for the remote control of devices such as external recorders using a MIDI interface.

4.14.2 Music-related timing data

This section describes the group of MIDI messages that deals with ‘music-related’ synchronisation – that is synchronisation related to the passing of bars and beats as opposed to ‘real’ time in hours, minutes and seconds. It is normally possible to choose which type of sync data will be used by a software package or other MIDI receiver when it is set to ‘external sync’ mode.

A group of system messages called the ‘system realtime’ messages control the execution of timed sequences in a MIDI system and these are often used in conjunction with the song pointer (which is really a system common message) to control autolocation within a stored song. The system realtime messages concerned with synchronisation, all of which are single bytes, are:

&F8 Timing clock

&FA Start

&FB Continue

&FC Stop

The timing clock (often referred to as ‘MIDI beat clock’) is a single status byte (&F8) to be issued by the controlling device six times per MIDI beat. A MIDI beat is equivalent to a musical semiquaver or sixteenth note (see Table 4.10) so the increment of time represented by a MIDI clock byte is related to the duration of a particular musical value, not directly to a unit of real time. Twenty-four MIDI clocks are therefore transmitted per quarter note, unless the definition is changed. (As mentioned in the discussion of time signature format in MIDI files (see Section 4.10.5) some software packages allow the user to redefine the notated musical increment represented by MIDI clocks.) At any one musical tempo, a MIDI beat could be said to represent a fixed increment of time, but this time increment would change if the tempo changed.

The timing clock byte, like other system realtime messages, may temporarily interrupt other MIDI messages, the status reverting to the previous status automatically after the realtime message has been handled by a receiver. This is necessary because of the very nature of the timing clock as a synchronising message. If it were made to wait for other messages to finish, it would lose its ability to represent a true increment of time. It may be seen that there could still be a small amount of error in the timing of any clock byte within the data stream if a large amount of other data was present, because the timing byte may not interrupt until at least the break between one byte and another, but this timing error cannot be greater than plus or minus half the duration of a MIDI byte, which is 160 μs.

Table 4.10 Musical durations related to MIDI timing data

Note value	Number of MIDI beats	Number of MIDI clocks
Semibreve (whole note)	16	96
Minim (half note)	8	48
Crotchet (quarter note)	4	24
Quaver (eighth note)	2	12
Semiquaver (sixteenth note)	1	6

So the &F8 byte might appear between the two data bytes of a note on message, for example, but it would not be necessary to repeat either the entire message or the ‘note on’ status after &F8 had passed. &F8 may also interrupt running status in the same way, without the need for reiteration of the status after the timing byte has been received. MIDI clocks should be given a very high priority by receiving software, since the degree of latency in the handling of this data will affect the timing stability of synchronised replay. On receipt of &F8, a device that handles timing information should increment its internal clock by the relevant amount. This in turn will increment the internal song pointer after six MIDI clocks (i.e. one MIDI beat) have passed. Any device controlling the sequencing of other instruments should generate clock bytes at the appropriate intervals and any changes of tempo within the system should be reflected in a change in the rate of MIDI clocks. In systems where continuously varying changes have been made in the tempo, perhaps to imitate rubato effects or to add ‘human feel’ to the music, the rate of the clock bytes will reflect this.

The ‘start’, ‘stop’ and ‘continue’ messages are used to remotely control the receiver’s replay. A receiver should only begin to increment its internal clock or song pointer after it receives a start or continue message, even though some devices may continue to transmit MIDI clock bytes in the intervening periods. For example, a sequencer may be controlling a number of keyboards, but it may also be linked to a drum machine that is playing back an internally stored sequence. The two need to be locked together, so the sequencer (running in internal sync mode) would send the drum machine (running in external sync mode) a ‘start’ message at the beginning of the song, followed by MIDI clocks at the correct intervals thereafter to keep the timing between the two devices correctly related. If the sequencer was stopped it would send ‘stop’ to the drum machine, whereafter ‘continue’ would carry on playing from the stopped position, and ‘start’ would restart at the beginning. This method of synchronisation appears to be fairly basic, as it allows only for two options: playing the song from the beginning or playing it from where it has been stopped.

SPPs are used when one device needs to tell another where it is in a song. (The term ‘song’ is used widely in MIDI parlance to refer to any stored sequence.) A sequencer or synchroniser should be able to transmit song pointers to other synchronisable devices when a new location is required or detected. For example, one might ‘fast-forward’ through a song and start again twenty bars later, in which case the other timed devices in the system would have to know where to restart. An SPP would be sent followed by ‘continue’ and then regular clocks. Originally it was recommended that a gap of at least 5 seconds was left between sending a SPP and restarting the sequence, in order to give the receiver time to locate to the new position, but revisions state that a receiver should be able to register a ‘continue’ message and count subsequent MIDI clocks even while still locating, even if it is not possible to start playing immediately. Replay should begin as soon as possible, taking into account the clocks elapsed since the ‘continue’ message was received.

An SPP represents the position in a stored song in terms of number of MIDI beats (not clocks) from the start of the song. It uses two data bytes so can specify up to 16 384 MIDI beats. SPP is a system common message, not a realtime message. It is often used in conjunction with &F3 (song select), which is used to define which of a collection of stored song sequences (in a drum machine, say) is to be replayed. SPPs are fine for directing the movements of an entirely musical system, in which every action is related to a particular beat or subdivision of a beat, but not so fine when actions must occur at a particular point in real time. If, for example, one was using a MIDI system to dub music and effects to a picture in which an effect was intended to occur at a particular visual event, that effect would have to maintain its position in time no matter what happened to the music. If the effect was to be triggered by a sequencer at a particular number of beats from the beginning of the song, this point could change in real time if the tempo of the music was altered slightly to fit a particular visual scene. Clearly some means of real-time synchronisation is required either instead of, or as well as the clock and song pointer arrangement, such that certain events in a MIDI controlled system may be triggered at specific times in hours, minutes and seconds.

Recent software may recognise and be able to generate the bar marker and time signature messages. The bar marker message can be used where it is necessary to indicate the point at which the next musical bar begins. It takes effect at the next &F8 clock. Some MIDI synchronisers will also accept an audio input or a tap switch input so that the user can program a tempo track for a sequencer based on the rate of a drum beat or a rate tapped in using a switch. This can be very useful in synchronising MIDI sequences to recorded music, or fitting music which has been recorded ‘rubato’ to bar intervals.

4.14.3 Timecode and synchronisation

There are a number of ways of organising real-time synchronisation in a workstation, but they all depend on the use of timecode in one form or another. In this section the principles of timecode and its relationship to MIDI are explained.

Timecode is more correctly referred to as SMPTE/EBU time and control code. It is often just referred to as SMPTE (‘simpty’) in studios. It comes in two forms: linear timecode (LTC), which is an audio signal capable of being recorded on a tape recorder, and vertical interval timecode (VITC), which is recorded in the vertical interval of a television picture. Timecode is basically a binary data signal registering time from an arbitrary start point (which may be the time of day) in hours, minutes, seconds and frames, against which the program runs. It was originally designed for video editing, and every single frame on a particular video tape has its own unique number called the timecode address. This can be used to pinpoint a precise editing position. More recently timecode has found its way into audio, where TV frames have less meaning but are still used as a convenient subdivision of a second. Sometimes a sample offset is added to a timecode value to indicate the precise point of an edit in audio samples from the start of a frame.

A number of frame rates are available, depending on the television standard to which they relate, the frame rate being the number of still frames per second used to give the impression of continuous motion in the TV picture. Thirty frames per second (fps), or true SMPTE, was used for monochrome American television; 29.97 fps is used for colour NTSC television (mainly USA, Japan and parts of the Middle East), and is called ‘SMPTE drop-frame’; 25 fps is used for PAL and SECAM TV and is called ‘EBU’ (Europe, Australia, etc.); and 24 fps is used for some film work. SMPTE drop frame timecode is so called because in order to maintain sync with NTSC colour television pictures running at 29.97 fps it is necessary to use the 30 fps SMPTE code but to drop two frames at the start of each minute, except every tenth minute. This is a compromise solution which has the effect of introducing a short term sync error between timecode and real time, whilst maintaining reasonable control over the long-term drift.

Figure 4.19 Data format of the SMPTE/EBU longitudinal timecode frame. Note the sync word 0011111111111101 which occurs at the end of each frame to mark the boundary. This pattern does not occur elsewhere in the frame and its asymmetry allows a timecode reader to determine the direction in which the code is being played

Figure 4.20 The FM or biphase-mark channel code is used to modulate the timecode data so that it can be recorded as an audio signal

An LTC frame value is represented by an 80-bit binary ‘word’, split principally into groups of 4 bits, with each 4 bits representing a particular parameter such as tens of hours, units of hours, and so forth, in BCD (binary-coded decimal) form (see Figure 4.19). Sometimes, not all four bits per group are required – the hours only go up to ‘23’, for example – and in these cases the remaining bits are either used for special control purposes or set to zero (unassigned): 26 bits in total are used for time address information to give each frame its unique hours, minutes, seconds, frame value; 32 are ‘user bits’ and can be used for encoding information such as reel number, scene number, day of the month and the like; bit 10 denotes drop-frame mode if a binary 1 is encoded there, and bit 11 can denote colour frame mode if a binary 1 is encoded (used in video editing). The end of each word consists of 16 bits in a unique sequence, called the ‘sync word’, and this is used to mark the boundary between one frame and the next. It also allows a timecode reader to tell in which direction the code is being read, since the sync word begins with ‘11’ in one direction and ‘10’ in the other.

This binary information cannot be directly recorded as an audio signal, since its bandwidth would be too wide, so it is modulated in a simple scheme known as ‘biphase mark’, or FM, in which a transition from one state to the other (low to high or high to low) occurs at the edge of each bit period, but an additional transition is forced within the period to denote a binary 1 (see Figure 4.20). The result looks rather like a square wave with two frequencies, depending on the presence of ones and zeros in the code. The code can be read forwards or backwards, and phase inverted. Readers are available which will read timecode over a very wide range of speeds, from around 0.1 to 200 times play speed. The rise-time of the signal, that is the time it takes to swing between its two extremes, is specified as 25 μs ± 5 μs, and this requires an audio bandwidth of about 10 kHz.

VITC is recorded not on an audio track, but in the vertical sync period of a video picture, such that it can always be read when video is capable of being read, such as in slow-motion and pause modes. It is useful in applications where slow-motion cueing is to be used in the location of sync or edit points and is extracted directly from the video signal by a timecode reader. Some MIDI synchronisers can accept VITC, but this is much less common than the ability to read and write LTC.

In audio workstations timecode is not usually recorded as an audio signal on a specific ‘track’ but is derived from the system clock in relation to the replay rate of an audio or MIDI sequence. Its use as an audio signal (LTC) will probably decline as more and more synchronisation of audio, video and MIDI takes place within the workstation itself. LTC will remain useful for synchronisation with external recorders.

4.14.4 MIDI timecode (MTC)

MIDI timecode has two specific functions. Firstly, to provide a means for distributing conventional SMPTE/EBU timecode data around a MIDI system in a format that is compatible with the MIDI protocol. Secondly, to provide a means for transmitting ‘setup’ messages that may be downloaded from a controlling computer to receivers in order to program them with cue points at which certain events are to take place. The intention is that receivers will then read incoming MTC as the program proceeds, executing the pre-programmed events defined in the setup messages. Sequencers and some digital audio systems often use MIDI timecode derived from an external synchroniser or MIDI peripheral when locking to video or to another sequencer. MTC is an alternative to MIDI clocks and song pointers, for use when real time synchronisation is important.

In an LTC timecode frame, two binary data groups are allocated to each of hours, minutes, seconds and frames, these groups representing the tens and units of each, so there are eight binary groups in total representing the time value of a frame. In order to transmit this information over MIDI, it has to be turned into a format that is compatible with other MIDI data (i.e. a status byte followed by relevant data bytes). There are two types of MTC synchronising message: one that updates a receiver regularly with running timecode and another that transmits one-time updates of the timecode position. The latter can be used during high speed cueing, where regular updating of each single frame would involve too great a rate of transmitted data. The former is known as a quarter-frame message, denoted by the status byte (&F1), whilst the latter is known as a full-frame message and is transmitted as a universal realtime SysEx message.

One timecode frame is represented by too much information to be sent in one standard MIDI message, so it is broken down into eight separate messages. Each message of the group of eight represents a part of the timecode frame value, as shown in Figure 4.21, and takes the general form:

&[F1] [DATA]

Figure 4.21 General format of the quarter-frame MTC message

The data byte begins with zero (as always), and the next seven bits of the data word are made up of a 3-bit code defining whether the message represents hours, minutes, seconds or frames, MSnibble or LSnibble, followed by the four bits representing the binary value of that nibble. In order to reassemble the correct timecode value from the eight quarter-frame messages, the LS and MS nibbles of hours, minutes, seconds and frames are each paired within the receiver to form 8-bit words as follows:

Frames: rrr qqqqq

where ‘rrr’ is reserved for future use and ‘qqqqq’ represents the frames value from 0 to 29;

Seconds: rr qqqqqq

where ‘rr’ is reserved for future use and ‘qqqqqq’ represents the seconds value from 0 to 59;

Minutes: rr qqqqqq

as for seconds; and

Hours: r qq ppppp

where ‘r’ is undefined, ‘qq’ represents the timecode type, and ‘ppppp’ is the hours value from 0 to 23. The timecode frame rate is denoted as follows in the ‘qq’ part of the hours value: 00 = 24 fps; 01 = 25 fps; 10 = 30 fps drop-frame; 11 = 30 fps non-drop-frame. Unassigned bits should be set to zero.

At a frame rate of 30 fps, quarter-frame messages would be sent over MIDI at a rate of 120 messages per second. As eight messages are needed fully to represent a frame, it can be appreciated that 30 × 8 = 240 messages really ought to be transmitted per second if the receiving device were to be updated every frame, but this would involve too great an overhead in transmitted data, so the receiving device is updated every two frames. If MTC is transmitted continuously over MIDI it takes up approximately 7.5 per cent of the available data bandwidth. Quarter-frame messages can be transmitted in forward or reverse order, to emulate timecode running either forwards or backwards, with the ‘frames LSnibble’ message transmitted on the frame boundary of the timecode frame that it represents.

The receiver must in fact maintain a two-frame offset between displayed timecode and received timecode since the frame value has taken two frames to transmit completely. For real-time synchronisation purposes, the receiver may wish simply to note that time has advanced another quarter of a frame at the receipt of each quarter-frame message, rather as it advances by one-sixth of a beat on receipt of each MIDI clock. Internal synchronisation software should normally be able to flywheel or interpolate between received synchronisation messages in order to obtain higher internal resolution than that implied by the rate of the messages. For all except the fastest musical tempo values, MIDI timecode messages arrive more regularly than MIDI clocks would, so they might be considered a more reliable timing reference. Nonetheless, MIDI clocks are still needed when synchronisation is based on musical time increments.

The format of the full-frame message is as follows, falling into the group of messages known as the sysex universal realtime messages:

&[F0] [7F] [dev. ID] [01] [01] [hh] [mm] [ss] [fr] [F7]

The device ID would normally be set to &7F which signifies that the message is intended for the whole system, the sub-ID #1 of &01 denotes an MTC message, and sub-ID #2 denotes a full-frame message. Thereafter hours, minutes, seconds and frames take the same form as for quarter-frame messages.

4.15 MIDI machine control (MMC)

MIDI may be used for remotely controlling tape machines and other studio equipment, as well as musical instruments. MMC uses universal realtime SysEx messages with a sub-ID #1 of either &06 or &07 and has a lot in common with a remote control protocol known as ‘ESbus’ which was devised by the EBU and SMPTE as a universal standard for the remote control of tape machines, VTRs and other studio equipment. The ESbus standard uses an RS422 remote control bus running at 38.4 kbaud, whereas the MMC standard uses the MIDI bus for similar commands. Although MMC and ESbus are not the same and the message protocols are not identical, the command types and reporting capabilities required of machines are very similar. There are a number of levels of complexity at which MMC can be made to operate, making it possible for people to implement it at anything from a very simple level (i.e. cheaply) to a very complicated level involving all the finer points.

MMC is designed to work in either open- or closed-loop modes (see Figure 4.22). This is similar to other system exclusive applications that make use of handshaking between the transmitter and the receiver. Communication can be considered as occurring between a ‘controller’ and a ‘controlled device’, with commands flowing from the controller to the controlled device and responses returning in the opposite direction. Since a controller may address more than one controlled device at a time it is possible for a number of responses to be returned, and this situation requires careful handling, as discussed below. It is expected that MMC devices and applications will default to the closed-loop condition, but a controller should be able to detect an open-loop situation by timing out if it does not receive a response within two seconds after it has sent a message which requires one. From then on, an open loop should be assumed. Alternatively, a controller could continue to check for the completion of a closed loop by sending out regular requests for a response, changing modes after a response was received. In the closed-loop mode a simple handshaking protocol is defined, again similar in concept to the sample and file dump modes, but involving only two messages – WAIT and RESUME. These handshaking messages are used to control the flow of data between controller and controlled device in both directions, in order to prevent the overflowing of MIDI receive buffers (which would result in loss of data). Handshaking is discussed further below.

Figure 4.22 A closed-loop MMC arrangement. The controller should receive a response from the controlled device within two seconds of issuing a command which expects a response. If it does not, it should assume an open loop

Typical MMC communications involve the transmission of a command from the controller to a particular device, using its device ID as a means of identifying the destination of the command. It is also possible to address all controlled devices on the bus using the &7F device ID in place of the individual ID. Commands take the general format:

&[F0] [7F] [dev. ID] [06] [data] … … [F7]

Note that only sub-ID #1 is used here, following the device ID, and there is no sub-ID #2 in order to conserve data bandwidth. The sub-ID #1 of &06 denotes an MMC command. [data] represents the data messages forming the command, and may be from one to many bytes in length.

The amount of data making up a command depends on its type. Commands that consist of only a single byte, such as the ‘play’ command (&02), occupy the range from &01 to &3F (&00 is reserved to be used for future extensions to the command set). A typical command of this type (e.g. ‘play’) would thus be transmitted as:

&[F0] [7F] [dev. ID] [06] [02] [7F]

The handshaking messages, WAIT (&7C) and RESUME (&7F), can be issued by either the controller or any of its controlled devices. Handshaking depends on the use of a closed loop. When issued by the controller the message would normally be a command addressed to any device trying to send data back to it, and thus the device ID attached to controller handshaking messages is &7F (‘all call’). For example, a controller whose receive data buffer was approaching overflow would wish to send out a general ‘everybody WAIT’ command, to suspend MMC transmission from controlled devices until it had reduced the contents of the buffer, after which an ‘everybody RESUME’ command would be transmitted. Such a command would take the form:

&[F0] [7F] [7F] [06] [7C or 7F] [F7]

When issued by a controlled device, handshaking messages should be a response tagged with the device’s own ID, as a means of indicating to the controller which device is requesting a WAIT or RESUME. On receipt of a WAIT from a particular device ID the controller would suspend transmissions to that device but continue to transmit commands to others. Such a message would take the form:

&[F0] [7F] [dev. ID] [07] [7C or 7F] [F7]

Table 4.11 gives a list of the single byte transport commands used in the MMC protocol. The list of MMC commands and their accompanying data occupies many pages so it is not proposed to describe them in detail here. Readers should refer to the MIDI Machine Control section of the MIDI standard for the latest information. There is no mandatory set of commands or responses defined in the standard, although there are some guidelines concerning possible minimum sets for certain applications. It is possible to tell which MMC commands and responses have been implemented in a particular device by analysing the ‘signature’ of the device. The signature will normally be both published in written form in the manual, and available as a response from the controlled device. It exists in the form of a bit map in which each bit corresponds to a certain MMC function. If the bit is set to ‘1’ then the function is implemented. The signature comes in two parts: the first describing the commands implemented and the second describing the responses implemented. It also contains a header describing the version of MMC used in the device. The exact format of the signature is described in the MMC standard.

Table 4.11 Basic MMC transport controls

Command	Hex value	Comment
Stop	01
Play	02
Deferred play	03	Play after autolocate achieved
Fast fwd	04
Rewind	05
Record strobe	06	Drop into or out of record (depending on rec. ready state)
Record exit	07
Record pause	08	Enters record-pause mode
Pause	09

4.16 MIDI over USB

USB (Universal Serial Bus) is a computer peripheral interface that carries data at a much faster rate than MIDI (up to 12 Mbit s^–1 or up to 480 Mbit s^–1, depending on the version). It is very widely used on workstations and peripherals these days and it is logical to consider using it to transfer MIDI data between devices as well. The USB Implementers Forum has published a ‘USB Device Class Definition for MIDI Devices’, version 1.0, that describes how MIDI data may be handled in a USB context. It preserves the protocol of MIDI messages but packages them in such a way as to enable them to be transferred over USB. It also ‘virtualises’ the concept of MIDI IN and OUT jacks, enabling USB to MIDI conversion, and vice versa, to take place in software within a synthesiser or other device. Physical MIDI ports can also be created for external connections to conventional MIDI equipment (see Figure 4.23).

Figure 4.23 A USB MIDI function contains a USB-to-MIDI convertor that can communicate with both embedded (internal) and external MIDI jacks via MIDI IN and OUT endpoints. Embedded jacks connect to internal elements that may be synthesisers or other MIDI data processors. XFER in and out endpoints are used for bulk dumps such as DLS and can be dynamically connected with elements as required for transfers

Figure 4.24 USB MIDI packets have a 1-byte header that contains a cable number to identify the MIDI jack destination and a code index number to identify the contents of the packet and the number of active bytes

A so-called ‘USB MIDI function’ (a device that receives USB MIDI events and transfers) may contain one or more ‘elements’. These elements can be synthesisers, synchronisers, effects processors or other MIDI-controlled objects.

A USB to MIDI convertor within a device will typically have MIDI in and out endpoints as well as what are called ‘transfer’ (XFER) endpoints. The former are used for streaming MIDI events whereas the latter are used for bulk dumps of data such as those needed for downloadable sounds (DLS). MIDI messages are packaged into 32-bit USB MIDI events, which involve an additional byte at the head of a typical MIDI message. This additional byte contains a cable number address and a code index number (CIN), as shown in Figure 4.24. The cable number enables the MIDI message to be targeted at one of 16 possible ‘cables’, thereby overcoming the 16-channel limit of conventional MIDI messages, in a similar way to that used in the addressing of multiport MIDI interfaces. The CIN allows the type of MIDI message to be identified (e.g. System Exclusive; Note On), which to some extent duplicates the MIDI status byte. MIDI messages with fewer than three bytes should be padded with zeros.

The USB message transport protocol and interfacing requirements are not the topic of this book, so users are referred to the relevant USB standards for further information about implementation issues.

4.17 MIDI over IEEE 1394

IEEE 1394, or ‘Firewire’ as it is sometimes known, is another high speed serial interface encountered widely on workstations and media equipment. It is capable of data rates up to many hundreds of megabits per second. The MMA and AMEI have published a ‘MIDI Media Adaptation Layer for IEEE 1394’ that describes how MIDI data may be transferred over 1394. This is also referred to in 1394 TA (Trade Association) documents describing the ‘Audio and Music Data Transmission Protocol’ and IEC standard 61883-6 that deals with the audio part of 1394 interfaces.

The approach is similar to that used with USB, described Section 4.16, but has somewhat greater complexity. MIDI 1.0 data streams can be multiplexed into a 1394 ‘MIDI conformant data channel’ that contains eight independent MIDI streams called ‘MPX-MIDI data channels’. This way each MIDI conformant data channel can handle 8 × 16 = 128 MIDI channels (in the original sense of MIDI channels). The data are transferred in AM824 format (see Section 6.7.1), using groups of ‘quadlets’ (four bytes). The first version of the standard limits the transmission of packets to the MIDI 1.0 data rate of 31.25 kbit s^–1 for compatibility with other MIDI devices, however provision is made for transmission at substantially faster rates for use in equipment that is capable of it. This includes options for 2X and 3X MIDI 1.0 speed.

1394 cluster events can be defined that contain both audio and MIDI data. This enables the two types of information to be kept together and synchronised.

4.18 After MIDI?

Various alternatives have been proposed over the years, aiming to improve upon MIDI’s relatively limited specification and flexibility when compared with modern music control requirements and computer systems. That said, MIDI has shown surprising robustness to such ‘challenges’ and has been extended over the years so as to ameliorate some of its basic problems. Perhaps the simplicity and ubiquity of MIDI has made it attractive for developers to find ways of working with old technology that they know rather than experimenting with untried but more sophisticated alternatives.

ZIPI was a networked control approach proposed back in the early 1990s that aimed to break free from MIDI’s limitations and take advantage of faster computer network technology, but it never really gained widespread favour in commercial equipment. It has now been overtaken by more recent developments and communication buses such as USB and 1394.

Open Sound Control is currently a promising alternative to MIDI that is gradually seeing greater adoption in the computer music and musical instrument control world. Developed by Matt Wright at CNMAT (Centre for New Music and Audio Technology) in Berkeley, California, it aims to offer a transport-independent message-based protocol for communication between computers, musical instruments and multimedia devices. It does not specify a particular hardware interface or network for the transport layer, but initial implementations have tended to use UDP (user datagram protocol) over Ethernet or other fast networks as a transport means. It is not proposed to describe this protocol in detail and further details can be found at the website indicated at the end of this chapter. A short summary will be given, however.

OSC uses a form of device addressing that is very similar to an Internet URL (uniform resource locator). In other words a text address with sub-addresses that relate to lower levels in the device hierarchy. For example (not a real address) ‘/synthesiser2/voice1/oscillator3/frequency’ might refer to a particular device called ‘synthesiser2’, within which is contained voice 1, within which is oscillator 3, whose frequency value is being addressed. The minimum ‘atomic unit’ of OSC data is 4 bytes (32 bits) long, so all values are 32-bit aligned, and transmitted packets are made up of multiples of 32-bit information. Packets of OSC data contain either individual messages or so-called ‘bundles’. Bundles contain elements that are either messages or further bundles, each having a size designation that precedes it, indicating the length of the element. Bundles have time tags associated with them, indicating that the actions described in the bundle are to take place at a specified time. Individual messages are supposed to be executed immediately. Devices are expected to have access to a representation of the correct current time so that bundle timing can be related to a clock.

Useful websites

MIDI Manufacturers Association: www.midi.org

Music XML: www.musicxml.org

Open Sound Contol: cnmat.cnmat.berkeley.edu/OpenSoundControl/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4 MIDI and synthetic audio control

Create new playlist

Sign In

Sign Up