MIDI is the Music Instrument Digital Interface, a control protocol and interface standard for electronic musical instruments which has also been used widely in other music and audio products. Although it is relatively dated by modern standards it is still used extensively, which is a testament to its simplicity and success. Even if the MIDI hardware interface is used less these days, either because more synthesis, sampling and processing takes place using software within the workstation, or because other data interfaces such as USB, Firewire and Ethernet are becoming popular, the original protocol for communicating events and other control information is still widely encountered. A lot of software that runs on computers uses MIDI as a basis for controlling the generation of sounds and external devices.

Synthetic audio is used increasingly in audio workstations and mobile devices as a very efficient means of audio representation, because it only requires control information and sound object descriptions to be transmitted. Standards such as MPEG-4 Structured Audio enable synthetic audio to be used as an alternative or an addition to natural audio coding and this can be seen as a natural evolution of the MIDI concept in interactive multimedia applications.

We also include in this chapter coverage of some recent developments in networked remote control of audio systems.

BACKGROUND

Electronic musical instruments existed widely before MIDI was developed in the early 1980s, but no universal means existed of controlling them remotely. Many older musical instruments used analog voltage control, rather than being controlled by a microprocessor, and thus used a variety of analog remote interfaces (if indeed any facility of this kind was provided at all). Such interfaces commonly took the form of one port for timing information, such as might be required by a sequencer or drum machine, and another for pitch and key triggering information, as shown in Figure 14.1. The latter, commonly referred to as ‘CV and gate’, consisted of a DC (direct current) control line carrying a variable control voltage (CV) which was proportional to the pitch of the note, and a separate line to carry a trigger pulse. A common increment for the CV was 1 volt per octave. Notes on a synthesizer could be triggered remotely by setting the CV to the correct pitch and sending a ‘note on’ trigger pulse which would initiate a new cycle of the synthesizer’s envelope generator. Such an interface would deal with only one note at a time, but many older synths were only monophonic in any case (that is, they were only capable of generating a single voice).

Instruments with onboard sequencers needed a timing reference in order that they could be run in synchronization with other such devices, and this commonly took the form of a square pulse train at a rate related to the current musical tempo, often connected to the device using a DIN-type connector, along with trigger lines for starting and stopping a sequence’s execution. There was no universal agreement over the rate of this external clock, and frequencies measured in pulses per musical quarter note (ppqn), such as 24 ppqn and 48 ppqn, were used by different manufacturers. A number of conversion boxes were available which divided or multiplied clock signals in order that devices from different manufacturers could be made to work together.

FIGURE 14.1 Prior to MIDI control, electronic musical instruments tended to use a DC remote interface for pitch and note triggering. A second interface handled a clock signal to control tempo and trigger pulses to control the execution of a stored sequence.

As microprocessor control began to be more widely used in musical instruments a number of incompatible digital control interfaces sprang up, promoted by the large synthesizer manufacturers, some serial and some parallel. Needless to say the plethora of non-standardized approaches to remote control made it difficult to construct an integrated system, especially when integrating equipment from different manufacturers. Owing to collaboration between the major parties in America and Japan, the way became clear for agreement over a common hardware interface and command protocol, resulting in the specification of the MIDI standard in late 1982/early 1983. This interface grew out of an amalgamation of a proposed universal interface called USI (the Universal Synthesizer Interface) which was intended mainly for note on and off commands, and a Japanese specification which was rather more complex and which proposed an extensive protocol to cover other operations as well.

The standard has been subject to a number of addenda, extending the functionality of MIDI far beyond the original. The original specification was called the MIDI 1.0 specification, to which has been added such addenda as the MIDI Sample Dump protocol, MIDI Files, General MIDI (1 and 2), MIDI TimeCode, MIDI Show Control, MIDI Machine Control and Downloadable Sounds. A new ‘HD’ (High Definition) version of the standard was originally planned for release in 2009, which is expected to include support for more channels and controllers, as well as greater controller resolution using single messages. Additional aims include direct control of note pitch and transport of the protocol using UDP over network systems such as Ethernet. It is aimed to make this compatible with existing hardware and software. At the time of writing this seventh edition, in January 2013, the HD Working Group was just about to demonstrate prototype hardware and software, along with a working draft of the new standard, but this was not available for public consumption. The MIDI Manufacturer’s Association (MMA) is now the primary association governing formal extensions to the standard, liaising closely with a Japanese association called AMEI (Association of Musical Electronics Industry).

WHAT IS MIDI?

MIDI is a digital remote control interface for music systems, but has come to relate to a wide range of standards and specifications to ensure interoperability between electronic music systems. MIDI-controlled equipment is normally based on microprocessor control, with the MIDI interface forming an I/O port. It is a measure of the popularity of MIDI as a means of control that it has now been adopted in many other audio and visual systems, including the automation of mixing consoles, the control of studio outboard equipment, lighting equipment and other machinery. Although many of its standard commands are music related, it is possible either to adapt music commands to non-musical purposes or to use command sequences designed especially for alternative methods of control.

The adoption of a serial communication standard for MIDI was dictated largely by economic and practical considerations, enabling it to be installed on relatively cheap items of equipment and available to as wide a range of users as possible. The simplicity and ease of installation of MIDI systems was largely responsible for its rapid proliferation as an international standard.

Unlike its analog predecessors, MIDI integrates timing and system control commands with pitch and note triggering commands, such that everything may be carried in the same format over the same piece of wire. MIDI makes it possible to control musical instruments polyphonically in pseudo real time: that is, the speed of transmission is such that delays in the transfer of performance commands are not audible in the majority of cases. It is also possible to address a number of separate receiving devices within a single MIDI data stream, and this allows a controlling device to determine the destination of a command.

MIDI AND DIGITAL AUDIO CONTRASTED

For many the distinction between MIDI and digital audio may be a clear one, but those new to the subject often confuse the two. Any confusion is often due to both MIDI and digital audio equipment appearing to perform the same task — that is, the recording of multiple channels of music using digital equipment — and is not helped by the way in which some manufacturers refer to MIDI sequencing as digital recording.

Digital audio involves a process whereby an audio waveform (such as the line output of a musical instrument) is sampled regularly and then converted into a series of binary words that represent the sound waveform, as described in Chapter 8. A digital audio recorder stores this sequence of data and can replay it by passing the original data through a digital-to-analog convertor that turns the data back into a sound waveform, as shown in Figure 14.2. A multitrack recorder has a number of independent channels that work in the same way, allowing a sound recording to be built up in layers. MIDI, on the other hand, handles digital information that controls the generation of sound. MIDI data does not represent the sound waveform itself. When a multitrack music recording is made using a MIDI sequencer (described later) this control data is stored, and can be replayed by transmitting the original data to a collection of MIDI-controlled musical instruments. It is the instruments that actually reproduce the recording.

A digital audio recording, then, allows any sound to be stored and replayed without the need for additional hardware. It is useful for recording acoustic sounds such as voices. A MIDI recording is almost useless without a collection of sound generators. An interesting advantage of the MIDI recording is that, since the stored data represents event information describing a piece of music, it is possible to change the music by changing the event data. MIDI recordings also consume a lot less memory space than digital audio recordings. It is also possible to transmit a MIDI recording to a different collection of instruments from those used during the original recording, thus resulting in a different sound. It is common for MIDI and digital audio recording to be integrated in one software package, allowing the two to be edited and manipulated in parallel. In some cases simple audio information can be converted into MIDI commands (for example a solo melody line converted into the nearest equivalent in terms of MIDI note and controller messages). This preserves essential information about the musical line but not usually anything about the subtleties of acoustics, sound quality or timbre in a digital recording.

FIGURE 14.2
(a) Digital audio recording and (b) MIDI recording contrasted. In (a) the sound waveform itself is converted into digital data and stored, whereas in (b) only control information is stored, and a MIDI-controlled sound generator is required during replay.

BASIC PRINCIPLES

The interface

The MIDI standard specifies a unidirectional serial interface (see Fact File 8.6, Chapter 8) running at 31.25 kbit/s ± 1%. The rate was defined at a time when the clock speeds of microprocessors were typically much slower than they are today, this rate being a convenient division of the typical 1 or 2MHz master clock rate. The rate had to be slow enough to be carried without excessive losses over simple cables and interface hardware, but fast enough to allow musical information to be transferred from one instrument to another without noticeable delays. Control messages are sent as groups of bytes. Each byte is preceded by one start bit and followed by one stop bit per byte in order to synchronize reception of the data which is transmitted asynchronously, as shown in Figure 14.3. The addition of start and stop bits means that each 8 bit word actually takes ten bit periods to transmit (lasting a total of 320μs). Standard MIDI messages typically consist of one, two or three bytes, although there are longer messages for some purposes.

The hardware interface is shown in Fact File 14.1. In the MIDI specification, the opto-isolator is defined as having a rise time of no more than 2μs. The rise time affects the speed with which the device reacts to a change in its input and if slow will tend to distort the leading edge of data bit cells. If a large number of MIDI devices are wired in series (that is from THRU to IN a number of times) the data will be forced to pass through a number of opto-isolators and thus will suffer the combined effects of a number of stages of rise-time distortion. The better the specification of the opto-isolator, the more stages of device cascading will be possible before unacceptable distortion is introduced. The delay in data passed between IN and THRU is only a matter of microseconds, so this contributes little to any audible delays perceived in the musical outputs of some instruments in a large system. The bulk of any perceived delay will be due to other factors like processing delay, buffer delays and traffic.

FIGURE 14.3
A MIDI message consists of a number of bytes, each transmitted serially and asynchronously by a UART in this format, with a start and stop bit to synchronize the receiving UART. The total period of a MIDI data byte, including start and stop bits, is 320μs.

FACT FILE 14.1 MIDI HARDWARE INTERFACE

Most equipment using MIDI has three interface connectors: IN, OUT and THRU. The OUT connector carries data that the device itself has generated. The IN connector receives data from other devices and the THRU connector is a direct throughput of the data that is present at the IN. As can be seen from the hardware interface diagram, it is simply a buffered feed of the input data, and it has not been processed in any way. A few cheaper devices do not have THRU connectors, but it is possible to obtain ‘MIDI THRU boxes’ which provide a number of ‘THRUs’ from one input. Occasionally, devices without a THRU socket allow the OUT socket to be switched between OUT and THRU functions.

The interface incorporates an opto-isolator between the MIDI IN (that is the receiving socket) and the device’s microprocessor system. This is to ensure that there is no direct electrical link between devices and helps to reduce the effects of any problems which might occur if one instrument in a system were to develop an electrical fault. An opto-isolator is an encapsulated device in which a light-emitting diode (LED) can be turned on or off depending on the voltage applied across its terminals, illuminating a photo-transistor which consequently conducts or not, depending on the state of the LED. Thus the data is transferred optically, rather than electrically.

The specification of cables and connectors is described in Fact File 14.2. This form of hardware interface is increasingly referred to as ‘MIDI-DIN’ to distinguish it from other means of transferring MIDI data.

Implementations of MIDI that work over other hardware interfaces such as Ethernet (using Internet Protocol/UDP), USB and Firewire (IEEE 1394) have also been introduced, sometimes in proprietary form. These are described briefly later in the chapter.

FACT FILE 14.2 MIDI CONNECTORS AND CABLES

The connectors used for MIDI interfaces are 5-pin DIN types. Only three of the pins of a 5-pin DIN plug are actually used in most equipment (the three innermost pins). In the cable, pin 5 at one end should be connected to pin 5 at the other, and likewise pin 4 to pin 4, and pin 2 to pin 2. The cable should be a shielded twisted pair. Within the receiver the MIDI IN does not have pin 2 connected to earth. This is to avoid earth loops and makes it possible to use a cable either way round. Professional microphone cable terminated in DIN connectors may be used as a higher-quality solution, because domestic cables will not always be a shielded twisted pair and thus are more susceptible to external interference, as well as radiating more themselves which could interfere with adjacent audio signals. A 5mA current loop is created between a MIDI OUT or THRU and a MIDI IN, when connected with the appropriate cable, and data bits are signaled by the turning on and off of this current by the sending device. This principle is shown in the diagram.

It is recommended that no more than 15m of cable is used for a single cable run in a simple MIDI system and investigation of typical cables indicates that corruption of data does indeed ensue after longer distances, although this is gradual and depends on the electromagnetic interference conditions, the quality of cable and the equipment in use. Longer distances may be accommodated with the use of buffer or ‘booster’ boxes that compensate for some of the cable losses and retransmit the data. It is also possible to extend a MIDI system by using a data network with an appropriate interface.

Simple interconnection

In the simplest MIDI system, one instrument could be connected to another as shown in Figure 14.4. Here, instrument 1 sends information relating to actions performed on its own controls (notes pressed, pedals pressed, etc.) to instrument 2, which imitates these actions as far as it is able. This arrangement can be used for ‘doubling-up’ sounds, ‘layering’ or ‘stacking’, such that a composite sound can be made up from two synthesizers’ outputs. (The audio outputs of the two instruments would have to be mixed together for this effect to be heard.) Larger MIDI systems could be built up by further ‘daisy-chaining’ of instruments, such that instruments further down the chain all received information generated by the first (see Figure 14.5), although this is not a very satisfactory way of building a large MIDI system. In large systems some form of central routing helps to avoid MIDI ‘traffic jams’ and simplifies interconnection.

INTERFACING A COMPUTER TO A MIDI SYSTEM

Adding MIDI ports

In order to use a workstation as a central controller for a MIDI system it must have at least one MIDI interface, consisting of at least an IN and an OUT port. (THRU is not strictly necessary in most cases.) Unless the computer has a built-in interface, as found on the old Atari machines, some form of third-party hardware interface must be added and there are many ranging from simple single ports to complex multiple port products.

FIGURE 14.4 The simplest form of MIDI interconnection involves connecting two instruments together as shown.

FIGURE 14.5 Further instruments can be added using THRU ports as shown, in order that messages from instrument 1 may be transmitted to all the other instruments.

A typical single port MIDI interface can be connected either to one of the spare I/O ports of the computer (a serial, Firewire or USB port, for example), or can be installed as an expansion slot card (perhaps as part of an integrated sound card). Multiport interfaces have become widely used in MIDI systems where more than 16 MIDI channels are required, and they are also useful as a means of limiting the amount of data sent or received through any one MIDI port. (A single port can become ‘overloaded’ with MIDI data if serving a large number of devices, resulting in data delays.) Multiport interfaces are normally more than just a parallel distribution of a single MIDI data stream, typically handling a number of independent MIDI data streams that can be separately addressed by the operating system drivers or sequencer software. USB and Firewire MIDI protocols allow a particular stream or ‘cable’ to be identified so that each stream controlling 16 MIDI channels can be routed to a particular physical port or instrument.

EMagic’s Unitor8 interface is pictured in Figure 14.6. It has RS-232 and -422 serial ports as well as a USB port to link with the host workstation. There are eight MIDI ports with two on the front panel for easy connection of ‘guest’ devices or controllers that are not installed at the back. This device also has VITC and LTC timecode ports in order that synchronization information can be relayed to and from the computer. A multi-device MIDI system is pictured in Figure 14.7, showing a number of multitimbral sound generators connected to separate MIDI ports and a timecode connection to an external video tape recorder for use in synchronized post-production. As more of these functions are now being provided within the workstation (e.g. synthesis, video, mixing) the number of devices connected in this way will reduce.

FIGURE 14.6 (a) Front and (b) back panels of the Emagic Unitor 8 interface, showing USB port, RS-422 port, RS-232 port, LTC and VITC ports and multiple MIDI ports.

FIGURE 14.7 A typical multi-machine MIDI system interfaced to a computer via a multiport interface connected by a high-speed link (e.g. USB).

Drivers and I/O software

Most audio and MIDI hardware requires ‘driver’ software of some sort to enable the operating system (OS) to ‘see’ the hardware and use it correctly. These are now designed as ‘hardware abstraction layers’ (HALs) that enable applications to communicate more effectively with I/O hardware. Whereas previously it would have been necessary to install a third-party MIDI HAL such as OMS (Opcode’s Open Music System) or MIDI Manager to route MIDI data to and from multiport interfaces and applications, these features are now included within the Mac and Windows operating system. For example Apple has its Core MIDI specification for OS X, which is a collection of APIs (application programming interfaces) that communicate with MIDI devices. Core Audio is its audio specification (see Chapter 9). Microsoft has also done something similar for Windows systems, with the Windows Driver Model (WDM). DirectSound is the Microsoft equivalent of Apple’s OS X Core Audio, while DirectMusic is the equivalent for MIDI data. Steinberg’s ASIO (Audio Stream Input Output) is a third-party alternative that handles digital audio and MIDI. It handles a range of audio sampling frequencies and bit depths, as well as multiple channel I/O, and many sound cards and applications are ASIO-compatible.

HOW MIDI CONTROL WORKS

MIDI channels

MIDI messages are made up of a number of bytes as explained in Fact File 14.3. Each part of the message has a specific purpose, and one of these is to define the receiving channel to which the message refers. In this way, a controlling device can make data device specific — in other words it can define which receiving instrument will act on the data sent. If a device is set in software to receive on a specific channel or on a number of channels it will act only on information ‘tagged’ with its own channel numbers. Everything else it will usually ignore. There are 16 basic MIDI channels and instruments can usually be set to receive on any specific channel or channels (omni off mode), or to receive on all channels (omni on mode). The latter mode is useful as a means of determining whether anything at all is being received by the device.

The limit of 16 MIDI channels can be overcome easily by using multi-port MIDI interfaces connected to a computer. In such cases it is important not to confuse the MIDI data channel with the physical port to which a device may be connected, since each physical port will be capable of transmitting on all 16 data channels.

Channel and system messages contrasted

Two primary classes of message exist: those that relate to specific MIDI channels and those that relate to the system as a whole.

FACT FILE 14.3 MIDI MESSAGE FORMAT

There are two basic types of MIDI message byte: the status byte and the data byte. The first byte in a MIDI message is normally a status byte. Standard MIDI messages can be up to 3 bytes long, but not all messages require 3 bytes, and there are some fairly common exceptions to the rule which are described below. The standard has been extended and refined over the years and the following is only an introduction to the basic messages. The prefix ‘&’ will be used to indicate hexadecimal values (see Table 8.1); individual MIDI message bytes will be delineated using square brackets, e.g. [&45], and channel numbers will be denoted using ‘n’ to indicate that the value may be anything from &0 to &F (channels 1 to 16). The table shows the format and content of MIDI messages under each of the statuses.

Status bytes always begin with a binary one to distinguish them from data bytes, which always begin with a zero. Because the most significant bit (MSB) of each byte is reserved to denote the type (status or data) there are only 7 active bits per byte which allows 2⁷ (that is 128) possible values. As shown in the figure below, the first half of the status byte denotes the message type and the second half denotes the channel number. Because 4 bits of the status byte are set aside to indicate the channel number, this allows for 2⁴ (or 16) possible channels. There are only 3 bits to denote the message type, because the first bit must always be a one. This theoretically allows for eight message types, but there are some special cases in the form of system messages (see below).

Not all MIDI devices will have all the commands implemented, since it is not mandatory for a device conforming to the MIDI standard to implement every possibility.

Message Status Data 1 Data 2

Note off &8n Note number Velocity

Note on &9n Note number Velocity

Polyphonic aftertouch &An Note number Pressure

Control change &Bn Controller number Data

Program change &Cn Program number —

Channel aftertouch &Dn Pressure —

Pitch wheel &En LSbyte MSbyte

System exclusive

System exclusive start &F0 Manufacturer ID Data, (Data), (Data)

End of SysEx &F7 —

System common

Quarter frame &F1 Data — Msbyte

Song pointer &F2 LSbyte —

Song select &F3 Song number

Tune request &F6 —

System real time

Timing clock &F8 — —

Start &FA — —

Continue &FB — —

Stop &FC — —

Active sensing &FE — —

Reset &FF — —

Channel messages start with status bytes in the range &8n to &En (they start at hexadecimal eight because the MSB must be a one for a status byte). System messages all begin with &F, and do not contain a channel number. Instead the least significant nibble of the system status byte is used for further identification of the system message, such that there is room for 16 possible system messages running from &F0 to &FF. System messages are themselves split into three groups: system common, system exclusive and system real time. The common messages may apply to any device on the MIDI bus, depending only on the device’s ability to handle the message. The exclusive messages apply to whichever manufacturers’ devices are specified later in the message (see below) and the real-time messages are for synchronizing devices to be to the prevailing musical tempo. (Some of the so-called real-time messages do not really seem to deserve this appellation, as discussed below.) The status byte &F1 is used for MIDI TimeCode.

MIDI channel numbers are usually referred to as ‘channels one to 16’, but the binary numbers representing these run from zero to 15 (&0 to &F), as 15 is the largest decimal number which can be represented with 4 bits. Thus the note on message for channel 5 is actually &94 (nine for note on, and four for channel 5).

Note on and note off messages

Much of the musical information sent over a typical MIDI interface will consist of note messages. As indicated by the titles, the note on message turns on a musical note, and the note off message turns it off. Note on takes the general format:

and note off takes the form:

A MIDI instrument will generate note on messages at its MIDI OUT corresponding to whatever notes are pressed on the keyboard, on whatever channel the instrument is set to transmit. Also, any note which has been turned on must subsequently be turned off in order for it to stop sounding, thus if one instrument receives a note on message from another and then loses the MIDI connection for any reason, the note will continue sounding ad infinitum. This situation can occur if a MIDI cable is pulled out during transmission.

Table 14.1 MIDI Note Numbers Related to the Musical Scale

Musical Note	MIDI Note Number
C−2	0
C−1	12
C0	24
C1	36
C2	48
C3 (middle C)	60 (Yamaha convention)
C4	72
C5	84
C6	96
C7	108
C8	120
G8	127

MIDI note numbers relate directly to the western musical chromatic scale and the format of the message allows for 128 note numbers which cover a range of a little over ten octaves — adequate for the full range of most musical material. This quantization of the pitch scale is geared very much towards keyboard instruments, being less suitable for other instruments and cultures where the definition of pitches is not so black and white. Nonetheless, means have been developed of adapting control to situations where unconventional tunings are required. It also seems likely that the upcoming HD MIDI standard will allow for direct control of note pitch. Note numbers normally relate to the musical scale as shown in Table 14.1, although there is a certain degree of confusion here. Yamaha established the use of C3 for middle C, whereas others have used C4. Some software allows the user to decide which convention will be used for display purposes.

Velocity information

Note messages are associated with a velocity byte that is used to represent the speed at which a key was pressed or released. The former will correspond to the force exerted on the key as it is depressed: in other words, ‘how hard you hit it’ (called ‘note on velocity’). It is used to control parameters such as the volume or timbre of the note at the audio output of an instrument and can be applied internally to scale the effect of one or more of the envelope generators in a synthesizer. This velocity value has 128 possible states, but not all MIDI instruments are able to generate or interpret the velocity byte, in which case they will set it to a value half way between the limits, i.e. 64₁₀. Some instruments may act on velocity information even if they are unable to generate it themselves. It is recommended that a logarithmic rather than linear relationship should be established between the velocity value and the parameter which it controls, since this corresponds more closely to the way in which musicians expect an instrument to respond, although some instruments allow customized mapping of velocity values to parameters. The note on, velocity zero value is reserved for the special purpose of turning a note off, for reasons that will become clear under ‘Running status’, below.

Note off velocity (or ‘release velocity’) is not widely used, as it relates to the speed at which a note is released, which is not a parameter that affects the sound of many normal keyboard instruments. Nonetheless it is available for special effects if a manufacturer decides to implement it.

Running status

Running status is an accepted method of reducing the amount of data transmitted. It involves the assumption that once a status byte has been asserted by a controller there is no need to reiterate this status for each subsequent message of that status, so long as the status has not changed in between. Thus a string of notes on messages could be sent with the note on status only sent at the start of the series of note data, for example:

For a long string of notes this could reduce the amount of data sent by nearly one-third. But in most music each note on is almost always followed quickly by a note off for the same note number, so note on, velocity zero (see above) allows a string of what appears to be note on messages to act as both note on and note off.

Running status is not used at all times for a string of same-status messages and will often only be called upon by an instrument’s software when the rate of data exceeds a certain point. Indeed, an examination of the data from a typical synthesizer indicates that running status is not used during a large amount of ordinary playing.

Polyphonic key pressure (aftertouch)

The key pressure messages are sometimes called ‘aftertouch’ by keyboard manufacturers. This message refers to the amount of pressure placed on a key at the bottom of its travel, and it is often applied to performance parameters such as vibrato.

The polyphonic key pressure message is not widely used, as it transmits a separate value for every key on the keyboard and thus requires a separate sensor for every key. This can be expensive to implement and is beyond the scope of many keyboards, so most manufacturers have resorted to the use of the channel pressure message (see below). The message takes the general format:

Implementing polyphonic key pressure messages involves the transmission of a considerable amount of data that might be unnecessary, as the message will be sent for every note in a chord every time the pressure changes. As most people do not maintain a constant pressure on the bottom of a key whilst playing, many redundant messages might be sent per note. A technique known as ‘controller thinning’ may be used by a device to limit the rate at which such messages are transmitted and this may be implemented either before transmission or at a later stage using a computer. Alternatively this data may be filtered out altogether if it is not required.

Control change

As well as note information, a MIDI device transmits information that corresponds to the various switches, control wheels and pedals associated with it. These come under the control change message group and should be distinguished from program change messages. The controller messages have proliferated enormously since the early days of MIDI and not all devices will implement all of them. The control change message takes the general form:

so a number of controllers may be addressed using the same type of status byte by changing the controller number.

Although the original MIDI standard did not lay down any hard and fast rules for the assignment of physical control devices to logical controller numbers, there is now common agreement amongst manufacturers that certain controller numbers will be used for certain purposes. These are assigned by the MMA. There are two distinct kinds of controller: the switch type and the analog type. The analog controller is any continuously variable wheel, lever, slider or pedal that might have any one of a number of positions and these are often known as continuous controllers. There are 128 controller numbers available and these are grouped as shown in Table 14.2. Table 14.3 shows a more detailed breakdown of some of these, as found in the majority of MIDI-controlled musical instruments, although the full list is regularly updated by the MMA.

Table 14.2 Controller Classifications

Controller Number (hex)	Function
&00–1F	14 bit controllers, Msbyte
&20—3F	14 bit controllers, Lsbyte
&40–65	7 bit controllers or switches
&66–77	Originally undefined
&78—7F	Channel mode control

The first 64 controller numbers (that is up to &3F) relate to only 32 physical controllers (the continuous controllers). This is to allow for greater resolution in the quantization of position than would be feasible with the 7 bits that are offered by a single data byte. The first 32 controllers handle the most significant byte (MSbyte) of the controller data, whilst the second 32 handle the least significant byte (LSbyte). In this way, controller numbers &06 and &38 both represent the data entry slider, for example. Together, the data values can make up a 14 bit number (because the first bit of each data word has to be a zero), which allows the quantization of a control’s position to be one part in 2¹⁴ (16 384₁₀). If a system opts not to use the extra resolution offered by the second byte, it should send only the MSbyte for coarse control. In practice this is all that is transmitted on many devices.

On/off switches can be represented easily in binary form (0 for OFF, 1 for ON), and it would be possible to use just a single bit for this purpose, but, in order to conform to the standard format of the message, switch states are normally represented by data values between &00 and &3F for OFF and &40 and &7F for ON. In other words switches are now considered as 7 bit continuous controllers. In older systems it may be found that only &00 = OFF and &7F = ON.

The data increment and decrement buttons that are present on many devices are assigned to two specific controller numbers (&60 and &61) and an extension to the standard defines four controllers (&62 to &65) that effectively expand the scope of the control change messages. These are the registered and non-registered parameter controllers (RPCs and NRPCs).

Table 14.3 MIDI Controller Functions

Controller Number (hex)	Function
00	Bank select
01	Modulation wheel
02	Breath controller
03	Undefined
04	Foot controller
05	Portamento time
06	Data entry slider
07	Main volume
08	Balance
09	Undefined
0A	Pan
0B	Expression controller
0C	Effect control 1
0D	Effect control 2
0E–0F	Undefined
10–13	General purpose controllers 1–4
14—1F	Undefined
20—3F	LSbyte for 14 bit controllers (same function order as 00–1F)
40	Sustain pedal
41	Portamento on/off
42	Sostenuto pedal
43	Soft pedal
44	Legato footswitch
45	Hold 2
46—4F	Sound controllers
50–53	General purpose controllers 5–8
54	Portamento control
55—5A	Undefined
5B–5F	Effects depth 1–5
60	Data increment
61	Data decrement
62	NRPC LSbyte (non-registered parameter controller)
63	NRPC MSbyte
64	RPC LSbyte (registered parameter controller)
65	RPC MSbyte
66–77	Undefined
78	All sounds off
79	Reset all controllers
7A	Local on/off
7B	All notes off
7C	Omni receive mode off
7D	Omni receive mode on
7E	Mono receive mode
7F	Poly receive mode

The ‘all notes off’ command (frequently abbreviated to ‘ANO’) was designed as a means of silencing devices, but it does not necessarily have this effect in practice. What actually happens varies between instruments, especially if the sustain pedal is held down or notes are still being pressed manually by a player. All notes off is supposed to put all note generators into the release phase of their envelopes, and clearly the result of this will depend on what a sound is programmed to do at this point. The exception should be notes which are being played whilst the sustain pedal is held down, which should only be released when that pedal is released. ‘All sounds off’ was designed to overcome the problems with ‘all notes off’, by turning sounds off as quickly as possible. ‘Reset all controllers’ is designed to reset all controllers to their default state, in order to return a device to its ‘standard’ setting.

Channel modes

Although grouped with the controllers, under the same status, the channel mode messages differ somewhat in that they set the mode of operation of the instrument receiving on that particular channel.

‘Local on/off’ is used to make or break the link between an instrument’s keyboard and its own sound generators. Effectively there is a switch between the output of the keyboard and the control input to the sound generators which allows the instrument to play its own sound generators in normal operation when the switch is closed (see Figure 14.8). If the switch is opened, the link is broken and the output from the keyboard feeds the MIDI OUT whilst the sound generators are controlled from the MIDI IN. In this mode the instrument acts as two separate devices: a keyboard without any sound, and a sound generator without a keyboard. This configuration can be useful when the instrument in use is the master keyboard for a large sequencer system, where it may not always be desired that everything played on the master keyboard results in sound from the instrument itself.

FIGURE 14.8 The ‘local off’ switch disconnects a keyboard from its associated sound generators in order that the two parts may be treated independently in a MIDI system.

‘Omni off’ ensures that the instrument will only act on data tagged with its own channel number(s), as set by the instrument’s controls. ‘Omni on’ sets the instrument to receive on all of the MIDI channels. In other words, the instrument will ignore the channel number in the status byte and will attempt to act on any data that may arrive, whatever its channel. Devices should power up in this mode according to the original specification, but more recent devices will tend to power up in the mode in which they were left. Mono mode sets the instrument such that it will only reproduce one note at a time, as opposed to ‘Poly’ (phonic) in which a number of notes may be sounded together.

Mono mode tends to be used mostly on MIDI guitar synthesizers because each string can then have its own channel and each can control its own set of pitch bend and other parameters. The mode also has the advantage that it is possible to play in a truly legato fashion — that is, with a smooth takeover between the notes of a melody — because the arrival of a second note message acts simply to change the pitch if the first one is still being held down, rather than retriggering the start of a note envelope. The legato switch controller allows a similar type of playing in polyphonic modes by allowing new note messages only to change the pitch.

In poly mode the instrument will sound as many notes as it is able to at the same time. Instruments differ as to the action taken when the number of simultaneous notes is exceeded: some will release the first note played in favor of the new note, whereas others will refuse to play the new note. Some may be able to route excess note messages to their MIDI OUT ports so that they can be played by a chained device. The more intelligent of them may look to see if the same note already exists in the notes currently sounding and only accept a new note if is not already sounding. Even more intelligently, some devices may release the quietest note (that with the lowest velocity value), or the note furthest through its velocity envelope, to make way for a later arrival. It is also common to run a device in poly mode on more than one receive channel, provided that the software can handle the reception of multiple polyphonic channels. A multi-timbral sound generator may well have this facility, commonly referred to as ‘multi’ mode, making it act as if it were a number of separate instruments each receiving on a separate channel. In multi mode a device may be able to dynamically assign its polyphony between the channels and voices in order that the user does not need to assign a fixed polyphony to each voice.

Program change

The program change message is used most commonly to change the ‘patch’ of an instrument or other device. A patch is a stored configuration of the device, describing the setup of the tone generators in a synthesizer and the way in which they are interconnected. Program change is channel specific and there is only a single data byte associated with it, specifying to which of 128 possible stored programs the receiving device should switch. On non-musical devices such as effects units, the program change message is often used to switch between different effects and the different effects programs may be mapped to specific program change numbers. The message takes the general form:

If a program change message is sent to a musical device it will usually result in a change of voice, as long as this facility is enabled. Exactly which voice corresponds to which program change number depends on the manufacturer. It is quite common for some manufacturers to implement this function in such a way that a data value of zero gives voice number one. This results in a permanent offset between the program change number and the voice number, which should be taken into account in any software. On some instruments, voices may be split into a number of ‘banks’ of 8, 16 or 32, and higher banks can be selected over MIDI by setting the program change number to a value which is 8, 16 or 32 higher than the lowest bank number. For example, bank 1, voice 2, might be selected by program change &01, whereas bank 2, voice 2, would probably be selected in this case by program change &11, where there were 16 voices per bank. Where more than 128 voices need to be addressed remotely, the more recent ‘bank select’ command may be implemented.

Channel aftertouch

Most instruments use a single sensor, often in the form of a pressure-sensitive conductive plastic bar running the length of the keyboard, to detect the pressure applied to keys at the bottom of their travel. In the case of channel aftertouch, one message is sent for the entire instrument and this will correspond to an approximate total of the pressure over the range of the keyboard, the strongest influence being from the key pressed the hardest. (Some manufacturers have split the pressure detector into upper and lower keyboard regions, and some use ‘intelligent’ zoning.) The message takes the general form:

There is only one data byte, so there are 128 possible values and, as with the polyphonic version, many messages may be sent as the pressure is varied at the bottom of a key’s travel. Controller ‘thinning’ may be used to reduce the quantity of these messages, as described above.

Pitch bend wheel

The pitch wheel message has a status byte of its own, and carries information about the movement of the sprung-return control wheel on many keyboards which modifies the pitch of any note(s) played. It uses two data bytes in order to give 14 bits of resolution, in much the same way as the continuous controllers, except that the pitch wheel message carries both bytes together. Fourteen data bits are required so that the pitch appears to change smoothly, rather than in steps (as it might with only 7 bits). The pitch bend message is channel specific so ought to be sent separately for each individual channel. This becomes important when using a single multi-timbral device in mono mode (see above), as one must ensure that a pitch bend message only affects the notes on the intended channel. The message takes the general form:

The value of the pitch bend controller should be halfway between the lower and upper range limits when it is at rest in its sprung central position, thus allowing bending both down and up. This corresponds to a hex value of &2000, transmitted as &[En] [00] [40]. The range of pitch controlled by the bend message is set on the receiving device itself, or using the RPC designated for this purpose (see ‘Control change’, p. 424).

System exclusive

A system exclusive message is one that is unique to a particular manufacturer and often a particular instrument. The only thing that is defined about such messages is how they are to start and finish, with the exception of the use of system exclusive messages for universal information, as discussed elsewhere. System exclusive messages generated by a device will naturally be produced at the MIDI OUT, not at the THRU, so a deliberate connection must be made between the transmitting device and the receiving device before data transfer may take place.

Occasionally it is necessary to make a return link from the OUT of the receiver to the IN of the transmitter so that two-way communication is possible and so that the receiver can control the flow of data to some extent by telling the transmitter when it is ready to receive and when it has received correctly (a form of handshaking).

The message takes the general form:

where [ident.] identifies the relevant manufacturer ID, a number defining which manufacturer’s message is to follow. Originally, manufacturer IDs were a single byte but the number of IDs has been extended by setting aside the [00] value of the ID to indicate that two further bytes of ID follow. Manufacturer IDs are therefore either 1 or 3 bytes long. A full list of manufacturer IDs is available from the MMA.

Data of virtually any sort can follow the ID. It can be used for a variety of miscellaneous purposes that have not been defined in the MIDI standard and the message can have virtually any length that the manufacturer requires. It is often split into packets of a manageable size in order not to cause receiver memory buffers to overflow. Exceptions are data bytes that look like other MIDI status bytes (except real-time messages), as they will naturally be interpreted as such by any receiver, which might terminate reception of the system exclusive message. The message should be terminated with &F7, although this is not always observed, in which case the receiving device should ‘time-out’ after a given period, or terminate the system exclusive message on receipt of the next status byte. It is recommended that some form of error checking (typically a checksum) is employed for long system exclusive data dumps, and many systems employ means of detecting whether the data has been received accurately, asking for retries of sections of the message in the event of failure, via a return link to the transmitter.

Universal system exclusive messages

The three highest numbered IDs within the system exclusive message have been set aside to denote special modes. These are the ‘universal non-commercial’ messages (ID: &7D), the ‘universal non-real-time’ messages (ID: &7E) and the ‘universal real-time’ messages (ID: &7F). Universal sysex messages are often used for controlling device parameters that were not originally specified in the MIDI standard and that now need addressing in most devices. Examples are things like ‘chorus modulation depth’, ‘reverb type’ and ‘master fine tuning’.

Universal non-commercial messages are set aside for educational and research purposes and should not be used in commercial products. Universal non-real-time messages are used for universal system exclusive events which are not time critical and universal real-time messages deal with time critical events (thus being given a higher priority). The two latter types of message normally take the general form of:

Device ID used to be referred to as ‘channel number’, but this did not really make sense since a whole byte allows for the addressing of 128 channels and this does not correspond to the normal 16 channels of MIDI. The term ‘device ID’ is now used widely by software as a means of defining one of a number of physical devices in a large MIDI system, rather than defining a MIDI channel number. It should be noted, though, that it is allowable for a device to have more than one ID if this seems appropriate. Modern MIDI devices will normally allow their device ID to be set either over MIDI or from the front panel. The use of &7F in this position signifies that the message applies to all devices as opposed to just one.

The sub-IDs are used to identify first the category or application of the message (sub-ID #1) and second, the type of message within that category (sub-ID #2). For some reason, the original MIDI sample dump messages do not use the sub-ID #2, although some recent additions to the sample dump do.

Tune request

Older analog synthesizers tended to drift somewhat in pitch over the time that they were turned on. The tune request is a request for these synthesizers to retune themselves to a fixed reference. (It is advisable not to transmit pitch bend or note on messages to instruments during a tune-up because of the unpredictable behavior of some products under these conditions.)

Active sensing

Active sensing messages are single status bytes sent roughly three times per second by a controlling device when there is no other activity on the bus. It acts as a means of reassuring the receiving devices that the controller has not disappeared. Not all devices transmit active sensing information, and a receiver’s software should be able to detect the presence or lack of it. If a receiver has come to expect active sensing bytes then it will generally act by turning off all notes if these bytes disappear for any reason. This can be a useful function when a MIDI cable has been pulled out during a transmission, as it ensures that notes will not be left sounding for very long. If a receiver has not seen active sensing bytes since last turned on, it should assume that they are not being used.

Reset

This message resets all devices on the bus to their power-on state. The process may take some time and some devices mute their audio outputs, which can result in clicks, therefore the message should be used with care.

MIDI CONTROL OF SOUND GENERATORS

MIDI note assignment in synthesizers and samplers

Many of the replay and signal processing aspects of synthesis and sampling now overlap so that it is more difficult to distinguish between the two. In basic terms a sampler is a device that stores short clips of sound data in RAM, enabling them to be replayed subsequently at different pitches, possibly looped and processed. A synthesizer is a device that enables signals to be artificially generated and modified to create novel sounds. Wavetable synthesis is based on a similar principle to sampling, though, and stored samples can form the basis for synthesis. A sound generator can often generate a number of different sounds at the same time. It is possible that these sounds could be entirely unrelated (perhaps a single drum, an animal noise and a piano note), or that they might have some relationship to each other (perhaps a number of drums in a kit, or a selection of notes from a grand piano). The method by which sounds or samples are assigned to MIDI notes and channels is defined by the replay program.

The most common approach when assigning note numbers to samples is to program the sampler with the range of MIDI note numbers over which a certain sample should be sounded. Akai, one of the most popular sampler manufacturers, called these ‘keygroups’. It may be that this ‘range’ is only one note, in which case the sample in question would be triggered only on receipt of that note number, but in the case of a range of notes the sample would be played on receipt of any note in the range. In the latter case transposition would be required, depending on the relationship between the note number received and the original note number given to the sample (see above). A couple of examples highlight the difference in approach, as shown in Figure 14.9. In the first example, illustrating a possible approach to note assignment for a collection of drum kit sounds, most samples are assigned to only one note number, although it is possible for tuned drum sounds such as tom-toms to be assigned over a range in order to give the impression of ‘tuned toms’. Each MIDI note message received would replay the particular percussion sound assigned to that note number in this example.

In the second example, illustrating a suggested approach to note assignment for an organ, notes were originally sampled every musical fifth across the organ’s note range. The replay program has been designed so that each of these samples is assigned to a note range of a fifth, centered on the original pitch of each sample, resulting in a maximum transposition of a third up or down. Ideally, of course, every note would have been sampled and assigned to an individual note number on replay, but this requires very large amounts of memory and painstaking sample acquisition in the first place.

FIGURE 14.9 (a) Percussion samples are often assigned to one note per sample, except for tuned percussion which sometimes covers a range of notes. (b) Organ samples could be transposed over a range of notes, centered on the original pitch of the sample.

In further pursuit of sonic accuracy, some devices provide the facility for introducing a crossfade between note ranges. This is used where an abrupt change in the sound at the boundary between two note ranges might be undesirable, allowing the takeover from one sample to another to be more gradual. For example, in the organ scenario introduced above, the timbre could change noticeably when playing musical passages that crossed between two note ranges because replay would switch from the upper limit of transposition of one sample to the lower limit of the next (or vice versa). In this case the ranges for the different samples are made to overlap (as illustrated in Figure 14.10). In the overlap range the system mixes a proportion of the two samples together to form the output. The exact proportion depends on the range of overlap and the note’s position within this range. Very accurate tuning of the original samples is needed in order to avoid beats when using positional crossfades. Clearly this approach would be of less value when each note was assigned to a completely different sound, as in the drum kit example.

FIGURE 14.10 Overlapped sample ranges can be crossfaded in order that a gradual shift in timbre takes place over the region of takeover between one range and the next.

Crossfades based on note velocity allow two or more samples to be assigned to one note or range of notes. This requires at least a ‘loud sample’ and a ‘soft sample’ to be stored for each original sound and some systems may accommodate four or more to be assigned over the velocity range. The terminology may vary, but the principle is that a velocity value is set at which the replay switches from one stored sample to another, as many instruments sound quite different when they are loud to when they are soft (it is more than just the volume that changes: it is the timbre also). If a simple switching point is set, then the change from one sample to the other will be abrupt as the velocity crosses either side of the relevant value. This can be illustrated by storing two completely different sounds as the loud and soft samples, in which case the output changes from one to the other at the switching point. A more subtle effect is achieved by using velocity crossfading, in which the proportion of loud and soft samples varies depending on the received note velocity value. At low velocity values the proportion of the soft sample in the output would be greatest and at high values the output content would be almost entirely made up of the loud sample (see Figure 14.11).

FIGURE 14.11 Illustration of velocity switch and velocity crossfade between two stored samples (‘soft’ and ‘loud’) over the range of MIDI note velocity values.

MIDI functions of sound generators

The MIDI implementation for a particular sound generator should be described in the manual that accompanies it. A MIDI implementation chart indicates which message types are received and transmitted, together with any comments relating to limitations or unusual features. Functions such as note off velocity and polyphonic aftertouch, for example, are quite rare. It is quite common for a device to be able to accept certain data and act upon it, even if it cannot generate such data from its own controllers. The note range available under MIDI control compared with that available from a device’s keyboard is a good example of this, since many devices will respond to note data over a full ten octave range yet still have only a limited (or no) keyboard. This approach can be used by a manufacturer who wishes to make a cheaper synthesizer that omits the expensive physical sensors for such things as velocity and aftertouch, whilst retaining these functions in software for use under MIDI control. Devices conforming to the General MIDI specification described below must conform to certain basic guidelines concerning their MIDI implementation and the structure of their sound generators.

MIDI data buffers and latency

All MIDI-controlled equipment uses some form of data buffering for received MIDI messages. Such buffering acts as a temporary store for messages that have arrived but have not yet been processed and allows for a certain prioritization in the handling of received messages. Cheaper devices tend to have relatively small MIDI input buffers and these can overflow easily unless care is taken in the filtering and distribution of MIDI data around a large system (usually accomplished by a MIDI router or multiport interface). When a buffer overflows it will normally result in an error message displayed on the front panel of the device, indicating that some MIDI data is likely to have been lost. More advanced equipment can store more MIDI data in its input buffer, although this is not necessarily desirable because many messages that are transmitted over MIDI are intended for ‘real-time’ execution and one would not wish them to be delayed in a temporary buffer. Such buffer delay is one potential cause of latency in MIDI systems. A more useful solution would be to speed up the rate at which incoming messages are processed.

Handling of velocity and aftertouch data

It is common for the user to be able to program a device such that the velocity value affects certain parameters to a greater or lesser extent. For example, it might be decided that the ‘brightness’ of the sound should increase with greater key velocity, in which case it would be necessary to program the device so that the envelope generator that affected the brightness was subject to control by the velocity value. The exact law of this relationship is up to the manufacturer and may be used to simulate different types of ‘keyboard touch’. A device may offer a number of laws or curves relating changes in velocity to changes in the control value, or the received velocity value may be used to scale the preset parameter rather than replace it.

Another common application of velocity value is to control the amplitude envelope of a particular sound, such that the output volume depends on how hard the key is hit. In many synthesizer systems that use multiple interacting digital oscillators, these velocity-sensitive effects can all be achieved by applying velocity control to the envelope generator of one or more of the oscillators, as indicated earlier in this chapter.

Note off velocity is not implemented in many keyboards, and most musicians are not used to thinking about what they do as they release a key, but this parameter can be used to control such factors as the release time of the note or the duration of a reverberation effect. Aftertouch is often used in synthesizers to control the application of low-frequency modulation (tremolo or vibrato) to a note.

Handling of controller messages

The controller messages that begin with a status of &Bn turn up in various forms in sound generator implementations. It should be noted that although there are standard definitions for many of these controller numbers it is often possible to remap them either within sequencer software or within sound modules themselves.

Controllers &07 (Volume) and &0A (Pan) are particularly useful with sound modules as a means of controlling the internal mixing of voices. These controllers work on a per channel basis, and are independent of any velocity control that may be related to note volume. There are two realtime system exclusive controllers that handle similar functions to these, but for the device as a whole rather than for individual voices or channels. The ‘master volume’ and ‘master balance’ controls are accessed using:

where the sub-ID # 1 of &04 represents a ‘device control’ message and sub-ID #2s of &01 or &02 select volume or balance respectively. The [data] values allow 14 bit resolution for the parameters concerned, transmitted LSB first. Balance is different to pan because pan sets the stereo positioning (the split in level between left and right) of a mono source, whereas balance sets the relative levels of the left and right channels of a stereo source (see Figure 14.12). Since a pan or balance control is used to shift the stereo image either left or right from a center detent position, the MIDI data values representing the setting are ranged either side of a mid-range value that corresponds to the center detent. The channel pan controller is thus normally centered at a data value of 63 (and sometimes over a range of values just below this if the pan has only a limited number of steps), assuming that only a single 7 bit controller value is sent. There may be fewer steps in these controls than there are values of the MIDI controller, depending on the device in question, resulting in a range of controller values that will give rise to the same setting.

FIGURE 14.12 (a) A pan control takes a mono input and splits it two ways (left and right), the stereo position depending on the level difference between the two channels. The attenuation law of pan controls is designed to result in a smooth movement of the source across the stereo ‘picture’ between left and right, with no apparent rise or fall in overall level when the control is altered. A typical pan control gain law is shown below. (b) A balance control simply adjusts the relative level between the two channels of a stereo signal so as to shift the entire stereo image either left or right.

Some manufacturers have developed alternative means of expressive control for synthesizers such as the ‘breath controller’, which is a device which responds to the blowing effort applied by the mouth of the player. It was intended to allow wind players to have more control over expression in performance. Plugged into the synthesizer, it can be applied to various envelope generator or modulator parameters to affect the sound. The breath controller also has its own MIDI controller number. There is also a portamento controller (&54) that defines a note number from which the next note should slide. It is normally transmitted between two note on messages to create an automatic legato portamento effect between two notes.

The ‘effects’ and ‘sound’ controllers have been set aside as a form of general purpose control over aspects of the built-in effects and sound quality of a device. How they are applied will depend considerably on the architecture of the sound module and the method of synthesis used, but they give some means by which a manufacturer can provide a more abstracted form of control over the sound without the user needing to know precisely which voice parameters to alter. In this way, a user who is not prepared to get into the increasingly complicated world of voice programming can modify sounds to some extent.

The effects controllers occupy five controller numbers from &5B to &5F and are defined as Effects Depths 1–5. The default names for the effects to be controlled by these messages are respectively ‘External Effects Depth’, ‘Tremolo Depth’, ‘Chorus Depth’, ‘Celeste (Detune) Depth’ and ‘Phaser Depth’, although these definitions are open to interpretation and change by manufacturers. There are also ten sound controllers that occupy controller numbers from &46 to &4F. Again these are user or manufacturer definable, but five defaults were originally specified (listed in Table 14.4). They are principally intended as real-time controllers to be used during performance, rather than as a means of editing internal voice patches (the RPCs and NRPCs can be used for this as described in Fact File 14.4).

The sound variation controller is interesting because it is designed to allow the selection of one of a number of variants on a basic sound, depending on the data value that follows the controller number. For example, a piano sound might have variants of ‘honky tonk’, ‘soft pedal’, ‘lid open’ and ‘lid closed’. The data value in the message is not intended to act as a continuous controller for certain voice parameters, rather the different data values possible in the message are intended to be used to select certain pre-programmed variations on the voice patch. If there are fewer than the 128 possible variants on the voice then the variants should be spread evenly over the number range so that there is an equal number range between them.

Table 14.4 Sound Controller Functions (byte 2 of status &Bn)

MIDI Controller Number	Function (default)
&46	Sound variation
&47	Timbre/harmonic content
&48	Release time
&49	Attack time
&4A	Brightness
&4B–4F	No default

Fact File 14.4 Registered and Non-Registered Parameter Numbers

The MIDI standard was extended to allow for the control of individual internal parameters of sound generators by using a specific control change message. This meant, for example, that any aspect of a voice, such as the velocity sensitivity of an envelope generator, could be assigned a parameter number that could then be accessed over MIDI and its setting changed, making external editing of voices much easier. Parameter controllers are a subset of the control change message group, and they are divided into the registered and non-registered numbers (RPNs and NRPNs). RPNs are intended to apply universally and should be registered with the MMA, whilst NRPNs may be manufacturer specific. Only five parameter numbers were originally registered as RPNs, as shown in the table, but more may be added at any time.

Some examples of RPC definitions
RPC number (hex)	Parameter
00 00	Pitch bend sensitivity
00 01	Fine tuning
00 02	Coarse tuning
00 03	Tuning program select
00 04	Tuning bank select
7F7F	Cancels RPN or NRPN (usually follows Message 3)

Parameter controllers operate by specifying the address of the parameter to be modified, followed by a control change message to increment or decrement the setting concerned. It is also possible to use the data entry slider controller to alter the setting of the parameter. The address of the parameter is set in two stages, with an MSbyte and then an LSbyte message, so as to allow for 16 384 possible parameter addresses. The controller numbers &62 and &63 are used to set the LS- and MSbytes respectively of an NRPN, whilst &64 and &65 are used to address RPNs. The sequence of messages required to modify a parameter is as follows:

Message 1

Message 2

Message 3

Message 3 represents either data increment (&60) or decrement (&61), or a 14 bit data entry slider control change with MSbyte (&06) and LSbyte (&38) parts (assuming running status). If the control has not moved very far, it is possible that only the MSbyte message need be sent.

The timbre and brightness controllers can be used to alter the spectral content of the sound. The timbre controller is intended specifically for altering the harmonic content of a sound, whilst the brightness controller is designed to control its high-frequency content. The envelope controllers can be used to modify the attack and release times of certain envelope generators within a synthesizer. Data values less than &40 attached to these messages should result in progressively shorter times, whilst values greater than &40 should result in progressively longer times.

Voice selection

The program change message was adequate for a number of years as a means of selecting one of a number of stored voice patches on a sound generator. Program change on its own allows for up to 128 different voices to be selected and a synthesizer or sound module may allow a program change map to be set up in order that the user may decide which voice is selected on receipt of a particular message. This can be particularly useful when the module has more than 128 voices available, but no other means of selecting voice banks. A number of different program change maps could be stored, perhaps to be selected under system exclusive control.

Modern sound modules tend to have very large patch memories — often too large to be adequately addressed by 128 program change messages. Although some older synthesizers used various odd ways of providing access to further banks of voices, most modern modules have implemented the standard ‘bank select’ approach. In basic terms, ‘bank select’ is a means of extending the number of voices that may be addressed by preceding a standard program change message with a message to define the bank from which that program is to be recalled. It uses a 14 bit control change message, with controller numbers &00 and &20, to form a 14 bit bank address, allowing 16 384 banks to be addressed. The bank number is followed directly by a program change message, thus creating the following general message:

GENERAL MIDI

One of the problems with MIDI sound generators is that although voice patches can be selected using MIDI program change commands, there is no guarantee that a particular program change number will recall a particular voice on more than one instrument. General MIDI is an approach to the standardization of a sound generator’s behavior, so that MIDI files (see Fact File 14.5) can be exchanged more easily between systems and device behavior can be predicted by controllers. It comes in three flavors: GM 1, GM Lite and GM 2.

Fact File 14.5 Standard MIDI Files (SMF)

Sequencers and notation packages typically store data on disk in their own unique file formats. The standard MIDI file was developed in an attempt to make interchange of information between packages more straightforward. MIDI files are most useful for the interchange of performance and control information. They are not so useful for music notation where it is necessary to communicate greater detail about the way music appears on the stave and other notational concepts. For the latter purpose a number of different file formats have been developed, including Music XML which is among the most widely used of the universal interchange formats today. Further information about Music XML resources and other notation formats may be found in the ‘Recommended further reading’ at the end of this chapter.

Three types of standard MIDI file exist to encourage the interchange of sequencer data between software packages. The MIDI file contains data representing events on individual sequencer tracks, as well as labels such as track names, instrument names and time signatures. File type 0 is the simplest and is used for single-track data, whilst file type 1 supports multiple tracks which are ‘vertically’ synchronous with each other (such as the parts of a song).

File type 2 contains multiple tracks that have no direct timing relationship and may therefore be asynchronous. type 2 could be used for transferring song files made up of a number of discrete sequences, each with a multiple track structure. The basic file format consists of a number of 8 bit words formed into chunk-like parts, very similar to the RIFF and AIFF audio file formats described in Chapter 9. SMFs are not exactly RIFF files though, because they do not contain the highest level FORM chunk. (To encapsulate SMFs in a RIFF structure, use the RMID format.)

The header chunk, which always heads a MIDI file, contains global information relating to the whole file, whilst subsequent track chunks contain event data and labels relating to individual sequencer tracks. Track data should be distinguished from MIDI channel data, since a sequencer track may address more than one MIDI channel. Each chunk is preceded by a preamble of its own, which specifies the type of chunk (header or track) and the length of the chunk in terms of the number of data bytes that are contained in the chunk. There then follow the designated number of data bytes (see the figure below). The chunk preamble contains 4 bytes to identify the chunk type using ASCII representation and 4 bytes to indicate the number of data bytes in the chunk (the length). The number of bytes indicated in the length does not include the preamble (which is always 8 bytes).

General MIDI Level 1 specifies a standard voice map and a minimum degree of polyphony, requiring that a sound generator should be able to receive MIDI data on all 16 channels simultaneously and polyphonically, with a different voice on each channel. There is also a requirement that the sound generator should support percussion sounds in the form of drum kits, so that a General MIDI sound module is capable of acting as a complete ‘band in a box’.

Dynamic voice allocation is the norm in GM sound modules, with a requirement either for at least 24 dynamically allocated voices in total, or 16 for melody and eight for percussion. Voices should all be velocity sensitive and should respond at least to the controller messages 1, 7, 10, 11, 64, 121 and 123 (decimal), RPNs 0, 1 and 2 (see above), pitch bend and channel aftertouch. In order to ensure compatibility between sequences that are replayed on GM modules, percussion sounds are always allocated to MIDI channel 10. Program change numbers are mapped to specific voice names, with ranges of numbers allocated to certain types of sounds, as shown in Table 14.5. Precise voice names may be found in the GM documentation. Channel 10, the percussion channel, has a defined set of note numbers on which particular sounds are to occur, so that the composer may know, for example, that key 39 will always be a ‘hand clap’.

Table 14.5 General MIDI Program Number Ranges (Except Channel 10)

Program Change (Decimal)	Sound Type
0–7	Piano
8–15	Chromatic percussion
16–23	Organ
24–31	Guitar
32–39	Bass
40–47	Strings
48–55	Ensemble
56–63	Brass
64–71	Reed
72–79	Pipe
80–87	Synth lead
88–95	Synth pad
96–103	Synth effects
104–111	Ethnic
112–119	Percussive
120–127	Sound effects

General MIDI sound modules may operate in modes other than GM, where voice allocations may be different, and there are two universal non-real-time SysEx messages used to turn GM on or off. These are:

to turn GM on, and:

o turn it off.

There is some disagreement over the definition of ‘voice’, as in ‘24 dynamically allocated voices’ — the requirement that dictates the degree of polyphony supplied by a GM module. The spirit of the GM specification suggests that 24 notes should be capable of sounding simultaneously, but some modules combine sound generators to create composite voices, thereby reducing the degree of note polyphony.

General MIDI Lite (GML) is a cut-down GM 1 specification designed mainly for use on mobile devices with limited processing power. It can be used for things like ring tones on mobile phones and for basic music replay from PDAs. It specifies a fixed polyphony of 16 simultaneous notes, with 15 melodic instruments and one percussion kit on channel 10. The voice map is the same as GM Level 1. It also supports basic control change messages and the pitch-bend sensitivity RPN. As a rule, GM Level 1 songs will usually replay on GM Lite devices with acceptable quality, although some information may not be reproduced. An alternative to GM Lite is SPMIDI (see next section) which allows greater flexibility.

GM Level 2 is backwards compatible with Level 1 (GM 1 songs will replay correctly on GM 2 devices) but allows the selection of voice banks and extends polyphony to 32 voices. Percussion kits can run on channel 11 as well as the original channel 10. It adds MIDI tuning, RPN controllers and a range of universal system exclusive messages to the MIDI specification, enabling a wider range of control and greater versatility.

SCALABLE POLYPHONIC MIDI (SPMIDI)

SPMIDI, rather like GM Lite, is designed principally for mobile devices that have issues with battery life and processing power. It has been adopted by the 3GPP wireless standards body for structured audio control of synthetic sounds in ring tones and multimedia messaging. It was developed primarily by Nokia and Beatnik. The SPMIDI basic specification for a device is based on GM Level 2, but a number of selectable profiles are possible, with different levels of sophistication.

The idea is that rather than fixing the polyphony at 16 voices the polyphony should be scalable according to the device profile (a description of the current capabilities of the device). SPMIDI also allows the content creator to decide what should happen when polyphony is limited — for example, what should happen when only four voices are available instead of 16. Conventional ‘note stealing’ approaches work by stealing notes from sounding voices to supply newly arrived notes, and the outcome of this can be somewhat arbitrary. In SPMIDI this is made more controllable. A process known as channel masking is used, whereby certain channels have a higher priority than others, enabling the content creator to put high priority material on particular channels. The channel priority order and maximum instantaneous polyphony are signaled to the device in a setup message at the initialization stage.

RMID AND XMF FILES

RMID is a version of the RIFF file structure that can be used to combine a standard MIDI file and a downloadable sound file (see Fact File 14.6) within a single structure. In this way all of the data required to replay a song using synthetic sounds can be contained within one file. RMID seems to have been superseded by another file format known as XMF (eXtensible Music Format) that is designed to contain all of the assets required to replay a music file. It is based on Beatnik’s RMF (Rich Music Format) which was designed to incorporate standard MIDI files and audio files such as MP3 and WAVE so that a degree of interactivity could be added to audio replay. RMF can also address a Special Bank of MIDI sounds (an extension of GM) in the Beatnik Audio Engine. XMF is now the MMA’s recommended way of combining such elements. It is more extensible than RMID and can contain WAVE files and other media elements for streamed or interactive presentations. XMF introduces concepts such as looping and branching into standard MIDI files. RMF included looping but did not incorporate DLS into the file format. In addition to the features just described, XMF can incorporate 40 bit encryption for advanced data security as well as being able to compress standard MIDI files by up to 5:1 and incorporate metadata such as rights information. So far, XMF Type 0 and Type 1 have been defined, both of which contain SMF and DLS data, and which are identical except that Type 0 MIDI data may be streamed.

Fact File 14.6 Downloadable Sounds and Soundfonts

Downloadable Sounds (DLS) is an MMA specification for synthetic voice description that enables synthesizers to be programmed using voice data downloaded from a variety of sources. In this way a content creator could not only define the musical structure of his or her content in a universally usable way, using standard MIDI files, but could also define the nature of the sounds to be used with downloadable sounds. In these ways content creators can specify more precisely how synthetic audio should be replayed, so that the end result can be more easily predicted across multiple rendering platforms.

The success of these approaches depends on ‘wavetable synthesis’. Here basic sound waveforms are stored in wavetables (simply tables of sample values) in RAM, to be read out at different rates and with different sample skip values, for replay at different pitches. Subsequent signal processing and envelope shaping can be used to alter the timbre and temporal characteristics. Such synthesis capabilities exist on the majority of computer sound cards, making it a realistic possibility to implement the standard widely.

DLS Level 1, version 1.1a, was published in 1999 and contains a specification for devices that can deal with DLS as well as a file format for containing the sound descriptions. The basic idea is that a minimal synthesis engine should be able to replay a looped sample from a wavetable, apply two basic envelopes for pitch and volume, use low-frequency oscillator control for tremolo and vibrato, and respond to basic MIDI controls such as pitch bend and modulation wheel. There is no option to implement velocity crossfading or layering of sounds in DLS Level 1, but keyboard splitting into 16 ranges is possible.

DLS Level 2 is somewhat more advanced, requiring two six-segment envelope generators, two LFOs, a low-pass filter with resonance and dynamic cut-off frequency controls. It requires more memory for wavetable storage (2MB), 256 instruments and 1024 regions, amongst other things. DLS Level 2 has been adopted as the MPEG-4 Structured Audio Sample Bank format.

Emu developed so-called SoundFonts for Creative Labs and these have many similar characteristics to downloadable sounds. They have been used widely to define synthetic voices for Sound Blaster and other computer sound cards. In fact the formats have just about been harmonized with the issue of DLS Level 2 which apparently contains many of the advanced features of SoundFonts. SoundFont 2 descriptions are normally stored in RIFF files with the extension ‘.sf2’.

MIDI OVER USB

The USB Implementers Forum has published a ‘USB Device Class Definition for MIDI Devices’, version 1.0, which describes how MIDI data may be transported over USB connections. It preserves the protocol of MIDI messages but packages them in such a way as to enable them to be transferred over USB. It also ‘virtualizes’ the concept of MIDI IN and OUT jacks, enabling USB to MIDI conversion, and vice versa, to take place in software within a synthesizer or other device. Physical MIDI ports can also be created for external connections to conventional MIDI equipment (see Figure 14.13). A so-called ‘USB MIDI function’ (a device that receives USB MIDI events and transfers) may contain one or more ‘elements’. These elements can be synthesizers, synchronizers, effects processors or other MIDI-controlled objects.

FIGURE 14.13 A USB MIDI function contains a USB-to-MIDI convertor that can communicate with both embedded (internal) and external MIDI jacks via MIDI IN and OUT endpoints. Embedded jacks connect to internal elements that may be synthesizers or other MIDI data processors. XFER in and out endpoints are used for bulk dumps such as DLS and can be dynamically connected with elements as required for transfers.

A USB to MIDI convertor within a device will typically have MIDI in and out endpoints as well as what are called ‘transfer’ (XFER) endpoints. The former are used for streaming MIDI events whereas the latter are used for bulk dumps of data such as those needed for downloadable sounds (DLS). MIDI messages are packaged into 32 bit USB MIDI events, which involve an additional byte at the head of a typical MIDI message. This additional byte contains a cable number address and a code index number (CIN), as shown in Figure 14.14. The cable number enables the MIDI message to be targeted at one of 16 possible ‘cables’, thereby overcoming the 16 channel limit of conventional MIDI messages, in a similar way to that used in the addressing of multiport MIDI interfaces. The CIN allows the type of MIDI message to be identified (e.g. System Exclusive; Note On), which to some extent duplicates the MIDI status byte. MIDI messages with fewer than 3 bytes should be padded with zeros.

FIGURE 14.14 USB MIDI packets have a 1 byte header that contains a cable number to identify the MIDI jack destination and a code index number to identify the contents of the packet and the number of active bytes.

MIDI OVER IEEE 1394

The MMA and AMEI have published a recommended practice, RP-27, ‘MIDI Media Adaptation Layer for IEEE 1394’, which describes how MIDI data is to be transferred over 1394 (Firewire). This is also referred to in 1394 TA (Trade Association) documents describing the ‘Audio and Music Data Transmission Protocol’ and IEC standard 61883–6 which deals with the audio part of 1394 interfaces.

The approach is similar to that used with USB, described in the previous section, but has somewhat greater complexity. MIDI 1.0 data streams can be multiplexed into a 1394 ‘MIDI conformant data channel’ which contains eight independent MIDI streams called ‘MPX-MIDI data channels’. This way each MIDI conformant data channel can handle 8 × 16 = 128 MIDI channels (in the original sense of MIDI channels). The first version of the standard limits the transmission of packets to the MIDI 1.0 data rate of 31.25kbit/s for compatibility with other MIDI devices; however, provision is made for transmission at substantially faster rates for use in equipment that is capable of it. This includes options for 2X and 3X MIDI 1.0 speed. 1394 cluster events can be defined that contain both audio and MIDI data. This enables the two types of information to be kept together and synchronized.

MIDI OVER ETHERNET

Although not yet standardized by the MMA, the Internet Engineering Task Force (IETF) came up with a method of transporting MIDI over Ethernet networks using the Real Time Protocol (RTP). This was also the basis for Apple’s MIDI over Ethernet. The same format allows for MPEG-4-generic audio object types including General MIDI, Downloadable Sounds and Structured Audio. Each packet of data transmitted over the network includes an RTP header and a MIDI ‘payload’ that encodes the standard MIDI commands in a suitable form. Each packet header has a baseline timestamp to assist in synchronizing the event to a clock that is defined in the RTP session setup at a particular sampling rate. Each MIDI payload defines the timing of the MIDI event relative to the baseline timestamp.

Because this format uses standard Internet protocols for transporting the RTP packets, such as TCP and UDP, it offers a relatively straightforward way of streaming sound information at very low bit rates over the Internet. Naturally this is limited to the basic music and sound control information provided by MIDI, and the sounds produced by any rendering device will depend on the synthesis engine available.

It is likely that the upcoming HD MIDI standard will also include a method for transporting MIDI over Ethernet.

OPEN SOUND CONTROL

Open Sound Control is an alternative to MIDI that is gradually seeing greater adoption in the computer music and musical instrument control world. Developed by Matt Wright at CNMAT (Center for New Music and Audio Technology) in Berkeley, California, it aims to offer a transport-independent message-based protocol for communication between computers, musical instruments and multimedia devices. It does not specify a particular hardware interface or network for the transport layer, but initial implementations have tended to use UDP (user datagram protocol) over Ethernet or other fast networks as a transport means. It is not proposed to describe this protocol in detail and further details can be found at the website indicated at the end of this chapter. A short summary will be given, however.

OSC uses a form of device addressing that is very similar to an Internet URL (uniform resource locator). In other words a text address with sub-addresses that relate to lower levels in the device hierarchy. For example, ‘/synthesizer2/voice1/oscillator3/frequency’ (not a real address) might refer to a particular device called ‘synthesizer2’, within which is contained voice 1, within which is oscillator 3, whose frequency value is being addressed. The minimum ‘atomic unit’ of OSC data is 4 bytes (32 bits) long, so all values are 32 bit aligned, and transmitted packets are made up of multiples of 32 bit information. Packets of OSC data contain either individual messages or so-called ‘bundles’. Bundles contain elements that are either messages or further bundles, each having a size designation that precedes it, indicating the length of the element. Bundles have time tags associated with them, indicating that the actions described in the bundle are to take place at a specified time. Individual messages are supposed to be executed immediately. Devices are expected to have access to a representation of the correct current time so that bundle timing can be related to a clock.

SEQUENCING SOFTWARE

Introduction

Sequencers are probably the most ubiquitous of audio and MIDI software applications. Although they used to be available as dedicated devices they are now widely available as sophisticated software packages to run on a desktop computer. A sequencer is capable of storing a number of ‘tracks’ of MIDI and audio information, editing it and otherwise manipulating it for musical composition purposes. It is also capable of storing MIDI events for non-musical purposes such as studio automation. Some of the more advanced packages are available in cut-down or ‘Lite’ versions for the new user. Popular packages such as Pro Tools and Logic now combine audio and MIDI manipulation in an almost seamless fashion, and have been developed to the point where they can no longer really be considered as simply sequencers. In fact they are full-blown audio production systems with digital mixers, synchronization, automation, effects and optional video.

The dividing line between sequencer and music notation software is a gray one, since there are features common to both. Music notation software such as Sibelius is designed to allow the user control over the detailed appearance of the printed musical page, rather as page layout packages work for typesetters, and such software often provides facilities for MIDI input and output. MIDI input is used for entering note pitches during setting, whilst output is used for playing the finished score in an audible form. Most major packages will read and write standard MIDI files, and can therefore exchange data with sequencers, allowing sequenced music to be exported to a notation package for fine tuning of printed appearance. It is also common for sequencer packages to offer varying degrees of music notation capability, although the scores that result may not be as professional in appearance as those produced by dedicated notation software.

FIGURE 14.15 Example of a sequencer’s primary display, showing tracks and transport controls. (Logic Platinum 5 ‘arrange’ window.).

Tracks, channels, instruments and environments

A sequencer can be presented to the user so that it emulates a multitrack tape recorder to some extent. The example shown in Figure 14.15 illustrates this point, showing the familiar transport controls as well as a multi-track ‘tape-like’ display.

A track can be either a MIDI track or an audio track, or it may be a virtual instrument of some sort, perhaps running on the same computer. A project is built up by successively overlaying more and more tracks, all of which may be replayed together. Tracks are not fixed in their time relationship and can be slipped against each other, as they simply consist of data stored in the memory. On older or less advanced sequencers, the replay of each MIDI track was assigned to a particular MIDI channel, but more recent packages offer an almost unlimited number of virtual tracks that can contain data for more than one channel (in order to drive a multi-timbral instrument, for example). Using a multiport MIDI interface it is possible to address a much larger number of instruments than the basic 16 MIDI channels allowed in the past.

In a typical sequencer, instruments are often defined in a separate ‘environment’ that defines the instruments, the ports to which they are connected, any additional MIDI processing to be applied, and so forth. An example is shown in Figure 14.16. When a track is recorded, therefore, the user simply selects the instrument to be used and the environment takes care of managing what that instrument actually means in terms of processing and routing. Now that soft synthesizers are used increasingly, sequencers can often address those directly via plug-in architectures such as DirectX or VST (see Chapter 13), without recourse to MIDI. These are often selected on pull-down menus for individual tracks, with voices selected in a similar way, often using named voice tables.

FIGURE 14.16 Example of environment window from Logic, showing ways in which various MIDI processes can be inserted between physical input and recording operation.

Input and output filters

After MIDI information is received from the hardware interface it is stored in memory, but it may sometimes be helpful to filter out some information before it can be stored, using an input filter. This will be a subsection of the program that watches out for the presence of certain MIDI status bytes and their associated data as they arrive, so that they can be discarded before storage. The user may be able to select input filters for such data as after-touch, pitch bend, control changes and velocity information, amongst others. Clearly it is only advisable to use input filters if it is envisaged that this data will never be needed, since although filtering saves memory space the information is lost forever. Output filters are often implemented for similar groups of MIDI messages as for the input filters, acting on the replayed rather than recorded information. Filtering may help to reduce MIDI delays, owing to the reduced data flow.

Timing resolution

The timing resolution to which a sequencer can store MIDI events varies between systems. This ‘record resolution’ may vary with recent systems offering resolution to many thousandths of a note. Audio events are normally stored to sample accuracy. A sequencer with a MIDI resolution of 480ppqn (pulses per quarter note) can resolve events to 4.1 millisecond steps, for example. The quoted resolution of sequencers, though, tends to be somewhat academic, depending on the operational circumstances, since there are many other factors influencing the time at which MIDI messages arrive and are stored. These include buffer delays and traffic jams. Modern sequencers have sophisticated routines to minimize the latency with which events are routed to MIDI outputs.

The record resolution of a sequencer is really nothing to do with the timing resolution available from MIDI clocks or timecode (see Chapter 15). The sequencer’s timing resolution refers to the accuracy with which it time-stamps events and to which it can resolve events internally. Most sequencers attempt to interpolate or ‘flywheel’ between external timing bytes during replay, in an attempt to maintain a resolution in excess of the 24 ppqn implied by MIDI clocks.

Displaying, manipulating and editing information

A sequencer is the ideal tool for manipulating MIDI and audio information and this may be performed in a number of ways depending on the type of interface provided to the user. The most flexible is the graphical interface employed on many desktop computers which may provide for visual editing of the stored MIDI information either as a musical score, a table or event list of MIDI data, or in the form of a grid of some kind. Figure 14.17 shows a number of examples of different approaches to the display of stored MIDI information. Audio information is manipulated using an audio sample editor display that shows the waveform and allows various changes to be made to the signal, often including sophisticated signal processing, as discussed further below.

Although it might be imagined that a musical score would be the best way of visualizing MIDI data, it is often not the most appropriate. This is partly because unless the input is successfully quantized (see below) the score will represent precisely what was played when the music was recorded and this is rarely good-looking on a score! The appearance is often messy because some notes were just slightly out of time. Score representation is useful after careful editing and quantization, and can be used to produce a visually satisfactory printed output. Alternatively, the score can be saved as a MIDI file and exported to a music notation package for layout purposes.

FIGURE 14.17 Examples of a selection of different editor displays from Logic, showing display of MIDI data as a score, a graphical matrix of events and a list of events. Audio can be shown as an audio waveform display.

In the grid editing (called ‘Matrix’ in the example shown) display, MIDI notes may be dragged around using a mouse or trackball and audible feedback is often available as the note is dragged up and down, allowing the user to hear the pitch or sound as the position changes. Note lengths can be changed and the timing position may be altered by dragging the note left or right. In the event list form, each MIDI event is listed next to a time value. The information in the list may then be changed by typing in new times or new data values. Also events may be inserted and deleted. In all of these modes the familiar cut and paste techniques used in word processors and other software can be applied, allowing events to be used more than once in different places, repeated so many times over, and other such operations.

A whole range of semi-automatic editing functions are also possible, such as transposition of music, using the computer to operate on the data so as to modify it in a predetermined fashion before sending it out again. Echo effects can be created by duplicating a track and offsetting it by a certain amount, for example. Transposition of MIDI performances is simply a matter of raising or lowering the MIDI note numbers of every stored note by the relevant degree. A number of algorithms have also been developed for converting audio melody lines to MIDI data, or using MIDI data to control the pitch of audio, further blurring the boundary between the two types of information. Silence can also be stripped from audio files, so that individual drum notes or vocal phrases can be turned into events in their own right, allowing them to be manipulated, transposed or time-quantized independently.

A sequencer’s ability to search the stored data (both music and control) based on specific criteria, and to perform modifications or transformations to just the data which matches the search criteria, is one of the most powerful features of a modern system. For example, it may be possible to search for the highest-pitched notes of a polyphonic track so that they can be separated off to another track as a melody line. Alternatively it may be possible to apply the rhythm values of one track to the pitch values of another so as to create a new track, or to apply certain algorithmic manipulations to stored durations or pitches for compositional experimentation. The possibilities for searching, altering and transforming stored data are almost endless once musical and control events are stored in the form of unique values, and for those who specialize in advanced composing or experimental music these features will be of particular importance. It is in this field that many of the high-end sequencer packages will continue to develop.

Quantization of rhythm

Rhythmic quantization is a feature of almost all sequencers. In its simplest form it involves the ‘pulling-in’ of events to the nearest musical time interval at the resolution specified by the user, so that events that were ‘out of time’ can be played back ‘in time’. It is normal to be able to program the quantizing resolution to an accuracy of at least as small as a 32nd note and the choice depends on the audible effect desired. Events can be quantized either permanently or just for replay. Some systems allow ‘record quantization’ which alters the timing of events as they arrive at the input to the sequencer. This is a form of permanent quantization. It may also be possible to ‘quantize’ the cursor movement so that it can only drag events to predefined rhythmic divisions.

More complex rhythmic quantization is also possible, in order to maintain the ‘natural’ feel of rhythm, for example. Simple quantization can result in music that sounds ‘mechanical’ and electronically produced, whereas the ‘human feel’ algorithms available in many packages attempt to quantize the rhythm strictly and then reapply some controlled randomness. The parameters of this process may be open to adjustment until the desired effect is achieved.

Automation and non-note MIDI events

In addition to note and audio events, one may either have recorded or may wish to add events for other MIDI control purposes such as program change messages, controller messages or system exclusive messages. Audio automation can also be added to control fades, panning, effects and other mixing features. Such data may be displayed in a number of ways, but again the graphical plot is arguably the most useful. It is common to allow automation data to be plotted as an overlay, such as shown in Figure 14.18.

Some automation data is often stored in a so-called ‘segment-based’ form. Because automation usually relates to some form of audio processing or control, it usually applies to particular segments on the timeline of the current project. If the segment is moved, often one needs to carry the relevant automated audio processing with it. Segment-based processing or automation allows the changes in parameters that take place during a segment to be ‘anchored’ to that segment so that they can be made to move around with it if required.

FIGURE 14.18 Example from Logic of automation data graphically overlaid on sequencer tracks.

It is possible to edit automation or control events in a similar way to note events, by dragging, drawing, adding and deleting points, but there are a number of other possibilities here. For example, a scaling factor may be applied to controller data in order to change the overall effect by so many percent, or a graphical contour may be drawn over the controller information to scale it according to the magnitude of the contour at any point. Such a contour could be used to introduce a gradual increase in MIDI note velocities over a section, or to introduce any other time-varying effect. Program changes can be inserted at any point in a sequence, usually either by inserting the message in the event list, or by drawing it at the appropriate point in the controller chart. This has the effect of switching the receiving device to a new voice or stored program at the point where the message is inserted. It can be used to ensure that all tracks in a sequence use the desired voices from the outset without having to set them up manually each time. Either the name of the program to be selected at that point or its number can be displayed, depending on whether the sequencer is subscribing to a known set of voice names such as General MIDI.

System exclusive data may also be recorded or inserted into sequences in a similar way to the message types described above. Any such data received during recording will normally be stored and may be displayed in a list form. It is also possible to insert SysEx voice dumps into sequences in order that a device may be loaded with new parameters whilst a song is executing if required.

MIDI mixing and external control

Sequencers often combine a facility for mixing audio with one for controlling the volume and panning of MIDI sound generators. Using MIDI volume and pan controller numbers (decimal 7 and 10), a series of graphical faders can be used to control the audio output level of voices on each MIDI channel, and may be able to control the pan position of the source between the left and right outputs of the sound generator if it is a stereo source. On-screen faders may also be available to be assigned to other functions of the software, as a means of continuous graphical control over parameters such as tempo, or to vary certain MIDI continuous controllers in real time.

It is also possible with some packages to control many of the functions of the sequencer using external MIDI controllers. An external MIDI controller with a number of physical faders and buttons could be used as a basic means of mixing, for example, with each fader assigned to a different channel on the sequencer’s mixer.

Synchronization

A sequencer’s synchronization features are important when locking replay to external timing information such as MIDI clock or timecode. Most sequencers are able to operate in either beat clock or timecode sync modes and some can detect which type of clock data is being received and switch over automatically. To lock the sequencer to another sequencer or to a drum machine beat clock synchronization may be adequate. If you will be using the sequencer for applications involving the timing of events in real rather than musical time, such as the dubbing of sounds to a film, then it is important that the sequencer is able to allow events to be tied to time-code locations, as timecode locations will remain in the same place even if the musical tempo is changed.

Sequencers incorporating audio tracks also need to be able to lock to sources of external audio or video sync information (e.g. word clock or composite video sync), in order that the sampling frequency of the system can be synchronized to that of other equipment in the studio.

Synchronized digital video

Digital video capability is now commonplace in desktop workstations. It is possible to store and replay full motion video on a desktop computer, either using a separate monitor or within a window on an existing monitor, using widely available technology such as QuickTime or Windows Multimedia Extensions. The replay of video from disk can be synchronized to the replay of audio and MIDI, using timecode, and this is particularly useful as an alternative to using video on a separate video tape recorder (which is mechanically much slower, especially in locating distant cues). In some sequencing or editing packages the video can simply be presented as another ‘track’ alongside audio and MIDI information.

In the applications considered here, compressed digital video is intended principally as a cue picture that can be used for writing music or dubbing sound to picture in post-production environments. In such cases the picture quality must be adequate to be able to see cue points and lip sync, but it does not need to be of professional broadcast quality. What is important is reasonably good slow motion and freeze-frame quality. Good quality digital video (DV), though, can now be transferred to and from workstations using a Firewire interface enabling video editing and audio post-production to be carried out in an integrated fashion, all on the one platform.

AUDIO REMOTE CONTROL USING COMPUTER NETWORKS

MIDI has been around for some decades, and it has already been mentioned how remarkably well it has continued to serve in a variety of applications and environments during that time despite the huge developments and changes the industry has seen. It will continue to perform a useful function, particularly in its original home where electronic musical instruments are required to trigger and control each other, but there has been a growing need for system control standards for computer networks. Options exist for carrying MIDI data over Ethernet, as mentioned above, and the forthcoming HD MIDI standard will most likely be network-oriented. The situation with audio system remote control today is similar to that which existed in about 1980, when several manufacturers were using proprietary signaling systems for the control of musical instruments before MIDI became the industry standard. Several bodies are now working to achieve common technical standards both for the implementation of proprietary formats and with a view to achieving industry consensus, and some formats are in use in the field.

Open control architecture

A number of professional audio manufacturers have formed a group with the aim of securing standardization of the Open Control Architecture (OCA), a media networking system control standard for professional applications. The Alliance has been formed to complete the technical definition of OCA, then to transfer its development to an accredited public standards organization. The latter will render the OCA specification into an open public standard for control of professional media network systems. Developed recently by Bosch Communications Systems, OCA is descended from AES-24, a system control protocol developed by the Audio Engineering Society in the 1990s. OCA defines a flexible and robust control standard that covers the entire range of pro media networking applications, from the smallest to the largest. The OCA Release 1.1 is available on the OCA Alliance website, and is only one of a number of control systems proposed or already in the field. Others will be covered below, but a closer look at the OCA proposals gives an insight into the level of control offered generally by these systems as the industry moves from control via DIN-to-DIN leads towards full computer integration.

The OCA Framework, sometimes abbreviated as OCF, includes the following:

1. Discover devices. This recognizes OCA-compliant devices that are connected to the network.

2. Manage streaming connections. Define and undefine media stream paths between and among devices. Interfaces with features of the media and transport system in order to set up and take down media connections.

3. Control and monitor of operating and configuration parameters of OCA-compliant devices.

4. Define and manage devices that have reconfigurable signal processing and/or control capabilities.

5. Upgrade software/firmware of controlled devices. This to include features for fail-safe upgrades.

OCA operates over industry-standard data network equipment in secure and unsecure mode, coexisting with non-OCA devices. Incorporation of new device types and device upgrades as well as of non-standard devices is allowed, and multiple protocol versions are offered for different kinds of interconnection. The current protocol is OCP.1 for TCP/IP Ethernets. Future specifications, labelled OCP.2, OCP.3 and so on, will be for other kinds of connection such as USB.

The actions of OCA are contained within separate ‘classes’, each class defining the type of action allowable between devices. This is known as ‘object-oriented protocol’ and is defined by the combination of four sets:

1. Class definitions: the types of objects existing within devices.

2. Naming and addressing rules: how the objects and their attributes are identified.

3. Protocol Data Unit formats (PDU): specify formats of transmitted and received data.

4. PDU Exchange Rules: defining communication sequences used to effect information exchange.

‘Objects’ in this context are what actually run in the computer, and are units of code belonging to generic ‘classes’. Each OCA device is given a unique object number, ONo, which can be fixed at the time of manufacture or chosen subsequently for configurable devices. The ONos are 32 bits long, so duplication or a later re-use of an ONo would be rare, 2³² being a huge number. Allocating blocks of numbers to the various manufacturers however would remove the possibility of duplications entirely. (AES64, described below, addresses this, and EuCon uses a system of ‘zones’.) The ONo is sent and received with every command and response, and controllers can allow users to identify devices hierarchically, for instance by using channel numbers.

Actuators, sensors and blocks

Actuators control a device’s signal processing and housekeeping. In any device, any actuator class may be instantiated, that is, a specific command pertaining to that class is sent, as many times as required to control that function of a processor. There are 36 actuator classes, and examples that can be specified in the system include control of gain, signal mute, multiposition selection, parametric EQ, delay, compressor/limiting, and temperature setting. The latter can be used to monitor the temperature of power amplifiers, for example.

Sensors detect the value of a parameter and transmit it back to controllers. A sensor’s reading may be transmitted either periodically or when it exceeds a defined threshold. There can be up to 20 sensor classes, including sensing of signal level in absolute terms, sensing of level according to VU or PPM ballistics, and the sensing of temperature.

A block is a special type of worker that can contain other objects — units of code. It can contain workers, agents, or certain other blocks, and it has a signal flow topology. An object inside a block is a member of that block, or ‘container’. Each block is described by a class, and the base class for all block classes is named OcaBlock. A block class represents a group of workers, agents and nested blocks, the signal flowing within that group; it does not represent specific audio processing.

AES64-2012

The AES64-2012 ‘standard for audio applications of networks: command, control, and connection management for integrated media’ specifies a system of control which shares common ground with OCA. The latter is now a project of the AES Standards Committee, designated X210, and the AES and OCA Alliance are working together to achieve common standards.

The AES64 message structure is as follows:

IP Header: this includes source and destination Internet Protocol (IP)

address (unicast, broadcast or multicast).

UDP (User Datagram Protocol) Header: includes source and destination port

(IANA reserved port number 7107).

AES64 Message.

Table 14.6 AES64-2012 Message Structure

The AES message structure has a number of components, as shown in Table 14.6. Each source and destination device has a unique 128-bit ID, comprising:

1. An 8-byte Extended Unique Identifier (EUI64) as registered with IEEE-RA.

2. A 3-byte Organizationally Unique Identifier (OUI) for the manufacturer of the network interface as registered with IEEE-RA.

3. A 2-byte device ID that is unique amongst a manufacturer’s products.

4. A 3-byte reserve field.

Every parameter within a device has its own parameter ID comprising a 32-bit value to give unambiguous control of gain, EQ adjustments, routing and other such actions. The User Level informs a parameter of the user’s authority to make a change to that parameter. Some Message Types will be requests, some will be responses. Some will contain hierarchical information describing a parameter action, others will contain parameter IDs. The Sequence ID relates each response message with its request message, important when a large number of commands are being transmitted. Command Execute and Command Qualifier involve both the sending of commands to a receiver and the qualifying of the type of command relevant to that action.

AES X170, the area in Table 14.6 enclosed within the heavy lines, is an IP-based protocol for device control, monitoring and configuration over a network, and it forms the address block. It has a seven-level hierarchy:

1. Section block: the highest functional group. A device has a number of these, for example input, output and group sections.

2. Section type: a subgroup within the section block, identifying components within it, such as which input, and which output.

3. Section number: indicates an interface or channel number as appropriate.

4. Parameter block: identifies parameter groups, such as a group of equalizers or a routing section that controls a channel.

5. Parameter block index: differentiates similar components within a parameter block, for example different sections of an eq.

6. Parameter type: indicates the type of parameter, for example eq, gain, or routing.

7. Parameter type index: gives accurate information regarding which of a choice of similar parameters is to be adjusted where there may be ambiguity, such as ‘coarse gain’ or ‘fine gain’.

Where messages are sent to simple devices not requiring or not able to respond to a full set of commands, dummy values are used. Each device has a unique mapping table containing a local index, the device ID and its IP address.

Types of message

Every AES64 message will either be a command message sent from one device to another or a response message once a command has been received. Not all commands require responses. After the Message Type, which can be a full Address Block, an Indexed Message or a Response Message, two other fields indicate the type of command: Command Executive, and Command Qualifier. The Executive indicates the basic action to be taken, for example ‘GET’ or ‘SET’ data values, perform an action — ‘ACT’ — or create a structure such as a list — ‘CREATE’. Command Qualifiers define precisely what action is to be taken: for example the exact number of decibels up or down; the routing; whether the command pertains to single channels or groups of channels; and the appropriate labeling and indication of an action be it a dB scale, an on/off, delay value, or another type of indication.

Table 14.7 gives a brief description of other commands that AES64 is capable of.

EuCon

Avid EuCon is an example of a computer control system which has been operating in the field for several years. Its users include Pro Tools 9.0 and higher, Media Composer, Cubase, Logic Pro, Nuendo, and others. Traditional-looking digital mixing desks are still very much in use, and the need to access a number of faders and knobs rapidly with the fingers rather than control via a mouse and keypad on a computer screen is an important ergonomic consideration in the live sound industry, and also in studios for certain types of recording and mixing sessions. The EuCon system can be used to create control messages from a computer, but an early design aim, however, was also to generate control signals from a digital mixing desk or a control surface designed for operation like one, the operation of faders, EQ and aux knobs, routing and all other functions creating the appropriate control signals to be sent via a computer link. The C++ programming language is used to communicate with other devices using TCP/IP (Transmission Control Protocol/Internet Protocol) via Ethernet, IEEE-1394 or USB connection. The conventional-looking mixer therefore controls what is going on in the computer (the ‘box’), the actual audio signals and processing being contained and implemented in the latter. In-the box/out-of-the-box mixing issues are covered further in Chapter 9. With the EuCon system a number of Control surfaces can be used to control a common Application (the latter in the context of EuCon referring to the audio processing project going on in the computer or workstation) which enables several operators to concentrate on their particular areas of interest, something that has become important in complex audio/visual projects for instance. Also, a single Control surface can be used to control several Applications. Original design aims included high control resolution across thousands of operations with low latency. The following gives a brief overview of some of the details of the EuCon format.

Table 14.7 Some AES64 Features

Feature	Function
Snapshot	A set of commands to be saved by a receiving device, the controller sending a ‘SAVE SNP’ message to instruct it to do so, complete with its unique ID. A future instruction ‘SET VAL’ with the unique ID attached will recall those commands in the receiver.
Parameter	Each parameter has a number of flags indicating the states the parameters can occupy. A 0 or 1 within a 32-bit flag register indicates ‘not set’ or ‘set’. These flags can include such things as ‘select’ which will control a certain parameter within a selection group; ‘read’ which can determine whether a parameter can be read or not; ‘write’ which determines whether a parameter can be written to or not; and various others such as value protect, isolate, or lock.
‘Pushing’	There are situations where the density of information to be handled is unnecessarily high for a requirement, taking up a lot of bandwidth. For example, a large number of meter readings which change frequently can be relayed back to the controlling device, necessitating the sending of much information. The ‘push’ facility rationalizes the data, allowing a restricted number of meter readings to be used along with a set interval of time between each update of the readings.
Grouping	An application of this can be a VCA-type of grouping parameter where all of the faders controlling the level of instruments in a band need to be moved in relation to each other for overall level control. A master-slave group would operate in this way. A peer-to-peer group however would mean that altering the fader which controls a particular instrument will also alter the faders controlling the other instruments by the same amount.
Modifiers	There will be circumstances where commands sent from a controller will need to be modified to fit the receiving device, or control it in a certain desired way. For instance, a limited number of fader movement commands can be modified to control a large number of movements on the receiving device. The Modifier can be used to assign the original channel numbers to appropriate destinations, and/or control them to a scale and at a rate that is different from the original command. An Event Modifier allows an initial trigger event command to set in motion a series of other events over a period of time, and also to modify such parameters as time intervals between events or whether a particular event is to be triggered or not. Automated mixing is an obvious application of the Event Modifier.
Desk items	As the name applies, these are simply representations on the computer screen, or physical controls on a traditional-style console, of traditional mixer facilities: faders, knobs, switches, buttons. They enable read and write access to the controls to generate the AES64 data to send and receive commands which will be understood by all devices. A large amount of data is needed to describe all aspects of a large console’s controls, and these are stored as an Extensible Markup Language (XML) file.

Node objects

Control surfaces and Applications are represented by individual node objects, and all processing actions that a particular Control surface or Application is capable of is registered with its associated node. There is one node per control surface, and one node for each Application; several of the latter can reside in a computer or DAW. All nodes are registered uniquely with a EuCon Discovery distributed database so that any of them on the network can be requested and located to facilitate connection between Control and Application, the database being kept current via the TCP/IP network connection. When a Control surface has located an Application, it proceeds to map its controls to those of the Application in order to coordinate all possible actions, a process called assignment. If a new plug-in or other device is added to or removed from the Application at a later stage, an appropriate control knob and label will appear or be deleted on the Control surface.

Protocol

EuCon’s protocol is object-orientated, an object in this context being a specific control function such as fader, aux send, routing path or meter; this simplifies programming, and also facilitates object grouping such as all faders, or all routing. These objects have a containment hierarchy as follows.

1. Primitives. Twelve basic controls: knob, fader, switch, meter, LED, bitmap, joystick, graph, wheel, trackball, text display, automation status.

1.1. Primitive values: 32 bit integer number, 32 bit floating.

2. Controls. A control is an object that contains one or more primitives. For instance, it is desirable to group both a switch and its associated LED into a single control command. A ‘knob’ control might contain Primitives such as the knob itself, text display, touch switch, automation switch, and function LED.

2.1. Control Array: a control that contains other controls, in two types. One is a Switch Array which can contain any number of switches across channels. The other is a Knob Cell Array which contains knobs and any associated switches such as pre/post or aux send on/off throughout the control surface.

Processor types

Individual processor objects are grouped together as processor types according to a hierarchy rule, a logical procedure which eases programming. Processor types include:

■ Channel Strip — groups together traditional input channel objects in a logical order, and arranges channels appropriately on the control surface with number 1 to the left, etc. If there are more channels in the Application than are available on the control surface, channel numbers can be scrolled through, or channel strips on the control surface can be assigned to the channel number in the application.

■ Command — Application-defined set of switches and menu choices which appear on the control surface as soft keys on a screen. These include such things as ‘file’, ‘edit’, ‘undo’, ‘re-do’, ‘cut’, and ‘copy’.

■ Transport — contains traditional transport control buttons along with text displays for such things as edit locate points and time code. As well as physical button assignment, a series of soft keys can be created on the screen for inclusion of additional Application-specific requirements.

■ Edit controller — contains a jog wheel and jog/shuttle mode switches. The latter can assign the jog wheel function to any of the Application’s functions, examples including horizontal and vertical zooming, waveform zooming, trimming of clip head and tail, cross fade length, and adjustment of gain for edited and inserted sections.

■ Monitor — collects together the traditional control room requirements such as volume, dim, speaker select, and monitor source selection. It will also support any specific Application requirement.

■ Project — a slightly confusing term which refers to an array of marked points along an Application which can be located using a row of switches. Operators can thereby locate their specific ‘projects’ within the Application instantly using one of these buttons.

■ System — a collection of global functions such as mute all, clear mute, clear solo or solo in place (SIP), automation mode, and user preferences.

Summary

When hard disk drive editing first appeared in the 1980s, manipulation of a project by a mouse and keypad on a computer screen was a quite adequate way of working, given the basic nature of early systems and the small number of recording tracks available. As the computer formats grew in complexity and the number of recording tracks began to outstrip anything that was available with analog multitrack recorders, ergonomic considerations became more pressing. The digital mixing console is essentially a conventional control surface combined with a computer which carries the audio processing, the two occupying the same box. The use of multi-function knobs and screens facilitate efficient organization and operation whilst maintaining a tactile relationship between machine and operator. Some digital mixers also contain disk drive-based audio recorders for the recording of projects. As computer editing formats have developed, it has been recognized that this relationship continues to be important, not just in the fields of live sound and mixing but also in the recording and audio-visual fields. A logical development is therefore a control surface which provides a conventional and familiar working tool without carrying audio signals, recording or processing. It merely generates and receives control signals which can be transmitted and received via a computer network to and from a computer, workstation, or another control surface. The physical separation of control and processing has provided a very flexible and versatile framework within which the industry can develop, and the next few years should hopefully see the achievement of agreed standards for network control across the industry.

USEFUL WEBSITES

Audio Engineering Society standards: http://www.aes.org/standards.

MIDI Manufacturers Association: http://www.midi.org.

Music XML: http://www.musicxml.org.

Open Control Architecture: http://www.oca-alliance.com.

Open Sound Control: http://opensoundcontrol.org.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 14 MIDI and Remote Control

Create new playlist

Sign In

Sign Up

CHAPTER CONTENTS

Table of Contents for
Chapter 14 MIDI and Remote Control