MIDI is the Music Instrument Digital Interface, a control protocol and interface standard for electronic musical instruments which has also been used widely in other music and audio products. Although it is relatively dated by modern standards it is still used extensively, which is a testament to its simplicity and success. Even if the MIDI hardware interface is used less these days, either because more synthesis, sampling and processing takes place using software within the workstation, or because other data interfaces such as USB, Firewire and Ethernet are becoming popular, the original protocol for communicating events and other control information is still widely encountered. A lot of software that runs on computers uses MIDI as a basis for controlling the generation of sounds and external devices.

Synthetic audio is used increasingly in audio workstations and mobile devices as a very efficient means of audio representation, because it only requires control information and sound object descriptions to be transmitted. Standards such as MPEG-4 Structured Audio enable synthetic audio to be used as an alternative or an addition to natural audio coding and this can be seen as a natural evolution of the MIDI concept in interactive multimedia applications.

BACKGROUND

Electronic musical instruments existed widely before MIDI was developed in the early 1980s, but no universal means existed of controlling them remotely. Many older musical instruments used analog voltage control, rather than being controlled by a microprocessor, and thus used a variety of analog remote interfaces (if indeed any facility of this kind was provided at all). Such interfaces commonly took the form of one port for timing information, such as might be required by a sequencer or drum machine, and another for pitch and key triggering information, as shown in Figure 14.1. The latter, commonly referred to as ‘CV and gate’, consisted of a DC (direct current) control line carrying a variable control voltage (CV) which was proportional to the pitch of the note, and a separate line to carry a trigger pulse. A common increment for the CV was 1 volt per octave (although this was by no means the only approach) and notes on a synthesizer could be triggered remotely by setting the CV to the correct pitch and sending a ‘note on’ trigger pulse which would initiate a new cycle of the synthesizer’s envelope generator. Such an interface would deal with only one note at a time, but many older synths were only monophonic in any case (that is, they were only capable of generating a single voice).

Instruments with onboard sequencers would need a timing reference in order that they could be run in synchronization with other such devices, and this commonly took the form of a square pulse train at a rate related to the current musical tempo, often connected to the device using a DIN-type connector, along with trigger lines for starting and stopping a sequence’s execution. There was no universal agreement over the rate of this external clock, and frequencies measured in pulses per musical quarter note (ppqn), such as 24 ppqn and 48 ppqn, were used by different manufacturers. A number of conversion boxes were available which divided or multiplied clock signals in order that devices from different manufacturers could be made to work together.

FIGURE 14.1 Prior to MIDI control, electronic musical instruments tended to use a DC remote interface for pitch and note triggering. A second interface handled a clock signal to control tempo and trigger pulses to control the execution of a stored sequence.

As microprocessor control began to be more widely used in musical instruments a number of incompatible digital control interfaces sprang up, promoted by the large synthesizer manufacturers, some serial and some parallel. Needless to say the plethora of non-standardized approaches to remote control made it difficult to construct an integrated system, especially when integrating equipment from different manufacturers. Owing to collaboration between the major parties in America and Japan, the way became clear for agreement over a common hardware interface and command protocol, resulting in the specification of the MIDI standard in late 1982/early 1983. This interface grew out of an amalgamation of a proposed universal interface called USI (the Universal Synthesizer Interface) which was intended mainly for note on and off commands, and a Japanese specification which was rather more complex and which proposed an extensive protocol to cover other operations as well. Since MIDI’s introduction, the use of older remote interfaces has died away very quickly, but there remain available a number of specialized interfaces which may be used to interconnect non-MIDI equipment to MIDI systems by converting the digital MIDI commands into the type of analog information described above.

The standard has been subject to a number of addenda, extending the functionality of MIDI far beyond the original. The original specification was called the MIDI 1.0 specification, to which has been added such addenda as the MIDI Sample Dump protocol, MIDI Files, General MIDI (1 and 2), MIDI TimeCode, MIDI Show Control, MIDI Machine Control and Downloadable Sounds. A new ‘HD’ (High Definition) version of the standard is planned for release in 2009, which is expected to include support for more channels and controllers, as well as greater controller resolution using single messages. It is aimed to make this compatible with existing hardware and software. The MIDI Manufacturer’s Association (MMA) is now the primary association governing formal extensions to the standard, liaising closely with a Japanese association called AMEI (Association of Musical Electronics Industry).

WHAT IS MIDI?

MIDI is a digital remote control interface for music systems, but has come to relate to a wide range of standards and specifications to ensure interoperability between electronic music systems. MIDI-controlled equipment is normally based on microprocessor control, with the MIDI interface forming an I/O port. It is a measure of the popularity of MIDI as a means of control that it has now been adopted in many other audio and visual systems, including the automation of mixing consoles, the control of studio outboard equipment, lighting equipment and other machinery. Although many of its standard commands are music related, it is possible either to adapt music commands to non-musical purposes or to use command sequences designed especially for alternative methods of control.

The adoption of a serial standard for MIDI was dictated largely by economic and practical considerations, as it was intended that it should be possible for the interface to be installed on relatively cheap items of equipment and that it should be available to as wide a range of users as possible. A parallel system might have been more professionally satisfactory, but would have involved a considerable manufacturing cost overhead per MIDI device, as well as parallel cabling between devices, which would have been more expensive and bulky than serial interconnection. The simplicity and ease of installation of MIDI systems was largely responsible for its rapid proliferation as an international standard.

Unlike its analog predecessors, MIDI integrates timing and system control commands with pitch and note triggering commands, such that everything may be carried in the same format over the same piece of wire. MIDI makes it possible to control musical instruments polyphonically in pseudo real time: that is, the speed of transmission is such that delays in the transfer of performance commands are not audible in the majority of cases. It is also possible to address a number of separate receiving devices within a single MIDI data stream, and this allows a controlling device to determine the destination of a command.

MIDI AND DIGITAL AUDIO CONTRASTED

For many the distinction between MIDI and digital audio may be a clear one, but those new to the subject often confuse the two. Any confusion is often due to both MIDI and digital audio equipment appearing to perform the same task – that is the recording of multiple channels of music using digital equipment – and is not helped by the way in which some manufacturers refer to MIDI sequencing as digital recording.

Digital audio involves a process whereby an audio waveform (such as the line output of a musical instrument) is sampled regularly and then converted into a series of binary words that represent the sound waveform, as described in Chapter 8. A digital audio recorder stores this sequence of data and can replay it by passing the original data through a digital-to-analog convertor that turns the data back into a sound waveform, as shown in Figure 14.2. A multitrack recorder has a number of independent channels that work in the same way, allowing a sound recording to be built up in layers. MIDI, on the other hand, handles digital information that controls the generation of sound. MIDI data does not represent the sound waveform itself. When a multitrack music recording is made using a MIDI sequencer (described later) this control data is stored, and can be replayed by transmitting the original data to a collection of MIDI-controlled musical instruments. It is the instruments that actually reproduce the recording.

FIGURE 14.2 (a) Digital audio recording and (b) MIDI recording contrasted. In (a) the sound waveform itself is converted into digital data and stored, whereas in (b) only control information is stored, and a MIDI-controlled sound generator is required during replay

A digital audio recording, then, allows any sound to be stored and replayed without the need for additional hardware. It is useful for recording acoustic sounds such as voices, where MIDI is not a great deal of help here. A MIDI recording is almost useless without a collection of sound generators. An interesting advantage of the MIDI recording is that, since the stored data represents event information describing a piece of music, it is possible to change the music by changing the event data. MIDI recordings also consume a lot less memory space than digital audio recordings. It is also possible to transmit a MIDI recording to a different collection of instruments from those used during the original recording, thus resulting in a different sound. It is now common for MIDI and digital audio recording to be integrated in one software package, allowing the two to be edited and manipulated in parallel.

BASIC PRINCIPLES

The interface

The MIDI standard specifies a unidirectional serial interface (see Fact File 8.6, Chapter 8) running at 31.25 kbit/s ±1%. The rate was defined at a time when the clock speeds of microprocessors were typically much slower than they are today, this rate being a convenient division of the typical 1 or 2 MHz master clock rate. The rate had to be slow enough to be carried without excessive losses over simple cables and interface hardware, but fast enough to allow musical information to be transferred from one instrument to another without noticeable delays. Control messages are sent as groups of bytes. Each byte is preceded by one start bit and followed by one stop bit per byte in order to synchronize reception of the data which is transmitted asynchronously, as shown in Figure 14.3. The addition of start and stop bits means that each 8 bit word actually takes ten bit periods to transmit (lasting a total of 320μs). Standard MIDI messages typically consist of one, two or three bytes, although there are longer messages for some purposes.

FIGURE 14.3 A MIDI message consists of a number of bytes, each transmitted serially and asynchronously by a UART in this format, with a start and stop bit to synchronize the receiving UART. The total period of a MIDI data byte, including start and stop bits, is 320μs.

The hardware interface is shown in Fact File 14.1. In the MIDI specification, the opto-isolator is defined as having a rise time of no more than 2μs. The rise time affects the speed with which the device reacts to a change in its input and if slow will tend to distort the leading edge of data bit cells. The same also applies in practice to fall times. Rise-time distortion results in timing instability of the data, since it alters the time at which a data edge crosses the decision point between one and zero. If the rise time is excessively slow the data value may be corrupted since the output of the device will not have risen to its full value before the next data bit arrives. If a large number of MIDI devices are wired in series (that is from THRU to IN a number of times) the data will be forced to pass through a number of opto-isolators and thus will suffer the combined effects of a number of stages of rise-time distortion. Whether or not this will be sufficient to result in data detection errors at the final receiver will depend to some extent on the quality of the opto-isolators concerned, as well as on other losses that the signal may have suffered on its travels. It follows that the better the specification of the opto-isolator, the more stages of device cascading will be possible before unacceptable distortion is introduced. The delay in data passed between IN and THRU is only a matter of microseconds, so this contributes little to any audible delays perceived in the musical outputs of some instruments in a large system. The bulk of any perceived delay will be due to other factors like processing delay, buffer delays and traffic.

FACT FILE 14.1 MIDI HARDWARE INTERFACE

Most equipment using MIDI has three interface connectors: IN, OUT, and THRU. The OUT connector carries data that the device itself has generated. The IN connector receives data from other devices and the THRU connector is a direct throughput of the data that is present at the IN. As can be seen from the hardware interface diagram, it is simply a buffered feed of the input data, and it has not been processed in any way. A few cheaper devices do not have THRU connectors, but it is possible to obtain ‘MIDI THRU boxes’ which provide a number of ‘THRUs’ from one input. Occasionally, devices without a THRU socket allow the OUT socket to be switched between OUT and THRU functions.

The interface incorporates an opto-isolator between the MIDI IN (that is the receiving socket) and the device’s microprocessor system. This is to ensure that there is no direct electrical link between devices and helps to reduce the effects of any problems which might occur if one instrument in a system were to develop an electrical fault. An opto-isolator is an encapsulated device in which a light-emitting diode (LED) can be turned on or off depending on the voltage applied across its terminals, illuminating a photo-transistor which consequently conducts or not, depending on the state of the LED. Thus the data is transferred optically, rather than electrically.

FACT FILE 14.2 MIDI CONNECTORS AND CABLES

The connectors used for MIDI interfaces are 5-pin DIN types. The specification also allows for the use of XLR-type connectors (such as those used for balanced audio signals in professional equipment), but these are rarely encountered in practice. Only three of the pins of a 5-pin DIN plug are actually used in most equipment (the three innermost pins). In the cable, pin 5 at one end should be connected to pin 5 at the other, and likewise pin 4 to pin 4, and pin 2 to pin 2. Unless any hi-fi DIN cables to be used follow this convention they will not work. Professional microphone cable terminated in DIN connectors may be used as a higher-quality solution, because domestic cables will not always be a shielded twisted pair and thus are more susceptible to external interference, as well as radiating more themselves which could interfere with adjacent audio signals. A 5 mA current loop is created between a MIDI OUT or THRU and a MIDI IN, when connected with the appropriate cable, and data bits are signaled by the turning on and off of this current by the sending device. This principle is shown in the diagram.

The cable should be a shielded twisted pair with the shield connected to pin 2 of the connector at both ends, although within the receiver itself, as can be seen from the diagram above, the MIDI IN does not have pin 2 connected to earth. This is to avoid earth loops and makes it possible to use a cable either way round. If two devices are connected together whose earths are at slightly different potentials, a current is caused to flow down any earth wire connecting them. This can induce interference into the data wires, possibly corrupting the data, and can also result in interference such as hum on audio circuits. It is recommended that no more than 15 m of cable is used for a single cable run in a simple MIDI system and investigation of typical cables indicates that corruption of data does indeed ensue after longer distances, although this is gradual and depends on the electromagnetic interference conditions, the quality of cable and the equipment in use. Longer distances may be accommodated with the use of buffer or ‘booster’ boxes that compensate for some of the cable losses and retransmit the data. It is also possible to extend a MIDI system by using a data network with an appropriate interface.

The specification of cables and connectors is described in Fact File 14.2. This form of hardware interface is increasingly referred to as ‘MIDI-DIN’ to distinguish it from other means of transferring MIDI data.

Implementations of MIDI that work over other hardware interfaces such as Ethernet (using Internet Protocol/UDP), USB and Firewire (IEEE 1394) have also been introduced, sometimes in proprietary form. The latter two are described briefly later in the chapter.

Simple interconnection

In the simplest MIDI system, one instrument could be connected to another as shown in Figure 14.4. Here, instrument 1 sends information relating to actions performed on its own controls (notes pressed, pedals pressed, etc.) to instrument 2, which imitates these actions as far as it is able. This type of arrangement can be used for ‘doubling-up’ sounds, ‘layering’ or ‘stacking’, such that a composite sound can be made up from two synthesizers’ outputs. (The audio outputs of the two instruments would have to be mixed together for this effect to be heard.) Larger MIDI systems could be built up by further ‘daisy-chaining’ of instruments, such that instruments further down the chain all received information generated by the first (see Figure 14.5), although this is not a very satisfactory way of building a large MIDI system. In large systems some form of central routing helps to avoid MIDI ‘traffic jams’ and simplifies interconnection.

INTERFACING A COMPUTER TO A MIDI SYSTEM

Adding MIDI ports

In order to use a workstation as a central controller for a MIDI system it must have at least one MIDI interface, consisting of at least an IN and an OUT port. (THRU is not strictly necessary in most cases.) Unless the computer has a built-in interface, as found on the old Atari machines, some form of third-party hardware interface must be added and there are many ranging from simple single ports to complex multiple port products.

FIGURE 14.4 The simplest form of MIDI interconnection involves connecting two instruments together as shown.

FIGURE 14.5 Further instruments can be added using THRU ports as shown, in order that messages from instrument 1 may be transmitted to all the other instruments.

A typical single port MIDI interface can be connected either to one of the spare I/O ports of the computer (a serial or USB port, for example), or can be installed as an expansion slot card (perhaps as part of an integrated sound card). Depending on which port it is connected to, some processing may be required within the MIDI interface to convert the MIDI data stream to and from the relevant interface protocol. Older PCs had serial interfaces that would operate at a high enough data rate for MIDI, but were not normally able to operate at precisely the 31.25 kbaud required. External interfaces were able to transpose the data stream from a higher serial data rate (often 38.4 kbaud) down to the MIDI rate using intermediate buffering and flow control. Some PCs and soundcards also had the so-called ‘MIDI/Joystick port’ that conformed to the old Roland MPU-401 interface standard. Adaptor cables were available that provided MIDI IN and OUT connectors from this port. Some older PC interfaces also attach to the parallel port. The majority of recent MIDI interfaces are connected either to USB or Firewire ports of host workstations.

Multiport interfaces have become widely used in MIDI systems where more than 16 MIDI channels are required, and they are also useful as a means of limiting the amount of data sent or received through any one MIDI port. (A single port can become ‘overloaded’ with MIDI data if serving a large number of devices, resulting in data delays.) Multiport interfaces are normally more than just a parallel distribution of a single MIDI data stream, typically handling a number of independent MIDI data streams that can be separately addressed by the operating system drivers or sequencer software. USB and Firewire MIDI protocols allow a particular stream or ‘cable’ to be identified so that each stream controlling 16 MIDI channels can be routed to a particular physical port or instrument.

EMagic’s Unitor8 interface is pictured in Figure 14.6. It has RS-232 and -422 serial ports as well as a USB port to link with the host workstation. There are eight MIDI ports with two on the front panel for easy connection of ‘guest’ devices or controllers that are not installed at the back. This device also has VITC and LTC timecode ports in order that synchronization information can be relayed to and from the computer. A multi-device MIDI system is pictured in Figure 14.7, showing a number of multi-timbral sound generators connected to separate MIDI ports and a timecode connection to an external video tape recorder for use in synchronized post-production. As more of these functions are now being provided within the workstation (e.g. synthesis, video, mixing) the number of devices connected in this way will reduce.

FIGURE 14.6 (a) Front and (b) back panels of the Emagic Unitor 8 interface, showing USB port, RS-422 port, RS-232 port, LTC and VITC ports and multiple MIDI ports.

FIGURE 14.7 A typical multi-machine MIDI system interfaced to a computer via a multiport interface connected by a high-speed link (e.g. USB).

Drivers and audio I/O software

Most audio and MIDI hardware requires ‘driver’ software of some sort to enable the operating system (OS) to ‘see’ the hardware and use it correctly. There are also sound manager or multimedia extensions that form part of the operating system of the workstation in question, designed to route audio to and from hardware in the absence of dedicated solutions. The standard multimedia extensions of the OS that basic audio software used in older systems to communicate with sound cards could result in high latency and might also be limited to only two channels and 48 kHz sampling frequency. Dedicated low latency approaches were therefore developed as an alternative, allowing higher sampling frequencies, full audio resolution, sample-accurate synchronization and multiple channels. Examples of these are Steinberg’s ASIO (Audio Stream Input Output) and E-Magic’s EASI. These are software extensions behaving as ‘hardware abstraction layers’ (HALs) that replace the OS standard sound manager and enable applications to communicate more effectively with I/O hardware. ASIO, for example, handles a range of sampling frequencies and bit depths, as well as multiple channel I/O, and many sound cards and applications are ASIO-compatible.

As high-quality audio begins to feature more prominently in general purpose desktop computers, audio architectures and OS audio provision improve to keep step. OS native audio provision may now take the place of what third-party extensions have provided in the past. For example, Apple’s OS X Core Audio standard is designed to provide a low latency HAL between applications and audio hardware, enabling multichannel audio data to be communicated to and from sound cards and external interfaces such as USB and Firewire. Core Audio handles audio in 32 bit floating-point form for high-resolution signal processing, as well as enabling sample accurate timing information to be communicated alongside audio data. Microsoft has also done something similar for Windows systems, with the Windows Driver Model (WDM) audio drivers that also include options for multichannel audio, high resolutions and sampling frequencies. DirectSound is the Microsoft equivalent of Apple’s OS X Core Audio.

Core MIDI and DirectMusic do a similar thing for MIDI data. Whereas previously it would have been necessary to install a third-party MIDI HAL such as OMS (Opcode’s Open Music System) or MIDI Manager to route MIDI data to and from multiport interfaces and applications, these features are now included within the operating system’s multimedia extensions.

HOW MIDI CONTROL WORKS

MIDI channels

MIDI messages are made up of a number of bytes as explained in Fact File 14.3. Each part of the message has a specific purpose, and one of these is to define the receiving channel to which the message refers. In this way, a controlling device can make data device specific – in other words it can define which receiving instrument will act on the data sent. This is most important in large systems that use a computer sequencer as a master controller, when a large amount of information will be present on the MIDI data bus, not all of which is intended for every instrument. If a device is set in software to receive on a specific channel or on a number of channels it will act only on information which is ‘tagged’ with its own channel numbers. Everything else it will usually ignore. There are 16 basic MIDI channels and instruments can usually be set to receive on any specific channel or channels (omni off mode), or to receive on all channels (omni on mode). The latter mode is useful as a means of determining whether anything at all is being received by the device.

Later it will be seen that the limit of 16 MIDI channels can be overcome easily by using multiport MIDI interfaces connected to a computer. In such cases it is important not to confuse the MIDI data channel with the physical port to which a device may be connected, since each physical port will be capable of transmitting on all 16 data channels.

Channel and system messages contrasted

Two primary classes of message exist: those that relate to specific MIDI channels and those that relate to the system as a whole. One should bear in mind that it is possible for an instrument to be receiving in ‘omni on’ mode, in which case it will ignore the channel label and attempt to respond to anything that it receives.

Channel messages start with status bytes in the range &8n to &En (they start at hexadecimal eight because the MSB must be a one for a status byte). System messages all begin with &F, and do not contain a channel number. Instead the least significant nibble of the system status byte is used for further identification of the system message, such that there is room for 16 possible system messages running from &F0 to &FF. System messages are themselves split into three groups: system common, system exclusive and system real time. The common messages may apply to any device on the MIDI bus, depending only on the device’s ability to handle the message. The exclusive messages apply to whichever manufacturers ‘devices are specified later in the message (see below) and the real-time messages are intended for devices which are to be synchronized to the prevailing musical tempo. (Some of the so-called real-time messages do not really seem to deserve this appellation, as discussed below.) The status byte &F1 is used for MIDI TimeCode.

FACT FILE 14.3 MIDI MESSAGE FORMAT

There are two basic types of MIDI message byte: the status byte and the data byte. The first byte in a MIDI message is normally a status byte. Standard MIDI messages can be up to three bytes long, but not all messages require three bytes, and there are some fairly common exceptions to the rule which are described below. The standard has been extended and refined over the years and the following is only an introduction to the basic messages. The prefix ‘&’ will be used to indicate hexadecimal values (see Table 8.1); individual MIDI message bytes will be delineated using square brackets, e.g. [&45], and channel numbers will be denoted using ‘n’ to indicate that the value may be anything from &0 to &F (channels 1 to 16). The table shows the format and content of MIDI messages under each of the statuses.

Status bytes always begin with a binary one to distinguish them from data bytes, which always begin with a zero. Because the most significant bit (MSB) of each byte is reserved to denote the type (status or data) there are only seven active bits per byte which allows 2⁷ (that is 128) possible values. As shown in the figure below, the first half of the status byte denotes the message type and the second half denotes the channel number.

Because four bits of the status byte are set aside to indicate the channel number, this allows for 2⁴ (or 16) possible channels. There are only three bits to denote the message type, because the first bit must always be a one. This theoretically allows for eight message types, but there are some special cases in the form of system messages (see below).

The MMA has defined Approved Protocols (APs) and Recommended Practices (RPs). An AP is a part of the standard MIDI specification and is used when the standard is further defined or when a previously undefined command is defined, whereas an RP is used to describe an optional new MIDI application that is not a mandatory or binding part of the standard. Not all MIDI devices will have all the following commands implemented, since it is not mandatory for a device conforming to the MIDI standard to implement every possibility.

MIDI channel numbers are usually referred to as ‘channels one to 16’, but the binary numbers representing these run from zero to 15 (&0 to &F), as 15 is the largest decimal number which can be represented with four bits. Thus the note on message for channel 5 is actually &94 (nine for note on, and four for channel 5).

Note on and note off messages

Much of the musical information sent over a typical MIDI interface will consist of these two message types. As indicated by the titles, the note on message turns on a musical note, and the note off message turns it off. Note on takes the general format:

[&8n] [Note number] [Velocity]

and note off takes the form:

[&9n] [Note number] [Velocity]

A MIDI instrument will generate note on messages at its MIDI OUT corresponding to whatever notes are pressed on the keyboard, on whatever channel the instrument is set to transmit. Also, any note which has been turned on must subsequently be turned off in order for it to stop sounding, thus if one instrument receives a note on message from another and then loses the MIDI connection for any reason, the note will continue sounding ad infinitum. This situation can occur if a MIDI cable is pulled out during transmission.

Table 14.1 MIDI note numbers related to the musical scale

Musical note	MIDI note number
C-2	0
C-1	12
CO	24
C1	36
C2	48
C3 (middle C)	60 (Yamaha convention)
C4	72
C5	84
C6	96
C7	108
C8	120
G8	127

MIDI note numbers relate directly to the western musical chromatic scale and the format of the message allows for 128 note numbers which cover a range of a little over ten octaves – adequate for the full range of most musical material. This quantization of the pitch scale is geared very much towards keyboard instruments, being less suitable for other instruments and cultures where the definition of pitches is not so black and white. Nonetheless, means have been developed of adapting control to situations where unconventional tunings are required. Note numbers normally relate to the musical scale as shown in Table 14.1, although there is a certain degree of confusion here. Yamaha established the use of C3 for middle C, whereas others have used C4. Some software allows the user to decide which convention will be used for display purposes.

Velocity information

Note messages are associated with a velocity byte that is used to represent the speed at which a key was pressed or released. The former will correspond to the force exerted on the key as it is depressed: in other words, ‘how hard you hit it’ (called ‘note on velocity’). It is used to control parameters such as the volume or timbre of the note at the audio output of an instrument and can be applied internally to scale the effect of one or more of the envelope generators in a synthesizer. This velocity value has 128 possible states, but not all MIDI instruments are able to generate or interpret the velocity byte, in which case they will set it to a value half way between the limits, i.e. 64₁₀. Some instruments may act on velocity information even if they are unable to generate it themselves. It is recommended that a logarithmic rather than linear relationship should be established between the velocity value and the parameter which it controls, since this corresponds more closely to the way in which musicians expect an instrument to respond, although some instruments allow customized mapping of velocity values to parameters. The note on, velocity zero value is reserved for the special purpose of turning a note off, for reasons that will become clear under ‘Running status’, below. If an instrument sees a note number with a velocity of zero, its software should interpret this as a note off message.

Note off velocity (or ‘release velocity’) is not widely used, as it relates to the speed at which a note is released, which is not a parameter that affects the sound of many normal keyboard instruments. Nonetheless it is available for special effects if a manufacturer decides to implement it.

Running status

Running status is an accepted method of reducing the amount of data transmitted. It involves the assumption that once a status byte has been asserted by a controller there is no need to reiterate this status for each subsequent message of that status, so long as the status has not changed in between. Thus a string of notes on messages could be sent with the note on status only sent at the start of the series of note data, for example:

[&9n] [Data] [Velocity] [Data] [Velocity] [Data] [Velocity]

For a long string of notes this could reduce the amount of data sent by nearly one-third. But in most music each note on is almost always followed quickly by a note off for the same note number, so this method would clearly break down as the status would be changing from note on to note off very regularly, thus eliminating most of the advantage gained by running status. This is the reason for the adoption of note on, velocity zero as equivalent to a note off message, because it allows a string of what appears to be note on messages, but which is, in fact, both note on and note off.

Running status is not used at all times for a string of same-status messages and will often only be called upon by an instrument’s software when the rate of data exceeds a certain point. Indeed, an examination of the data from a typical synthesizer indicates that running status is not used during a large amount of ordinary playing.

Polyphonic key pressure (aftertouch)

The key pressure messages are sometimes called ‘aftertouch’ by keyboard manufacturers. Aftertouch is perhaps a slightly misleading term as it does not make clear what aspect of touch is referred to, and many people have confused it with note off velocity. This message refers to the amount of pressure placed on a key at the bottom of its travel, and it is used to instigate effects based on how much the player leans onto the key after depressing it. It is often applied to performance parameters such as vibrato.

The polyphonic key pressure message is not widely used, as it transmits a separate value for every key on the keyboard and thus requires a separate sensor for every key. This can be expensive to implement and is beyond the scope of many keyboards, so most manufacturers have resorted to the use of the channel pressure message (see below). The message takes the general format:

[&An] [Note number] [Pressure]

Implementing polyphonic key pressure messages involves the transmission of a considerable amount of data that might be unnecessary, as the message will be sent for every note in a chord every time the pressure changes. As most people do not maintain a constant pressure on the bottom of a key whilst playing, many redundant messages might be sent per note. A technique known as ‘controller thinning’ may be used by a device to limit the rate at which such messages are transmitted and this may be implemented either before transmission or at a later stage using a computer. Alternatively this data may be filtered out altogether if it is not required.

Control change

As well as note information, a MIDI device may be capable of transmitting control information that corresponds to the various switches, control wheels and pedals associated with it. These come under the control change message group and should be distinguished from program change messages. The controller messages have proliferated enormously since the early days of MIDI and not all devices will implement all of them. The control change message takes the general form:

[&Bn] [Controller number] [Data]

so a number of controllers may be addressed using the same type of status byte by changing the controller number.

Although the original MIDI standard did not lay down any hard and fast rules for the assignment of physical control devices to logical controller numbers, there is now common agreement amongst manufacturers that certain controller numbers will be used for certain purposes. These are assigned by the MMA. There are two distinct kinds of controller: the switch type and the analog type. The analog controller is any continuously variable wheel, lever, slider or pedal that might have any one of a number of positions and these are often known as continuous controllers. There are 128 controller numbers available and these are grouped as shown in Table 14.2. Table 14.3 shows a more detailed breakdown of some of these, as found in the majority of MIDI-controlled musical instruments, although the full list is regularly updated by the MMA. The control change messages have become fairly complex and interested users are referred to the relevant standards.

The first 64 controller numbers (that is up to &3F) relate to only 32 physical controllers (the continuous controllers). This is to allow for greater resolution in the quantization of position than would be feasible with the seven bits that are offered by a single data byte. Seven bits would only allow 128 possible positions of an analog controller to be represented and this might not be adequate in some cases. For this reason the first 32 controllers handle the most significant byte (MSbyte) of the controller data, whilst the second 32 handle the least significant byte (LSbyte). In this way, controller numbers &06 and &38 both represent the data entry slider, for example. Together, the data values can make up a 14 bit number (because the first bit of each data word has to be a zero), which allows the quantization of a control’s position to be one part in 2¹⁴ (16 384₁₀). Clearly, not all controllers will require this resolution, but it is available if needed. Only the LSbyte would be needed for small movements of a control. If a system opts not to use the extra resolution offered by the second byte, it should send only the MSbyte for coarse control. In practice this is all that is transmitted on many devices.

Table 14.2 MIDI controller classifications

Controller number (hex)	Function
&00–1F	14 bit controllers, MSbyte
&20–3F	14 bit controllers, LSbyte
&40–65	7 bit controllers or switches
&66–77	Originally undefined
&78–7F	Channel mode control

Table 14.3 MIDI controller functions

Controller number (hex)	Function
00	Bank select
01	Modulation wheel
02	Breath controller
03	Undefined
04	Foot controller
05	Portamento time
06	Data entry slider
07	Main volume
08	Balance
09	Undefined
0A	Pan
0B	Expression controller
0C	Effect control 1
0D	Effect control 2
0E–0F	Undefined
10–13	General purpose controllers 1–4
14–1F	Undefined
20–3F	LSbyte for 14 bit controllers (same function order as 00–1F)
40	Sustain pedal
41	Portamento on/off
42	Sostenuto pedal
43	Soft pedal
44	Legato footswitch
45	Hold 2
46–4F	Sound controllers
50–53	General purpose controllers 5–8
54	Portamento control
55–5A	Undefined
5B–5F	Effects depth 1–5
60	Data increment
61	Data decrement
62	NRPC LSbyte (non-registered parameter controller)
63	NRPC MSbyte
64	RPC LSbyte (registered parameter controller)
65	RPC MSbyte
66–77	Undefined
78	All sounds off
79	Reset all controllers
7A	Local on/off
7B	All notes off
7C	Omni receive mode off
7D	Omni receive mode on
7E	Mono receive mode
7F	Poly receive mode

On/off switches can be represented easily in binary form (0 for OFF, 1 for ON), and it would be possible to use just a single bit for this purpose, but, in order to conform to the standard format of the message, switch states are normally represented by data values between &00 and &3F for OFF and &40 and &7F for ON. In other words switches are now considered as seven bit continuous controllers. In older systems it may be found that only &00 = OFF and &7F = ON.

The data increment and decrement buttons that are present on many devices are assigned to two specific controller numbers (&60 and &61) and an extension to the standard defines four controllers (&62 to &65) that effectively expand the scope of the control change messages. These are the registered and non-registered parameter controllers (RPCs and NRPCs).

The ‘all notes off’ command (frequently abbreviated to ‘ANO’) was designed to be transmitted to devices as a means of silencing them, but it does not necessarily have this effect in practice. What actually happens varies between instruments, especially if the sustain pedal is held down or notes are still being pressed manually by a player. All notes off is supposed to put all note generators into the release phase of their envelopes, and clearly the result of this will depend on what a sound is programmed to do at this point. The exception should be notes which are being played whilst the sustain pedal is held down, which should only be released when that pedal is released. ‘All sounds off’ was designed to overcome the problems with ‘all notes off’, by turning sounds off as quickly as possible. ‘Reset all controllers’ is designed to reset all controllers to their default state, in order to return a device to its ‘standard’ setting.

Channel modes

Although grouped with the controllers, under the same status, the channel mode messages differ somewhat in that they set the mode of operation of the instrument receiving on that particular channel.

‘Local on/off’ is used to make or break the link between an instrument’s keyboard and its own sound generators. Effectively there is a switch between the output of the keyboard and the control input to the sound generators which allows the instrument to play its own sound generators in normal operation when the switch is closed (see Figure 14.8). If the switch is opened, the link is broken and the output from the keyboard feeds the MIDI OUT whilst the sound generators are controlled from the MIDI IN. In this mode the instrument acts as two separate devices: a keyboard without any sound, and a sound generator without a keyboard. This configuration can be useful when the instrument in use is the master keyboard for a large sequencer system, where it may not always be desired that everything played on the master keyboard results in sound from the instrument itself.

FIGURE 14.8 The local off switch disconnects a keyboard from its associated sound generators in order that the two parts may be treated independently in a MIDI system.

‘Omni off’ ensures that the instrument will only act on data tagged with its own channel number(s), as set by the instrument’s controls. ‘Omni on’ sets the instrument to receive on all of the MIDI channels. In other words, the instrument will ignore the channel number in the status byte and will attempt to act on any data that may arrive, whatever its channel. Devices should power up in this mode according to the original specification, but more recent devices will tend to power up in the mode in which they were left. Mono mode sets the instrument such that it will only reproduce one note at a time, as opposed to ‘Poly’ (phonic) in which a number of notes may be sounded together.

In older devices the mono mode came into its own as a means of operating an instrument in a ‘multi-timbral’ fashion, whereby MIDI information on each channel controlled a separate monophonic musical voice. This used to be one of the only ways of getting a device to generate more than one type of voice at a time. The data byte that accompanies the mono mode message specifies how many voices are to be assigned to adjacent MIDI channels, starting with the basic receive channel. For example, if the data byte is set to 4, then four voices will be assigned to adjacent MIDI channels, starting from the basic channel which is the one on which the instrument has been set to receive in normal operation. Exceptionally, if the data byte is set to zero, all 16 voices (if they exist) are assigned each to one of the 16 MIDI channels. In this way, a single multi-timbral instrument can act as 16 monophonic instruments, although on cheaper systems all of these voices may be combined to one audio output.

Mono mode tends to be used mostly on MIDI guitar synthesizers because each string can then have its own channel and each can control its own set of pitch bend and other parameters. The mode also has the advantage that it is possible to play in a truly legato fashion – that is with a smooth takeover between the notes of a melody – because the arrival of a second note message acts simply to change the pitch if the first one is still being held down, rather than retriggering the start of a note envelope.

The legato switch controller allows a similar type of playing in polyphonic modes by allowing new note messages only to change the pitch.

In poly mode the instrument will sound as many notes as it is able to at the same time. Instruments differ as to the action taken when the number of simultaneous notes is exceeded: some will release the first note played in favor of the new note, whereas others will refuse to play the new note. Some may be able to route excess note messages to their MIDI OUT ports so that they can be played by a chained device. The more intelligent of them may look to see if the same note already exists in the notes currently sounding and only accept a new note if is not already sounding. Even more intelligently, some devices may release the quietest note (that with the lowest velocity value), or the note furthest through its velocity envelope, to make way for a later arrival. It is also common to run a device in poly mode on more than one receive channel, provided that the software can handle the reception of multiple polyphonic channels. A multi-timbral sound generator may well have this facility, commonly referred to as ‘multi’ mode, making it act as if it were a number of separate instruments each receiving on a separate channel. In multi mode a device may be able to dynamically assign its polyphony between the channels and voices in order that the user does not need to assign a fixed polyphony to each voice.

Program change

The program change message is used most commonly to change the ‘patch’ of an instrument or other device. A patch is a stored configuration of the device, describing the setup of the tone generators in a synthesizer and the way in which they are interconnected. Program change is channel specific and there is only a single data byte associated with it, specifying to which of 128 possible stored programs the receiving device should switch. On non-musical devices such as effects units, the program change message is often used to switch between different effects and the different effects programs may be mapped to specific program change numbers. The message takes the general form:

&[Cn] [Program number]

If a program change message is sent to a musical device it will usually result in a change of voice, as long as this facility is enabled. Exactly which voice corresponds to which program change number depends on the manufacturer. It is quite common for some manufacturers to implement this function in such a way that a data value of zero gives voice number one. This results in a permanent offset between the program change number and the voice number, which should be taken into account in any software. On some instruments, voices may be split into a number of ‘banks’ of 8, 16 or 32, and higher banks can be selected over MIDI by setting the program change number to a value which is 8, 16 or 32 higher than the lowest bank number. For example, bank 1, voice 2, might be selected by program change &01, whereas bank 2, voice 2, would probably be selected in this case by program change &11, where there were 16 voices per bank.

There are also a number of other approaches used in commercial sound modules. Where more than 128 voices need to be addressed remotely, the more recent ‘bank select’ command may be implemented.

Channel aftertouch

Most instruments use a single sensor, often in the form of a pressure-sensitive conductive plastic bar running the length of the keyboard, to detect the pressure applied to keys at the bottom of their travel. In the case of channel aftertouch, one message is sent for the entire instrument and this will correspond to an approximate total of the pressure over the range of the keyboard, the strongest influence being from the key pressed the hardest. (Some manufacturers have split the pressure detector into upper and lower keyboard regions, and some use ‘intelligent’ zoning.) The message takes the general form:

&[Dn] [Pressure value]

There is only one data byte, so there are 128 possible values and, as with the polyphonic version, many messages may be sent as the pressure is varied at the bottom of a key’s travel. Controller ‘thinning’ may be used to reduce the quantity of these messages, as described above.

Pitch bend wheel

The pitch wheel message has a status byte of its own, and carries information about the movement of the sprung-return control wheel on many keyboards which modifies the pitch of any note(s) played. It uses two data bytes in order to give 14 bits of resolution, in much the same way as the continuous controllers, except that the pitch wheel message carries both bytes together. Fourteen data bits are required so that the pitch appears to change smoothly, rather than in steps (as it might with only seven bits). The pitch bend message is channel specific so ought to be sent separately for each individual channel. This becomes important when using a single multi-timbral device in mono mode (see above), as one must ensure that a pitch bend message only affects the notes on the intended channel. The message takes the general form:

&[En] [LSbyte] [MSbyte]

The value of the pitch bend controller should be halfway between the lower and upper range limits when it is at rest in its sprung central position, thus allowing bending both down and up. This corresponds to a hex value of &2000, transmitted as &[En] [00] [40]. The range of pitch controlled by the bend message is set on the receiving device itself, or using the RPC designated for this purpose (see ‘Control change’, above).

System exclusive

A system exclusive message is one that is unique to a particular manufacturer and often a particular instrument. The only thing that is defined about such messages is how they are to start and finish, with the exception of the use of system exclusive messages for universal information, as discussed elsewhere. System exclusive messages generated by a device will naturally be produced at the MIDI OUT, not at the THRU, so a deliberate connection must be made between the transmitting device and the receiving device before data transfer may take place.

Occasionally it is necessary to make a return link from the OUT of the receiver to the IN of the transmitter so that two-way communication is possible and so that the receiver can control the flow of data to some extent by telling the transmitter when it is ready to receive and when it has received correctly (a form of handshaking).

The message takes the general form:

&[F0] [ident.] [data] [data]…[F7]

where [ident.] identifies the relevant manufacturer ID, a number defining which manufacturer’s message is to follow. Originally, manufacturer IDs were a single byte but the number of IDs has been extended by setting aside the [00] value of the ID to indicate that two further bytes of ID follow. Manufacturer IDs are therefore either one or three bytes long. A full list of manufacturer IDs is available from the MMA.

Data of virtually any sort can follow the ID. It can be used for a variety of miscellaneous purposes that have not been defined in the MIDI standard and the message can have virtually any length that the manufacturer requires. It is often split into packets of a manageable size in order not to cause receiver memory buffers to overflow. Exceptions are data bytes that look like other MIDI status bytes (except real-time messages), as they will naturally be interpreted as such by any receiver, which might terminate reception of the system exclusive message. The message should be terminated with &F7, although this is not always observed, in which case the receiving device should ‘time-out’ after a given period, or terminate the system exclusive message on receipt of the next status byte. It is recommended that some form of error checking (typically a checksum) is employed for long system exclusive data dumps, and many systems employ means of detecting whether the data has been received accurately, asking for retries of sections of the message in the event of failure, via a return link to the transmitter.

Examples of applications for such messages can be seen in the form of sample data dumps (from a sampler to a computer and back for editing purposes), although this is painfully slow, and voice data dumps (from a synthesizer to a computer for storage and editing of user-programmed voices). There are now an enormous number of uses of system exclusive messages, both in the universal categories and in the manufacturer categories.

Universal system exclusive messages

The three highest numbered IDs within the system exclusive message have been set aside to denote special modes. These are the ‘universal non-commercial’ messages (ID: &7D), the ‘universal non-real-time’ messages (ID: &7E) and the ‘universal real-time’ messages (ID: &7F). Universal sysex messages are often used for controlling device parameters that were not originally specified in the MIDI standard and that now need addressing in most devices. Examples are things like ‘chorus modulation depth’, ‘reverb type’ and ‘master fine tuning’.

Universal non-commercial messages are set aside for educational and research purposes and should not be used in commercial products. Universal non-real-time messages are used for universal system exclusive events which are not time critical and universal real-time messages deal with time critical events (thus being given a higher priority). The two latter types of message normally take the general form of:

&[F0] [ID] [dev. ID] [sub-ID 1] [sub-ID 2] [data]……[F7]

Device ID used to be referred to as ‘channel number’, but this did not really make sense since a whole byte allows for the addressing of 128 channels and this does not correspond to the normal 16 channels of MIDI. The term ‘device ID’ is now used widely by software as a means of defining one of a number of physical devices in a large MIDI system, rather than defining a MIDI channel number. It should be noted, though, that it is allowable for a device to have more than one ID if this seems appropriate. Modern MIDI devices will normally allow their device ID to be set either over MIDI or from the front panel. The use of &7F in this position signifies that the message applies to all devices as opposed to just one.

The sub-IDs are used to identify first the category or application of the message (sub-ID #1) and secondly the type of message within that category (sub-ID #2). For some reason, the original MIDI sample dump messages do not use the sub-ID #2, although some recent additions to the sample dump do.

Tune request

Older analog synthesizers tended to drift somewhat in pitch over the time that they were turned on. The tune request is a request for these synthesizers to retune themselves to a fixed reference. (It is advisable not to transmit pitch bend or note on messages to instruments during a tune-up because of the unpredictable behavior of some products under these conditions.)

Active sensing

Active sensing messages are single status bytes sent roughly three times per second by a controlling device when there is no other activity on the bus. It acts as a means of reassuring the receiving devices that the controller has not disappeared. Not all devices transmit active sensing information, and a receiver’s software should be able to detect the presence or lack of it. If a receiver has come to expect active sensing bytes then it will generally act by turning off all notes if these bytes disappear for any reason. This can be a useful function when a MIDI cable has been pulled out during a transmission, as it ensures that notes will not be left sounding for very long. If a receiver has not seen active sensing bytes since last turned on, it should assume that they are not being used.

Reset

This message resets all devices on the bus to their power-on state. The process may take some time and some devices mute their audio outputs, which can result in clicks, therefore the message should be used with care.

MIDI CONTROL OF SOUND GENERATORS

MIDI note assignment in synthesizers and samplers

Many of the replay and signal processing aspects of synthesis and sampling now overlap so that it is more difficult to distinguish between the two. In basic terms a sampler is a device that stores short clips of sound data in RAM, enabling them to be replayed subsequently at different pitches, possibly looped and processed. A synthesizer is a device that enables signals to be artificially generated and modified to create novel sounds. Wavetable synthesis is based on a similar principle to sampling, though, and stored samples can form the basis for synthesis. A sound generator can often generate a number of different sounds at the same time. It is possible that these sounds could be entirely unrelated (perhaps a single drum, an animal noise and a piano note), or that they might have some relationship to each other (perhaps a number of drums in a kit, or a selection of notes from a grand piano). The method by which sounds or samples are assigned to MIDI notes and channels is defined by the replay program.

The most common approach when assigning note numbers to samples is to program the sampler with the range of MIDI note numbers over which a certain sample should be sounded. Akai, one of the most popular sampler manufacturers, called these ‘keygroups’. It may be that this ‘range’ is only one note, in which case the sample in question would be triggered only on receipt of that note number, but in the case of a range of notes the sample would be played on receipt of any note in the range. In the latter case transposition would be required, depending on the relationship between the note number received and the original note number given to the sample (see above). A couple of examples highlight the difference in approach, as shown in Figure 14.9. In the first example, illustrating a possible approach to note assignment for a collection of drum kit sounds, most samples are assigned to only one note number, although it is possible for tuned drum sounds such as tom-toms to be assigned over a range in order to give the impression of ‘tuned toms’. Each MIDI note message received would replay the particular percussion sound assigned to that note number in this example.

In the second example, illustrating a suggested approach to note assignment for an organ, notes were originally sampled every musical fifth across the organ’s note range. The replay program has been designed so that each of these samples is assigned to a note range of a fifth, centered on the original pitch of each sample, resulting in a maximum transposition of a third up or down. Ideally, of course, every note would have been sampled and assigned to an individual note number on replay, but this requires very large amounts of memory and painstaking sample acquisition on the first place.

In further pursuit of sonic accuracy, some devices provide the facility for introducing a crossfade between note ranges. This is used where an abrupt change in the sound at the boundary between two note ranges might be undesirable, allowing the takeover from one sample to another to be more gradual. For example, in the organ scenario introduced above, the timbre could change noticeably when playing musical passages that crossed between two note ranges because replay would switch from the upper limit of transposition of one sample to the lower limit of the next (or vice versa). In this case the ranges for the different samples are made to overlap (as illustrated in Figure 14.10). In the overlap range the system mixes a proportion of the two samples together to form the output. The exact proportion depends on the range of overlap and the note’s position within this range. Very accurate tuning of the original samples is needed in order to avoid beats when using positional crossfades. Clearly this approach would be of less value when each note was assigned to a completely different sound, as in the drum kit example.

FIGURE 14.9 (a) Percussion samples are often assigned to one note per sample, except for tuned percussion which sometimes covers a range of notes. (b) Organ samples could be transposed over a range of notes, centered on the original pitch of the sample.

Crossfades based on note velocity allow two or more samples to be assigned to one note or range of notes. This requires at least a ‘loud sample’ and a ‘soft sample’ to be stored for each original sound and some systems may accommodate four or more to be assigned over the velocity range. The terminology may vary, but the principle is that a velocity value is set at which the replay switches from one stored sample to another, as many instruments sound quite different when they are loud to when they are soft (it is more than just the volume that changes: it is the timbre also). If a simple switching point is set, then the change from one sample to the other will be abrupt as the velocity crosses either side of the relevant value. This can be illustrated by storing two completely different sounds as the loud and soft samples, in which case the output changes from one to the other at the switching point. A more subtle effect is achieved by using velocity crossfading, in which the proportion of loud and soft samples varies depending on the received note velocity value. At low velocity values the proportion of the soft sample in the output would be greatest and at high values the output content would be almost entirely made up of the loud sample (see Figure 14.11).

FIGURE 14.10 Overlapped sample ranges can be crossfaded in order that a gradual shift in timbre takes place over the region of takeover between one range and the next.

FIGURE 14.11 Illustration of velocity switch and velocity crossfade between two stored samples (‘soft’ and ‘loud’) over the range of MIDI note velocity values.

Polyphony, voice and note assignment

Modern sound modules (synthesizers and samplers) tend to be multi-note polyphonic. When the polyphony of a device is exceeded the device should follow a predefined set of rules to determine what to do with the extra notes. Typically a sound module will either release the ‘oldest’ notes first, or possibly release the quietest. Alternatively, new notes that exceed the polyphony will simply not be sounded until others are released. Rules for this are defined in some of the recent General MIDI specifications (see below), and composers may now even be able to exercise some control over what happens in devices with limited polyphony.

It is important to distinguish between the degree of polyphony offered by a device and the number of simultaneous voices it can generate. Sometimes these may be traded off against each other in multi-timbral devices, by allocating a certain number of notes to each voice, with the total adding up to the total polyphony. Either 16 notes could be allocated to one voice or four notes to each of four voices, for example. Dynamic allocation is often used to distribute the polyphony around the voices depending on demand and this is a particular feature of General MIDI sound modules.

A multi-timbral sound generator is one that is capable of generating more than one voice at a time, independent of polyphony considerations. A voice is a particular sound type, such as ‘grand piano’ or ‘accordion’. This capability is now the norm for modern sound modules. Older synthesizers used to be able to generate only one or two voices at a time, possibly allowing a keyboard split, and could sometimes make use of MIDI channel mode 4 (monophonic, omni off) to allow multiple monophonic voices to be generated under MIDI control. They tended only to receive polyphonically on one MIDI channel at a time. More recent systems are capable of receiving on all 16 MIDI channels simultaneously, with each channel controlling an entirely independent polyphonic voice.

MIDI functions of sound generators

The MIDI implementation for a particular sound generator should be described in the manual that accompanies it. A MIDI implementation chart indicates which message types are received and transmitted, together with any comments relating to limitations or unusual features. Functions such as note off velocity and polyphonic aftertouch, for example, are quite rare. It is quite common for a device to be able to accept certain data and act upon it, even if it cannot generate such data from its own controllers. The note range available under MIDI control compared with that available from a device’s keyboard is a good example of this, since many devices will respond to note data over a full ten octave range yet still have only a limited (or no) keyboard. This approach can be used by a manufacturer who wishes to make a cheaper synthesizer that omits the expensive physical sensors for such things as velocity and aftertouch, whilst retaining these functions in software for use under MIDI control. Devices conforming to the General MIDI specification described below must conform to certain basic guidelines concerning their MIDI implementation and the structure of their sound generators.

MIDI data buffers and latency

All MIDI-controlled equipment uses some form of data buffering for received MIDI messages. Such buffering acts as a temporary store for messages that have arrived but have not yet been processed and allows for a certain prioritization in the handling of received messages. Cheaper devices tend to have relatively small MIDI input buffers and these can overflow easily unless care is taken in the filtering and distribution of MIDI data around a large system (usually accomplished by a MIDI router or multiport inferface). When a buffer overflows it will normally result in an error message displayed on the front panel of the device, indicating that some MIDI data is likely to have been lost. More advanced equipment can store more MIDI data in its input buffer, although this is not necessarily desirable because many messages that are transmitted over MIDI are intended for ‘real-time’ execution and one would not wish them to be delayed in a temporary buffer. Such buffer delay is one potential cause of latency in MIDI systems. A more useful solution would be to speed up the rate at which incoming messages are processed.

Handling of velocity and aftertouch data

Sound generators able to respond to note on velocity will use the value of this byte to control assigned functions within the sound generators. It is common for the user to be able to program the device such that the velocity value affects certain parameters to a greater or lesser extent. For example, it might be decided that the ‘brightness’ of the sound should increase with greater key velocity, in which case it would be necessary to program the device so that the envelope generator that affected the brightness was subject to control by the velocity value. This would usually mean that the maximum effect of the envelope generator would be limited by the velocity value, such that it could only reach its full programmed effect (that which it would give if not subject to velocity control) if the velocity was also maximum. The exact law of this relationship is up to the manufacturer and may be used to simulate different types of ‘keyboard touch’. A device may offer a number of laws or curves relating changes in velocity to changes in the control value, or the received velocity value may be used to scale the preset parameter rather than replace it.

Another common application of velocity value is to control the amplitude envelope of a particular sound, such that the output volume depends on how hard the key is hit. In many synthesizer systems that use multiple interacting digital oscillators, these velocity-sensitive effects can all be achieved by applying velocity control to the envelope generator of one or more of the oscillators, as indicated earlier in this chapter.

Note off velocity is not implemented in many keyboards, and most musicians are not used to thinking about what they do as they release a key, but this parameter can be used to control such factors as the release time of the note or the duration of a reverberation effect. Aftertouch (either polyphonic or channel, as described on page 429) is often used in synthesizers to control the application of low-frequency modulation (tremolo or vibrato) to a note. Sometimes aftertouch may be applied to other parameters, but this is less common.

Handling of controller messages

The controller messages that begin with a status of &Bn turn up in various forms in sound generator implementations. It should be noted that although there are standard definitions for many of these controller numbers it is often possible to remap them either within sequencer software or within sound modules themselves. Fourteen bit continuous controllers are rarely encountered for any parameter and often only the MSbyte of the controller value (which uses the first 32 controller numbers) is sent and used. For most parameters the 128 increments that result are adequate.

Controllers &07 (Volume) and &0 A (Pan) are particularly useful with sound modules as a means of controlling the internal mixing of voices. These controllers work on a per channel basis, and are independent of any velocity control which may be related to note volume. There are two realtime system exclusive controllers that handle similar functions to these, but for the device as a whole rather than for individual voices or channels. The ‘master volume’ and ‘master balance’ controls are accessed using:

&[F0] [7F] [dev. ID] [04] [01 or 02] [data] [data] [F7]

where the sub-ID #1 of &04 represents a ‘device control ‘message and sub-ID #2 s of &01 or &02 select volume or balance respectively. The [data] values allow 14 bit resolution for the parameters concerned, transmitted LSB first. Balance is different to pan because pan sets the stereo positioning (the split in level between left and right) of a mono source, whereas balance sets the relative levels of the left and right channels of a stereo source (see Figure 14.12). Since a pan or balance control is used to shift the stereo image either left or right from a center detent position, the MIDI data values representing the setting are ranged either side of a mid-range value that corresponds to the center detent. The channel pan controller is thus normally centered at a data value of 63 (and sometimes over a range of values just below this if the pan has only a limited number of steps), assuming that only a single 7 bit controller value is sent. There may be fewer steps in these controls than there are values of the MIDI controller, depending on the device in question, resulting in a range of controller values that will give rise to the same setting.

Some manufacturers have developed alternative means of expressive control for synthesizers such as the ‘breath controller’, which is a device which responds to the blowing effort applied by the mouth of the player. It was intended to allow wind players to have more control over expression in performance. Plugged into the synthesizer, it can be applied to various envelope generator or modulator parameters to affect the sound. The breath controller also has its own MIDI controller number. There is also a portamento controller (&54) that defines a note number from which the next note should slide. It is normally transmitted between two note on messages to create an automatic legato portamento effect between two notes.

FIGURE 14.12 (a) A pan control takes a mono input and splits it two ways (left and right), the stereo position depending on the level difference between the two channels. The attenuation law of pan controls is designed to result in a smooth movement of the source across the stereo ‘picture’ between left and right, with no apparent rise or fall in overall level when the control is altered. A typical pan control gain law is shown below. (b) A balance control simply adjusts the relative level between the two channels of a stereo signal so as to shift the entire stereo image either left or right.

The ‘effects’ and ‘sound’ controllers have been set aside as a form of general purpose control over aspects of the built-in effects and sound quality of a device. How they are applied will depend considerably on the architecture of the sound module and the method of synthesis used, but they give some means by which a manufacturer can provide a more abstracted form of control over the sound without the user needing to know precisely which voice parameters to alter. In this way, a user who is not prepared to get into the increasingly complicated world of voice programming can modify sounds to some extent.

The effects controllers occupy five controller numbers from &5B to &5F and are defined as Effects Depths 1–5. The default names for the effects to be controlled by these messages are respectively ‘External Effects Depth’, ‘Tremolo Depth’, ‘Chorus Depth’, ‘Celeste (Detune) Depth’ and ‘Phaser Depth’, although these definitions are open to interpretation and change by manufacturers. There are also ten sound controllers that occupy controller numbers from &46 to &4F. Again these are user or manufacturer definable, but five defaults were originally specified (listed in Table 14.4). They are principally intended as real-time controllers to be used during performance, rather than as a means of editing internal voice patches (the RPCs and NRPCs can be used for this as described in Fact File 14.4).

Table 14.4 Sound controller functions (byte 2 of status &Bn)

MIDI controller number	Function (default)
&46	Sound variation
&47	Timbre/harmonic content
&48	Release time
&49	Attack time
&4A	Brightness
&4B–4F	No default

The sound variation controller is interesting because it is designed to allow the selection of one of a number of variants on a basic sound, depending on the data value that follows the controller number. For example, a piano sound might have variants of ‘honky tonk’, ‘soft pedal’, ‘lid open’ and ‘lid closed’. The data value in the message is not intended to act as a continuous controller for certain voice parameters, rather the different data values possible in the message are intended to be used to select certain preprogrammed variations on the voice patch. If there are fewer than the 128 possible variants on the voice then the variants should be spread evenly over the number range so that there is an equal number range between them.

The timbre and brightness controllers can be used to alter the spectral content of the sound. The timbre controller is intended to be used specifically for altering the harmonic content of a sound, whilst the brightness controller is designed to control its high-frequency content. The envelope controllers can be used to modify the attack and release times of certain envelope generators within a synthesizer. Data values less than &40 attached to these messages should result in progressively shorter times, whilst values greater than &40 should result in progressively longer times.

Voice selection

The program change message was adequate for a number of years as a means of selecting one of a number of stored voice patches on a sound generator. Program change on its own allows for up to 128 different voices to be selected and a synthesizer or sound module may allow a program change map to be set up in order that the user may decide which voice is selected on receipt of a particular message. This can be particularly useful when the module has more than 128 voices available, but no other means of selecting voice banks. A number of different program change maps could be stored, perhaps to be selected under system exclusive control.

FACT FILE 14.4 REGISTERED AND NON-REGISTERED PARAMETER NUMBERS

The MIDI standard was extended a few years ago to allow for the control of individual internal parameters of sound generators by using a specific control change message. This meant, for example, that any aspect of a voice, such as the velocity sensitivity of an envelope generator, could be assigned a parameter number that could then be accessed over MIDI and its setting changed, making external editing of voices much easier. Parameter controllers are a subset of the control change message group, and they are divided into the registered and non-registered numbers (RPNs and NRPNs). RPNs are intended to apply universally and should be registered with the MMA, whilst NRPNs may be manufacturer specific. Only five parameter numbers were originally registered as RPNs, as shown in the table, but more may be added at any time and readers are advised to check the most recent revisions of the MIDI standard.

Some examples of RPC definitions

*RPC number (hex)*	*Parameter*
00 00	Pitch bend sensitivity
00 01	Fine tuning
00 02	Coarse tuning
00 03	Tuning program select
00 04	Tuning bank select
7F 7F	Cancels RPN or NRPN (usually follows Message 3)

Parameter controllers operate by specifying the address of the parameter to be modified, followed by a control change message to increment or decrement the setting concerned. It is also possible to use the data entry slider controller to alter the setting of the parameter. The address of the parameter is set in two stages, with an MSbyte and then an LSbyte message, so as to allow for 16 384 possible parameter addresses. The controller numbers &62 and &63 are used to set the LS- and MSbytes respectively of an NRPN, whilst &64 and &65 are used to address RPNs. The sequence of messages required to modify a parameter is as follows:

Message 1

&[Bn] [62 or 64] [LSB]

Message 2

&[Bn] [63 or 65] [MSB]

Message 3

&[Bn] [60 or 61] [7F] or &[Bn] [06] [DATA] [38] [DATA]

Message 3 represents either data increment (&60) or decrement (&61), or a 14 bit data entry slider control change with MSbyte (&06) and LSbyte (&38) parts (assuming running status). If the control has not moved very far, it is possible that only the MSbyte message need be sent.

Modern sound modules tend to have very large patch memories – often too large to be adequately addressed by 128 program change messages. Although some older synthesizers used various odd ways of providing access to further banks of voices, most modern modules have implemented the standard ‘bank select’ approach. In basic terms, ‘bank select’ is a means of extending the number of voices that may be addressed by preceding a standard program change message with a message to define the bank from which that program is to be recalled. It uses a 14 bit control change message, with controller numbers &00 and &20, to form a 14 bit bank address, allowing 16384 banks to be addressed. The bank number is followed directly by a program change message, thus creating the following general message:

&[Bn] [00] [MSbyte (of bank)]

&[Bn] [20] [LSbyte]

&[Cn] [Program number]

GENERAL MIDI

One of the problems with MIDI sound generators is that although voice patches can be selected using MIDI program change commands, there is no guarantee that a particular program change number will recall a particular voice on more than one instrument. In other words, program change 3 may correspond to ‘alto sax’ on one instrument and ‘grand piano’ on another. This makes it difficult to exchange songs between systems with any hope of the replay sounding the same as intended by the composer. General MIDI is an approach to the standardization of a sound generator’s behavior, so that MIDI files (see Fact File 14.5) can be exchanged more easily between systems and device behavior can be predicted by controllers. It comes in three flavors: GM 1, GM Lite and GM 2.

General MIDI Level 1 specifies a standard voice map and a minimum degree of polyphony, requiring that a sound generator should be able to receive MIDI data on all 16 channels simultaneously and polyphonically, with a different voice on each channel. There is also a requirement that the sound generator should support percussion sounds in the form of drum kits, so that a General MIDI sound module is capable of acting as a complete ‘band in a box’.

Dynamic voice allocation is the norm in GM sound modules, with a requirement either for at least 24 dynamically allocated voices in total, or 16 for melody and eight for percussion. Voices should all be velocity sensitive and should respond at least to the controller messages 1, 7, 10, 11, 64, 121 and 123 (decimal), RPNs 0, 1 and 2 (see above), pitch bend and channel aftertouch. In order to ensure compatibility between sequences that are replayed on GM modules, percussion sounds are always allocated to MIDI channel 10. Program change numbers are mapped to specific voice names, with ranges of numbers allocated to certain types of sounds, as shown in Table 14.5. Precise voice names may be found in the GM documentation. Channel 10, the percussion channel, has a defined set of note numbers on which particular sounds are to occur, so that the composer may know, for example, that key 39 will always be a ‘hand clap’.

FACT FILE 14.5 STANDARD MIDI FILES (SMF)

Sequencers and notation packages typically store data on disk in their own unique file formats. The standard MIDI file was developed in an attempt to make interchange of information between packages more straightforward and it is now used widely in the industry in addition to manufacturers’ own file formats. It is rare now not to find a sequencer or notation package capable of importing and exporting standard MIDI files. MIDI files are most useful for the interchange of performance and control information. They are not so useful for music notation where it is necessary to communicate greater detail about the way music appears on the stave and other notational concepts. For the latter purpose a number of different file formats have been developed, including Music XML which is among the most widely used of the universal interchange formats today. Further information about Music XML resources and other notation formats may be found in the ‘Recommended further reading’ at the end of this chapter.

Three types of standard MIDI file exist to encourage the interchange of sequencer data between software packages. The MIDI file contains data representing events on individual sequencer tracks, as well as labels such as track names, instrument names and time signatures. File type 0 is the simplest and is used for single-track data, whilst file type 1 supports multiple tracks which are ‘vertically’ synchronous with each other (such as the parts of a song).

File type 2 contains multiple tracks that have no direct timing relationship and may therefore be asynchronous. Type 2 could be used for transferring song files made up of a number of discrete sequences, each with a multiple track structure. The basic file format consists of a number of 8 bit words formed into chunk-like parts, very similar to the RIFF and AIFF audio file formats described in Chapter 9. SMFs are not exactly RIFF files though, because they do not contain the highest level FORM chunk. (To encapsulate SMFs in a RIFF structure, use the RMID format.)

The header chunk, which always heads a MIDI file, contains global information relating to the whole file, whilst subsequent track chunks contain event data and labels relating to individual sequencer tracks. Track data should be distinguished from MIDI channel data, since a sequencer track may address more than one MIDI channel. Each chunk is preceded by a preamble of its own, which specifies the type of chunk (header or track) and the length of the chunk in terms of the number of data bytes that are contained in the chunk. There then follow the designated number of data bytes (see the figure below). The chunk preamble contains 4 bytes to identify the chunk type using ASCII representation and 4 bytes to indicate the number of data bytes in the chunk (the length). The number of bytes indicated in the length does not include the preamble (which is always 8 bytes).

General MIDI sound modules may operate in modes other than GM, where voice allocations may be different, and there are two universal non-real-time SysEx messages used to turn GM on or off. These are:

&[F0] [7E] [dev. ID] [09] [01] [F7]

Table 14.5 General MIDI program number ranges (except channel 10)

Program change (decimal)	Sound type
0–7	Piano
8–15	Chromatic percussion
16–23	Organ
24–31	Guitar
32–39	Bass
40–47	Strings
48–55	Ensemble
56–63	Brass
64–71	Reed
72–79	Pipe
80–87	Synth lead
88–95	Synth pad
96–103	Synth effects
104–111	Ethnic
112–119	Percussive
120–127	Sound effects

to turn GM on, and:

&[F0] [7E] [dev. ID] [09] [02] [F7]

to turn it off.

There is some disagreement over the definition of ‘voice’, as in ‘24 dynamically allocated voices’ – the requirement that dictates the degree of polyphony supplied by a GM module. The spirit of the GM specification suggests that 24 notes should be capable of sounding simultaneously, but some modules combine sound generators to create composite voices, thereby reducing the degree of note polyphony.

General MIDI Lite (GML) is a cut-down GM 1 specification designed mainly for use on mobile devices with limited processing power. It can be used for things like ring tones on mobile phones and for basic music replay from PDAs. It specifies a fixed polyphony of 16 simultaneous notes, with 15 melodic instruments and one percussion kit on channel 10. The voice map is the same as GM Level 1. It also supports basic control change messages and the pitch-bend sensitivity RPN. As a rule, GM Level 1 songs will usually replay on GM Lite devices with acceptable quality, although some information may not be reproduced. An alternative to GM Lite is SPMIDI (see next section) which allows greater flexibility.

GM Level 2 is backwards compatible with Level 1 (GM 1 songs will replay correctly on GM 2 devices) but allows the selection of voice banks and extends polyphony to 32 voices. Percussion kits can run on channel 11 as well as the original channel 10. It adds MIDI tuning, RPN controllers and a range of universal system exclusive messages to the MIDI specification, enabling a wider range of control and greater versatility.

SCALABLE POLYPHONIC MIDI (SPMIDI)

SPMIDI, rather like GM Lite, is designed principally for mobile devices that have issues with battery life and processing power. It has been adopted by the 3GPP wireless standards body for structured audio control of synthetic sounds in ring tones and multimedia messaging. It was developed primarily by Nokia and Beatnik. The SPMIDI basic specification for a device is based on GM Level 2, but a number of selectable profiles are possible, with different levels of sophistication.

The idea is that rather than fixing the polyphony at 16 voices the polyphony should be scalable according to the device profile (a description of the current capabilities of the device). SPMIDI also allows the content creator to decide what should happen when polyphony is limited – for example, what should happen when only four voices are available instead of 16. Conventional ‘note stealing’ approaches work by stealing notes from sounding voices to supply newly arrived notes, and the outcome of this can be somewhat arbitrary. In SPMIDI this is made more controllable. A process known as channel masking is used, whereby certain channels have a higher priority than others, enabling the content creator to put high priority material on particular channels. The channel priority order and maximum instantaneous polyphony are signaled to the device in a setup message at the initialization stage.

RMID AND XMF FILES

RMID is a version of the RIFF file structure that can be used to combine a standard MIDI file and a downloadable sound file (see Fact File 14.6) within a single structure. In this way all of the data required to replay a song using synthetic sounds can be contained within one file. RMID seems to have been superseded by another file format known as XMF (eXtensible Music Format) that is designed to contain all of the assets required to replay a music file. It is based on Beatnik’s RMF (Rich Music Format) which was designed to incorporate standard MIDI files and audio files such as MP3 and WAVE so that a degree of interactivity could be added to audio replay. RMF can also address a Special Bank of MIDI sounds (an extension of GM) in the Beatnik Audio Engine. XMF is now the MMA’s recommended way of combining such elements. It is more extensible than RMID and can contain WAVE files and other media elements for streamed or interactive presentations. XMF introduces concepts such as looping and branching into standard MIDI files. RMF included looping but did not incorporate DLS into the file format. In addition to the features just described, XMF can incorporate 40 bit encryption for advanced data security as well as being able to compress standard MIDI files by up to 5:1 and incorporate metadata such as rights information. So far, XMF Type 0 and Type 1 have been defined, both of which contain SMF and DLS data, and which are identical except that Type 0 MIDI data may be streamed.

FACT FILE 14.6 DOWNLOADABLE SOUNDS AND SOUNDFONTS

A gradual convergence may be observed in the industry between the different methods by which synthetic sounds can be described. These have been variously termed ‘Downloadable Sounds’, ‘SoundFonts’ and more recently ‘MPEG-4 Structured Audio Sample Bank Format’. Downloadable Sounds (DLS) is an MMA specification for synthetic voice description that enables synthesizers to be programmed using voice data downloaded from a variety of sources. In this way a content creator could not only define the musical structure of his or her content in a universally usable way, using standard MIDI files, but could also define the nature of the sounds to be used with downloadable sounds. In these ways content creators can specify more precisely how synthetic audio should be replayed, so that the end result can be more easily predicted across multiple rendering platforms.

The success of these approaches depends on ‘wave-table synthesis’. Here basic sound waveforms are stored in wavetables (simply tables of sample values) in RAM, to be read out at different rates and with different sample skip values, for replay at different pitches. Subsequent signal processing and envelope shaping can be used to alter the timbre and temporal characteristics. Such synthesis capabilities exist on the majority of computer sound cards, making it a realistic possibility to implement the standard widely.

DLS Level 1, version 1.1a, was published in 1999 and contains a specification for devices that can deal with DLS as well as a file format for containing the sound descriptions. The basic idea is that a minimal synthesis engine should be able to replay a looped sample from a wave table, apply two basic envelopes for pitch and volume, use low-frequency oscillator control for tremolo and vibrato, and respond to basic MIDI controls such as pitch bend and modulation wheel. There is no option to implement velocity crossfading or layering of sounds in DLS Level 1, but keyboard splitting into 16 ranges is possible.

DLS Level 2 is somewhat more advanced, requiring two six-segment envelope generators, two LFOs, a low-pass filter with resonance and dynamic cut-off frequency controls. It requires more memory for wavetable storage (2 MB), 256 instruments and 1024 regions, amongst other things. DLS Level 2 has been adopted as the MPEG-4 Structured Audio Sample Bank format.

Emu developed so-called SoundFonts for Creative Labs and these have many similar characteristics to downloadable sounds. They have been used widely to define synthetic voices for Sound Blaster and other computer sound cards. In fact the formats have just about been harmonized with the issue of DLS Level 2 which apparently contains many of the advanced features of SoundFonts. SoundFont 2 descriptions are normally stored in RIFF files with the extension ‘.sf2’.

SAOL AND SASL IN MPEG 4 STRUCTURED AUDIO

SAOL is the Structured Audio Orchestra Language of MPEG 4 Structured Audio (a standard for low bit rate representation of digital audio). SASL is the Structured Audio Score Language. An SASL ‘score’ controls SAOL ‘instruments’. SAOL is an extension of CSound, a synthesis language developed over many years, primarily at MIT, and is more advanced than MIDI DLS (which is based only on simple wavetable synthesis). Although there is a restricted profile of Structured Audio that uses only wavetable synthesis (essentially DLS Level 2 for use in devices with limited processing power), a full implementation allows for a variety of other synthesis types such as FM, and is extensible to include new ‘unit generators’ (the CSound name for the elements of a synthesis patch).

SASL is more versatile than standard MIDI files in its control of SAOL instruments. There is a set of so-called ‘MIDI semantics’ that enables the translation of MIDI commands and controllers into SAOL events, so that MIDI commands can either be used instead of an SASL score, or in addition to it. If MPEG 4 Structured Audio (SA) gains greater ground and authoring tools become more widely available, the use of MIDI control and DLS may decline as they are inherently less versatile. MIDI, however, is inherently simpler than SA and could well continue to be used widely when the advanced features of SA are not required.

MIDI OVER USB

USB (Universal Serial Bus) is a computer peripheral interface that carries data at a much faster rate than MIDI (up to 12 Mbit/s or up to 480 Mbit/s, depending on the version). It is very widely used on workstations and peripherals these days and it is logical to consider using it to transfer MIDI data between devices as well. The USB Implementers Forum has published a ‘USB Device Class Definition for MIDI Devices’, version 1.0, which describes how MIDI data may be handled in a USB context. It preserves the protocol of MIDI messages but packages them in such a way as to enable them to be transferred over USB. It also ‘virtualizes’ the concept of MIDI IN and OUT jacks, enabling USB to MIDI conversion, and vice versa, to take place in software within a synthesizer or other device. Physical MIDI ports can also be created for external connections to conventional MIDI equipment (see Figure 14.13). A so-called ‘USB MIDI function’ (a device that receives USB MIDI events and transfers) may contain one or more ‘elements’. These elements can be synthesizers, synchronizers, effects processors or other MIDI-controlled objects.

FIGURE 14.13 A USB MIDI function contains a USB-to-MIDI convertor that can communicate with both embedded (internal) and external MIDI jacks via MIDI IN and OUT endpoints. Embedded jacks connect to internal elements that may be synthesizers or other MIDI data processors. XFER in and out endpoints are used for bulk dumps such as DLS and can be dynamically connected with elements as required for transfers.

FIGURE 14.14 USB MIDI packets have a 1 byte header that contains a cable number to identify the MIDI jack destination and a code index number to identify the contents of the packet and the number of active bytes.

A USB to MIDI convertor within a device will typically have MIDI in and out endpoints as well as what are called ‘transfer’ (XFER) endpoints. The former are used for streaming MIDI events whereas the latter are used for bulk dumps of data such as those needed for downloadable sounds (DLS). MIDI messages are packaged into 32 bit USB MIDI events, which involve an additional byte at the head of a typical MIDI message. This additional byte contains a cable number address and a code index number (CIN), as shown in Figure 14.14. The cable number enables the MIDI message to be targeted at one of 16 possible ‘cables’, thereby overcoming the 16 channel limit of conventional MIDI messages, in a similar way to that used in the addressing of multiport MIDI interfaces. The CIN allows the type of MIDI message to be identified (e.g. System Exclusive; Note On), which to some extent duplicates the MIDI status byte. MIDI messages with fewer than 3 bytes should be padded with zeros.

The USB message transport protocol and interfacing requirements are not the topic of this book, so users are referred to the relevant USB standards for further information about implementation issues.

MIDI OVER IEEE 1394

The MMA and AMEI have published a ‘MIDI Media Adaptation Layer for IEEE 1394’ which describes how MIDI data may be transferred over 1394. This is also referred to in 1394 TA (Trade Association) documents describing the ‘Audio and Music Data Transmission Protocol’ and IEC standard 61883-6 which deals with the audio part of 1394 interfaces.

The approach is similar to that used with USB, described in the previous section, but has somewhat greater complexity. MIDI 1.0 data streams can be multiplexed into a 1394 ‘MIDI conformant data channel’ which contains eight independent MIDI streams called ‘MPX-MIDI data channels’. This way each MIDI conformant data channel can handle 8 x 16 = 128 MIDI channels (in the original sense of MIDI channels). The first version of the standard limits the transmission of packets to the MIDI 1.0 data rate of 31.25 kbit/s for compatibility with other MIDI devices; however, provision is made for transmission at substantially faster rates for use in equipment that is capable of it. This includes options for 2X and 3X MIDI 1.0 speed. 1394 cluster events can be defined that contain both audio and MIDI data. This enables the two types of information to be kept together and synchronized.

AFTER MIDI?

Various alternatives have been proposed over the years, aiming to improve upon MIDI’s relatively limited specification and flexibility when compared with modern music control requirements and computer systems. That said, MIDI has shown surprising robustness to such ‘challenges’ and has been extended over the years so as to ameliorate some of its basic problems. Perhaps the simplicity and ubiquity of MIDI has made it attractive for developers to find ways of working with old technology that they know rather than experimenting with untried but more sophisticated alternatives. As mentioned at the start of this chapter, a new ‘HD’ (High Definition) version of the MIDI standard is planned for release in 2009, which is likely to include support for more channels and controllers, as well as greater controller resolution using single messages. It is aimed to make this compatible with existing hardware and software.

ZIPI was a networked control approach proposed back in the early 1990s that aimed to break free from MIDI’s limitations and take advantage of faster computer network technology, but it never really gained widespread favor in commercial equipment. It has now been overtaken by more recent developments and communication buses such as USB and 1394.

Open Sound Control is a promising alternative to MIDI that is gradually seeing greater adoption in the computer music and musical instrument control world. Developed by Matt Wright at CNMAT (Center for New Music and Audio Technology) in Berkeley, California, it aims to offer a transport-independent message-based protocol for communication between computers, musical instruments and multimedia devices. It does not specify a particular hardware interface or network for the transport layer, but initial implementations have tended to use UDP (user datagram protocol) over Ethernet or other fast networks as a transport means. It is not proposed to describe this protocol in detail and further details can be found at the website indicated at the end of this chapter. A short summary will be given, however.

OSC uses a form of device addressing that is very similar to an Internet URL (uniform resource locator). In other words a text address with subaddresses that relate to lower levels in the device hierarchy. For example, ‘/synthesizer2/voice1/oscillator3/frequency’ (not a real address) might refer to a particular device called ‘synthesizer2’, within which is contained voice 1, within which is oscillator 3, whose frequency value is being addressed. The minimum ‘atomic unit’ of OSC data is 4 bytes (32 bits) long, so all values are 32 bit aligned, and transmitted packets are made up of multiples of 32 bit information. Packets of OSC data contain either individual messages or so-called ‘bundles’. Bundles contain elements that are either messages or further bundles, each having a size designation that precedes it, indicating the length of the element. Bundles have time tags associated with them, indicating that the actions described in the bundle are to take place at a specified time. Individual messages are supposed to be executed immediately. Devices are expected to have access to a representation of the correct current time so that bundle timing can be related to a clock.

SEQUENCING SOFTWARE

Introduction

Sequencers are probably the most ubiquitous of audio and MIDI software applications. Although they used to be available as dedicated devices they are now widely available as sophisticated software packages to run on a desktop computer. A sequencer is capable of storing a number of ‘tracks’ of MIDI and audio information, editing it and otherwise manipulating it for musical composition purposes. It is also capable of storing MIDI events for nonmusical purposes such as studio automation. Some of the more advanced packages are available in modular form (allowing the user to buy only the functional blocks required) and in cut-down or ‘entry-level’ versions for the new user. Popular packages such as ProTools and Logic now combine audio and MIDI manipulation in an almost seamless fashion, and have been developed to the point where they can no longer really be considered as simply sequencers. In fact they are full-blown audio production systems with digital mixers, synchronization, automation, effects and optional video.

The dividing line between sequencer and music notation software is a gray one, since there are features common to both. Music notation software is designed to allow the user control over the detailed appearance of the printed musical page, rather as page layout packages work for typesetters, and such software often provides facilities for MIDI input and output. MIDI input is used for entering note pitches during setting, whilst output is used for playing the finished score in an audible form. Most major packages will read and write standard MIDI files, and can therefore exchange data with sequencers, allowing sequenced music to be exported to a notation package for fine tuning of printed appearance. It is also common for sequencer packages to offer varying degrees of music notation capability, although the scores that result may not be as professional in appearance as those produced by dedicated notation software.

Tracks, channels, instruments and environments

A sequencer can be presented to the user so that it emulates a multitrack tape recorder to some extent. The example shown in Figure 14.15 illustrates this point, showing the familiar transport controls as well as a multitrack ‘tape-like’ display.

A track can be either a MIDI track or an audio track, or it may be a virtual instrument of some sort, perhaps running on the same computer. A project is built up by successively overlaying more and more tracks, all of which may be replayed together. Tracks are not fixed in their time relationship and can be slipped against each other, as they simply consist of data stored in the memory. On older or less advanced sequencers, the replay of each MIDI track was assigned to a particular MIDI channel, but more recent packages offer an almost unlimited number of virtual tracks that can contain data for more than one channel (in order to drive a multi-timbral instrument, for example). Using a multiport MIDI interface it is possible to address a much larger number of instruments than the basic 16 MIDI channels allowed in the past.

FIGURE 14.15 Example of a sequencer’s primary display showing tracks and transport controls. (Logic Platinum 5 ‘arrange’ window.)

In a typical sequencer, instruments are often defined in a separate ‘environment’ that defines the instruments, the ports to which they are connected, any additional MIDI processing to be applied, and so forth. An example is shown in Figure 14.16. When a track is recorded, therefore, the user simply selects the instrument to be used and the environment takes care of managing what that instrument actually means in terms of processing and routing. Now that soft synthesizers are used increasingly, sequencers can often address those directly via plug-in architectures such as DirectX or VST (see Chapter 13), without recourse to MIDI. These are often selected on pull-down menus for individual tracks, with voices selected in a similar way, often using named voice tables.

FIGURE 14.16 Example of environment window from Logic, showing ways in which various MIDI processes can be inserted between physical input and recording operation.

Input and output filters

After MIDI information is received from the hardware interface it is stored in memory, but it may sometimes be helpful to filter out some information before it can be stored, using an input filter. This will be a subsection of the program that watches out for the presence of certain MIDI status bytes and their associated data as they arrive, so that they can be discarded before storage. The user may be able to select input filters for such data as aftertouch, pitch bend, control changes and velocity information, amongst others. Clearly it is only advisable to use input filters if it is envisaged that this data will never be needed, since although filtering saves memory space the information is lost for ever. Output filters are often implemented for similar groups of MIDI messages as for the input filters, acting on the replayed rather than recorded information. Filtering may help to reduce MIDI delays, owing to the reduced data flow.

Timing resolution

The timing resolution to which a sequencer can store MIDI events varies between systems. This ‘record resolution’ may vary with recent systems offering resolution to many thousandths of a note. Audio events are normally stored to sample accuracy. A sequencer with a MIDI resolution of 480 ppqn (pulses per quarter note) can resolve events to 4.1 millisecond steps, for example. The quoted resolution of sequencers, though, tends to be somewhat academic, depending on the operational circumstances, since there are many other factors influencing the time at which MIDI messages arrive and are stored. These include buffer delays and traffic jams. Modern sequencers have sophisticated routines to minimize the latency with which events are routed to MIDI outputs.

The record resolution of a sequencer is really nothing to do with the timing resolution available from MIDI clocks or timecode (see Chapter 15). The sequencer’s timing resolution refers to the accuracy with which it time-stamps events and to which it can resolve events internally. Most sequencers attempt to interpolate or ‘flywheel’ between external timing bytes during replay, in an attempt to maintain a resolution in excess of the 24 ppqn implied by MIDI clocks.

Displaying, manipulating and editing information

A sequencer is the ideal tool for manipulating MIDI and audio information and this may be performed in a number of ways depending on the type of interface provided to the user. The most flexible is the graphical interface employed on many desktop computers which may provide for visual editing of the stored MIDI information either as a musical score, a table or event list of MIDI data, or in the form of a grid of some kind. Figure 14.17 shows a number of examples of different approaches to the display of stored MIDI information. Audio information is manipulated using an audio sample editor display that shows the waveform and allows various changes to be made to the signal, often including sophisticated signal processing, as discussed further below.

Although it might be imagined that a musical score would be the best way of visualizing MIDI data, it is often not the most appropriate. This is partly because unless the input is successfully quantized (see below) the score will represent precisely what was played when the music was recorded and this is rarely good-looking on a score! The appearance is often messy because some notes were just slightly out of time. Score representation is useful after careful editing and quantization, and can be used to produce a visually satisfactory printed output. Alternatively, the score can be saved as a MIDI file and exported to a music notation package for layout purposes.

In the grid editing (called ‘Matrix’ in the example shown) display, MIDI notes may be dragged around using a mouse or trackball and audible feedback is often available as the note is dragged up and down, allowing the user to hear the pitch or sound as the position changes. Note lengths can be changed and the timing position may be altered by dragging the note left or right. In the event list form, each MIDI event is listed next to a time value. The information in the list may then be changed by typing in new times or new data values. Also events may be inserted and deleted. In all of these modes the familiar cut and paste techniques used in word processors and other software can be applied, allowing events to be used more than once in different places, repeated so many times over, and other such operations.

FIGURE 14.17 Examples of a selection of different editor displays from Logic, showing display of MIDI data as a score, a graphical matrix of events and a list of events. Audio can be shown as an audio waveform display.

A whole range of semi-automatic editing functions are also possible, such as transposition of music, using the computer to operate on the data so as to modify it in a predetermined fashion before sending it out again. Echo effects can be created by duplicating a track and offsetting it by a certain amount, for example. Transposition of MIDI performances is simply a matter of raising or lowering the MIDI note numbers of every stored note by the relevant degree. A number of algorithms have also been developed for converting audio melody lines to MIDI data, or using MIDI data to control the pitch of audio, further blurring the boundary between the two types of information. Silence can also be stripped from audio files, so that individual drum notes or vocal phrases can be turned into events in their own right, allowing them to be manipulated, transposed or time-quantized independently.

FIGURE 14.18 Example from Logic of automation data graphically overlaid on sequencer tracks.

A sequencer’s ability to search the stored data (both music and control) based on specific criteria, and to perform modifications or transformations to just the data which matches the search criteria, is one of the most powerful features of a modern system. For example, it may be possible to search for the highest-pitched notes of a polyphonic track so that they can be separated off to another track as a melody line. Alternatively it may be possible to apply the rhythm values of one track to the pitch values of another so as to create a new track, or to apply certain algorithmic manipulations to stored durations or pitches for compositional experimentation. The possibilities for searching, altering and transforming stored data are almost endless once musical and control events are stored in the form of unique values, and for those who specialize in advanced composing or experimental music these features will be of particular importance. It is in this field that many of the high-end sequencer packages will continue to develop.

Quantization of rhythm

Rhythmic quantization is a feature of almost all sequencers. In its simplest form it involves the ‘pulling-in’ of events to the nearest musical time interval at the resolution specified by the user, so that events that were ‘out of time’ can be played back ‘in time’. It is normal to be able to program the quantizing resolution to an accuracy of at least as small as a 32nd note and the choice depends on the audible effect desired. Events can be quantized either permanently or just for replay. Some systems allow ‘record quantization’ which alters the timing of events as they arrive at the input to the sequencer. This is a form of permanent quantization. It may also be possible to ‘quantize’ the cursor movement so that it can only drag events to predefined rhythmic divisions.

More complex rhythmic quantization is also possible, in order to maintain the ‘natural’ feel of rhythm, for example. Simple quantization can result in music that sounds ‘mechanical’ and electronically produced, whereas the ‘human feel’ algorithms available in many packages attempt to quantize the rhythm strictly and then reapply some controlled randomness. The parameters of this process may be open to adjustment until the desired effect is achieved.

Automation and non-note MIDI events

In addition to note and audio events, one may either have recorded or may wish to add events for other MIDI control purposes such as program change messages, controller messages or system exclusive messages. Audio automation can also be added to control fades, panning, effects and other mixing features. Such data may be displayed in a number of ways, but again the graphical plot is arguably the most useful. It is common to allow automation data to be plotted as an overlay, such as shown in Figure 14.18.

Some automation data is often stored in a so-called ‘segment-based’ form. Because automation usually relates to some form of audio processing or control, it usually applies to particular segments on the timeline of the current project. If the segment is moved, often one needs to carry the relevant automated audio processing with it. Segment-based processing or automation allows the changes in parameters that take place during a segment to be ‘anchored’ to that segment so that they can be made to move around with it if required.

It is possible to edit automation or control events in a similar way to note events, by dragging, drawing, adding and deleting points, but there are a number of other possibilities here. For example, a scaling factor may be applied to controller data in order to change the overall effect by so many percent, or a graphical contour may be drawn over the controller information to scale it according to the magnitude of the contour at any point. Such a contour could be used to introduce a gradual increase in MIDI note velocities over a section, or to introduce any other time-varying effect. Program changes can be inserted at any point in a sequence, usually either by inserting the message in the event list, or by drawing it at the appropriate point in the controller chart. This has the effect of switching the receiving device to a new voice or stored program at the point where the message is inserted. It can be used to ensure that all tracks in a sequence use the desired voices from the outset without having to set them up manually each time. Either the name of the program to be selected at that point or its number can be displayed, depending on whether the sequencer is subscribing to a known set of voice names such as General MIDI.

System exclusive data may also be recorded or inserted into sequences in a similar way to the message types described above. Any such data received during recording will normally be stored and may be displayed in a list form. It is also possible to insert SysEx voice dumps into sequences in order that a device may be loaded with new parameters whilst a song is executing if required.

MIDI mixing and external control

Sequencers often combine a facility for mixing audio with one for controlling the volume and panning of MIDI sound generators. Using MIDI volume and pan controller numbers (decimal 7 and 10), a series of graphical faders can be used to control the audio output level of voices on each MIDI channel, and may be able to control the pan position of the source between the left and right outputs of the sound generator if it is a stereo source. Onscreen faders may also be available to be assigned to other functions of the software, as a means of continuous graphical control over parameters such as tempo, or to vary certain MIDI continuous controllers in real time.

It is also possible with some packages to control many of the functions of the sequencer using external MIDI controllers. An external MIDI controller with a number of physical faders and buttons could be used as a basic means of mixing, for example, with each fader assigned to a different channel on the sequencer’s mixer.

Synchronization

A sequencer’s synchronization features are important when locking replay to external timing information such as MIDI clock or timecode. Most sequencers are able to operate in either beat clock or timecode sync modes and some can detect which type of clock data is being received and switch over automatically. To lock the sequencer to another sequencer or to a drum machine beat clock synchronization may be adequate. If you will be using the sequencer for applications involving the timing of events in real rather than musical time, such as the dubbing of sounds to a film, then it is important that the sequencer is able to allow events to be tied to timecode locations, as timecode locations will remain in the same place even if the musical tempo is changed.

Sequencers incorporating audio tracks also need to be able to lock to sources of external audio or video sync information (e.g. word clock or composite video sync), in order that the sampling frequency of the system can be synchronized to that of other equipment in the studio.

Synchronized digital video

Digital video capability is now commonplace in desktop workstations. It is possible to store and replay full motion video on a desktop computer, either using a separate monitor or within a window on an existing monitor, using widely available technology such as QuickTime or Windows Multimedia Extensions. The replay of video from disk can be synchronized to the replay of audio and MIDI, using timecode, and this is particularly useful as an alternative to using video on a separate video tape recorder (which is mechanically much slower, especially in locating distant cues). In some sequencing or editing packages the video can simply be presented as another ‘track’ alongside audio and MIDI information.

In the applications considered here, compressed digital video is intended principally as a cue picture that can be used for writing music or dubbing sound to picture in post-production environments. In such cases the picture quality must be adequate to be able to see cue points and lip sync, but it does not need to be of professional broadcast quality. What is important is reasonably good slow motion and freeze-frame quality. Good quality digital video (DV), though, can now be transferred to and from workstations using a Firewire interface enabling video editing and audio post-production to be carried out in an integrated fashion, all on the one platform.

WEBSITES

MIDI Manufacturers Association: http://www.midi.org/

Music XML: http://www.musicxml.org

Open Sound Control: http://opensoundcontrol.org

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 14 MIDI and Synthetic Audio Control

Create new playlist

Sign In

Sign Up

Chapter 14

MIDI and Synthetic Audio Control