5  Hardware and systems issues

This chapter is concerned with explaining the audio storage media, sound cards and interfaces commonly encountered on computers. It also considers signal processing options and hardware abstraction layers for communication with sound cards.

5.1 Storage media

The purpose of this section is to describe the principles, limitations and applications of storage media used for audio in computer workstations. The media described are not exclusive to the field of audio and are widely encountered in general-purpose storage applications. In most cases the same media that are used for general purpose applications in computers can be used for storing audio and video without modification, although certain specifications must be adequate if operation is to be satisfactory. There will continue to be a decline in the use of dedicated audio recording formats in favour of general purpose mass storage media, if only because of the simple economics of the matter.

Improvements in the design of storage media will continue and prices will continue to fall. The devices described here are likely to remain popular for some years to come and in any case the fundamental principles involved are unlikely to change radically. Examples of specifications should only be taken as representative of today’s equipment.

5.1.1 Storage requirements of digital audio and video

There are two main roles for storage media in audio workstations. One is the primary role of real-time recording and replay and the other is the secondary role of backup storage. The requirements differ somewhat, although it is possible to use similar media for both purposes. Real-time recording and replay needs storage devices capable of sustaining data transfer for a number of audio channels, so that the channels can record or replay for long periods without breaks, be edited and post processed, with quick access to stored files. This was discussed in greater detail in Chapter 3. Backup can take place in non-real time, does not need such fast access to files and does not need to support editing and other post processing operations. Backup may also need a large capacity and it would be advantageous if it were cheaper than primary storage, and be based on removable media. It follows that certain devices are suitable for backup that may not be suitable for primary storage.

Storage systems may use removable media but many have fixed media. It is advantageous to have removable media for audio and video purposes because it allows different jobs to be kept on different media and exchanged at will, but unfortunately the highest performance is normally only obtainable from storage systems with fixed media. Systems involving a small number of audio channels or using data reduction may be able to take advantage of removable media as primary storage, but in most current systems removable media are normally used as secondary storage.

It perhaps goes without saying that any storage system used for audio and video should be as reliable and robust as possible. It is also likely to need to be a fairly ‘heavy duty’ system because the demands of audio and video recording are quite heavy and will require the storage device to be in an almost constant state of activity. This differs from the more gentle task of, say, word processing, where the storage device is idling for long periods.

5.1.2 Disk drives in general

Disk drives are probably the most common form of mass storage. They have the advantage of being random-access systems – in other words any data can be accessed at random and with only a short delay. This may be contrasted with tape drives that only allow linear access – by winding through the tape until the desired data is reached, resulting in a considerable delay. Disk drives come in all shapes and sizes from the commonly encountered floppy disk at the bottom end to high performance hard drives at the top end. The means by which data are stored is usually either magnetic or optical, but some use a combination of the two, as described below. There exist both removable and fixed media disk drives, but in almost all cases the fixed media drives have a higher performance than removable media drives. This is because the design tolerances can be made much finer when the drive does not have to cope with removable media, allowing higher data storage densities to be achieved. Although removable disk media can appear to be expensive compared with tape media, the cost must be weighed against the benefits of random access and the possibility that some removable disks can be used for primary storage whereas a tape can not. Removable media should be distinguished from removable drives, the latter requiring that the complete drive is removed from the system as opposed to the storage surface(s) only.

The general structure of a disk drive is shown in Figure 5.1. It consists of a motor connected to a drive mechanism that causes one or more disk surfaces to rotate at anything from a few hundred to many thousands of revolutions per minute. This rotation may either remain constant or may stop and start, and it may either be at a constant rate or a variable rate, depending on the drive. One or more heads are mounted on a positioning mechanism that can move the head across the surface of the disk to access particular points, under the control of hardware and software called a disk controller. The heads read data from and write data to the disk surface by whatever means the drive employs. Certain disk types are read-only, some are write-once-read-many (WORM) and some are fully erasable and rewritable.


Figure 5.1 The general mechanical structure of a disk drive


Figure 5.2 Disk formatting divides the storage area into tracks and sectors

The disk surface is normally divided up into tracks and sectors, not physically but by means of ‘soft’ formatting (see figure 5.2). Formatting writes logical markers to indicate block boundaries, amongst other processes. On most hard disks the tracks are arranged as a series of concentric rings, but with some optical disks there is a continuous spiral track.

5.1.3 Disk drive specifications

Disk drive performance is characterised by specifications that are often quoted in promotional literature. These are the subject of a certain amount of misunderstanding and manufacturers often play games with these figures to make their drives seem better than they are. As with all specifications it is important to compare like with like, and to know how a certain parameter has been measured. The most important parameters are:

•  access time;

•  instantaneous transfer rate;

•  sustained transfer rate; and

•  storage capacity (formatted).

These are not the only factors that affect the performance or desirability of a drive, but they are a ready means of comparing two apparently similar drives.

Access time, normally quoted in milliseconds, is the time taken for a block of data to be accessed. It may be specified in a number of ways, since clearly the actual access time depends on where the head is when a block is requested. Figure 5.3 shows that true access time is made up of seek latency and rotational latency. The seek latency is dependent on the speed of the positioner and the rotational latency is dependent on how fast the disk rotates. Access time may often be just seek latency and may be quoted as ‘track-to-track’, which is the fastest, ‘average’, which is a reliable guide to general performance, or ‘one-third full sweep’, which is the time taken for the head to traverse one third of the active disk radius.

Instantaneous transfer rate is the fastest speed at which data can be read from the disk surface once the head has arrived at its correct location. Normally quoted in megabits per second, it gives a guide to the peak performance of the drive.


Figure 5.3 The delays involved in accessing a block of data stored on a disk

Sustained transfer rate is a more useful guide to real performance, though, because it gives a guide to the long-term data rate that might be expected from the disk, sustained over many blocks. This parameter, though, is affected considerably in real multimedia systems by the fragmentation of the drive and by the number of channels it has to service.

Formatted storage capacity is the number of megabytes of capacity available for user data after the disk has been formatted. It is often considerably smaller than the unformatted capacity of the disk (which is not a very useful figure to know). The formatted capacity is available for the storage of audio data if necessary, with no necessity to add an overhead for error correction, as described in Chapter 2.

5.1.4 Magnetic hard disk drives

Magnetic hard disks provide space for the storage of a large amount of data in a relatively small space, are reliable, fast and reasonably economical. Performance and capacity are normally in excess of typical multichannel audio requirements these days. One can store many hours of monophonic audio on a hard disk and they are capable of handling a large number of simultaneous channels of recording and replay. A quiet drive is important for audio operations, especially if the drive is to be installed in the same room as the operator.

A typical drive is a sealed unit and the physical disks inside it cannot be removed to make way for others. The recording process is magnetic, whereby data is stored in the form of flux reversals in the surface layer of the disks. The drive is a combination of physical disk surfaces on which data is stored, electromagnetic heads that read and write data, a positioner to move the heads to the right place, a motor that rotates the surfaces, a servo mechanism that controls the moving parts, and a controller that looks after the data flow to and from the surfaces and interfaces to the rest of the computer system. A cut-away diagram of an older drive is shown in Figure 5.4.

The drive is sealed (except sometimes for a small pressure-relief vent) in order to prevent the surfaces of the disks from becoming contaminated. The lack of contamination and the fact that the disks will never be removed means that fine tolerances can be used in manufacture, allowing a larger amount of data to be stored in a smaller space than is possible with removable magnetic disks. It also results in a very low error rate. One or more disks normally reside inside a drive and it is common for both sides of each disk to be used. These disks are rigid, not floppy, and all rotate on a common spindle. Each surface has its own read/write heads, which can be moved across the disk surface to access data stored in different places. The head positioner moves all the heads at the same time, rather than independently. The heads do not touch the surface of the disks during operation, they fly just a small distance above the surface, lifted by the aerodynamic effect of the air which is dragged around above the disk surface due to friction. A small area of the disk surface is set aside for the heads to land on when the power is turned off and this area does not contain data.

Data are stored in tracks divided up into sectors. Each sector is separated by a small gap and preceded by an address mark that uniquely identifies the sector’s location and a preamble to synchronise the reading of data. The term cylinder relates to all the tracks that reside physically in line with each other in the vertical plane through the different surfaces (see Figure 5.5).


Figure 5.4 Cut-away drawing of a typical Winchester drive. (Courtesy of MacUser)


Figure 5.5 Winchester drive tracks on different surfaces form concentric cylinders

A sector typically contains 512 bytes. The disk is of the ‘write-many-times’ format which means that old data may be overwritten many times in order to reuse the storage space. Although the disk surfaces of such a drive are not removable, drives exist that may be interchanged in their entirety. Such drives are known as removable drives (not removable disks) and they are usually mounted in a cartridge with a handle so that they can be ‘unplugged’ from a docking frame of some sort. Figure 5.6 shows a photograph of such a system. This is a useful feature, but it is relatively expensive to interchange complete drives in this way. It may be considered worth the advantage of being able to take a complete session’s primary storage from one system and insert it into another.


Figure 5.6 A typical removable disk drive system allowing multiple drives to be inserted or removed from the chassis at will (Courtesy of Glyph Technologies, Inc.)

5.1.5 RAID arrays

Hard disk drives can be combined in various ways to improve either data integrity or data throughput. RAID stands for redundant array of inexpensive disks, and is a means of linking ordinary disk drives under one controller so that they form an array of data storage space, as shown in Figure 5.7. A RAID array can be treated as a single volume by a host computer. There are a number of levels of RAID array, each of which is designed for a slightly different purpose, as summarised in Table 5.1.

One of the main reasons for using a RAID array would be to improve the reliability of data storage. At certain RAID levels the data is spread across all of the drives involved, with a final drive used to store error protection information (the check drive). The aim is to prevent you losing your data if one of the drives fails, because it can be reconstructed from the remaining data. ‘Mirroring’ is also an option that allows the data on one disk to be perfectly duplicated on another, again for improving data security. By spreading data across drives it is also possible to speed up read and write operations.

5.1.6 Removable magnetic media

Floppy disks are unsuitable for AV applications because of just about every aspect of their specification. They are too small and too slow. Higher capacity removable magnetic media have existed for some time, though, with speeds approaching that of slower hard disks. These include things like Iomega’s Zip disks that are constructed rather like large floppy disks, in a rigid cartridge. These have tended to offer capacities up to 250 Mbytes, which makes them only marginally useful for AV applications requiring short storage times.

Advances in the magnetic recording field have resulted in removable media offering much higher capacities and transfer speeds. Because removable media are not permanently sealed the reliability and performance may be less satisfactory than sealed hard disks, but there is the advantage of removability. One example, the Iomega Jaz drive, has performance suitable for primary storage in audio systems. This is a form of removable cartridge that houses hard disk platters inside a dustproof case. Syquest has also manufactured a range of high capacity removable storage systems (but they were bought out by Iomega), and Castlewood Systems has introduced a range of so-called ‘ORB’ drives based on magneto-resistive head technology that allows greater capacity per area of the disk surface.



Figure 5.7 Some examples of RAID array configurations. (a) Level 0. (b) Level 1. (c) Level 3

Table 5.1 RAID levels

RAID level Features
0 Data blocks split alternately between a pair of disks, but no redundancy so actually less reliable than a single disk. Transfer rate is higher than a single disk. Can improve access times by intelligent controller positioning of heads so that next block is ready more quickly
1 Offers disk mirroring. Data from one disk is automatically duplicated on another. A form of real-time backup
2 Uses bit interleaving to spread the bits of each data word across the disks, so that, say, eight disks each hold one bit of each word, with additional disks carrying error protection data. Non-synchronous head positioning. Slow to read data, and designed for mainframe computers
3 Similar to level 2, but synchronises heads on all drives, and ensures that only one drive is used for error protection data. Allows high speed data transfer, because of multiple disks in parallel. Cannot perform simultaneous read and write operations
4 Writes whole blocks sequentially to each drive in turn, using one dedicated error protection drive. Allows multiple read operations but only single write operations
5 As level 4 but splits error protection between drives, avoiding the need for a dedicated check drive. Allows multiple simultaneous reads and writes
6 As level 5 but incorporates RAM caches for higher performance

5.1.7 Optical disks in general

There are a number of families of optical disk drive that have differing operational and technical characteristics, although they share the universal benefit of removable media. They are all written and read using a laser, which is a highly focused beam of coherent light, although the method by which the data is actually stored varies from type to type. Optical disks are sometimes enclosed in a plastic cartridge that protects the disk from damage, dust and fingerprints, and they have the advantage that the pickup never touches the disk surface making them immune from the ‘head crashes’ that can affect magnetic hard disks.

Compatibility between different optical disks and drives is something of a minefield because the method of formatting and the read/write mechanism may differ. The most obvious differences lie in the erasable or non-erasable nature of the disks and the method by which data is written to and read from the disk, but there are also physical sizes and the presence or lack of a cartridge to consider. Drives tend to split into two distinct families from a compatibility point of view: those that handle CD/DVD formats and those that handle magneto-optical (M-O) and other cartridge-type ISO standard disk formats. The latter may be considered more suitable for ‘professional purposes’ whereas the former are often encountered in consumer equipment.

WORM disks (for example the cartridges that were used quite widely for archiving in the late 1980s and 90s) may only be written once by the user, after which the recording is permanent (a CD-R is therefore a type of WORM disk). Other types of optical disks can be written numerous times, either requiring pre-erasure or using direct overwrite methods (where new data is simply written on top of old, erasing it in the process). The read/write process of most current rewritable disks is typically ‘phase change’ or ‘magneto-optical’. The CD-RW is an example of a rewritable disk that now uses direct overwrite principles.

The speed of some optical drives approaches that of a slow hard disk, which makes it possible to use them as an alternative form of primary storage, capable of servicing a number of audio channels. One of the major hurdles which had to be overcome in the design of such optical drives was that of making the access time suitably fast, since an optical pickup head was much more massive than the head positioner in a magnetic drive (it weighed around 100 g as opposed to less than 10 g). Techniques are being developed to rectify this situation, since it is the primary limiting factor in the onward advance of optical storage.

5.1.8 CAV and CLV modes in optical storage

CAV (constant angular velocity) and CLV (constant linear velocity) recording are two modes of rotation used in optical disk drives. In CLV recording the rotational speed of the disk changes depending on the position of the pickup, in order to keep a constant length of track passing under the head per second. In CAV recording the rotational speed of the disk remains constant. CAV disks normally have sectors of a fixed angle of arc, holding a fixed amount of data, so the data is more densely packed in sectors towards the centre of the disk (see Figure 5.8). CLV recording allows more data sectors to be stored towards the edges of the disk than at the centre, so may allow more efficient use to be made of the space available, but CLV requires servo operation to change the disk speed when the pickup head is moved, making them slower to access data.

Some drives use a mode known as zoned-CAV (Z-CAV) to pack more data into the outer tracks of a disk. The disk rotates at one of a number of fixed speeds depending on which ‘zone’ the pickup is in. This is really a halfway house between CAV and CLV recording and does not compromise access time so much. Compact discs use CLV recording, for example, but most optical disk cartridge drives (e.g. M-O) use a form of CAV or Z-CAV recording. Recent drives may use CAV replay, even for CLV disks, in order to enable constant spin speeds and faster access time. Z-CLV is a variant of CLV recording used on DVD-RAM disks.

5.1.9 The magneto-optical (M-O) drive

M-O drives use optical disks that can be erased and re-recorded. In order to write data, the laser is used at a higher power to that used in the reading process, to heat spots in the recording layer that is made up of rare earth elements (typically gadolinium and terbium). In older drives a biasing magnet is used to create a weak magnetic field in the vicinity of the heated spot on the disk, whose recording layer only takes on this prevailing magnetic polarisation when it is hot. Under normal conditions the recording layer cannot be magnetised (see Figure 5.9). When the spot cools it retains this magnetisation. So-called LIMDOW (light intensity modulated direct overwrite) drives have enabled better recording performance from M-O technology by doing away with the external biasing magnet. Instead the disk contains two magnetic layers with opposite polarity, close to the recording layer. The magnetic polarity taken on by the recording layer then depends on the laser intensity during recording.


Figure 5.8 (a) Sectors on a CAV disk are of equal angle of arc. (b) On a Z-CAV disk the sector angle is not constant and more sectors are recorded at the outer edges of the disk than at the centre


Figure 5.9 The magneto-optical disk is recorded by exposing small areas of the recording layer to high-power laser light, whereupon they take on the magnetic polarity provided by the polarising magnet. On replay the magnetic polarisation affects the polarisation of reflected laser light

Although the data is recorded by a combination of optical heating and magnetisation, it is read by an entirely optical means which relies upon the fact that laser light reflected from the disk will be polarised depending on the magnetic state of the recording layer. This is known as the Kerr effect and the change in optical polarisation angle may be as small as a few degrees depending on the material concerned. The reflected light passes through a polarisation analyser, resulting in changes in intensity of the light falling on a photodetector. The M-O disk is normally pre-grooved and sectored to enable the drive to track the medium during recording.

An ISO standard was established for M-O disks, to which most of the major manufacturers adhere. This allows for two different sector sizes (512 bytes and 1024 bytes), giving 297 and 325 Mbytes per side of storage capacity respectively on a 5.25 inch disk (594 or 650 Mbytes in total) using CAV recording. There are also higher density versions offering up to around 9 Gbytes capacity, in approximate multiples of two times the basic capacity stated above.

5.1.10 Phase-change optical recording

In phase-change recording data is written by a high-powered laser, changing recorded spots from a non-crystalline (amorphous) state to a crystalline state. In the crystalline state the reflectivity is increased considerably over that of the amorphous state. Data are read by a lower-powered laser that detects changes in reflectivity. By careful selection of the recording material and laser beam control the process may be made reversible (so data may be overwritten). The only apparent drawback is the number of re-write cycles allowed (cycles of erasure and re-recording), which may be in the order of ten times lower than that of the M-O disk. The CD-RW is based on phase-change principles.

5.1.11 Compact discs and drives

The CD is not immediately suitable for real-time audio editing and production, partly because of its relatively slow access time compared with hard disks, but can be seen to have considerable value for the storage and transfer of sound material that does not require realtime editing. Broadcasters use them for sound effects libraries and studios and mastering facilities use them for providing customers and record companies with ‘acetates’ or test pressings of a new recording. They have also become quite popular as a means of transferring finished masters to a CD pressing plant in the form of the PMCD (pre-master CD). They are ideal as a means of ‘proofing’ CD-ROMs and other CD formats, and can be used as low-cost backup storage for computer data.

Compact discs (CDs) are familiar to most people as a consumer read-only optical disk for audio (CD-DA) or data (CD-ROM) storage. Standard audio CDs (CD-DA) conform to the Red Book standard published by Philips. The CD-ROM standard (Yellow Book) divides the CD into a structure with 2048 byte sectors, adds an extra layer of error protection, and makes it useful for general purpose data storage including the distribution of sound and video in the form of computer data files. It is possible to find disks with mixed modes, containing sections in CD-ROM format and sections in CD-Audio format. The CD Plus is one such example.

CD-R is the recordable CD, and may be used for recording CD-Audio format or other CD formats using a suitable drive and software. The Orange Book, Part 2, contains information on the additional features of CD-R, such as the area in the centre of the disk where data specific to CD-R recordings is stored. Audio CDs recorded to the Orange Book standard can be ‘fixed’ to give them a standard Red Book table of contents (TOC), allowing them to be replayed on any conventional CD player. Once fixed into this form, the CD-R may not subsequently be added to or changed, but prior to this there is a certain amount of flexibility, as discussed below. CD-RW disks are erasable and work on phase-change principles, requiring a drive compatible with this technology, being described in the Orange Book, Part 3.

The degree of reflectivity of CD-RW disks is much lower than that of typical CD-R and CD-ROM. This means that some early drives and players may have difficulties reading them. However the ‘multi-read’ specification developed by the OSTA (Optical Storage Technology Association) describes a drive that should read all types of CD, so recent drives should have no difficulties here.

Figure 5.10 shows the cross-section through a typical blank CD-R disk. The disk consists of a pre-formed ‘groove’ in the so-called recording layer. The recording layer consists of a green semi-transparent material, behind which is a gold reflective layer. During recording, the laser heats the recording layer to around 250 °C, a process which causes it to melt, forming a pit similar to that found on a conventional CD. On replay, the laser pickup, operated at a lower power than for recording, experiences a lower level of reflected light in the presence of a pit than it does in the absence of a pit, in exactly the same manner as for a prerecorded CD.

An Orange Book CD does not have to be recorded all at once. It can be removed from the machine and added to at a later date, appending the new material to the end of the last recording. In order to make this possible the disc contains an additional recording area inside the starting point of a conventional CD (normal CDs begin with a TOC in the centre of the disk and play from the inside out), divided into two parts (see Figure 5.11). The Program Calibration Area (PCA) is used for optimising laser power by making a number of short test recordings when a new disk is first inserted. On subsequent occasions this calibration is not required since a message is stored on the disk to indicate the appropriate laser power. The Program Memory Area (PMA) is used to store a temporary TOC while the disk is yet ‘unfixed’ and this TOC is updated every time a new track is recorded. Here is also stored ‘skip’ information, which allows certain tracks to be skipped on replay if they have been messed up (although this will only work when the disk is replayed on a CD player that recognises skip IDs).


Figure 5.10 Cross section through a CD-R WORM disk


Figure 5.11 Division of recording area on the CD-R, showing space for program calibration area (PCA) and temporary program memory area (PMA)

The lead-in area of an Orange Book CD, where a normal CD would start to read its TOC, is left blank until such time as the user decides that the disc is completed. On ‘fixing’ the disk the machine records a Red Book TOC, after which no further recording is allowed. The early blanks for these machines ran to 63 minutes, but 74 minute disks became available, running at the slightly slower linear velocity of 1.2ms–1. The standard capacity for a CD-R is 650 Mbyte (74 minutes), although 700 Mbyte (80 minute) disks are now available. ‘Audio-only’ disks have a royalty attached to them that offsets the supposed losses of the record industry owing to consumer piracy. Some consumer CD recorders may refuse to record audio on disks other than these.

A number of recording modes are possible on most Orange Book drives. ‘Disk-at-once’ is the most basic, in which all of the information is written at one time together with a Red Book TOC; ‘Track-at-once’ allows partial recording of the disk, with the option to record more at a later time, but without the option to read any of the data back until the disk TOC is fixed; ‘Multisession’ allows partial recording of the disk to a total of 99 sessions, with the option to read back the recorded data before the disk has been filled (provided that the reading drive is multisession capable and can read the temporary Orange Book TOC). ‘Packet writing’ or ‘incremental writing’ allows very small chunks of data to be recorded, even within a track. Only OSTA (Optical Storage Technology Association)-endorsed ‘Multi-Read’ CD drives can replay packet-written disks.

5.1.12 DVD

DVD is the natural successor to CD, being a higher-density optical disc format aimed at the consumer market, having the same diameter as CD and many similar physical features. It uses a different laser wavelength to CD (635–650 nm as opposed to 780 nm) so multi-standard drives need to be able to accommodate both. Data storage capacity depends on the number of sides and layers to the disk, but ranges from 4.7 Gbytes (single-layer, single-sided) up to about 18 Gbytes (double-layer, double-sided). The data transfer rate at ‘one times’ speed is just over 11 Mbit s–1.

DVD-Video is the format originally defined for consumer distribution of movies with surround sound, typically incorporating MPEG-2 video encoding and Dolby Digital surround sound encoding. It also allows for up to eight channels of 96 kHz linear PCM audio, at up to 24-bit resolution. DVD-Audio is intended for very high quality multichannel audio reproduction and allows for linear PCM sampling rates up to 192 kHz, with numerous configurations of audio channels for different surround modes, and optional lossless data reduction (MLP). These formats will not be described in detail here as the intention is primarily to consider DVD as a mass storage medium for workstations, rather than as a consumer release format.

Table 5.2 Recordable DVD formats

Recordable DVD type Description
DVD-R (A and G) DVD equivalent of CD-R. One-time recordable in sequential manner, replayable on virtually any DVD-ROM drive. Supports ‘incremental writing’ or ‘disk at once’ recording. Capacity either 3.95 (early disks) or 4.7 Gbytes per side. ‘Authoring’ (A) version (recording laser wavelength = 635 nm) can be used for pre-mastering DVDs for pressing, including DDP data for disk mastering (see Chapter 6). ‘General’ (G) version (recording laser wavelength = 650 nm) intended for consumer use, having various ‘content protection’ features that prevent encrypted commercial releases from being cloned
DVD-RAM Sectored format, rather more like a hard disk in data structure when compared with DVD-R. Uses phase-change (PD-type) principles allowing direct over-write. Version 2 disks allow 4.7 Gbyte per side (reduced to about 4.2 Gbytes after formatting). Type 1 cartridges are sealed and Type 2 allow the disc to be removed. Double-sided discs only come in sealed cartridges. Can be re-written about 100 000 times. The recent Type 3 is a bare disc that can be placed in an open cartridge for recording
DVD-RW Pioneer development, similar to CD-RW in structure, involving sequential writing. Does not involve a cartridge. Can be re-written about 1000 times. 4.7 Gbytes per side
DVD+RW Non-DVD-Forum alternative to DVD-RAM (and not compatible), allowing direct overwrite. No cartridge. Data can be written in either CLV (for video recording) or CAV (for random access storage) modes. There is also a write-once version known as DVD+R

DVD can be used as a general-purpose data storage medium. Like CD, there are numerous different variants on the recordable DVD, partly owing to competition between the numerous different ‘factions’ in the DVD consortium. These include DVD-R, DVD-RAM, DVD-RW and DVD + RW, all of which are based on similar principles but have slightly different features, leading to a compatibility minefield that is only gradually being addressed. It is not proposed to go into this topic in great detail here, but a brief overview is given and a summary of common formats is shown in Table 5.2.

The ‘DVD Multi’ guidelines produced by the DVD Forum are an attempt to foster greater compatibility between DVD drives and disks, but this does not really solve the problem of the formats that are currently outside the DVD Forum.

Writeable DVDs are a useful option for backup of large projects, particularly DVD-RAM because of its many-times overwriting capacity and its hard disk-like behaviour. It is possible that a format like DVD-RAM could be used as primary storage in a multitrack recording/editing system, as it has sufficient performance for a limited number of channels and it has the great advantage of being removable. However it is likely that hard disks will retain the performance edge for the foreseeable future.

5.1.13 Optical disc filing structures

There is a standard filing structure for CD-ROM known as ISO 9660 or High Sierra, which was (and still is) used when wanting to ensure that disks can be read across a wide range of platforms, although CD-ROMs can also be formatted in non-ISO modes for use on proprietary platforms. ISO 9660 format handles basic eight-character filenames and three-character extensions, but there are extensions such as ‘Joliet’ to allow for longer filenames.

The universal disc format (UDF) was developed as a means of simplifying the compatibility problems between optical discs such as CD and DVD, especially when used in ‘packet-writing’ modes. It is an IEC standard: IEC 13346. ISO 9660 compatibility is included in UDF. A form of UDF (version 1.02) was originally devised for DVD formats, version 1.5 being introduced later to encompass CD formats. It maintains ‘virtual allocation tables’ (VATs) on the disc that map physical data locations to relevant file packets, and these are updated to include all previous VAT data each time new packets are written.

5.1.14 Tape storage media

There are a number of types of storage media in common use for tape backup storage with AV workstations. All are cassette or cartridge formats. These include DDS, Exabyte, Mammoth, AIT and DLT. Tapes are not usually formatted in the same way as disks. Tapes are often used as basic ‘data streamers’ where data is stored in a very simple sequential fashion, possibly even with the block size varying in different parts of the tape. It may be that no directory is stored on the tape itself, this being kept in a disk file on the host computer. An ANSI standard exists which defines basic rules for information interchange on magnetic tapes and this is often used on media such as Exabyte to determine the method of labelling tapes and filing information. Because tapes are not usually ‘mountable volumes’ in the same way as disks, it is rare to be able to ‘see’ them on the desktops of GUI-based computers, requiring special software with appropriate drivers for the tape system in question to read and write information.

DDS is the DAT Data Storage format, and rather like the CD-ROM is the extension of a format originally intended purely for audio to general purpose data storage applications. The DAT format uses 4 mm tape and the tape is read and written using heads mounted in a drum which scans the tape in a helical fashion. On top of the audio DAT formatting is added formatting and error correction information so that the tape is then useful as a block-structured medium with low enough error rates for data purposes, and a directory area at the start of the tape.

DDS drives normally have four heads on the drum so that the data can be verified immediately after it is written – important for checking data reliability. It is recommended that one uses special DDS tapes for data purposes, which are said to be manufactured to the high specifications needed to ensure reliability, but some users have been known to use audio DAT tapes with varying degrees of success. It is sometimes necessary to alter a switch inside the drive for this purpose, so that it accepts ordinary tapes. DDS-1 drives store up to 2 Gbytes of data on a tape and some drives incorporate built-in data compression which can boost the storage capacity of such drives up to a maximum of 8 Gbytes. This is lossless compression allowing the data to be recovered in precisely its original form. The transfer rate to and from a DDS-1 drive is moderate (of the order of 180 kbyte s–1), and the access time is quite slow compared with a disk drive (of the order of seconds). DDS-2 drives offer higher storage capacity and higher transfer rates. Using a longer tape, the DDS-2 drive can store up to 4 Gbytes of data in uncompressed form and up to 16 Gbytes compressed. The transfer rate is approximately 500 kbyte s–1.

Exabyte tapes are based on the original consumer Video-8 format, adapted for data storage. The tapes are 8 mm wide, as opposed to the 4 mm of DDS, and the cartridges are slightly larger. Drives are typically more expensive than DDS drives. Storage capacities and transfer rates available from Exabyte drives are considerably greater than those available from DAT. One current example holds up to 5 Gbytes per tape and transfers data at a rate of around 500 kbyte s–1. Maximum available capacity is currently 7 Gbytes uncompressed. Mammoth is a relatively recent tape storage technology based also on 8 mm tape. It allows considerably greater capacity than Exabyte (around 60 Gbytes) and increased data transfer rates with a simpler mechanism that is said to reduce tape wear.

The QIC (quarter-inch cartridge) is quite a well-established tape backup medium, used widely in professional computing and mainframe systems. It uses quarter-inch tape housed in a largish cartridge, and has very low error rates and high longevity. Recording is via stationary heads with multiple narrow tracks. Capacities and transfer rates are quite high, with drives storing over 10 Gbytes planned.

Digital Linear Tape (DLT) drives use a large number of linear tracks (128) across the width of a half-inch tape. It is often used for DVD masters, offering an uncompressed capacity of up to 35 Gbytes. Using a SCSI-2 interface, these drives offer transfer rates of up to 20 Mbyte s–1 with very low error rate, which makes them ideal for workstation backup purposes. Super DLT is a more recent alternative to DLT, offering yet higher capacity and transfer rate.

An alternative to these for high-capacity storage is AIT (Advanced Intelligent Tape), that also offers capacities into the hundreds of gigabytes and high transfer rates, as well as data compression. An interesting feature of AIT is the incorporation of a memory chip into the cassette, to store data such as a search map that enables information to be located without rewinding the tape to the directory at the start. The LTO Ultrium series of drives and cartridges, developed by HP, IBM and Seagate, has similarly high capacity and uses a 4KB cartridge memory chip that can communicate with the drive using a radio frequency transmission while the tape is not even inserted in the drive. This cartridge memory contains a file log and other user information.

5.2 Peripheral interfaces

A variety of different physical interfaces can be used for interconnecting storage devices and host workstations. Some are internal buses only designed to operate over limited lengths of cable and some are external interfaces that can be connected over several metres. The interfaces can be broadly divided into serial and parallel types, the serial types tending to be used for external connections owing to their size and ease of use. The disk interface can be slower than the drive attached to it in some cases, making it into a bottleneck in some applications. There is no point having a super fast disk drive if the interface cannot handle data at that rate.

5.2.1 SCSI

For many years the most commonly used interface for connecting mass storage media to host computers was SCSI (the Small Computer Systems Interface), pronounced ‘scuzzy’. It is still used quite widely for very high performance applications but EIDE interfaces and drives are now capable of very good performance that can be adequate for many purposes.

SCSI is a high-speed parallel interface found on many computer systems, originally allowing up to seven peripheral devices to be connected to a host on a single bus. Such peripheral devices include all forms of mass storage media, CD drives, scanners, printers and network ports. It is specified in ANSI X3.131 (1986). SCSI-2 can be both faster and wider than SCSI-1, allowing for higher speed data transfer (SCSI-1 interfaces were limited to speeds of around 4–5 Mbyte s–1, and were only 8 bits wide, whereas SCSI-2 can run at over 10 Mbyte s–1 and may be 16 or even 32 bits wide). SCSI has grown through a number of improvements and revisions, the latest being Ultra160 SCSI, capable of addressing 16 devices at a maximum data rate of 160 Mbyte s–1.

SCSI devices are connected in a ‘daisy-chain’ fashion, as shown in Figure 5.12. SCSI-1 devices have two 50-pin connectors for this purpose, although some computers like the Macintosh have a non-standard 25-pin D-type connector. SCSI-2 usually uses a higher density connector. SCSI devices all have a means of setting their address, either with a DIP switch, a rotary or push button switch, and this determines the address on which the device will respond. The highest numbered address has the highest priority on the bus and will be dealt with first, which helps when two devices conflict in attempting to access the bus. Normally the host computer has the highest address (ID7), leaving ID0 through ID6 for peripherals. A computer’s internal hard disk often uses ID0. It is important to ensure that all devices on the bus have different addresses, otherwise problems arise, although it is not necessary to assign SCSI IDs in sequence.


Figure 5.12 Interconnection of SCSI devices


Figure 5.13 Termination of a SCSI chain, showing use of an external terminator on the last device in the chain

The SCSI bus requires termination at both ends (one end is normally in the host computer or card and is not modifiable). This termination is a collection of resistors connected to each of the parallel lines that ensure the termination impedance of the bus is correct, in order that the data is not distorted by reflections or attenuated. Unterminated SCSI buses occasionally work, but it is not recommended. Termination can be either internal or external to the peripheral and it may be switchable or automatically sensed and controlled. Internal unswitchable termination is not advisable because it forces one to use the terminated device at the end of the SCSI chain (see Figure 5.13). It is particularly inconvenient if more than one SCSI device is to be connected, because the termination has to be physically removed from those devices in the middle of the chain (not always easy). External termination normally involves plugging a termination block into the daisy-chain connector of the last device in the chain. These can be easily purchased from computer stores. Automatic termination is useful because it means that the user does not need to think about which devices are in which positions on the bus – the device senses the impedance of the bus and terminates or not accordingly. Only the devices at each end of the bus should be terminated, not any of those in between.

‘The shorter the better’ is the motto when it comes to choosing cables. Data rates are very high on the SCSI bus and it is important to limit cable lengths to less than a metre where possible, otherwise errors will arise. Poor quality cables are the root of many problems encountered with SCSI buses and trouble-free operation depends on using high quality cables that are double-screened.

The most common problems to arise involve (a) computers failing to ‘see’ certain peripherals; (b) systems failing to boot up properly; (c) data errors resulting in erroneous file transfers; (d) system crashes and ‘glitches’. The following hints form a first-level troubleshooting guide:

•  Never connect or disconnect SCSI devices with power turned on.

•  Check that all devices have different addresses.

•  Check all cables and connectors for soundness.

•  Try swapping cables around or changing cables.

•  Try shorter cables.

•  Check termination and change if necessary.

•  Try putting devices in different physical positions in the chain.

•  Try changing the order of SCSI addresses.

•  Try powering up SCSI devices in a different order.

•  Try moving devices apart physically.

•  Ensure that the correct device drivers are installed on the host computer.

•  Run a SCSI diagnostic software tool which may point to the fault.

5.2.2 ATA/IDE interface

The ATA and IDE family of interfaces has evolved through the years as the primary internal interface for connecting disk drives to PC system buses. It is cheap and ubiquitous. Although drives with such interfaces were not considered adequate for audio purposes in the past, many people are now using them with the on-board audio processing of modern computers as they are cheap and the performance is adequate for many needs. It is a development of IBM’s Advanced Technology Attachment (ATA) interface and has gone through improvements such as EIDE (Enhanced IDE) which improved performance and allowed four devices to be attached instead of the two of basic IDE. IDE started off using direct cylinder and sector addressing of disk drives but recent versions use logical addressing because of the disk size limitations imposed by physical addressing. Recent flavours of this interface family include Ultra ATA/66 and Ultra ATA/100 that use a 40-pin, 80-conductor connector and deliver data rates up to either 66 or 100 Mbyte s–1. ATAPI (ATA Packet Interface) is a variant used for storage media such as CD drives.

Serial ATA is a relatively recent development designed to enable disk drives to be interfaced serially, thereby reducing the physical complexity of the interface. High data transfer rates are planned, eventually up to 600 Mbyte s–1. It is intended primarily for internal connection of disks within host workstations, rather than as an external interface like USB or Firewire.

5.2.3 PCMCIA

PCMCIA is a standard expansion port for notebook computers and other small-size computer products. A number of storage media and other peripherals are available in PCMCIA format, and these include flash memory cards, modem interfaces and super-small hard disk drives. The standard is of greatest use in portable and mobile applications where limited space is available for peripheral storage.

5.2.4 IEEE 1394 (Firewire) and USB

Firewire and USB are both serial interfaces for connecting external peripherals. They are covered in Chapter 4 and Chapter 6 in relation to their use for MIDI and audio information respectively, so they will not be described in detail here. It is sufficient to explain that they both enable disk drives to be connected in a very simple manner, with high transfer rates (many hundreds of megabits per second), although USB 1.0 devices are limited to 12 Mbit s–1. A key feature of these interfaces is that they can be ‘hot plugged’ (in other words devices can be connected and disconnected with the power on). The interfaces also supply basic power that enables some simple devices to be powered from the host device. Interconnection cables can usually be run up to between 5 and 10 metres, depending on the cable and the data rate.

5.3 Filing systems and volume partitions

So far only the physical structure and basic format of mass storage have been described. The way in which this raw storage space is used is another issue. There are a number of ways of organising the storage capacity of a disk drive which involve formatting it at a high level for a particular filing system, depending on the computer platform or other host device and its operating system. It is this that determines whether the files stored on a disk or tape will be accessible by the host computer, once interfaced correctly. If physical media are to be exchanged between systems, for example, then the filing system must be able to be handled by the host computer’s operating system. Driver or extension software can often be obtained to enable computers to read filing systems other than their own.

When a disk is formatted at a low level the sector headers are written and the bad blocks mapped out. A map is kept of the locations of bad blocks so that they may be avoided in subsequent storage operations. Low level formatting can take quite a long time as every block has to be addressed. During a high level format the disk may be subdivided into a number of ‘partitions’. Each of these partitions can behave as an entirely independent ‘volume’ of information, as if it were a separate disk drive (see Figure 5.14). It may even be possible to format each partition in a different way, such that a different filing system may be used for each partition. Each volume then has a directory created, which is an area of storage set aside to contain information about the contents of the disk. The directory indicates the locations of the files, their sizes, and various other vital statistics.

A number of audio workstation manufacturers developed their own filing systems that were optimised for speed and efficiency in real-time applications. In many cases this was the key to their success because it allowed them to obtain more simultaneous audio channels from a given disk than would otherwise have been possible. Now that disk drives have become cheap and fast, the need for special filing systems has become less important while compatibility has become more important because of the need to interchange data between systems. The most common general purpose filing systems in audio workstations are HFS (Hierarchical Filing System) or HFS+ (for Mac OS), FAT 32 (for Windows PCs) and NTFS (for Windows NT and 2000). The Unix operating system is used on some multi-user systems and high-powered workstations and also has its own filing system. These were not designed principally with real-time requirements such as audio and video replay in mind but they have the advantage that disks formatted for a widely used filing system will be more easily interchangeable than those using proprietary systems.


Figure 5.14 A disk may be divided up into a number of different partitions, each acting as an independent volume of information

5.4 Formatting, fragmentation and optimisation of media

The process of formatting a disk or tape erases all of the information in the volume. (It may not actually do this, but it rewrites the directory and volume map information to make it seem as if the disk is empty again.) Effectively the volume then becomes virgin territory again and data can be written anywhere.

When an erasable volume like a hard disk has been used for some time there will be a lot of files on the disk, and probably a lot of small spaces where old files have been erased. New files must be stored in the available space and this may involve splitting them up over the remaining smaller areas. This is known as disk fragmentation, and it seriously affects the overall performance of the drive. The reason is clear to see from Figure 5.15. More head seeks are required to access the blocks of a file than if they had been stored contiguously, and this slows down the average transfer rate considerably. It may come to a point where the drive is unable to supply data fast enough for the purpose.

There are only two solutions to this problem: one is to reformat the disk completely (which may be difficult, if one is in the middle of a project), the other is to optimise or consolidate the storage space. Various software utilities exist for this purpose, whose job is to consolidate all the little areas of free space into fewer larger areas. They do this by juggling the blocks of files between disk areas and temporary RAM – a process that often takes a number of hours. Power failure during such an optimisation process can result in total corruption of the drive, because the job is not completed and files may be only half moved, so it is advisable to back up the drive before doing this. It has been known for some such utilities to make the files unusable by some audio editing packages, because the software may have relied on certain files being in certain physical places, so it is wise to check first with the manufacturer.

5.5 Audio processing and synthesis hardware

5.5.1 Introduction

A lot of audio processing now takes place within the workstation, usually relying either on the host computer’s processing power (using the CPU to perform signal processing operations) or on one or more DSP (digital signal processing) cards attached to the workstation’s expansion bus. Professional systems usually use external A/D and D/A convertors, connected to a ‘core’ card attached to the computer’s expansion bus. This is because it is often difficult to obtain the highest technical performance from convertors mounted on internal sounds cards, owing to the relatively ‘noisy’ electrical environment inside most computers. Furthermore, the number of channels required may not fit onto an internal card. As more and more audio work takes place entirely in the digital domain, though, the need for analog convertors decreases. Digital interfaces (see Chapter 6) are also often provided on external ‘breakout boxes’, partly for convenience and partly because of physical size of the connectors. Compact connectors such as the optical connector used for the ADAT 8-channel interface or the 2-channel SPDIF phono connector are accommodated on some cards, but multiple AES/EBU connectors cannot be.


Figure 5.15 At (a) a file is stored in three contiguous blocks and these can be read sequentially without moving the head. At (b) the file is fragmented and is distributed over three remote blocks, involving movement of the head to read it. The latter read operation will take more time

It is also becoming increasingly common for substantial audio processing power to exist on integrated sound cards that contain digital interfaces and possibly A/D and D/A convertors. These cards are typically used for consumer or semi-professional applications on desktop computers, although many now have very impressive features and can be used for advanced operations. Such cards are now available in ‘full duplex’ configurations that enable audio to be received by the card from the outside world, processed and/or stored, then routed back to an external device. Full duplex operation usually allows recording and replay simultaneously.

Sound cards and DSP cards are commonly connected to the workstation using the PCI (peripheral component interface) expansion bus. Older ISA (PC) buses or NuBus (Mac) slots did not have the same data throughput capabilities and performance was therefore somewhat limited. PCI can be extended to an external expansion chassis that enables a larger number of cards to be connected than allowed for within the host computer.

Sufficient processing power can now be installed for the workstation to become the audio processing ‘heart’ of a larger studio system, as opposed to using an external mixing console and effects units. The higher the sampling frequency, the more DSP operations will be required per second, so it is worth bearing in mind that going up to, say, 96 kHz sampling frequency for a project will require double the processing power and twice the storage space of 48 kHz. The same is true of increasing the number of channels to which processing is applied.

5.5.2 Audio processing latency

Latency is the delay incurred in executing audio operations between input and output of a system. The lower the better is the rule, particularly when operating systems in ‘full duplex’ mode, because processed sound may be routed back to musicians (for foldback purposes) or may be combined with undelayed sound at some point. The management of latency is a software issue and some systems have sophisticated approaches to ensuring that all supposedly synchronous audio reaches the output at the same time no matter what processing it has encountered on the way.

Minimum latency achievable is both a hardware and a software issue. The poorest systems can give rise to tens or even hundreds of milliseconds between input and output whereas the best reduce this to a few milliseconds. Audio I/O that connects directly to an audio processing card can help to reduce latency, otherwise the communication required between host and various cards can add to the delay. Some real-time audio processing software also implements special routines to minimise and manage critical delays and this is often what distinguishes professional systems from cheaper ones. The audio driver software or ‘middleware’ that communicates between applications and sound cards influences latency considerably. One example of such middleware intended for low latency audio signal routing in computers is Steinberg’s ASIO (Audio Stream Input Output), discussed further in Section 5.9.

5.5.3 DSP cards

DSP cards can be added to widely used workstation packages such as Digidesign’s ProTools. These so-called ‘DSP Farms’ or ‘Mix Farms’ are expansion cards that connect to the PCI bus of the workstation and take on much of the ‘number crunching’ work involved in effects processing and mixing. ‘Plug-in’ processing software is becoming an extremely popular and cost-effective way of implementing effects processing within the workstation, and this is discussed further in Chapter 7. ProTools plug-ins usually rely either on DSP Farms or on host-based processing (see Section 5.5.4) to handle this load.

Digidesign’s TDM (Time Division Multiplex) architecture is a useful example of the way in which audio processing can be handled within the workstation. Here the processing tasks are shared between DSP cards, each card being able to handle a certain number of operations per second. If the user runs out of ‘horse power’ it is possible to add further DSP cards to share the load. Audio is routed and mixed at 24-bit resolution, and a common audio bus links the cards that are connected on a separate multiway ribbon cable.

5.5.4 Host-based audio processing

An alternative to using dedicated DSP cards is to use the now substantial processing capacity of a typical desktop workstation. The success of such ‘host-based processing’ obviously depends on the number of tasks that the workstation is required to undertake and this capacity may vary with time and context. It is however quite possible to use the host’s own CPU to run DSP ‘plug-ins’ for implementing equalisation, mixing and limited effects, provided it is fast enough.

The software architecture required to run plug-in operations on the host CPU is naturally slightly different to that used on dedicated DSP cards, so it is usually necessary to specify whether the plug-in is to run on the host or on a dedicated resource such as Digidesign’s TDM cards. A number of applications are now appearing, however, that enable the integration of host-based (or ‘native’) plug-ins and dedicated DSP such as TDM-bus cards. Audio processing that runs on the host may be subject to greater latency (input to output delay) than when using dedicated signal processing, and it obviously takes up processing power that could be used for running the user interface or other software. It is nonetheless a cost-effective option for many users that do not have high expectations of a system and it may be possible to expand the system to include dedicated DSP in the future.

5.5.5 Integrated sound cards

Integrated sound cards typically contain all the components necessary to handle audio for basic purposes within a desktop computer and may be able to operate in full duplex mode (in and out at the same time). They typically incorporate convertors, DSP, a digital interface, FM and/or wavetable synthesis engines. Optionally, they may also include some sort of I/O daughter board that can be connected to a break-out audio interface, increasing the number of possible connectors and the options for external analog conversion. Such cards also tend to sport MIDI/joystick interfaces. A typical example of this type of card is the ‘SoundBlaster’ series from Creative Labs.

Any analog audio connections are normally unbalanced and the convertors may be of only limited quality compared with the best external devices. For professional purposes it is advisable to use high quality external convertors and balanced analog audio connections.

5.5.6 Synthesis engines on sound cards

The two main approaches to synthetic sound generation on PC sound cards are FM and wavetable synthesis. In FM synthesis, as pioneered by John Chowning and developed by Yamaha, the frequency of one oscillator (or ‘operator’) is modulated by another oscillator or chain of oscillators, as shown in Figure 5.16. The result of this frequency modulation (FM) is the creation of a complex set of sidebands or spectral components around the fundamental or ‘carrier’ frequency of the last oscillator in the chain, as exemplified in Figure 5.17. Quite rich timbres can be created with only a few oscillators/operators, although advanced FM synthesisers often use up to six operators per voice. The operators can be arranged in different ways, either in a chain with each modulating the next, or partly in parallel with each sub-chain contributing a particular component of the voice. These configurations are called ‘algorithms’, and some examples are shown in Figure 5.18. Each operator can be affected by an envelope generator which controls the way in which the amplitude of the output changes with time, and a simple envelope has four stages, as shown in Figure 5.19.


Figure 5.16 In FM synthesis one operator (the equivalent of an oscillator) frequency modulates another so as to alter its output spectrum. Each operator has its own envelope generator which affects how the output level of the operator changes with time


Figure 5.17 The result of FM is the creation of a sideband pattern around the ‘carrier’ oscillator frequency (fc). The amplitudes and frequencies of these sidebands depend on the amplitude and frequency of the modulating signal (fm). Sidebands are spaced apart by the frequency of the modulating signal, and a higher modulator amplitude generally creates more sidebands (a richer timbre). Sidebands which fall into the negative frequency range (below 0 Hz) are folded back into the positive range, some with phase reversal

Although FM is a very flexible way of producing new synthesised sounds it is not always easy to predict or program, so as to produce a particular desired output. Wavetable synthesis is more predictable in this respect, since it involves the storage of short portions of sampled sound waves in memory (the wave table is basically the series of memory addresses containing the discrete sample values of the stored wave segment). During replay, the wave samples are read out of memory in various ways, very similar to the replay of ordinary digital audio recordings, except that the pitch of the stored sound is varied by skipping samples in order to change the period of the replayed sound. Using variable rate replay and digital filtering a simple stored wave segment can be transformed in both pitch and timbre. A technique known as looping is used to allow quite short stored wave segments to be lengthened or sustained on replay by repeating one section of the stored wave over and over. There is a clear trade off here between the shortness of the looped segment (which conserves memory) and the quality of the instrumental sound produced. As with other forms of synthesis, envelope generators are used to alter the characteristics of the output over the duration of each note, and often separate wavetables are used for attack and sustain portions of a note.


Figure 5.18 Some examples of FM synthesis algorithms. (a) Operators in parallel give the equivalent of additive synthesis (not really FM). (b) Operators in series produce very complex and unpredictable timbres. (c) A combination of series and parallel operators can be used so that different components of the sound can be handled by different parts of the algorithm


Figure 5.19 A typical envelope generator has four stages. Attack (A), Decay (D), Sustain (S) and Release (R). The rates and maximum values of each stage of the envelope can be set independently

Wavetables stored in ROM are inflexible because only a standard sound set is available to the multimedia sound designer. Increasingly common is for sound cards to contain an area of working RAM, in addition to standard sounds on ROM, which can be used for the storage of temporary or custom wavetables, now possible through the use of downloadable sound (DLS) as described in Chapter 4. In this way the sound designer can ensure that the sounds he wants to use are available when his product is played, by arranging for the appropriate audio samples to be uploaded into the sound card RAM before the action begins. New sounds could be loaded during the course of a game or other multimedia production.

5.6 External synchronisation interfaces

It is quite rare to find a means of externally synchronising the audio sampling rate of a basic sound card or computer audio system. Silicon Graphics included this as one of the digital audio options on its Unix workstations, but the average PC sound card can only be internally clocked. More sophisticated sound cards or additional hardware is usually needed to allow workstations to be externally synchronised. For example Digidesign’s ‘Sync I/O’ interface enables ProTools to be connected to a range of sync sources and destinations. Creamware also has a ‘Sync Plate’ card that accompanies its SCOPE Fusion hardware and software, carrying word clock and an ADAT sync interface.

The minimum requirement for external synchronisation is usually a word clock input and output, being a BNC connector carrying a square wave signal at the sampling frequency. More advanced systems will also provide composite video sync interfaces and possibly a SMPTE/EBU timecode interface for locking systems to sources of external timecode (such as from an audio tape machine or video tape recorder).

5.7 User interfaces

Screen-based graphical user interfaces (GUIs) have been the norm in most low-cost audio workstation packages for many years because they are cheap and can display a lot of information. A user would typically interact with such an interface using a mouse and a QWERTY keyboard, which tends to limit the flexibility and controllability for complicated mixes. Dedicated audio workstations, however, have typically used a combination of screen display and physical controls. Dedicated systems are typically more expensive than PC- or Mac-based systems and have lost ground commercially as the power of desktop workstations has increased and the cost fallen. The versatility of plug-in architectures and standard computing platforms has proven very popular. However, as the power of desktop systems has increased the difficulty of controlling all the functions has also increased, as has the number of channels, leading to a need for dedicated control interfaces again. Furthermore, now that the audio workstation has the processing power to act as the mixing console and effects processor in a studio, some users are doing away with an external mixer altogether and attaching a sophisticated control interface to the workstation.

Examples of external control interfaces from Digidesign are shown in Figure 5.20. Devices such as these can range from low-cost devices to substantial control surfaces with transport control, editing control, metering and multiple moving faders.

5.8 Serial control interfaces

A variety of external serial interfaces can be provided to enable interconnection with control devices and data networks. Networking is discussed in Chapter 6. RS-232, RS-422 and MIDI are discussed in this section.

5.8.1 RS-232 and RS-422

RS-232 and RS-422 should be mentioned here as they are serial data interfaces still used quite a lot for controlling external equipment. RS-232 normally terminates on a 25-way D-type connector and carries receive and transmit data lines as well as all sorts of control and housekeeping lines for managing the flow of data between devices such as computers and modems. Data transmission is unbalanced. RS-422 normally terminates in a 9-pin D-type connector and can transfer data at a number of rates using a standard asynchronous serial communication protocol. The electrical interface is normally balanced and the collection of control and housekeeping lines that accompany RS-232 are drastically reduced to ‘request to send (RTS)’ and ‘clear to send (CTS)’ handshaking. A widely used protocol on this interface is ‘Sony 9-pin’ which is used a lot for controlling video equipment at a data rate of 38.4 kbit s–1. The ES Bus was also developed as a universal remote control standard over RS-422, but it did not catch on as expected and does not seem to be particularly widely used.

5.8.2 The basic MIDI interface

The MIDI standard specifies a unidirectional serial interface running at 31.25 kbit s–1±1%. The rate was defined at a time when the clock speeds of microprocessors were typically much slower than they are today, this rate being a convenient division of the typical 1 or 2 MHz master clock rate. The rate had to be slow enough to be carried without excessive losses over simple cables and interface hardware, but fast enough to allow musical information to be transferred from one instrument to another without noticeable delays.

The hardware interface is shown in Figure 5.21. Most equipment using MIDI has three interface connectors: IN, OUT, and THRU. The OUT connector carries data that the device itself has generated. The IN connector receives data from other devices and the THRU connector is a direct throughput of the data that is present at the IN. As can be seen from the hardware interface diagram, it is simply a buffered feed of the input data, and it has not been processed in any way. A few cheaper devices do not have THRU connectors, but it is possible to obtain ‘MIDI THRU boxes’ which provide a number of ‘THRUs’ from one input. Occasionally, devices without a THRU socket allow the OUT socket to be switched between OUT and THRU functions. A5 mA current loop is created between a MIDI OUT or THRU and a MIDI IN, when connected with the appropriate cable, and data bits are signalled by the turning on and off of this current by the sending device. This principle is shown in Figure 5.22.


Figure 5.20 Examples of two external control interfaces from Digidesign. (a) ProControl. (b) Control 24


Figure 5.21 MIDI electrical interface showing IN, OUT and THRU ports


Figure 5.22 A current loop is formed between the OUT of the transmitter and the IN of the receiver when a MIDI cable is connected. The LED in the receiver’s opto-isolator is turned on and off according to current flow

The interface incorporates an opto-isolator between the MIDI IN (that is the receiving socket) and the device’s microprocessor system. This is to ensure that there is no direct electrical link between devices and helps to reduce the effects of any problems that might occur if one instrument in a system were to develop an electrical fault. An opto-isolator is an encapsulated device in which a light-emitting diode (LED) can be turned on or off depending on the voltage applied across its terminals, illuminating a photo-transistor which consequently conducts or not, depending on the state of the LED. Thus the data is transferred optically, rather than electrically. In the MIDI specification, the opto-isolator is defined as having a rise time of no more than 2 µs. The rise time affects the speed with which the device reacts to a change in its input and if slow will tend to distort the leading edge of data bit cells, as shown in Figure 5.23. The same also applies in practice to fall times.


Figure 5.23 The edges of a square pulse subjected to rise-time distortion

Rise-time distortion results in timing instability of the data, since it alters the time at which a data edge crosses the decision point between one and zero. If the rise time is excessively slow the data value may be corrupted since the output of the device will not have risen to its full value before the next data bit arrives. If a large number of MIDI devices are wired in series (that is from THRU to IN a number of times) the data will be forced to pass through a number of opto-isolators and thus will suffer the combined effects of a number of stages of rise-time distortion. Whether or not this will be sufficient to result in data detection errors at the final receiver will depend to some extent on the quality of the opto-isolators concerned, as well as on other losses that the signal may have suffered on its travels. It follows that the better the specification of the opto-isolator, the more stages of device cascading will be possible before unacceptable distortion is introduced.

The delay in data passed between IN and THRU is only a matter of microseconds, so this contributes little to any audible delays perceived in the musical outputs of some instruments in a large system. The bulk of any perceived delay will be due to other factors like processing delay, buffer delays and traffic.

5.8.3 MIDI connectors and cables

The connectors used for MIDI interfaces are five-pin DIN types. The specification also allows for the use of XLR-type connectors (such as those used for balanced audio signals in professional equipment), but these are rarely encountered in practice. Only three of the pins of a five-pin DIN plug are actually used in most equipment (the three innermost pins). In the cable, pin 5 at one end should be connected to pin 5 at the other, and likewise pin 4 to pin 4, and pin 2 to pin 2. Unless any hi-fi DIN cables to be used follow this convention they will not work. Professional microphone cable terminated in DIN connectors may be used as a higher-quality solution, because domestic cables will not always be a shielded twisted-pair and thus are more susceptible to external interference, as well as radiating more themselves which could interfere with adjacent audio signals.

The cable should be a shielded twisted pair with the shield connected to pin 2 of the connector at both ends, although within the receiver itself, as can be seen from Figure 5.21, the MIDI IN does not have pin 2 connected to earth. This is to avoid earth loops and makes it possible to use a cable either way round. (If two devices are connected together whose earths are at slightly different potentials, a current is caused to flow down any earth wire connecting them. This can induce interference into the data wires, possibly corrupting the data, and can also result in interference such as hum on audio circuits. It is recommended that no more than 15 m of cable is used for a single cable run in a simple MIDI system and investigation of typical cables indicates that corruption of data does indeed ensue after longer distances, although this is gradual and depends on the electromagnetic interference conditions, the quality of cable and the equipment in use. Longer distances may be accommodated with the use of buffer or ‘booster’ boxes that compensate for some of the cable losses and retransmit the data. It is also possible to extend a MIDI system by using a data network with an appropriate interface.

5.8.4 Interfacing a computer to a MIDI system

In order to use a workstation as a central controller for a MIDI system it must have at least one MIDI interface, consisting of at least an IN and an OUT port. (THRU is not strictly necessary in most cases.) Unless the computer has a built-in interface, as found on the old Atari machines, some form of third-party hardware interface must be added and there are many ranging from simple single ports to complex multiple port products.

A typical single port MIDI interface can be connected either to one of the spare I/O ports of the computer (a serial or USB port, for example), or can be installed as an expansion slot card (perhaps as part of an integrated sound card). Depending on which port it is connected to, some processing may be required within the MIDI interface to convert the MIDI data stream to and from the relevant interface protocol. PCs have serial interfaces that will operate at a high enough data rate for MIDI, but are not normally able to operate at precisely the 31.25 kbaud required. Nonetheless, there are a few external interfaces available which connect to the PC’s serial port and transpose a higher serial data rate (often 38.4 kbaud) down to the MIDI rate using intermediate buffering and flow control. Some PCs and sound cards also have the so-called ‘MIDI/joystick port’ that conforms to the old Roland MPU-401 interface standard. Adaptor cables are available that provide MIDI IN and OUT connectors from this port. Some older PC interfaces also attach to the parallel port. The majority of recent MIDI interfaces are connected either to USB or Firewire ports of host workstations.

Multiport interfaces have become widely used in MIDI systems where more than 16 MIDI channels are required, and they are also useful as a means of limiting the amount of data sent or received through any one MIDI port. (A single port can become ‘overloaded’ with MIDI data if serving a large number of devices, resulting in data delays.) Multiport interfaces are normally more than just a parallel distribution of a single MIDI data stream, typically handling a number of independent MIDI data streams that can be separately addressed by the operating system drivers or sequencer software.

Recent interfaces are typically connected to the host workstations using USB or Firewire. On older Mac systems interconnection was handled over one or two RS-422 ports while an expansion card, RS-232 connection or parallel I/O, was normally used on the PC. The principle of such approaches is that data is transferred between the computer and the multiport interface at a higher speed than the normal MIDI rate, requiring the interface’s CPU to distribute the MIDI data between the output ports as appropriate, and transmit it at the normal MIDI rate. As described in Chapter 4, USB and Firewire MIDI protocols allow a particular stream or ‘cable’ to be identified so that each stream controlling 16 MIDI channels can be routed to a particular physical port or instrument.


Figure 5.24 Front and back panels of the Emagic Unitor 8 interface, showing USB port, RS-422 port, RS-232 port, LTC and VITC ports and multiple MIDI ports

Emagic’s Unitor 8 interface is pictured in Figure 5.24. It has RS-232 and RS-422 serial ports as well as a USB port to link with the host workstation. There are eight MIDI ports with two on the front panel for easy connection of ‘guest’ devices or controllers that are not installed at the back. This device also has VITC and LTC timecode ports in order that synchronisation information can be relayed to and from the computer. A multi-device MIDI system is pictured in Figure 5.25, showing a number of multi-timbral sound generators connected to separate MIDI ports and a timecode connection to an external video tape recorder for use in synchronised post-production. As more of these functions are now being provided within the workstation (e.g. synthesis, video, mixing) the number of devices connected in this way will reduce.

5.9 Drivers and audio I/O software

Most audio and MIDI hardware requires ‘driver’ software of some sort to enable the operating system (OS) to ‘see’ the hardware and use it correctly. There are also sound manager or multimedia extensions that form part of the operating system of the workstation in question, designed to route audio to and from hardware in the absence of dedicated solutions. This topic crosses the boundary into software and is discussed further in Chapter 7, but basic audio and MIDI I/O extensions will be described here. (It is different from the topic of plug-in architecture which is also discussed in Chapter 7.)

The standard multimedia extensions of the OS that basic audio software used in older systems to communicate with sound cards could result in high latency and might also be limited to only two channels and 48 kHz sampling frequency. Dedicated low-latency approaches were therefore developed as an alternative, allowing higher sampling frequencies, full audio resolution, sample-accurate synchronisation and multiple channels. Examples of these are Steinberg’s ASIO (Audio Stream Input Output) and Emagic’s EASI. These are software extensions behaving as ‘hardware abstraction layers’ (HALs) that replace the OS standard sound manager and enable applications to communicate more effectively with I/O hardware. ASIO, for example, handles a range of sampling frequencies and bit depths, as well as multiple channel I/O, and many sound cards and applications are ASIO-compatible.


Figure 5.25 A typical multi-machine MIDI system interfaced to a computer via a multiport interface connected by a high-speed link (e.g. USB)

As high-quality audio begins to feature more prominently in general purpose desktop computers, audio architectures and OS audio provision improve to keep step. OS native audio provision may now take the place of what third-party extensions have provided in the past. For example, Apple’s OS X Core Audio standard is designed to provide a low-latency HAL between applications and audio hardware, enabling multichannel audio data to be communicated to and from sound cards and external interfaces such as USB and Firewire. Core Audio handles audio in 32-bit floating-point form for high-resolution signal processing, as well as enabling sample accurate timing information to be communicated alongside audio data. Microsoft has also done something similar for Windows systems, with the Windows Driver Model (WDM) audio drivers that also include options for multichannel audio, high resolutions and sampling frequencies. DirectSound is the Microsoft equivalent of Apple’s OS X Core Audio.

Core MIDI and DirectMusic do a similar thing for MIDI data in recent systems. Whereas previously it would have been necessary to install a third-party MIDI HAL such as OMS (Opcode’s Open Music System) or MIDI Manager to route MIDI data to and from multiport interfaces and applications, these features are now included within the operating system’s multimedia extensions.

Useful websites

The Optical Storage Technology Association: www.osta.org

DVD Forum: www.dvdforum.com

DVD+RW Alliance: www.dvdrw.com

WDM Audio: www.microsoft.com/hwdev/tech/audio/wdmaudio.asp

OS X Core Audio: http://developer.apple.com/audio/coreaudio.html

