Chapter 6

Surround Sound1

Francis Rumsey

Surround sound is a ‘catch-all’ term often used to describe any form of loudspeaker sound reproduction that involves more than two loudspeakers and attempts to surround the listener with sounds from multiple directions. It started as a marketing term that aimed to imply something more spatially interesting than 2-channel stereo. These days the term may be rather dated, as it has to some extent been superseded or replaced by more generic terms such as spatial audio, or trumped by systems termed 3D or immersive. It is still useful, nonetheless, to describe a set of entertainment audio systems and techniques that involve more than two loudspeakers arranged around the listener, but only in the horizontal plane. In this chapter, therefore, the discussion will be restricted to approaches based on conventional stereophonic principles that range from three channels upward, but don’t involve height information.

Surround sound loudspeaker layouts and production techniques grew out of the methods used in 2-channel stereophony, based on ideas arising from Bell Labs, Blumlein and others in the 1930s, as described in Chapter 3. Spatialization of sound images in basic stereophony was achieved by the introduction of simple time and/or level differences between channels, created out of the relationships between microphone outputs or by simple panpots. The aim was one of creating an adequate spatial illusion. As more loudspeakers were added to create ‘surround sound’ these techniques were extended in an attempt to make them work for more loudspeakers, but often still only considering the relationships between pairs of channels, or at most three at a time. As the number of loudspeakers grew from four up to more than ten, the challenges of deciding how to divide the signal between the loudspeakers to create a successful spatial illusion grew, leading to an increasing dichotomy between those systems that were based on some underlying mathematical model of an acoustical sound field and those that simply kept adding loudspeakers and attempting to apply basic stereophonic principles. The former could be considered as the scientist’s conception of the way to do surround sound, and is characterized by those methods of sound field reconstruction such as sound field and wave field synthesis. These approaches are covered in Chapters 9 and 10. This chapter is therefore mainly dedicated to the latter group of systems (which could be thought of as the recording engineer’s idea of the way to do surround sound) and deals with systems that mainly, but not exclusively, grew out of cinema sound formats. A modern nomenclature was developed that describes formats as ‘n-m stereo (or surround)’, where n represents the number of front channels and m the number of surround channels, more about which is said later on in the chapter.

The Evolution of Surround Sound

Cinema Systems

Early work on directional reproduction at Bell Labs in the 1930s involved attempts to approximate the sound wavefront that would result from an infinite number of microphone/loudspeaker channels by using a smaller number of channels, as shown in Figure 6.1. Spaced pressure (omnidirectional) microphones were used, each connected by a single amplifier to the appropriate loudspeaker in a listening room. Steinberg and Snow (1934) found that three channels gave quite convincing results, and that when reducing the number of channels from three to two, central sources appeared to recede towards the rear of the stage and that the width of the reproduced sound stage appeared to be increased. In fact, as Snow later explained, the situations shown in the two diagrams are in fact rather different because the small number of channels do not really recreate the original source wavefront, but depend upon the precedence effect for success. This could be seen as an early break between the concept of sound field synthesis and basic stereophony.

Steinberg and Snow’s work was principally intended for large auditorium sound reproduction with wide screen pictures, rather than small rooms or consumer equipment, and in fact most of the early development of surround sound was driven by the need to attract audiences to the cinema. Three front channels were the norm in cinema sound reproduction for many years, partly because of the wide range of seating positions and size of the image. The center channel had the effect of stabilizing the important central image (where the dialog was located) for off-center listeners, and was used increasingly since the Disney film Fantasia in 1939.

The Fantasound system used multiple operators to do the mixing, and eventually a pilot-tone-based control track alongside the film to manipulate the panning of three sound tracks to a number of loudspeakers around the auditorium. It was an expensive historical experiment into surround sound, in many ways well before its time, and one version even had a ‘voice of God’ loudspeaker mounted on the ceiling above the listeners, possibly the first example of immersive audio in the business. The development and implications were well described in an article in the SMPTE Motion Imaging Journal (Garity & Hawkins, 1941, p. 127), where some critically important observations were made that still have application for entertainment audio today. “Therefore,” they said

we must take large steps forward, rather than small ones, if we are to inveigle the public away from softball games, bowling alleys, nightspots, or rapidly improving radio reproduction. The public has to hear the difference and then be thrilled by it, if our efforts toward the improvement of sound-picture quality are to be reflected at the box-office. Improvements perceptible only through direct A-B comparisons have little box-office value. While dialog is intelligible and music is satisfactory, no one can claim that we have even approached perfect simulation of concert hall or live entertainment. It might be emphasized that perfect simulation of live entertainment is not our objective. Motion picture entertainment can evolve far beyond the inherent limitations of live entertainment.

In saying this they made the crucial point that the purpose of much entertainment audio is just that—entertainment—and does not have to emulate or recreate natural environments. It can be hyper-real. Fantasound, however, did not last beyond the early war years as it was too time-consuming and costly to set up.

Apart from the unusual stereo effects used in Fantasia, cinema sound did not incorporate stereo reproduction until the 1950s. Stereo film sound tracks often employed dialog panned to match the visual scene elements, which was a laborious and time-consuming process. This technique gradually died out in favor of central dialog, accompanied by stereo music and sound effects. During the 1950s Warner Brothers introduced a large screen format with three front channels and a single surround channel, and the 20th Century Fox Cinemascope format also used a similar arrangement. Cinerama used seven discrete channels of surround sound to accompany wide screen pictures using three projectors, prefiguring the 7.1-channel systems of later years with its five front screen channels and stereo surrounds.

Multichannel stereo formats for the cinema became increasingly popular in the late 1950s and 1960s, culminating in the so-called baby boomer 70 mm format involving multiple front channels, a surround channel and a subwoofer channel to accompany high-quality, wide screen cinema productions. In the early 1970s, Dolby’s introduction of Dolby Stereo enabled a 4-channel surround sound signal to be matrix encoded into two optical sound tracks recorded on the same 35mm film as the picture. It was later released in a consumer form called Dolby Surround for home cinema applications. The main problem with analog matrix formats was the difficulty of maintaining adequate channel separation, requiring sophisticated ‘steering’ circuits in the decoder to direct dominant signal components to the appropriate loudspeakers.

In the 1990s cinema surround sound moved to all-digital sound tracks that typically incorporated either five or seven discrete channels of surround sound plus a sub-bass effects channel. A variety of commercial digital low-bit-rate coding schemes were used to deliver surround sound signals with movie films, such as Dolby Digital, Sony SDDS and Digital Theatre Systems (DTS).

Ambiophony and Similar Techniques

Although surround sound did not appear to be commercially feasible for consumer music reproduction applications during the late 1950s and early 1960s, a number of researchers were experimenting at the time with methods for augmenting conventional reproduction by radiating reverberation signals from separate loudspeakers. This is an interesting precursor of the modern approach that tends to recommend the use of surround channels for the augmentation of conventional frontal stereo with ambience or effects signals. One of the most developed examples in this respect was the ‘Ambiophonic’ concept developed by Keibs and colleagues in 1960, nicely summarized in a paper by Steinke (1996) and later developed by others (e.g., Glasgal, 1995).

Quadraphonic Sound

Quadraphonic sound represents an attempt to introduce surround sound to the consumer in the 1970s that ultimately failed from a commercial point of view. It can be thought of as a 2–2 surround system in terms of modern nomenclature introduced below. A variety of competing encoding methods, having different degrees of compatibility with each other and with 2-channel stereo, were used to convey four channels of surround sound on 2-channel analogue media such as vinyl LPs (so-called 4–2–4 matrix systems). Unlike Dolby Stereo, quadraphonic sound used no center channel, but was normally configured for a square arrangement of loudspeakers, two at the front and two behind the listener. The 90° angle of the front loudspeakers proved problematic because of lack of compatibility with ideal 2-channel reproduction, and gave poor front images, often with a ‘hole in the middle’. A review of some of the issues can be found in Scheiber (1971).

While a number of LP records were issued in various ‘quad’ formats, the approach failed to capture a sufficiently large part of the consumer imagination to succeed. It seemed that people were unwilling to install the additional loudspeakers required, and there were too many alternative forms of quad encoding for a clear standard to emerge. Also, many people felt that quad encoding compromised the integrity of 2-channel stereo listening (the matrix encoding of the rear channels was supposed to be 2-channel compatible but unwanted side effects could often be heard). Although some efforts were made to release 4-channel recordings on magnetic tape, and some 4-track tape recorders were made for the consumer market, the popularity of magnetic tape as a consumer medium was not sufficient to carry this forward.

The Home Cinema and 5.1-Channel Surround Sound

During the 1990s the development of new consumer audio formats such as DVD, the ‘home cinema’, and digital sound formats for cinema and broadcasting, gave a new commercial impetus to surround sound. The ITU 5.1-channel configuration became widely adopted for broadcasting and recording applications, as described below, and the discrete channel delivery possibilities of digital transmission and storage formats helped to avoid the former problems of matrix encoding. As will be explained later, this ITU standard did not define anything about the way that sound signals should be represented or coded for surround sound; it simply stated the layout of the loudspeakers. Most other things were open, and there was no ‘correct’ method of sound field representation or spatial encoding for this standard.

Surround Sound Formats

The principal loudspeaker formats for surround sound that have featured in both professional and consumer environments since the 1990s will be described in this section. Surround sound standards of the type dealt with in this chapter often specify little more than the channel configuration and the way the loudspeakers should be arranged. This leaves the business of how to create or represent a spatial sound field entirely up to the user. Then there is the separate question of how to matrix or encode the channels for delivery to the end user, which is often the domain of commercial technology, and that is introduced in a subsequent section.

In international standards describing stereo loudspeaker configurations, such as ITU-R BS.775-3 (2012), the nomenclature for the configuration is often in the form ‘n-m stereo’, where n is the number of front channels and m is the number of rear or side channels (the latter only being encountered in surround systems). This distinction can be helpful as it reinforces the slightly different role of the surround channels. It was one of the underlying principles of those who designed these layouts that the front channels would fulfill a different role to the rear ones, rather than all being equal. The front left and right channels are often in positions that are compatible with 2-channel stereo, which makes them quite narrowly spaced, then there can be a large gap at the sides where imaging is difficult, followed by rear loudspeakers that are primarily intended for delivering effects and ambience. There is no explicit aim to deliver full 360° imaging with equal accuracy in all directions, although many have ignored this fact and attempted to do just that, suffering from the inevitable problems. Some non-standard approaches have then been adopted for consumer music applications that widen the spacing of the front loudspeakers and fill in the gaps at the sides with additional loudspeakers, and then use sophisticated decoding techniques to render content more effectively.

Another common nomenclature is ‘something point something’ surround, for example, 5.1 surround, which makes no distinction between front and rear channels but highlights the number of main channels and the number of low frequency effects (LFE) channels that go with them. The ‘point one’ channel in this case is an LFE channel, and relates mainly to cinema systems where these are commonly used for ground-shaking effects, explosions and the like.

Three-Channel (3–0) Stereo

3–0 stereo forms the basis of the front layout of a lot of surround sound systems. It requires the use of a left (L), center (C) and right (R) channel, the loudspeakers arranged equidistantly across the front sound stage, as shown in Figure 6.2. It has some precedents in historical development, in that the stereophonic system developed by Steinberg and Snow in the 1930s used three channels, as mentioned earlier. Three front channels have also been commonplace in cinema stereo systems, mainly because of the need to cover a wide listening area and because wide screens tend to result in a large distance between left and right loudspeakers. Two channels only became the norm in consumer systems for reasons of economy and convenience, and particularly because it was much more straightforward to cut two channels onto an analog disk than three.

There are various advantages of 3–0 stereo. First, it allows for a somewhat wider front sound stage than 2-channel stereo, if desired, because the center channel acts to ‘anchor’ the central image and the left and right loudspeakers can be placed further out to the sides (say ± 45°). (Note, though, that in the ITU-R standard the L and R loudspeakers are in fact placed at ± 30°, for compatibility with 2-channel stereo material.) Second, the center loudspeaker enables a wider range of listening positions in many cases, as the image does not collapse quite as readily into the nearest loudspeaker. It also anchors dialog more clearly in the middle of the screen in sound-for-picture applications. Third, the center image does not suffer the same timbral modification as the center image in 2-channel stereo, because it emanates from a real source.

A practical problem with it is that the center loudspeaker position is often very inconvenient. Although in cinema reproduction it can be behind an acoustically transparent screen, in consumer environments, studios and television environments it is almost always just where one wants a television monitor or a window. Consequently the center channel has to be mounted above or below the object in question, and possibly made smaller than the other loudspeakers.

Four-Channel Surround (3–1 Stereo)

The form of stereo called ‘3–1 stereo’ in some international standards, or ‘LCRS surround’ in some other circles, will briefly be described. Proprietary encoding and decoding technology relating to this format is described later. Although it uses four channels, it is different from quadraphonics because the loudspeakers are not at 90° intervals, but are configured as three front channels plus one surround.

In the 3–1 approach, an additional ‘effects’ channel or ‘surround’ channel is added to the three front channels, routed to a loudspeaker or loudspeakers located behind (and possibly to the sides) of listeners. It was developed first for cinema applications, enabling a greater degree of audience involvement in the viewing/listening experience by providing a channel for ‘wrap-around’ effects. This development is attributed to 20th Century Fox in the 1950s, along with wide screen Cinemascope viewing, being intended to offer effective competition to the new television entertainment.

There is no specific intention in 3–1 stereo to use the effects channel as a means of enabling 360° image localization. In any case, this would be virtually impossible with most configurations as there is only a single audio channel feeding a larger number of surround loudspeakers, effectively in mono.

Figure 6.3 shows the typical loudspeaker configuration for this format. In the cinema there are usually a large number of surround loudspeakers fed from the single surround channel, in order to cover a wide audience area. This gives rise to a relatively diffuse or distributed reproduction of the effects signal. The surround speakers are sometimes electronically decorrelated to increase the degree of spaciousness or diffuseness of surround effects, in order that they are not specifically localized to the nearest loudspeaker or perceived inside the head.

In consumer systems reproducing 3–1 stereo, the mono surround channel is normally fed to two surround loudspeakers located in similar positions to the 3–2 format described below. The gain of the channel is usually reduced by 3 dB so that the summation of signals from the two speakers does not lead to a level mismatch between front and rear. The mono surround channel is the main limitation in this format. Despite the use of multiple loudspeakers to reproduce the surround channel, it is still not possible to create a strong sense of envelopment or spaciousness without using surround signals that are different on both sides of the listener.

The 3–1 audio format was also used in MUSE/Hi-Vision, high-definition laser discs released in Japan in the 1990s. It had 4 channels of digital audio encoded with 32 kHz sampling frequency and 12-bit quantization and multiplexed into MUSE video. MUSE decoders would play out the 4 analog signals into 5 outputs: left, right, center, surround left, surround right, with the surround channels sharing the same signal.

5.1-Channel Surround (3–2 Stereo)

Four-channel systems have the disadvantage of a mono surround channel, and this limitation is removed in the 5.1-channel system, enabling the provision of stereo effects or room ambience to accompany a primarily front-orientated sound stage. The distinction between front channels providing the stereo imaging of sources and rear channels providing ambience or effects explains the insistence in some standards on the use of the term ‘3–2 stereo’ rather than ‘five-channel surround’. As introduced above, the ‘.1’ component is a dedicated low-frequency effects (LFE) channel or sub-bass channel, so termed because of its limited bandwidth. Strictly, the international standard nomenclature for 5.1 surround is ‘3–2–1’, the last digit indicating the number of LFE channels.

The loudspeaker layout and channel configuration is shown in Figure 6.4. There can be little doubt that this format’s widespread success was the result of a political compromise that aimed to come up with speaker locations that would fulfill a range of functions across cinema, broadcast and music recording applications, not being entirely ideal for any of them, but adequate for most. A display screen is also shown in the figure for sound with picture applications, and there are recommendations concerning the relative size of the screen and the loudspeaker base width. The left and right loudspeakers are located at ± 30° for compatibility with 2-channel stereo reproduction, a necessary compromise that limited the options at the time. The surround loudspeaker locations, at approximately ± 110°, are placed so as to provide a compromise between the need for effects panning behind the listener and the lateral energy important for good envelopment. In this respect they are more like ‘side’ loudspeakers than rear loudspeakers, and in many installations this is an inconvenient location causing people to mount them nearer the rear than the standard suggests.

In this standard there are normally no loudspeakers directly behind the listener, which can make for creative difficulties, and some commercial variants have attempted to remedy this. The ITU standard allows for additional surround loudspeakers to cover the region around listeners, similar to the 3–1 arrangement described earlier. If these are used then they are expected to be distributed evenly in the angle between ± 60° and ± 150°. Surround loudspeakers should be the same as front loudspeakers where possible, in order that uniform sound quality can be obtained all around. That said, there are arguments for use of dipole loudspeakers in these positions. Dipoles radiate sound in more of a figure-eight pattern and one way of obtaining a diffuse surround impression is to orient these with the nulls of the figure-eight towards the listening position. In this way the listener experiences more reflected than direct sound and this can give the impression of a more spacious ambient sound field that may better emulate the cinema listening experience in small rooms. Dipoles make it correspondingly more difficult to create defined sound images in rear and side positions, though.

The low-frequency effects channel is a separate sub-bass channel with an upper limit extending to a maximum of 120 Hz. It is intended for conveying special low-frequency content that requires greater sound pressure levels and headroom than can be handled by the main channels. It is not intended for conveying the low-frequency component of the main channel signals, and its application is likely to be primarily in sound-for-picture applications where explosions and other high-level rumbling noises are commonplace. In consumer audio systems, reproduction of the LFE channel is considered optional. Because of this, recordings for general purpose applications should normally be made so that they sound satisfactory even if the LFE channel is not reproduced.

Further discussion of LFE channel handling and subwoofer configuration is contained in the section on surround sound monitoring, below. The main limitations of the 5.1 surround format are first, that it was not designed for accurate 360° phantom imaging capability, as explained above. Second, the front sound stage is narrower than it could be if compatibility with 2–0 reproduction was not a requirement. Third, the center channel can prove problematic for music balancing, as conventional panning laws and coincident microphone techniques are not usually optimized for three loudspeakers. Fourth, the LS and RS loudspeakers are located in a compromise position, leading to a large hole in the potential image behind the listener and making it difficult to find physical locations for the loudspeakers in practical rooms.

Such limitations of the format led to various non-standard uses of the five or six channels available. For example, some used the sixth channel to create a height channel. Others made a pair out of the LFE channel and the center channel so as to feed a pair of front-side loudspeakers, enabling the rear loudspeakers to be farther back.

7.1-Channel Surround

Deriving from wide screen cinema formats, the 7.1-channel configuration normally adds two further loudspeakers to the 5.1-channel configuration, located at center-left (CL) and center-right (CR), as shown in Figure 6.5. This was not a format primarily intended for consumer applications, but for large cinema auditoria where the screen width is such that the additional channels are needed to cover the angles between the loudspeakers satisfactorily for all the seats in the auditorium. Some consumer equipment manufacturers have also implemented a 7-channel mode in their consumer surround decoders, but the recommended locations for the loudspeakers are not quite the same as in the cinema application. The additional channels are used to provide a wider side-front component and allow the rear speakers to be moved round more to the rear than in the 5.1 arrangement.

In some recent 7.1 cinema installations, instead of the additional channels being used across the front screen, they are used for Back Surround Left, and Back Surround Right, leaving the conventional surround channels connected to the side loudspeakers. This is said to offer greater flexibility for audio placement with 3D visual content.

10.2-Channel Surround

Tomlinson Holman developed a 10.2-channel surround sound system, which began the process of bridging the gap to later immersive audio systems that include height. To the basic 5-channel array he added wider side-front loudspeakers and a center-rear channel to ‘fill in the holes’ in the standard layout. He also added two height channels and a second LFE channel. The second LFE channel was intended to provide lateral separation of decorrelated low bass content to either side of the listening area, to enhance low-frequency spaciousness. (Surround sound with height channels will be discussed in detail in Chapter 7.)

Surround Sound Delivery and Coding

This part of the chapter concerns ways in which surround sound signals can be encoded so as to be carried over analog or digital media. It does not specifically deal with computer file formats, though. Initially, analog surround sound signals often had to be matrixed to get them to the end user, because most delivery media were 2-channel only, although some wide screen cinema formats had individual sound tracks for each channel. These days, surround sound is almost entirely carried in the digital domain, using some form of data reduction coding, although for some high-end applications there may be sufficient bit rate to carry it as linear PCM data.

Matrixed Surround Sound Systems

By matrixing surround signals they can be represented using fewer channels than the source material contains. This can give rise to some side effects and the signals need careful dematrixing, but the approach was used widely for many years. The Dolby Stereo approach is described as an example, but there are alternatives, and a number of other companies made enhanced decoders to improve the decoding of Dolby-matrixed sound tracks.

The original Dolby Stereo system involved a number of different formats for film sound with three to six channels, particularly a 70 mm film format with six discrete tracks of magnetically recorded audio, and a 35 mm format with two optically recorded audio tracks onto which were matrixed four audio channels in the 3–1 configuration. Dolby Surround was introduced in 1982 as a means of emulating the effects of Dolby Stereo in a consumer environment. Essentially the same method of matrix decoding was used, so movies transferred to television formats could be decoded in the home in a similar way to the cinema. Dolby Stereo optical sound tracks for the cinema were Dolby A noise-reduction encoded and decoded, in order to improve the signal-to-noise ratio, but this is not a feature of consumer Dolby Surround.

The Dolby Stereo matrix (see Figure 6.6) is a form of ‘4–2–4’ matrix that encodes the mono surround channel so that it is added out of phase into the left and right channels (+90° in one channel and −90° in the other). The center channel signal is added to left and right in phase. The resulting sum is called Lt/Rt (left total and right total). By doing this the surround signal can be separated from the front signals upon decoding by summing the Lt/Rt signals out of phase (extracting the stereo difference signal), and the center channel can be extracted by summing Lt/Rt in phase. A decoder block diagram for the consumer version (Dolby Surround) is shown in Figure 6.7. In addition to the sum-and-difference-style decoding, the surround channel is subject to an additional delay, band-limiting between 100 Hz and 7 kHz and a modified form of Dolby B noise reduction. The low-pass filtering and the delay are both designed to reduce matrix side effects that could otherwise result in front signals appearing to come from behind. The delay (of the order of 20–30 ms in consumer systems, depending on the distance of the rear speakers) relies on the precedence effect to cause the listener to localize signals according to the first arriving wavefront which will now be from the front rather than the rear of the sound stage. The rear signal then becomes psychoacoustically better separated from the front and localization of primary signals is biased more towards the front. The modified B-type NR reduces surround channel noise and also helps to reduce the effects of decoding errors and interchannel crosstalk, as some distortions introduced between encoding and decoding will be reduced by B-type decoding.

A problem with passive matrix decoding is that the separation between adjacent channels is relatively modest, although the separation of left/right and center/surround remains high. When a signal is panned fully left it will tend to appear only 3 dB down in the center, and also in the surround, for example. Dolby’s ProLogic system, based on principles employed in the professional decoder, attempted to resolve this problem by including sophisticated ‘steering’ mechanisms into the decoder circuit to improve the perceived separation between the channels. A basic block diagram is shown in Figure 6.8. This enables a real center loudspeaker to be employed. Put crudely, ProLogic works by sensing the location of ‘dominant’ signal components and selectively attenuating channels away from the dominant component. ProLogic 2 added support for full-bandwidth stereo rear channels, with various options that made it more suitable for music programs. It was also claimed to be effective in the up-conversion of unencoded 2-channel material to 5-channel surround.

In 1998 Dolby and Lucasfilm THX joined forces to promote an enhanced surround system that added a center rear channel to the standard 5.1-channel setup. They introduced it, apparently, because of frustrations felt by sound designers for movies in not being able to pan sounds properly to the rear of the listener—the surround effect typically being rather diffuse. This system was christened ‘Dolby Digital—Surround EX’, and used matrix-style center channel encoding and decoding between the left and right surround channels of a 5.1-channel mix. The loudspeakers at the rear of the auditorium were then driven separately from those on the left and right sides, using the feed from this ‘rear-center’ channel, as shown in Figure 6.9.

Digital Surround Coding

Dolby Digital or AC-3 encoding (Todd et al., 1994) was developed as a means of delivering 5.1 channel surround to cinemas or the home without the need for analog matrix encoding. It has been used widely for the distribution of digital sound tracks on 35 mm movie films, broadcast and consumer media, on films the data being stored optically in the space between the sprocket holes on the film. The process involves a number of techniques by which the data representing audio from the source channels is transformed into the frequency domain and requantized to a lower resolution, relying on the masking characteristics of the human hearing process to hide the increased quantizing noise that results from this process. A common bit pool is used so that channels requiring higher data rates than others can trade their bit rate requirements provided that the overall total bit rate does not exceed the constant rate specified.

Aside from the representation of surround sound in a compact digital form, Dolby Digital includes a variety of operational features that enhance system flexibility and help adapt replay to a variety of consumer situations. These include dialog normalization (‘dialnorm’) and the option to include dynamic range control information alongside the audio data for use in environments where background noise prevents the full dynamic range of the source material being heard. Downmix control information can also be carried alongside the audio data in order that a 2-channel version of the surround sound material can be reconstructed in the decoder. As a rule, Dolby Digital data is stored or transmitted with the highest number of channels needed for the end product to be represented and any compatible downmixes are created in the decoder. This differs from some other systems where a 2-channel downmix is carried alongside the surround information.

Dolby Digital Plus is an extension to AC-3 (Dolby Digital), with higher data-rate options and shorter frames if required. It is designed to offer enhanced quality to Dolby Digital, running at data rates up to 6 Mbit/s, although the typical data rate on HD optical disks is said to be between 768 kbit/s and 1.5 Mbit/s. The data stream can be decoded by legacy receivers, which will only decode the Dolby Digital core at up to 640 kbit/s.

Dolby’s TrueHD, based on Meridian Lossless Packing (MLP), is a lossless codec resulting in decoded quality that is identical to the studio master. It enables 7.1-channel playback on BD although it has the capacity to support more than 16 channels of audio. Operating at data rates of up to 18 Mbit/s, it supports the Blu-Ray Disk (BD) standard’s requirement for eight full-range channels at 96 kHz/24 bits and up to 5.1 channels at 192 kHz/24 bits. An entirely separate artistic stereo mix can be carried if desired.

The original DTS (Digital Theater Systems) ‘Coherent Acoustics’ system (Smyth et al., 1996) is another digital signal coding format that can be used to deliver surround sound in consumer or professional applications, using low bit rate coding techniques to reduce the data rate of the audio information. The DTS system can accommodate a wide range of bit rates from 32 kbit/s up to 4.096 Mbit/s (somewhat higher than Dolby Digital), with up to eight source channels and with sampling rates up to 192 kHz. Variable bit rate and lossless coding are also optional. Downmixing and dynamic range control options are provided in the system.

DTS currently offers two codecs that can be used for higher-resolution audio on optical disks. Both are backwards compatible with the original DTS Digital Surround decoder because they are based on a lossy core plus extension model. Some other lossless formats take a similar form, for backwards compatibility, whereas others are lossless from the bottom up. DTS-HD High Resolution Audio offers data rates from 2–6 Mbit/s, offering quality that is not identical to the studio master but claimed to be close (it’s still a lossy coding format). This version allows for a maximum of 7.1 channels at 96 kHz in a CBR (constant bit-rate) stream. DTS-HD Master Audio operates at data rates up to 24.5 Mbit/s in a variable bit-rate (VBR) stream, offering 7.1 channels at 96 kHz, or 5.1 at 192 kHz. This version is lossless, and therefore bit-for-bit compatible with the original master. The core coding, which works at up to 1509 kbit/s with 6.1 channels, is at a higher bit rate than typical DVD audio data rates, so non-HD players still get a quality increase. This data stream can be routed to legacy AV receivers using a SPDIF connection. A tool is available (Neural Upmix) that enables one to upmix creatively from 5.1 to surround formats with higher numbers of channels. The encoder enables one to set the downmix coefficients from surround to stereo. There is also a QC control tool that enables one to hear the effect of conversion of 5.1 material to different loudspeaker layouts such as non-standard 7.1 speaker positions where there are sides and rears.

Of the MPEG multichannel coding formats (e.g., Bosi et al., 1997), the MPEG-2 BC (backwards compatible) version worked by encoding a matrixed downmix of the surround channels and the center channel into the left and right channels of an MPEG-1 compatible frame structure. Although MPEG-2 BC was originally intended for use with DVD releases in Region 2 countries (primarily Europe), this requirement was dropped in favor of Dolby Digital. MPEG-2 AAC, on the other hand, is a more sophisticated algorithm that codes multichannel audio to create a single bit stream that represents all the channels, in a form that cannot be decoded by an MPEG-1 decoder. Having dropped the requirement for backwards compatibility, the bit rate could be optimized by coding the channels as a group and taking advantage of interchannel redundancy if required. The MPEG-2 AAC system contained contributions from a wide range of different manufacturers, and evolved into MPEG-4. High Definition AAC (HD AAC) has a lossy core accompanied by a lossless extension that enables decoding to provide bit-for-bit compatibility with the original master recording. The AAC core part is compatible with existing decoders in mobile devices such as the iPod and iTunes. It can operate at sampling rates up to 192 kHz and at 24-bit resolution.

The most recent standard related to surround audio coding is MPEG-H (Herre et al., 2014), which is capable of handling a wide range of surround and fully immersive content, as well as ambisonic material and audio objects, rendering to a number of possible loudspeaker layouts. Because of the increasing number of format options, the trend here is away from fixed loudspeaker layouts and towards the idea that material may have to be rendered to whatever is available at the reproduction end of the chain.

Auro 3D and its variants is not mentioned here because it is principally aimed at fully immersive content with height information. The same applies to other coding formats aimed primarily at with-height content.

Spatial Audio Object Coding

The MPEG Spatial Audio Object Coding (SAOC) standard (ISO, 2010) describes a user- controllable rendering method for multiple audio objects based on transmission of a mono or stereo downmix of the object signals. Audio objects are individual signals that can be manipulated independently of other objects under the control of metadata and user interaction. Rendering can be controlled so as to place audio objects in desired positions and at different levels. An increase in level difference and/or repositioning of a speech dialog object, for example, can also improve intelligibility with certain speaker layouts and environments. SAOC encodes Object Level Differences (OLD), Inter-Object Cross Coherences (IOC) and Downmix Channel Level Differences (DCLD) into a parameter bitstream, and so does not discretely encode input audio signals. The SAOC bitstream is independent of loudspeaker configuration, and a default downmix option ensures backwards compatibility.

Object-based audio representation is also a key feature of the more recent MPEG-H standard (Herre et al., 2014), as well as an increasing number of other immersive audio coding systems.

Parametric Audio Coding

One variant on lossy low-bit rate coding involves the encoding of audio signals in the form of a ‘core’ signal alongside a sparse stream of ‘parameters’ that describe one or more features needed to reconstruct an approximation of the original signals. The idea is to code a basic version of the original audio signal, that could be decoded by compatible decoders, and to transmit spatial or spectral enhancements in the form of much lower bit rate ‘side’ information. MPEG-Surround (Breebaart et al., 2007) transmits a mono or stereo downmix of the original surround, plus side information to enable the surround spatial impression to be approximated upon decoding (see Figure 6.10). The downmix is encoded using a ‘legacy’ or conventional stereo coder such as MP3. The additional bit rate required for the side information is usually only a few kilobits per second, as opposed to the few hundred that might be needed to transmit the surround information as conventionally coded audio. This enables convincing surround to be transmitted at bit rates as low as 64 kilobits per second.

Time-Frequency Representation

Related to parametric coding of surround sound is time–frequency representation. A technique known as Directional Audio Coding (DirAC) embodies some of the same principles but is not a conventional parametric multichannel encoder like MPEG-Surround (Pulkki & Faller, 2006). Rather it is a method for spatial sound representation that can be applied to arbitrary reproduction scenarios, which can be linked to parametric multichannel audio coding. The authors explain that existing methods for the capture and reproduction of spatial sound fields, such as coincident and spaced microphone arrays, suffer from compromises that limit their ability either to create accurate directional cues or a sufficiently diffuse sound field. DirAC, on the other hand, is designed to represent and render these two components separately.

Figure 6.11

Figure 6.11Conceptual diagram of DirAC spatial audio encoding system.

(Reproduced courtesy of Ville Pulkki and Christof Faller, with permission of the Audio Engineering Society)

The DirAC approach is based on a number of assumptions about the relationship between perceptual parameters and physical cues; namely, that directional arrival of sound will transform into interaural time and level differences (ITD, ILD), that diffuseness will transform into interaural coherence cues, and that timbre depends on the monaural spectrum together with ITD, ILD and coherence information. A final assumption is that these factors together determine the auditory spatial image that the listener perceives. In order to ensure a match between the system’s representation and the characteristics of the human auditory process, the captured signals are split into filter bands similar to those of the auditory system, and the temporal resolution of the analysis is similarly defined. In the example presented in their paper, the authors show a flow diagram of the directional audio coding process based on the use of B format ambisonic signals as inputs (see Figure 6.11), which are further described in Chapter 9.

From the microphone signals, instantaneous direction vectors and diffuseness values are derived. The diffuseness values can be averaged over a period of some tens of milliseconds to reduce the rate at which they are transmitted. Upon reproduction the direction vectors in each frequency band are processed so as to render point-like virtual sources, using a panning technique such as VBAP (vector-based amplitude panning) (Pulkki, 1997). Diffuseness is resynthesized by one of two methods, the simplest involving the decorrelation of the transmitted omnidirectional component by convolving it with exponentially decaying white noise bursts having a time constant of 20 ms. By using a different noise signal for each loudspeaker signal, multiple decorrelated versions of the omni component can be generated.

Surround Sound Monitoring

This section is mainly about monitoring environments and configurations for 5.1-channel surround sound, although many of the principles apply in other configurations. The Audio Engineering Society published an information document on this topic (AES, 2001).

Main Loudspeakers

Rooms for surround monitoring should have an even distribution of absorbing and diffusing material. This is so that the rear loudspeakers function in a similar acoustic environment to the front loudspeakers. This is contrary to a number of popular two-channel control room designs that have one highly absorbtive end and the other end more reflective.

In larger rooms designed for film sound mixing, a distributed array of surround loudspeakers is often used, in some cases with decorrelation between them to avoid strong comb filtering effects. In smaller control rooms used for music and broadcasting mixing the space may not exist for such arrays. The ITU standard allows for more than one surround loudspeaker on either side and recommends that they are spaced equally on an arc from 60° to 150° from the front.

It can be difficult to install loudspeaker layouts according to the ITU standard, with equal spacing from the listening position and the surrounds at 110° ± 10°, because of the required width of the space. This often makes it necessary for the room to be laid out ‘wide’ rather than ‘long’, and if the room is one that was previously designed for 2-channel stereo the rotation of the axis of symmetry may result in the acoustic treatment being inappropriately distributed. Also the location of doors and windows may make the modification of existing rooms difficult. If building a new room for surround monitoring then it is obviously possible to start from scratch and make the room wide enough to accommodate the surround loudspeakers and absorption in more suitable places.

As a rule, front loudspeakers can be similar to those used for 2-channel stereo, although noting the particular problems with the center loudspeaker described in the next section. Low-directivity front loudspeakers may be desirable when trying to emulate the effect of a film mixing situation in a smaller surround control room. This is because in the large rooms typical of cinema listening the sound balancer is often well beyond the critical distance where direct and reflected sound are equal in level, and using speakers with low directivity helps to emulate this scenario in smaller rooms. Film mixers generally want to hear what the large auditorium audience member would hear, and this means being farther from the loudspeakers than for small room domestic listening or conventional music mixing.

Ideally the center speaker should be of the same type or quality as the rest. It may be possible to use somewhat smaller monitors for the main channels than would be used for 2-channel stereo, handling the low bass by means of a subwoofer or two. This makes it more practical to mount a center loudspeaker behind the mixing console, but its height will often be dictated by a control room window or video monitor. The center loudspeaker should be on the same arc as that bounding the other loudspeaker positions, otherwise the time delay of its direct sound at the listening position will be different from that of the other channels. If the center speaker is closer than the left or right channels, then it should be delayed slightly to put it in the correct place acoustically.

A lot of surround mixing work is carried out in conjunction with pictures, and this presents challenges for the center speaker location. In cinemas the screen is normally acoustically ‘transparent’ and uses front projection, although this transparency is never complete and usually requires some equalization. In smaller mixing rooms the display is often a flat-screen plasma monitor or a CRT display and these do not allow the same arrangement. With modestly sized solid displays for television purposes it can be possible to put the center loudspeaker underneath the display, with the display raised slightly, or above the display angled down slightly. The presence of a mixing console may dictate which of these is possible, and care should be taken to avoid strong reflections from the center loudspeaker off the console surface. Dolby suggests that if the center loudspeaker has to be offset height-wise it could be turned upside down compared with the left and right channels to make the tweeters line up, as shown in Figure 6.12.

Recommendations for professional setups suggest that the surround loudspeakers should be of the same quality as the front ones. In consumer environments this can be difficult to achieve, and the systems sold at the lower end of the market often incorporate much smaller surround loudspeakers than front. The use of a subwoofer to handle the low bass makes the required volume of the main speakers quite a lot smaller.

The directivity requirements of the surround loudspeakers have been the basis of some disagreement (see for example the exchange between Holman and Zacharov, 2000). The debate centers around the use of the surround loudspeakers to create a diffuse, enveloping sound field—a criterion that tends to favor either decorrelated arrays of direct radiators or dipole surrounds (bidirectional speakers that are typically arranged so that their main axis does not point towards the listener). If the creation of a diffuse, enveloping sound field is the only role for surround loudspeakers, then dipoles can be quite suitable if only two loudspeaker positions are available, particularly in small rooms and for the translation of large auditorium film mixes into smaller spaces. If, on the other hand, attempts are to be made at all-round source localization, direct radiators are probably more suitable.

Subwoofers

Low-frequency interaction between loudspeakers and rooms affects the placement and equalization of subwoofers. In choosing the optimum locations for subwoofers one must remember the basic principle that loudspeakers placed in corners tend to give rise to a noticeable bass boost, and couple well to most room modes. Some subwoofers are designed specifically for placement in particular locations whereas others need to be moved around until the most subjectively satisfactory result is obtained. Some equalization may be needed to obtain a reasonably flat overall frequency response at the listening position. Phase shifts or time-delay controls are sometimes provided to enable some correction of the time relationship of the subwoofer to other loudspeakers, but this will be a compromise with a single unit.

Multiple low-frequency drivers generating decorrelated signals can create a more natural spatial reproduction than monaural low-frequency reproduction from a single driver. Griesinger (1997) proposes that if mono LF content is reproduced it is better done through two units placed to the sides of the listener, driven 90° out of phase, to excite the asymmetrical lateral modes more successfully and improve LF spaciousness.

The LFE channel of a 5.1 surround system should be aligned so that its in-band gain on reproduction is 10 dB higher than that of the other channels. This does not mean that the overall subwoofer output should have its level raised by 10 dB compared with the other channels, as this would incorrectly boost any LF information routed to the subwoofer as a result of bass management (filtering off the LF content of the main channels and sending it to the sub).

It is a common misconception that any sub-bass or subwoofer loudspeaker(s) that may be used on reproduction must be fed directly from the LFE channel in all circumstances. While this may be the case in the cinema, bass management in consumer systems is not specified in the standard and is entirely system-dependent. It is not mandatory to feed low-frequency information to the LFE channel during the recording process, neither is it mandatory to use a subwoofer, indeed it has been suggested that restricting extreme low-frequency information to a monophonic channel may limit the potential for low-frequency spaciousness in balances. In music mixing it is likely to be common to send the majority of full-range LF information to the main channels, in order to retain the stereo separation between them.

In practical systems it may be desirable to use one or more subwoofers to handle the low-frequency content of a mix on reproduction. The benefit of this is that it enables the size of the main loudspeakers to be correspondingly reduced. In such cases crossover systems split the signals between main loudspeakers and subwoofer(s) somewhere between 80 Hz and 160 Hz. In order to allow for reproduction of the LFE channel and/or the low-frequency content from the main channels through subwoofer loudspeakers, a form of bass management akin to that shown in Figure 6.13 is typically employed.

Sound Bars

A brief mention should be made here of ‘sound bars’ in surround sound monitoring. Although not recommended for professional monitoring (except perhaps to discover what a mix might sound like when replayed using such a system), these are increasingly widely used in consumer reproduction as a compact and convenient alternative to multiple separate loudspeakers. Rear loudspeakers are particularly difficult to locate and wire in homes and other such environments, so the workaround adopted with sound bars is to radiate sound directionally from an array of loudspeakers, usually arranged in the form of a narrow ‘bar’ that can be mounted above or below a television screen. In this case rear channel content is radiated indirectly so as to bounce off the side and rear walls of the room. The same concept has also been extended to fully immersive reproduction.

Surround Sound Recording Techniques

Many of the concepts used in surround sound recording have at least some basis in conventional 2-channel stereo techniques. However, the challenges are greater, particularly when dealing with the region to the sides of the listener, where there can be a large gap between the loudspeakers and it is hard to deliver convincing phantom images (see Figure 6.14). A similar challenge can exist in the rear sector, where there is again a wide angle between the loudspeakers, at least in the 3–2 configuration. Some of the more recent extensions to the surround channel layout used for the cinema, such as 7.1, have helped to make these ‘dead spots’ less problematic.

Most ‘production’ recording techniques make most use of panned monophonic content and artificial effects, whereas a great deal of research and experimentation has gone into the design of surround microphone arrays that attempt the authentic directional pickup of entire acoustic environments. This is an example of the disconnection between purist/academic research and the mainstream of the recording industry. Broadcasting techniques live in a crossover domain between these two extremes, with some use of single-point surround microphones or arrays to capture live events.

Microphone Arrays

Surround microphone array techniques split into two main groups: those that are based on a single array of microphones in reasonably close proximity to each other, and those that treat the front and rear channels separately. The former are usually based on an attempt to generate phantom images with different degrees of accuracy around the full 360° in the horizontal plane. The latter usually have a front array providing reasonably accurate phantom images in the front, coupled with a separate means of capturing the ambient sound of the recording space.

Of the first type there are variants on a common theme involving fairly closely spaced microphones (often cardioids) configured in a five-point array. A book by Michael Williams deals with this idea in some detail (Williams, 2004). The basis of most of these arrays is pairwise time—intensity trading, usually treating adjacent microphones as pairs covering a particular sector of the recording angle around the array. The generic layout of such arrays is shown in Figure 6.15. Cardioids or even supercardioids tend to be favored because they offer the increased direct-to-reverberant ratio of sound capture when aimed at the source. The center microphone is typically spaced slightly forward of the L and R microphones, thereby introducing a useful time advance in the center channel for center-front sources. The spacing and angles between the capsules are typically based on the so-called Williams curves, based on time and amplitude differences required between single pairs of microphones to create phantom sources in particular locations. Some success has also been had by the author’s colleagues using omni microphones instead of cardioids, with appropriate adjustments to the spacings. These tend to give better overall sound quality but poorer front imaging.

The second group of techniques treats the stereo imaging of front signals separately from the capture of a natural-sounding spatial reverberation and reflection component. Most do this by adopting a 3-channel variant on a conventional 2-channel technique for the front channels, coupled with a more or less decorrelated combination of microphones in a different location for capturing spatial ambience (sometimes fed just to the surrounds, other times to both front and surrounds). Sometimes the front microphones also contribute to the capture of spatial ambience, depending on the proportion of direct to reflected sound picked up, but the essential point here is that the front and rear microphones are not intentionally configured as an attempt at a 360° imaging array.

Hamasaki of NHK (the Japanese broadcasting company) has proposed an arrangement based on near-coincident cardioids (30 cm) separated by a baffle, as shown in Figure 6.16 (Hamasaki, 2003). Here the center cardioid is placed slightly forward of left and right, and omni outriggers are spaced by about 3 meters. These omnis are low-pass filtered at 250 Hz and mixed with the left and right front signals to improve the LF sound quality. Left and right surround cardioids are spaced about 2–3 meters behind the front cardioids and 3 meters apart. An ambience array is used farther back, consisting of four figure-eight mics facing sideways, spaced by about 1 meter, to capture lateral reflections, fed to the four outer channels. This is placed high in the recording space.

Theile (2000) proposes a front microphone arrangement shown in Figure 6.17. While superficially similar to the front arrays described in the previous section, his arrangement reduces crosstalk between the channels by the use of supercardioid microphones at ± 90° for the left and right channels and a cardioid for the center. (Supercardioids are more directional than cardioids and have the highest direct/reverberant pickup ratio of any first-order directional microphone. They have a smaller rear lobe than hypercardioids.) Theile’s rationale behind this proposal is the avoidance of crosstalk between the front segments. He proposes to enhance the LF response of the array by using a hybrid microphone for left and right, which crosses over to omni below 100 Hz, thereby restoring the otherwise poor LF response of supercardioids. The center channel is high-pass filtered above 100 Hz. Furthermore, the response of the supercardioids should be equalized to have a flat response to signals at about 30° to the front of the array (they would normally sound quite colored at this angle). Schoeps has developed a prototype of this array, and it has been christened ‘OCT’ for ‘Optimum Cardioid Triangle’.

For the ambient sound signal, Theile proposes the use of a crossed configuration of microphones, which has been christened the ‘IRT cross’ or ‘atmo-cross’. This is shown in Figure 6.18. The microphones are either cardioids or omnis, and the spacing is chosen according to the degree of correlation desired between the channels. Theile suggests 25 cm for cardioids and about 40 cm for omnis, but says that this is open to experimentation. Small spacings are appropriate for more accurate imaging of reflection sources at the hot spot, whereas larger spacings are appropriate for providing diffuse reverberation over a large listening area. The signals are mixed in to L, R, LS and RS channels, but not the center.

A ‘double MS’ technique has been proposed by Curt Wittig and others, shown in Figure 6.19. Two mid-side pairs are used, one for the front channels and one for the rear. The center channel can be fed from the front M microphone. (As with any MS technique, the signals from the two microphones have to be added and subtracted using a simple transformer arrangement, mixer configuration, or signal processor, to derive left and right channels.) The rear pair is placed at or just beyond the room’s critical distance. S channel gain can be varied to alter the image width in either sector, and the M mic’s polar pattern can be chosen for the desired directional response (it would typically be a cardioid). Others have suggested using a fifth microphone (a cardioid) in front of the forward MS pair, to feed the center channel, delayed to time align it with the pair. If the front and rear MS pairs are co-located it may be necessary to delay the rear channels somewhat (10–30 ms) so as to reduce perceived spill from front sources into rear channels. In a co-located situation the same figure-eight microphone could be used as the S channel for both front and back pairs.

Spaced omni approaches to surround pickup include those proposed by Erdo Groot of Polyhymnia International, and Richard King of McGill University. Groot developed a largely undocumented array for Polyhymnia’s classical recordings that used omnis instead of cardioids, to take advantage of their better sound quality. Using an array of omnis separated by about three meters between left—right and front—back he achieves a spacious result where the rear channels are well integrated with the front. The center mic is placed slightly forward of left and right. It is claimed that placing the rear omnis too far away from the front tree makes the rear sound detached from the front image, so one gets a distinct echo or repeat of the front sound from the rear. In Richard King’s design, the spacings are slightly different, but the principle is essentially the same, with 1.3–2.6 m between the front left and right, 2–3 m between the rear left and right, and 3.6–5 m between the front and back. The rear microphones in this case are fitted with 50 mm spherical diffractive attachments (acoustic pressure equalizers or APEs) to modify the high- frequency directivity. The elevation of such arrays above the floor is a matter for experimentation, but typically between 2.5 and 6 m.

In general, the signals from separate ambience microphones fed to the rear loudspeakers may often be made less obtrusive and front–back ‘spill’ may be reduced by rolling off the high- frequency content of the rear channels. Some additional delay applied to the front channels or the rear channels may also assist in the process of integrating the rear channel ambience. The precise values of delay and equalization can only really be arrived at by experimentation in each situation.

Multi-Microphone Techniques and Panning

Most recording involves the use of spot ‘accent’ or ‘support’ microphones in addition to a main microphone technique of some sort. Indeed in many situations the spot microphones may end up at higher levels than the main microphone or there may be no main microphone. Alternatively sources such as recorded effects and synthesized material will be mixed and panned, using panoramic potentiometers, or panpots. Artificial reverberation of some sort is almost always helpful when trying to add spatial enhancement to panned mono sources, and some engineers prefer to use amplitude-panned signals to create a good balance in the front image, plus artificial reflections and reverberation to create a sense of spaciousness and depth.

The panning of signals between more than two loudspeakers presents a number of psychoacoustic problems, particularly with regard to appropriate energy distribution of signals, accuracy of phantom source localization, off-center listening and sound timbre. A number of different solutions have been proposed, some rather sophisticated, in addition to the relatively crude pairwise approach used in much film sound, but the simplicity and relative success of amplitude panning still seems to make it the most popular solution in practical applications.

English inventor Michael Gerzon came up with some criteria for a good panning law for surround sound (Gerzon, 1992c, p. 2):

The aim of a good panpot law is to take monophonic sounds, and to give each one amplitude gains, one for each loudspeaker, dependent on the intended illusory directional localization of that sound, such that the resulting reproduced sound provides a convincing and sharp phantom illusory image. Such a good panpot law should provide a smoothly continuous range of image directions for any direction between those of the two outermost loudspeakers, with no “bunching” of images close to any one direction or “holes” in which the illusory imaging is very poor.

Pairwise amplitude panning involves adjusting the relative amplitudes between a pair of adjacent loudspeakers so as to create a phantom image at some point between them. This has been extended to three front channels and is also sometimes used for panning between side loudspeakers (e.g., L and LS) and rear loudspeakers. The typical sine/cosine panning law devised by Blumlein for 2-channel stereo is often simply extended to more loudspeakers. Most such panners are constructed so as to ensure constant power as sources are panned to different combinations of loudspeakers, so that the approximate loudness of signals remains constant.

Panning using amplitude or time differences between widely spaced side loudspeakers is not particularly successful at creating accurate phantom images. Side images tend not to move linearly as they are panned and tend to jump quickly from front to back. Data from Theile and Plenge (1977) illustrating this is shown in Figure 6.20. Spectral differences resulting from differing HRTFs of front and rear sound tend to result in sources appearing to be spectrally split or ‘smeared’ when panned to the sides.

In some mixing consoles designed for surround work, particularly in the film domain, separate panners are provided for L–C–R, LS–RS, and front-to-rear surround. Combinations of positions of these amplitude panners enable sounds to be moved to various locations, but some more successfully than others. For example, sounds panned so that some energy is emanating from all loudspeakers (say, panned centrally on all three pots) tend to sound diffuse for center listeners, and in the nearest loudspeaker for those sitting off-center. Joystick panners combine these amplitude relationships under the control of a single lever that enables a sound to be ‘placed’ dynamically anywhere in the surround sound field. Moving effects made possible by these joysticks are often unconvincing and need to be used with experience and care.

Research undertaken by Jim West (1999) at the University of Miami showed that, despite the limitations of constant power ‘pairwise’ panning, it proved to offer reasonably stable images for center and off-center listening positions, for moving and stationary sources, compared with some other more esoteric algorithms. Front–back confusion was noticed in some cases, for sources panned behind the listener. In Martin et al.’s (1999) subjective tests of image focus using different panning laws it was found that conventional pairwise constant-power panning provided the most focused images, followed by a relatively simple polarity-restricted cosine law and a second-order ambisonic law. These tests were conducted at the hot spot only, and the authors subsequently concluded that the polarity-restricted cosine law appeared to create fewer unwanted side effects than the constant power law (such as changes in perceived distance to the source).

The amplitude panning concept was extended to a general model that can be used with combinations of loudspeakers in arbitrary locations, known as vector-based amplitude panning or VBAP (Pulkki, 1997). This approach enables amplitude differences between two or three loudspeakers to be used for the panning of sources. Borß (2014) describes a novel alternative to VBAP as a way of rendering phantom sources to immersive loudspeaker arrays. VBAP is basically an extension to the tangent panning law, based on amplitude differences between loudspeakers in a triad, and is widely used because it is simple and effective. However, there are some limitations under certain circumstances and Borß proposes a system that uses symmetric panning gains for symmetric loudspeaker setups, using N-wise panning defined using polygons. This scheme uses a larger number of loudspeakers and seems to stabilize the position and trajectory of phantom sources. However, it also introduces a slightly greater bass boost and slightly more spread images. The author christens the approach ‘Edge Fading Amplitude Panning’ (EFAP). The method is based on many of the same principles as VBAP, in that it uses minimal computing resources, is still based only on amplitude panning, and has power normalized gains. It aims to offer smooth transition of gains between the speakers, and tries to avoid panning where summing localization principles don’t work, where there are large angles between speakers.

Mixing Aesthetics

How to use the center channel in mixes has aroused controversy. Some engineers strongly protest using it, claiming that the center channel is a distraction and a nuisance, and that they can manage very well without it, while others are equally strongly convinced of its merits. The psychoacoustical advantages of using a center channel have been mentioned earlier, but existence of this channel complicates panning laws and microphone techniques, as well as makes conversion between formats more difficult.

Some classical engineers find that simultaneous surround and 2-channel recordings of the same session are made easier by adopting 4-channel rather than 5-channel recording techniques, but this may be more a matter of familiarity than anything else. For many situations a separate mix and different microphones will be required for 2-channel and 5-channel versions of a recording.

In multitrack recording using panned mono sources, the panning law chosen to derive the feed to the center channel will have an important effect on the psychoacoustic result. Numerous studies have highlighted the timbral differences between real and phantom center images, which leads to the conclusion that the equalization of a source sent to a hard center would ideally be different from that used on a source mixed to a phantom center. Vocals, for example, panned so as only to emanate from the center loudspeaker may sound constricted spatially compared with a phantom image created between left and right loudspeakers, as the center loudspeaker is a true source with a fixed location. Some ‘bleed’ into the left and right channels is sometimes considered desirable, in order to ‘defocus’ the image, or alternatively stereo reverberation can be used on the signal.

The technique of spreading mono panned sources into other channels is often referred to as a ‘divergence’ or ‘focus’ control, and can be extended to the surround channels as well, using a variety of different laws to split the energy between the channels. Holman (1999) advises against the indiscriminate use of divergence controls as they can cause sounds to be increasingly localized into the nearest loudspeaker for off-center listeners.

Surround channels are in most cases best reserved for mix components that are not to be clearly or accurately localized, unless very close to loudspeaker positions. In film sound the concept of a surround ‘loudspeaker position’ is somewhat alien in any case, as there are usually numerous surround loudspeakers connected together. In mixing music for consumer applications it may be possible to treat the surround loudspeakers as point sources, although they may not be accurately localized by listeners.

Upmixing and Downmixing

Content can be converted from one spatial format to another, using a matrix or algorithm of some kind, but this can come with compromises in both spatial and timbral quality. In upmixing an attempt is made to generate surround sound with more channels than exist in the source material, whereas in downmixing the aim is to create fewer channels.

Many upmixing algorithms, using 2-channel stereo as a source, extract some of the ambience contained in the difference information between the L and R channels and use it to drive the rear channels, often with quite sophisticated directional steering to enhance the stereo separation of the rear channels. Sometimes a proportion of the front sound is placed in the rear channels to increase envelopment, with suitable delay and filtering to prevent front sounds being pulled towards the rear. Experiments by the author found that the level of signal extracted by such algorithms to the center and rear channels was strongly related to the sum and difference components of the 2-channel signal (Rumsey, 1998).

Surround matrix decoding algorithms, such as described earlier, may be used for this purpose, although some are optimized better than others for dealing with 2-channel material that has not previously been matrix-encoded. Often a separate collection of settings is needed for upmixing unencoded 2-channel stereo to surround than is used for decoding matrix-encoded surround.

There are also a number of algorithms used in home cinema and surround systems that add ‘effects’ to conventional stereo in order to create a surround impression. Rather than extract existing components from the stereo sound to feed the rear channels they add reverberation on top of any already present, using effects called ‘Hall’ or ‘Jazz Club’ or some other such description. These alter the acoustic characteristics of the original recording quite strongly.

Subjective experiments carried out by the author on a range of such upmixing algorithms found that the majority of 2-channel material suffered from a degradation of the front image quality when converted to 5-channel reproduction (Rumsey, 1999). This either took the form of a narrower image, a change in perceived depth or a loss of focus. On the other hand, the overall spatial impression was often improved, although listeners differed quite strongly in their liking of the spatial impression created (some claiming it sounded artificial or phasy).

Faller et al. (2013) propose that upmixing is often based on an unrealistically simple model that separates direct and ambient sound. For example, if a model assumes that there is the same ambient signal power in all channels, some discrete 5.1 mixes can’t be represented very well because there are different ambient signals (with different power) in the front and rear channels. They go on to show how a cascade of 2-channel upmixes to surround, called a ring upmix, can be used to generate channels for more loudspeakers with full support for 360° panning and high channel separation. The ‘ring’ referred to in this case is the ring of loudspeakers that defines the input format, and the aim is to extend this to more channels by adding loudspeakers between the original channels, as shown in Figure 6.21. In each case the aim is to take a 2-channel original pair and reproduce it over N loudspeakers, ideally with the same sound and sound stage. As shown in Figure 6.22, the 5-to-13 upmix example in Figure 6.21 can be implemented with a cascade of 2-channel upmixes that include a number of delays (D) to compensate for the fact that some upmixes will have run through fewer stages than others. The total delay of each channel through the system should then be the same. The authors also show an alternative that uses frequency-domain processing to avoid the build-up of such delays. One application of this idea described in the paper is in the IOSONO 3D sound system, where a multichannel ring upmix is employed to render standard content such as 2.0, 5.1, 7.1 and so forth over the large number of loudspeakers used in the WFS (wave field synthesis) layouts concerned.

Making separate mixes for every format can be extremely time-consuming, and this has led to the need for semi-automatic or automatic downmixing of multichannel mixes. The total amount of reverberant sound in multichannel mixes can be different to that in 2-channel mixes, though. This is partly because the spatial separation of the loudspeakers enables one to concentrate on the front image separately from the all-round reverberation, whereas in 2-channel stereo all the reverberation comes from the front. Consequently some control is required over the downmix coefficients and possibly the phase relationships between the channels, for optimal control over a 2-channel downmix.

Downmix equations are given in ITU-R BS.775, intended principally for broadcasting applications where a ‘compatible’ 2-channel version of a 5-channel program needs to be created. These are relatively basic approaches to mixing the LS and RS channels into L and R, respectively, and the center equally into front left and right, all at –3 dB with respect to the gain of the front channels. Recognizing that this may not be appropriate for all program material the recommendation allows for alternative coefficients of 0 dB and –6 dB to be used. Formulae for other format conversions are also given. Experiments conducted at the BBC Research Department suggested that there was little consistency among listeners concerning the most suitable coefficients for different types of surround program material, with listeners preferring widely differing amounts of surround channel mixed into the front. It is possible that this was due to listeners having control over the downmix themselves, and that in cases where there was little energy in the surround channels a wide range of settings might have been considered acceptable. Averaged across all program types a setting of between –3 and –6 dB appeared to be preferred, but with a wide variance.

In Dolby Digital decoders the downmix coefficients can be varied by the originator of the program at the post-production or mastering stage and included as side information in the Dolby Digital data stream. In this way the downmix can be optimized for the current program conditions and does not have to stay the same throughout the program. Listeners can choose to ignore the producer’s downmix control if they choose, creating a custom version that they prefer.

Gerzon (1992a) proposed that, in order to preserve stereo width and make the downmix hierarchically compatible with other formats, an alternative downmix formula from 5–2 channels should be used:

L0 = 0.8536L + 0.5C 0.1464R + 0.3536k(LS + RS) + 0.3536k2(LS - RS)

R0 = -0.1464L + 0.5C + 0.8536R + 0.3536k(LS + RS) - 0.3536k2(LS - RS)

where k is between 0.5 and 0.7071 (–6 and –3 dB) and k2 = between 1.4142k and 1.4142 (–3 to +3 dB).

The result of this matrix is that the front stereo image in the 2-channel version is given increased width compared with the ITU downmix proposal, and that the rear difference gain component k2 has the effect of making rear sounds reproduce somewhat wider than front sounds.

He suggests that this would be generally desirable because rear sounds are generally ‘atmosphere’ and the increased width would improve the ‘spatial’ quality of such atmosphere and help separate it from front stage sounds. Based on the above equation he proposes that values of k = 0.5 and k2 = 1.1314 work quite well, making the folded-down rear stage wider than the front stage and the rear channels between 3.5 and 6 dB lower in level than the front.

Perceptual Evaluation

There are many perceptual dimensions or attributes making up human judgments about sound quality. These may be arranged in a hierarchy, with an integrative judgment of quality at the top, and judgments of individual descriptive attributes at the bottom (see Figure 6.23). According to Letowski’s model (1989) this ‘tree’ may be divided broadly into spatial and timbral attributes, the spatial attributes referring to the three-dimensional features of sounds such as their location, width and distance, and the timbral attributes referring to aspects of sound color. Effects of non-linear distortion and noise are also sometimes put in the timbral group. The higher one goes up the tree, the more one is usually talking about the acceptability or suitability of the sound for some purpose and in relation to some frame of reference, whereas at the lower levels one may be able to evaluate the attributes concerned in value-free terms. In other words a high-level judgment of quality is an integrative evaluation that takes into account all of the lower-level attributes and weighs up their contribution. The nature of the reference, the context and the definition of the task govern the way in which the listener decides which aspects of the sound should be taken into consideration.

Although researchers have tended to concentrate on analyzing the ability of surround sound systems to create optimally localized phantom images and to reconstruct original wavefronts accurately, other subjective factors such as image depth, width and envelopment relate strongly to subjective preference in entertainment audio applications. These factors are much harder to define and measure, but they appear nonetheless to be quite important determinants of overall quality. Mason (1999) proposed a hierarchy of spatial attributes for use in perceptual evaluation (see Figure 6.24) and the author has published an extensive review of terminology and schema for evaluation in Rumsey (2002).

If true correspondence of all source locations were possible (or indeed desirable) between recording environment and reproducing environment, in all three dimensions and for all listening positions, then it might be reasonable to suppose that ability of a surround sound system to create accurate phantom images of all sources (including reflections) would be the only requirement for fidelity. Since true identity is rarely possible or desirable, some means of creating and controlling adequate illusions of the most important subjective cues for consumer enjoyment could be held as the primary aim of recording and reproducing techniques. This attitude is particularly relevant for entertainment audio applications, but might not be the right one to take for flight simulators, for example.

An interesting conclusion of work by the author and his colleagues on the factors affecting overall quality judgments of surround sound was that timbral fidelity is considerably more important than spatial fidelity (Rumsey et al., 2005b). In other words, listeners care more about the tonal or sound color features of reproduction than they do about the spatial ones. Naïve listeners (ordinary consumers) hardly notice aspects of stereophonic source location, and are more affected by the immersive effect of surround channels; it is only trained listeners that seem to appreciate accurate phantom imaging. It was also found that the overlap between spatial and timbral domains cannot be ignored, as each affects the perception of the other (Conetta et al., 2014a).

One of the few examples of spatial subjective quality tests carried out during a previous intense period of interest in surround sound reproduction is the work of Nakayama et al. (1971). The subjective factors they identified as important in explaining listener quality ratings were interpreted as (a) ‘depth of image sources’, (b) ‘fullness’, (c) ‘clearness’. An examination of their results suggests that ‘fullness’ is very similar to what others have called ‘envelopment’, as it is heavily loaded for reproductions involving more loudspeakers to the sides and rear of the listener, and weak for 2-channel frontal stereo. ‘Fullness’ was most important, followed by ‘depth of sources’, followed by ‘clearness’. The authors’ concluding remarks are still relevant today with regard to the problem of assessing recorded material that does not conform to ‘natural’ acoustic layouts of sources and reverberation.

Needless to say, the present study is concerned with the multichannel reproduction of music played only in front of the listeners, and proves to be mainly concerned with extending the ambience effect… . In other types of four-channel reproduction the localizations of image sources are not limited to the front. With regard to the subjective effects of these other types of reproduction, many further problems, those mainly belonging to the realm of art, are to be expected. The optimization of these might require considerably more time to be spent in trial, analysis and study.

(Nakayama et al., 1971, p. 750)

No one has really solved this problem yet, as the mixing of surround sound for entertainment purposes is really an art and not a science.

In studies of the perceived effects of spatial sound reproduction, it is sometimes useful to distinguish between judgments and sentiments (Nunally and Bernstein, 1994). Judgments are human responses or perceptions essentially free of personal opinion or emotional response and can be externally verified (such as the response to questions like ‘how long is this piece of string?’ or indeed ‘what is the location of this sound source?’). Sentiments are preference-related or linked to some sort of emotional response, and cannot be externally verified. Obvious examples are ‘like/dislike’ and ‘good/bad’ forms of judgment.

In experiments designed to determine how subjects described spatial phenomena in reproduced sound systems, including surround sound, Berg and Rumsey (2006) separated descriptive attributes or constructs from emotional and evaluative ones. Descriptive features could then be analyzed separately from emotional responses, and relationships established between them in an attempt to determine what spatial features were most closely related to positive emotional responses. In this experiment it seemed that high levels of envelopment and room impression created by surround sound, rather than accurate imaging of sources, were the descriptive features most closely related to positive emotional responses.

Predictive Models of Surround Sound Quality

It may be possible to arrive at an overall prediction of quality or listener preference by some weighted combination of ratings of individual low-level attributes. However, such weightings are strongly context- and task-dependent. Listening tests are time-consuming and resource-intensive, so there is a strong motivation to develop perceptual models that aim to predict the human response to different aspects of sound quality. Relationships are established between metrics of the signals or sound field and the results of listening tests. A typical perceptual model for sound quality is calibrated in a similar way to that shown in Figure 6.25. Audio signals, usually consisting of a set of reference and impaired versions of chosen program items, are scaled in standard listening tests to generate a database of ‘subjective’ grades. In parallel with this a set of audio features is defined and measured, leading to a set of metrics representing perceptually relevant aspects of the audio signals concerned. These are sometimes termed ‘objective metrics’. The statistical model or neural network is then calibrated or trained based on these data so as to make a more or less accurate prediction of the quality ratings given by the listeners.

The author and his colleagues developed one such model for surround sound quality evaluation, which was able to predict the quality ratings of trained listeners with reasonably high accuracy, based on measurements made with probe signals (Conetta et al., 2014b). This was reasonably generalizable to a range of entertainment audio content types, but evidence suggested the need for adaptation to different mixing styles.

Note

1 Parts of this chapter are drawn from material appearing in Spatial Audio (Rumsey, 2001) and Sound and Recording, 7th ed. (Rumsey & McCormick, 2014), and are used by permission of Focal Press.

References

AES. (2001). Multichannel Surround Sound Systems and Operations: Technical Document AESTD1001.1.01–10. Audio Engineering Society, New York.

Berg, J., & Rumsey, F. (2006). Identification of quality attributes of spatial audio by repertory grid technique. Journal of Audio Engineering Society, 54(5), 365–379.

Borß, C. (2014). A polygon-based panning method for 3D loudspeaker setups. Presented at the AES 137th Convention, Los Angeles, USA, 9–12 October. Paper 9106. Audio Engineering Society.

Bosi, M., Brandenburg, K., Quackenbush, S., Fielder, L., Akagiri, K., Fuchs, H., & Dietz, M. (1997). ISO/IEC MPEG-2 advanced audio coding. Journal of Audio Engineering Society, 45(10), 789–812.

Breebaart, J., Hotho, G., Koppens, J., Schuijers, E., Oomen, W., & Van de Par, S. (2007). Background, concept, and architecture for the recent MPEG surround standard on multichannel audio compression. Journal of Audio Engineering Society, 55(5), 331–351.

Conetta, R., Brookes, T., Rumsey, F., Zielinski, S., Dewhirst, M., Jackson, P., Bech, S., Meares, D., & George, S. (2014a). Spatial audio quality perception (Part 1): Impact of commonly encountered processes. Journal of Audio Engineering Society, 62(12), 831–846.

Conetta, R., Brookes, T., Rumsey, F., Zielinski, S., Dewhirst, M., Jackson, P., Bech, S., Meares, D., & George, S. (2014b). Spatial audio quality perception (Part 2): A linear regression model. Journal of Audio Engineering Society, 62(12), 847–860.

Faller, C., Altmann, L., Levison, J., & Schmidt, M. (2013). A multi-channel ring upmix. Presented at the 134th AES Convention, Rome, May 4–7. Paper 8908. Audio Engineering Society.

Garity, W., & Hawkins, J. (1941). Fantasound. SMPTE Motion Imaging Journal, 37(8), 127–146.

Gerzon, M. (1992a). Compatibility of and conversion between multispeaker systems. Presented at 93rd AES Convention, San Francisco, 1–4 October. Preprint 3405. Audio Engineering Society.

Gerzon, M. (1992b). Optimum reproduction matrices for multispeaker stereo. Journal of Audio Engineering Society, 40(7‑8), 571–589.

Gerzon, M. (1992c). Panpot laws for multispeaker stereo. Presented at 92nd AES Convention, Vienna. Preprint 3309. Audio Engineering Society

Glasgal, R. (1995). Ambiophonics: The synthesis of concert hall sound fields in the home. Presented at the 99th AES Convention, New York, October 6–9. Preprint 4113. Audio Engineering Society.

Griesinger, D. (1997). Spatial impression and envelopment in small rooms. Presented at AES 103rd Convention, New York, September 26–29. Preprint 4638. Audio Engineering Society

Hamasaki, K. (2003). Multichannel recording techniques for reproducing adequate spatial impression. Proceedings of the AES 24th International Conference: Multichannel Audio, The New Reality. Paper 27. Audio Engineering Society

Herre, J., Hilpert, J., Kuntz, A., & Plogsties, J. (2014). MPEG-H Audio‑The new standard for universal spatial/3d audio coding. Journal of Audio Engineering Society, 62(12), 821–830.

Hertz, B. (1981). 100 years with stereo: The beginning. Journal of Audio Engineering Society, 29(5), 368–372.

Holman, T. (1999). 5.1 Surround Sound: Up and Running. Oxford and Boston: Focal Press.

Holman, T., & Zacharov, N. (2000). Comments on “subjective appraisal of loudspeaker directivity for multichannel reproduction” (in Letters to the Editor). Journal of Audio Engineering Society, 48(4), 314–321.

ISO. (2010). ISO/IEC 23003–2—Information technology—MPEG audio technologies—Part 2:Spatial Audio Object Coding (SAOC). International Standards Organization.

ITU-R. (2012). BS. 775–3 (2012) Multichannel Stereophonic sound System with and without Accompanying Picture. International Telecommunications Union.

Letowski, T. (1989). Sound quality assessment: Cardinal concepts. Presented at the 87th Audio Engineering Society Convention, New York. Preprint 2825.

Martin, G., Woszczyk, W., Corey, J., & Quesnel, R. (1999). Controlling phantom image focus in a multichannel reproduction system. Presented at 107th AES Convention, New York, 24–27 September. Preprint 4996. Audio Engineering Society.

Mason, R. (1999). Personal communication.

Nakayama, T., Miura, T., Kosaka, O., Okamoto, M., & Shiga, T. (1971). Subjective assessment of multichannel reproduction. Journal of Audio Engineering Society, 19(9), 744–751.

Nunally, J., & Bernstein, I. (1994). Psychometric Theory (3rd ed.). New York and London: McGraw‑Hill.

Pulkki, V. (1997). Virtual sound source positioning using vector base amplitude panning. Journal of Audio Engineering Society, 45(6), 456–466.

Pulkki, V., & Faller, C. (2006). Directional Audio Coding: Filterbank and STFT-based Design Presented at the AES 120th Convention, Paris, May 20–23. Paper 6658. Audio Engineering Society.

Rumsey, F. (1998). Synthesized multichannel signal levels versus the M-S ratios of 2-channel programme items. Presented at 104th AES Convention, Amsterdam, 16–19 May. Preprint 4653. Audio Engineering Society.

Rumsey, F. (1999) Controlled subjective assessments of 2-to-5 channel surround sound processing algorithms. J. Audio Eng. Soc., 47(7/8), pp. 563–582.

Rumsey, F. (2001). Spatial Audio. Oxford and Boston: Focal Press.

Rumsey, F. (2002). Spatial quality evaluation for reproduced sound: Terminology, meaning and a scene-based paradigm. Journal of Audio Engineering Society, 50(9), 651–666.

Rumsey, F., & McCormick, T. (2014). Sound and Recording: Applications and Theory (7th ed.). Oxford and Boston: Focal Press.

Rumsey, F., Zielinski, S., Kassier, R. & Bech, S. (2005a) Relationships between experienced listener ratings of multichannel audio quality and naïve listener preferences. Journal of Acoustical Society of America, 117(6), 3832–3840.

Rumsey, F., Zielinski, S., Kassier, R., & Bech, S. (2005b). On the relative importance of spatial and timbral fidelities in judgments of degraded multichannel audio quality. Journal of Acoustical Society of America, 118(2), 968–977.

Scheiber, P. (1971). Suggested performance requirements for compatible four-channel recording. Journal of Audio Engineering Society, 19(8), 647–650.

Smyth, S., Smith, W. P., Smyth, M. H. C., Yan, M., & Jung, T. (1996). DTS coherent acoustics: Delivering high quality multichannel sound to the consumer. Presented at 100th AES Convention, Copenhagen, 11–14 May. Workshop 4a-3.

Steinberg, J., & Snow, W. (1934). Auditory perspectives‑physical factors. Stereophonic Techniques, 3–7. Audio Engineering Society.

Steinke, G. (1996). Surround sound—the new phase. An overview. Presented at the 100th AES Convention, Copenhagen, May 11–14. Preprint 4286. Audio Engineering Society.

Theile, G. (2000). Multichannel Natural Recording Based on Psychoacoustic Principles. Presented at the AES 108th Convention, Paris, France, 19–22 February. Paper 5156. Audio Engineering Society.

Theile, G., & Plenge, G. (1977). Localization of lateral phantom images. Journal of Audio Engineering Society, 25(4), 196–200.

Todd, C., Davidson, G. A., Davis, M. F., Fielder, L. D., Link, B. D., & Vernon, S. (1994). Flexible perceptual coding for audio transmission and storage. Presented at 96th AES Convention. Preprint 3796.

West, J. (1999). Five-channel panning laws: An analytical and experimental comparison. Master’s thesis, University of Miami, Florida.

Williams, M. (2004). Microphone Arrays for Stereo and Multichannel Sound Recordings. Milano: Editrice Il Rostro.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset