This chapter is concerned with the most commonly encountered multichannel (i.e. more than two channels) stereo reproduction configurations, most of which are often referred to as surround sound. Standards or conventions that specify basic channel or loudspeaker configurations are distinguished from proprietary systems such as Dolby Digital and DTS whose primary function is the coding and delivery of multichannel audio signals. The latter are discussed in the second part of the chapter, in which is also contained an explanation of the Ambisonic system for stereo signal representation. Surround sound standards often specify little more than the channel configuration and the way the loudspeakers should be arranged. This leaves the business of how to create or represent a spatial sound field entirely up to the user.

THREE-CHANNEL (3-0) STEREO

It is not proposed to say a great deal about the subject of three-channel stereo here, as it is rarely used on its own. Nonetheless it does form the basis of a lot of surround sound systems. It requires the use of a left (L), center (C) and right (R) channel, the loudspeakers arranged equidistantly across the front sound stage, as shown in Figure 17.1. It has some precedents in historical development, in that the stereophonic system developed by Steinberg and Snow in the 1930s used three channels (see Chapter 16). Three front channels have also been commonplace in cinema stereo systems, mainly because of the need to cover a wide listening area and because wide screens tend to result in a large distance between left and right loudspeakers. Two channels only became the norm in consumer systems for reasons of economy and convenience, and particularly because it was much more straightforward to cut two channels onto an analog disk than three.

There are various advantages of three-channel stereo. First, it allows for a somewhat wider front sound stage than two-channel stereo, if desired, because the center channel acts to ‘anchor’ the central image and the left and right loudspeakers can be placed further out to the sides (say ± 45°). (Note, though, that in the current five-channel surround sound standard the L and R loudspeakers are in fact placed at ± 30°, for compatibility with two-channel stereo material.) Second, the center loudspeaker enables a wider range of listening positions in many cases, as the image does not collapse quite as readily into the nearest loudspeaker. It also anchors dialog more clearly in the middle of the screen in sound-for-picture applications. Third, the center image does not suffer the same timbral modification as the center image in two-channel stereo, because it emanates from a real source.

A practical problem with three-channel stereo is that the center loudspeaker position is often very inconvenient. Although in cinema reproduction it can be behind an acoustically transparent screen, in consumer environments, studios and television environments it is almost always just where one wants a television monitor or a window. Consequently the center channel has to be mounted above or below the object in question, and possibly made smaller than the other loudspeakers.

FIGURE 17.1 Three-channel stereo reproduction usually involves three equally spaced loudspeakers in front of the listener. The angle between the outer loudspeakers is 60° in the ITU standard configuration, for compatibility with two-channel reproduction, but the existence of a center loudspeaker makes wider spacings feasible if compatibility is sacrificed

FOUR-CHANNEL SURROUND (3-1 STEREO)

In this section the form of stereo called ‘3-1 stereo’ in some international standards, or ‘LCRS surround’ in some other circles, is briefly described. Proprietary encoding and decoding technology from Dolby relating to this format is described later.

‘Quadraphonic’ reproduction using four loudspeakers in a square arrangement is not covered further here (it was mentioned in the Introduction), as it has little relevance to current practice.

Purpose of four-channel systems

The merits of three front channels have already been introduced in the previous section. In the 3-1 approach, an additional ‘effects’ channel or ‘surround’ channel is added to the three front channels, routed to a loudspeaker or loudspeakers located behind (and possibly to the sides) of listeners. It was developed first for cinema applications, enabling a greater degree of audience involvement in the viewing/listening experience by providing a channel for ‘wrap-around’ effects. This development is attributed to 20th Century Fox in the 1950s, along with wide-screen Cinemascope viewing, being intended to offer effective competition to the new television entertainment.

There is no specific intention in 3-1 stereo to use the effects channel as a means of enabling 360° image localization. In any case, this would be virtually impossible with most configurations as there is only a single audio channel feeding a larger number of surround loudspeakers, effectively in mono.

Loudspeaker configuration

Figure 17.2 shows the typical loudspeaker configuration for this format. In the cinema there are usually a large number of surround loudspeakers fed from the single S channel (‘surround channel’, not to be confused with the ‘S’ channel in sum-and-difference stereo), in order to cover a wide audience area. This has the tendency to create a relatively diffuse or distributed reproduction of the effects signal. The surround speakers are sometimes electronically decorrelated to increase the degree of spaciousness or diffuseness of surround effects, in order that they are not specifically localized to the nearest loudspeaker or perceived inside the head.

In consumer systems reproducing 3-1 stereo, the mono surround channel is normally fed to two surround loudspeakers located in similar positions to the 3-2 format described below. The gain of the channel is usually reduced by 3dB so that the summation of signals from the two speakers does not lead to a level mismatch between front and rear.

Limitations of four-channel reproduction

The mono surround channel is the main limitation in this format. Despite the use of multiple loudspeakers to reproduce the surround channel, it is still not possible to create a good sense of envelopment of spaciousness without using surround signals that are different on both sides of the listener. Most of the psychoacoustic research suggests that the ears need to be provided with decorrelated signals to create the best sense of envelopment and effects can be better spatialized using stereo surround channels.

FIGURE 17.2
3-1 format reproduction uses a single surround channel usually routed (in cinema environments) to an array of loudspeakers to the sides and rear of the listening area. In consumer reproduction the mono surround channel may be reproduced through only two surround loudspeakers, possibly using artificial decorrelation and/or dipole loudspeakers to emulate the more diffused cinema experience

5.1-CHANNEL SURROUND (3-2 STEREO)

This section deals with the 3-2 configuration that has been standardized for numerous surround sound applications, including cinema, television and consumer applications. Because of its wide use in general parlance, the term ‘5.1 surround’ will be used below. Whilst without doubt a compromise, it has become widely adopted in professional and consumer circles and is likely to form the basis for consumer surround sound for the foreseeable future.

Various international groups have worked on developing recommendations for common practice and standards in this area, and some of the information below is based on the effort of the AES Technical Committee on Multichannel and Binaural Audio Technology to bring together a number of proposals.

Purpose of 5.1-channel systems

Four-channel systems have the disadvantage of a mono surround channel, and this limitation is removed in the 5.1-channel system, enabling the provision of stereo effects or room ambience to accompany a primarily front-orientated sound stage. This front-orientated paradigm is a most important one as it emphasizes the intentions of those that finalized this configuration, and explains the insistence in some standards on the use of the term ‘3-2 stereo’ rather than ‘five-channel surround’. Essentially the front three channels are intended to be used for a conventional three-channel stereo sound image, whilst the rear/side channels are only intended for generating supporting ambience, effects or ‘room impression’. In this sense, the standard does not directly support the concept of 360° image localization, although it may be possible to arrive at recording techniques or signal processing methods that achieve this to a degree.

The front-rear distinction is a conceptual point often not appreciated by those that use the format. Two-channel stereo can be relatively easily modeled and theoretically approached in terms of localization vectors, etc. for sounds at any angle between the loudspeakers. It is more difficult, though, to come up with such a model for the five-channel layout described below, as it has unequal angles between the loudspeakers and a particularly large angle between the two rear loudspeakers. It is possible to arrive at gain and phase relationships between these five loudspeakers that are similar to those used in Ambisonics for representing different source angles, but the varied loudspeaker angles make the imaging stability less reliable in some sectors than others. For those who do not have access to the sophisticated panning laws or psychoacoustic matrices required to feed five channels accurately for all-round localization it may be better to treat the format in ‘cinema style’ — in other words with a three-channel front image and two surround effect channels. With such an approach it is still possible to create very convincing spatial illusions, with good envelopment and localization qualities.

One cannot introduce the 5.1 surround system without explaining the meaning of the ‘.1’ component. This is a dedicated low-frequency effects (LFE) channel or sub-bass channel. It is called ‘.1’ because of its limited bandwidth. Strictly, the international standard nomenclature for 5.1 surround should be ‘3-2-1’, the last digit indicating the number of LFE channels.

International standards and configurations

The loudspeaker layout and channel configuration is specified in the ITU-R BS.775 standard. This is shown in Figure 17.3 and Fact File 17.1. A display screen is also shown in the figure for sound with picture applications, and there are recommendations concerning the relative size of the screen and the loudspeaker base width shown in the accompanying table. The left and right loudspeakers are located at ± 30° for compatibility with two-channel stereo reproduction. In many ways this need for compatibility with 2/0 is a pity, because the center channel unavoidably narrows the front sound stage in many applications, and the front stage could otherwise take advantage of the wider spacing facilitated by three-channel reproduction. It was nonetheless considered crucial for the same loudspeaker configuration to be usable for all standard forms of stereo reproduction, for reasons most people will appreciate.

FIGURE 17.3
3-2 format reproduction according to the ITU-R BS.775 standard uses two independent surround channels routed to one or more loudspeakers per channel

FACT FILE 17.1 TRACK ALLOCATIONS IN 5.1

Standards recommend the track allocations to be used for 5.1 surround on eight-track recording formats, as shown in the table below. Although other configurations are known to exist there is a strong move to standardize on this arrangement (see also the notes below the table).

Track¹	Signal		Comments	Color²
1	L	Left		Yellow
2	R	Right		Red
3	C	Center		Orange
4	LFE	Low frequency enhancement	Additional sub-bass and effects signal for subwoofer, optional³	Gray
5	LS	Left surround	−3 dB in the case of mono surround	Blue
6	RS	Right surround	−3 dB in the case of mono surround	Green
7	Free use in program exchange⁴		Preferably left signal of a 2/0 stereo mix	Violet
8	Free use in program exchange⁴		Preferably right signal of a 2/0 stereo mix	Brown

1 The term ‘track’ is used to mean either tracks on magnetic tape or virtual tracks on other storage media where no real tracks exist.

2 This color coding is only a proposal of the German Surround Sound Forum at present, and not internationally standardized.

3 Preferably used in film sound, but is optional for home reproduction. If no LFE signal is being used, track 4 can be used freely, e.g. for commentary. In some regions a mono surround signal MS= LS+ RS is applied, where the levels of LS and RS are decreased by 3 dB before summing.

4 Tracks 7 and 8 can be used alternatively, for example for commentary, for additional surround signals, or for half-left/half-right front signal (e.g. for special film formats), or rather for the matrix format sum signal Lt/Rt.

The surround loudspeaker locations, at approximately ±110°, are placed so as to provide a compromise between the need for effects panning behind the listener and the lateral energy important for good envelopment. In this respect they are more like ‘side’ loudspeakers than rear loudspeakers, and in many installations this is an inconvenient location causing people to mount them nearer the rear than the standard suggests. (Some have said that a 150° angle for the rear loudspeakers provides a more exciting surround effect.) In the 5.1 standard there are normally no loudspeakers directly behind the listener, which can make for creative difficulties. This has led to a Dolby proposal called EX (described below) that places an additional speaker at the center-rear location. (This is not part of the current standard, though.) The ITU standard allows for additional surround loudspeakers to cover the region around listeners, similar to the 3-1 arrangement described earlier. If these are used then they are expected to be distributed evenly in the angle between ±60° and ±150°.

Surround loudspeakers should be the same as front loudspeakers where possible, in order that uniform sound quality can be obtained all around. That said, there are arguments for use of dipole loudspeakers in these positions. Dipoles radiate sound in more of a figure-eight pattern and one way of obtaining a diffuse surround impression is to orient these with the nulls of the figure-eight towards the listening position. In this way the listener experiences more reflected than direct sound and this can give the impression of a more spacious ambient soundfield that may better emulate the cinema listening experience in small rooms. Dipoles make it correspondingly more difficult to create defined sound images in rear and side positions, though.

The LFE channel and use of subwoofers

The low-frequency effects channel is a separate sub-bass channel with an upper limit extending to a maximum of 120Hz (see Fact File 17.2). It is intended for conveying special low-frequency content that requires greater sound pressure levels and headroom than can be handled by the main channels. It is not intended for conveying the low-frequency component of the main channel signals, and its application is likely to be primarily in sound-for-picture applications where explosions and other high-level rumbling noises are commonplace, although it may be used in other circumstances.

In consumer audio systems, reproduction of the LFE channel is considered optional. Because of this, recordings should normally be made so that they sound satisfactory even if the LFE channel is not reproduced. The EBU (European Broadcasting Union) comments on the use of the LFE channel as follows.

When an audio program originally produced as a feature film for theatrical release is transferred to consumer media, the LFE channel is often derived from the dedicated theatrical subwoofer channel. In the cinema, the dedicated subwoofer channel is always reproduced, and thus film mixes may use the subwoofer channel to convey important low frequency program content. When transferring programs originally produced for the cinema over television media (e.g. DVD), it may be necessary to re-mix some of the content of the subwoofer channel into the main full bandwidth channels. It is important that any low frequency audio which is very significant to the integrity of the program content is not placed into the LFE channel. The LFE channel should be reserved for extreme low frequency, and for very high level, 120 Hz program content which, if not reproduced, will not compromise the artistic integrity of the program.

FACT FILE 17.2 BASS MANAGEMENT IN 5.1

It is a common misconception that any sub-bass or sub-woofer loudspeaker(s) that may be used on reproduction must be fed directly from the LFE channel in all circumstances. Whilst this may be the case in the cinema, bass management in the consumer reproducing system is not specified in the standard and is entirely system dependent. It is not mandatory to feed low-frequency information to the LFE channel during the recording process, neither is it mandatory to use a subwoofer, indeed it has been suggested that restricting extreme low-frequency information to a monophonic channel may limit the potential for low-frequency spaciousness in balances. In music mixing it is likely to be common to send the majority of full-range LF information to the main channels, in order to retain the stereo separation between them.

In practical systems it may be desirable to use one or more subwoofers to handle the low-frequency content of a mix on reproduction. The benefit of this is that it enables the size of the main loudspeakers to be correspondingly reduced, which may be useful practically when it comes to finding places to put them in living rooms or sound control rooms. In such cases crossover systems split the signals between main loudspeakers and subwoofer(s) somewhere between 80 Hz and 160 Hz. In order to allow for reproduction of the LFE channel and/or the low-frequency content from the main channels through subwoofer loudspeakers, a form of bass management akin to that shown below is typically employed.

With cinema reproduction the in-band gain of this channel is usually 10 dB higher than that of the other individual channels. This is achieved by a level increase of the reproduction channel, not by an increased recording level. (This does not mean that the broadband or weighted SPL of the LFE loudspeaker should measure 10 dB higher than any of the other channels — in fact it will be considerably less than this as its bandwidth is narrower.)

Limitations of 5.1-channel reproduction

The main limitations of the 5.1 surround format are first, that it was not intended for accurate 360° phantom imaging capability, as explained above. Whilst it may be possible to achieve a degree of success in this respect, the loudspeaker layout is not ideally suited to it. Second, the front sound stage is narrower than it could be if compatibility with 2/0 reproduction was not a requirement. Third, the center channel can prove problematic for music balancing, as conventional panning laws and coincident microphone techniques are not currently optimized for three loudspeakers, having been designed for two-speaker stereo. Simple bridging of the center loudspeaker between left and right signals has the effect of narrowing the front image compared with a two-channel stereo reproduction of the same material. This may be resolved over time as techniques suited better to three-channel stereo are resurrected or developed. Fourth, the LS and RS loudspeakers are located in a compromise position, leading to a large hole in the potential image behind the listener and making it difficult to find physical locations for the loudspeakers in practical rooms.

These various limitations of the format, particularly in some people’s view for music purposes, have led to various non-standard uses of the five or six channels available on new consumer disc formats such as DVD-A (Digital Versatile Disc — Audio) and SACD (Super Audio Compact Disc). For example, some are using the sixth channel (which would otherwise be LFE) in its full bandwidth form on these media to create a height channel. Others are making a pair out of the ‘LFE’ channel and the center channel so as to feed a pair of front-side loudspeakers, enabling the rear loudspeakers to be further back. These are non-standard uses and should be clearly indicated on any recordings.

Signal levels in 5.1 surround

In film sound environments it is the norm to increase the relative recording level of the surround channels by 3 dB compared with that of the front channels. This is in order to compensate for the −3 dB acoustic alignment of each surround channel’s SPL with respect to the front that takes place in dubbing stages and movie theaters. It is important to be aware of this discrepancy between practices, as it is the norm in music mixing and broadcasting to align all channels for equal level both on recording media and for acoustical monitoring. Transfers from film masters to consumer or broadcast media may require 3 dB alteration in the gain of the surround channels.

OTHER MULTICHANNEL CONFIGURATIONS

Although the 5.1 surround standard is becoming widely adopted as the norm for the majority of installations, other proposals and systems exist, typically involving more channels to cover a large listening area more accurately. It is reasonable to assume that the more real loudspeakers exist in different locations around the listener, the less one has to rely on the formation of phantom images to position sources accurately, and the more freedom one has in listener position. The added complication of mixing for such larger numbers of channels must be considered as a balancing factor.

The reader is also referred to the discussion of Ambisonics, as this system can be used with a wide range of different loudspeaker configurations depending on the decoding arrangements used.

7.1-channel surround

Deriving from widescreen cinema formats, the 7.1-channel configuration normally adds two further loudspeakers to the 5.1-channel configuration, located at center-left (CL) and center-right (CR), as shown in Figure 17.4. This is not a format primarily intended for consumer applications, but for large cinema auditoria where the screen width is such that the additional channels are needed to cover the angles between the loudspeakers satisfactorily for all the seats in the auditorium. Sony’s SDDS cinema system is a common proprietary implementation of this format, as is the original 70 mm Dolby Stereo format (see below), although the original 70 mm analog format only used one surround channel.

Lexicon and Meridian have also implemented a seven-channel mode in their consumer surround decoders, but the recommended locations for the loudspeakers are not quite the same as in the cinema application. The additional channels are used to provide a wider side-front component and allow the rear speakers to be moved round more to the rear than in the 5.1 arrangement.

10.2-channel surround

Tomlinson Holman has spent considerable effort promoting a 10.2-channel surround sound system as ‘the next step’ in spatial reproduction, but this has not yet been adopted as standard. To the basic five-channel array he adds wider side-front loudspeakers and a center-rear channel to ‘fill in the holes’ in the standard layout. He also adds two height channels and a second LFE channel. The second LFE channel is intended to provide lateral separation of decorrelated low bass content to either side of the listening area, as suggested by Griesinger, to enhance low-frequency spaciousness.

FIGURE 17.4
Some cinema sound formats for large auditorium reproduction enhance the front imaging accuracy by the addition of two further loudspeakers, center-left and center-right

SURROUND SOUND SYSTEMS

This part of the chapter concerns what will be called surround sound ‘systems’, which includes proprietary formats for the coding and transfer of surround sound. These are distinguished from the generic configurations and international standards discussed already. Most of the systems covered here are the subject of patents and intellectual property rights. In some proprietary systems the methods of signal coding or matrixing for storage and delivery are defined (e.g. Dolby Stereo), whilst others define a full source-receiver signal representation system (e.g. Ambisonics).

MATRIXED SURROUND SOUND SYSTEMS

Whilst ideally one would like to be able to transfer or store all the channels of a surround sound mix independently and discretely, it may be necessary to make use of existing two-channel media for compatibility with other systems. The systems described in the following sections all deal with multichannel surround sound in a matrixed form (in other words, using an algorithm that combines the channels in such a way that they can be subsequently extracted using a suitable decoder). By matrixing the signals they can be represented using fewer channels than the source material contains. This gives rise to some side-effects and the signals require careful dematrixing, but the approach has been used widely for many years, mainly because of the unavailability of multichannel delivery media in many environments.

Dolby stereo, surround and prologic

Dolby Labs was closely involved with the development of cinema surround sound systems, and gradually moved into the area of surround sound for consumer applications.

The original Dolby Stereo system involved a number of different formats for film sound with three to six channels, particularly a 70 mm film format with six discrete tracks of magnetically recorded audio, and a 35 mm format with two optically recorded audio tracks onto which were matrixed four audio channels in the 3-1 configuration (described above). The 70 mm format involved L, LC, C, RC, R and S channels, whereas the 35 mm format involved only L, C, R and S. Both clearly only involved mono surround information. The four-channel system is the one most commonly known today as Dolby Stereo, having found widespread acceptance in the cinema world and used on numerous movies. Dolby Surround was introduced in 1982 as a means of emulating the effects of Dolby Stereo in a consumer environment. Essentially the same method of matrix decoding was used, so movies transferred to television formats could be decoded in the home in a similar way to the cinema. Dolby Stereo optical sound tracks for the cinema were Dolby A noise-reduction encoded and decoded, in order to improve the signal-to-noise ratio, but this is not a feature of consumer Dolby Surround (more recent cinema formats have used Dolby SR-type noise reduction, alongside a digital soundtrack).

FIGURE 17.5 Basic components of the Dolby Stereo matrix encoding process

The Dolby Stereo matrix (see Figure 17.5) is a form of ‘4-2-4’ matrix that encodes the mono surround channel so that it is added out of phase into the left and right channels (+90° in one channel and −90° in the other). The center channel signal is added to left and right in phase. The resulting sum is called L_t/R_t (left total and right total). By doing this the surround signal can be separated from the front signals upon decoding by summing the L_t/R_t signals out of phase (extracting the stereo difference signal), and the center channel can be extracted by summing L_t/R_t in phase. In consumer systems using passive decoding the center channel is not always fed to a separate loudspeaker but can be heard as a phantom image between left and right. A decoder block diagram for the consumer version (Dolby Surround) is shown in Figure 17.6. Here it can be seen that in addition to the sum-and-difference-style decoding, the surround channel is subject to an additional delay, band-limiting between 100 Hz and 7 kHz and a modified form of Dolby B noise reduction. The low-pass filtering and the delay are both designed to reduce matrix side-effects that could otherwise result in front signals appearing to come from behind. Crosstalk between channels and effects of any misalignment in the system can cause front signals to ‘bleed’ into the surround channel, and this can be worse at high frequencies than low. The delay (of the order of 20–30ms in consumer systems, depending on the distance of the rear speakers) relies on the precedence effect (see Chapter 2) to cause the listener to localize signals according to the first arriving wavefront which will now be from the front rather than the rear of the sound stage. The rear signal then becomes psychoacoustically better separated from the front and localization of primary signals is biased more towards the front. The modified B-type NR reduces surround channel noise and also helps to reduce the effects of decoding errors and interchannel crosstalk, as some distortions introduced between encoding and decoding will be reduced by B-type decoding.

A problem with passive Dolby Surround decoding is that the separation between adjacent channels is relatively modest, although the separation of left/right and center/surround remains high. When a signal is panned fully left it will tend to appear only 3 dB down in the center, and also in the surround, for example. The effects of this can be ameliorated in passive consumer systems by the techniques described above (phantom center and surround delay/filtering). Dolby’s ProLogic system, based on principles employed in the professional decoder, attempts to resolve this problem by including sophisticated ‘steering’ mechanisms into the decoder circuit to improve the perceived separation between the channels. A basic block diagram is shown in Figure 17.7. This enables a real center loudspeaker to be employed. Put crudely, ProLogic works by sensing the location of ‘dominant’ signal components and selectively attenuating channels away from the dominant component. (A variety of other processes are involved as well as this.) So, for example, if a dialog signal is predominantly located in the center, the control circuit will reduce the output of the other channels (L, R, S) in order that the signal comes mainly from the center loudspeaker. (Without this it would also have appeared at quite high level in left and right as well.) A variety of algorithms are used to determine how quickly the system should react to changes in dominant signal position, and what to do when no signal appears dominant.

FIGURE 17.6 Basic components of the passive Dolby surround decoder

Dolby has recently introduced an enhancement to ProLogic, entitled ProLogic 2, that adds support for full-bandwidth stereo rear channels, with various options that make it more suitable for music programs. It is also claimed to be effective in the up-conversion of unencoded two-channel material to five-channel surround.

Mixes that are to be matrix encoded using the Dolby system should be monitored via the encode-decode chain in order that the side-effects of the process can be taken into account by the balance engineer. Dolby normally licenses the system for use on a project, and will assist in the configuration and alignment of their equipment during the project.

Dolby Stereo/Surround can be complemented by the THX system, as described in Fact File 17.3.

FIGURE 17.7 Basic components of the active Dolby ProLogic decoder

FACT FILE 17.3 WHAT IS THX?

The THX system was developed by Tomlinson Holman at Lucasfilm (THX is derived from ‘Tomlinson Holman Experiment’). The primary aim of the system was to improve the sound quality in movie theaters and make it closer to the sound experienced by sound mixers during post-production. It was designed to complement the Dolby Stereo system, and does not itself deal with the encoding or representation of surround sound. In fact THX is more concerned with the acoustics of cinemas and the design of loudspeaker systems, optimizing the acoustic characteristics and noise levels of the theater, as well as licensing a particular form of loudspeaker system and crossover network. THX licenses the system to theaters and requires that the installation is periodically tested to ensure that it continues to meet the specification.

Home THX was developed, rather like Dolby Surround, in an attempt to convey the cinema experience to the home. Through the use of a specific controller, amplifiers and speakers, the THX system enhances the decoding of Dolby Surround and can also be used with digital surround sound signals. The mono surround signal of Dolby Surround is subject to decorrelation of the signals sent to the two surround loudspeakers in order that the surround signal is made more diffuse and less ‘mono’. It is claimed that this has the effect of preventing surround signals from collapsing into the nearest loudspeaker. Signals are re-equalized to compensate for the excessive high-frequency content that can arise when cinema balances are replayed in small rooms, and the channels are ‘timbre matched’ to compensate for the spectral changes that arise when sounds are panned to different positions around the head.

In terms of hardware requirements, the Home THX system also specifies certain aspects of amplifier performance, as well as controlling the vertical and horizontal directivity of the front loudspeakers. Vertical directivity is tightly controlled to increase the direct sound component arriving at listeners, whilst horizontal directivity is designed to cover a reasonably wide listening area. Front speakers should have a frequency response from 80 Hz to 20kHz and all speakers must be capable of radiating an SPL of 105dB without deterioration in their response or physical characteristics. The surround speakers are unusual in having a bipolar radiation pattern, arranged so that the listener hears reflected sound rather than direct sound from these units. These have a more relaxed frequency response requirement of 125 Hz to 8kHz. A sub-woofer feed is usually also provided.

Circle Surround

Circle Surround was developed by the Rocktron Corporation (RSP Technologies) as a matrix surround system capable of encoding stereo surround channels in addition to the conventional front channels. They proposed the system as more appropriate than Dolby Surround for music applications, and claimed that it should be suitable for use on material that had not been encoded as well as that which had.

The Circle Surround encoder is essentially a sum and difference L_t/R_t process (similar to Dolby but without the band limiting and NR encoding of the surround channel). One incarnation of this involves 5-2 encoding, intended for decoding back to five channels (the original white paper on the system described a 4-2 encoder). Amongst other methods, the Circle decoder steers the rear channels separately according to a split-band technique that steers low- and high-frequency components independently from each other. In this way they claim to avoid the broad-band ‘pumping’ effects associated with some other systems. They also decode the rear channels slightly differently, using L-R for the left rear channel and R-L for the right rear channel, which it is claimed allows side-images to be created on either side. They avoid the use of a delay in the rear channels for the ‘Music’ mode of the system and do not band-limit the rear channels as Dolby Surround does.

Lexicon logic 7

Logic 7 is another surround matrix decoding process that can be used as an alternative for Dolby Surround decoding. Variants on this algorithm (such as the so-called Music Logic and Music Surround modes) can also be used for generating a good surround effect from ordinary two-channel material. Lexicon developed the algorithm for its high-end consumer equipment, and it is one of a family of steered decoding processes that distributes sound energy appropriately between a number of loudspeakers depending on the gain and phase relationships in the source material. In this case seven loudspeaker feeds are provided rather than five, adding two ‘side’ loudspeakers to the array, as shown in Figure 17.8. The rear speakers can then be further to the rear than would otherwise be desirable. The side loudspeakers can be used for creating an enhanced envelopment effect in music modes and more accurate side panning of effects in movie sound decoding.

In Logic 7 decoding of Dolby matrix material the front channel decoding is almost identical to Dolby ProLogic, with the addition of a variable center channel delay to compensate for non-ideal locations of the center speaker. The rear channels operate differently depending on whether the front channel content is primarily steered dialog/effects or music/ambience. In the former case the front signals are canceled from the rear channels and panned effects behave as they would with ProLogic, with surround effects panned ‘full rear’ appearing in mono on both rear channels. In the latter case the rear channels work in stereo, but reproducing the front left and right channels with special equalization and delay to create an enveloping spatial effect. The side channels carry steered information that attempts to ensure that effects which pan from left to rear pass through the left side on the way, and similarly for the right side with right-to-rear pans.

FIGURE 17.8
Approximate loudspeaker layout suitable for Lexicon’s Logic 7 reproduction. Notice the additional side loudspeakers that enable a more enveloping image and may enable rear loudspeakers to be placed further to the rear

It is claimed that by using these techniques the effect of decoding a 3-1 matrix surround version of a 3-2 format movie can be brought close to that of the original 3-2 version. Matrix encoding of five channels to L_t/R_t is also possible with a separate algorithm, suitable for decoding to five or more loudspeakers using Logic 7.

Dolby EX

In 1998 Dolby and Lucasfilm THX joined forces to promote an enhanced surround system that added a center rear channel to the standard 5.1-channel setup. They introduced it, apparently, because of frustrations felt by sound designers for movies in not being able to pan sounds properly to the rear of the listener — the surround effect typically being rather diffuse. This system was christened ‘Dolby Digital — Surround EX’, and apparently uses matrix-style center channel encoding and decoding between the left and right surround channels of a 5.1-channel mix. The loudspeakers at the rear of the auditorium are then driven separately from those on the left and right sides, using the feed from this ‘rear-center’ channel, as shown in Figure 17.9.

FIGURE 17.9
Dolby EX adds a center-rear channel fed from a matrix-decoded signal that was originally encoded between left and right surround channels in a manner similar to the conventional Dolby Stereo matrix process

DIGITAL SURROUND SOUND FORMATS

Data-reduced digital encoding has largely replaced analog matrix encoding for surround sound and is now covered in Chapter 8.

AMBISONICS

Principles

The Ambisonic system of directional sound pickup and reproduction is discussed here because of its relative thoroughness as a unified system, being based on some key principles of psychoacoustics. It has its theoretical basis in work by Gerzon, Barton and Fellgett in the 1970s, as well as work undertaken earlier by Cooper and Shiga.

Ambisonics aims to offer a complete hierarchical approach to directional sound pickup, storage or transmission and reproduction, which is equally applicable to mono, stereo, horizontal surround sound, or full ‘periphonic’ reproduction including height information. Depending on the number of channels employed it is possible to represent a lesser or greater number of dimensions in the reproduced sound. A number of formats exist for signals in the Ambisonic system, as detailed in the next section. A format known as UHJ (‘Universal HJ’, ‘HJ’ simply being the letters denoting two earlier surround sound systems) is also used for encoding multichannel Ambisonic information into two or three channels whilst retaining good mono and stereo compatibility for ‘non-surround’ listeners. Thus, Ambisonically-encoded material can be released as a conventional two-channel stereo recording and if required a UHJ decoder can be used to convert it into surround sound.

Ambisonic sound should be distinguished from quadraphonic sound, since quadrophonics explicitly requires the use of four loudspeaker channels, and cannot be adapted to the wide variety of pickup and listening situations that may be encountered. Quadraphonics generally works by creating conventional stereo phantom images between each pair of speakers and, as Gerzon states, conventional stereo does not perform well when the listener is off-center or when the loudspeakers subtend an angle larger than 60°. Since in quadraphonic reproduction the loudspeakers are angled at roughly 90° there is a tendency towards a hole-in-the-middle, as well as there being the problem that conventional stereo theories do not apply correctly for speaker pairs to the side of the listener. Ambisonics, however, encodes sounds from all directions in terms of pressure and velocity components, and decodes these signals to a number of loudspeakers, with psychoacoustically optimized shelf filtering above 700 Hz to correct for the shadowing effects of the head. It also incorporates an amplitude matrix that determines the correct levels for each speaker for the layout chosen, and can therefore be decoded correctly for 5.1 speaker layouts, for instance.

Ambisonics might thus be considered as the theoretical successor to coincident stereo on two loudspeakers, since it is the logical extension of Blumlein’s principles to surround sound.

The source of an Ambisonic signal may be an Ambisonic microphone such as the Calrec Soundfield, or it may be an artificially panned mono signal, split into the correct B-format components (see below) and placed in a position around the listener by adjusting the ratios between the signals.

Signal formats

As indicated above, there are four basic signal formats for Ambisonic sound: A, B, C and D. The A-format consists of the four signals from a microphone with four sub-cardioid capsules orientated as shown in Figure 17.10 (or the pan-pot equivalent of such signals). These are capsules mounted on the four faces of a tetrahedron, and correspond to left-front (LF), right-front (RF), left-back (LB) and right-back (RB), although two of the capsules point upwards and two point downwards. Such signals should be equalized so as to represent the soundfield at the center of the tetrahedron, since the capsules will not be perfectly coincident.

FIGURE 17.10 A-format capsule directions in an Ambisonic microphone

The B-format consists of four signals that between them represent the pressure and velocity components of the sound field in any direction, as shown in Figure 17.11. It can be seen that there is a similarity with the sum and difference format of two channel stereo, described in the previous chapter, since the B-format is made up of three orthogonal figure-eight components (X, Y and Z), and an omni component (W). All directions in the horizontal plane may be represented by scalar and vector combinations of W, X and Y, whilst Z is required for height information. X is equivalent to a forward-facing figure-eight (equivalent to M in MS stereo), Y is equivalent to a sideways-facing figure-eight (equivalent to S in MS stereo). The X, Y and Z components have a frontal, sideways or upwards gain of 3 dB or √2 with relation to the W signal (0dB) in order to achieve roughly similar energy responses for sources in different positions. B-format signals may also be created directly by arranging capsules or individual microphones in the B-format mode (two or three figure-eights at 90° plus an omni). The Z component is not necessary for horizontal information. If B-format signals are recorded instead of speaker feeds (D-format), subsequent manipulation of the soundfield is possible, and the signal will be somewhat more robust to interchannel errors.

FIGURE 17.11 B-format components W, X, Y and Z in Ambisonics represent an omnidirectional pressure component and three orthogonal velocity (figure-eight) components of the sound field respectively

The C-format consists of four signals L, R, T and Q, which conform to the UHJ hierarchy, and are the signals used for mono- or stereo-compatible transmission or recording. The C-format is, in effect, a useful consumer matrix format. L is a two-channel-compatible left channel, R is the corresponding right channel, T is a third channel which allows more accurate horizontal decoding, and Q is a fourth channel containing height information. The proportions of B-format signals which are combined to make up a C-format signal have been carefully optimized for the best compatibility with conventional stereo and mono reproduction.

Two, three, or four channels of the C-format signal may be used depending on the degree of directional resolution required, with a two-and-a-half channel option available where the third channel (T) is of limited bandwidth. For stereo compatibility only L and R are used. The UHJ or C-format hierarchy is depicted graphically in Figure 17.12.

D-format signals are those distributed to loudspeakers for reproduction, and are adjusted depending on the selected loudspeaker layout. They may be derived from either B- or C-format signals using an appropriate decoder, and the number of speakers is not limited in theory, nor is the layout constrained to a square. Four speakers give adequate surround sound, whilst six provide better immunity against the drawing of transient and sibilant signals towards a particular speaker, and eight may be used for full periphony with height. The decoding of B- and C-format components into loudspeaker signals is too complicated and lengthy a matter to go into here, and is the subject of several patents that were granted to the NRDC (the UK National Research and Development Council, as was). It is sufficient to say that the principle of decoding involves the passing of two or more UHJ signals via a phase-amplitude matrix, resulting in B-format signals that are subjected to shelf filters (in order to correct the levels for head-related transfer functions such as shadowing and diffraction). These are passed through an amplitude matrix which feeds the loudspeakers (see Figure 17.13). A layout control is used to vary the level sent to each speaker depending on the physical arrangement of speakers. See also Fact File 17.4. AES Convention Paper 5788, available on-line, provides further information about this surround format, including higher order Ambisonics.

FIGURE 17.12
The C-format or UHJ hierarchy enables a variety of matrix encoding forms for stereo signals, depending on the amount of spatial information to be conveyed and the number of channels available

FIGURE 17.13
C-format signals are decoded to provide D-format signals for loudspeaker reproduction

FACT FILE 17.4 LOUDSPEAKER MOUNTING

In many studios it is traditional to mount the monitor loudspeakers flush with the front wall. This has the particular advantage of avoiding the reflection that occurs with free-standing loudspeakers from the wall behind the loudspeaker, causing a degree of cancelation at a frequency where the spacing is equal to one-quarter of the radiated wavelength. It also improves the low-frequency radiation conditions if the front walls are hard. Nonetheless, it is hard to find places to mount five large loudspeakers in a flush-mounted configuration, and such mounting methods can be expensive. Furthermore the problems noted above, of detrimental reflections from rear loudspeakers off a hard front wall or speaker enclosure, can arise, depending on the angle of the rear loudspeakers. For such reasons, some sources recommend making the surfaces around the loudspeakers reflective at low frequencies and absorbent at mid and high frequencies.

The problem of low-frequency cancelation notches with free-standing loudspeakers can be alleviated but not completely removed. The perceived depth of the notch depends on the absorption of the surface and the directivity of the loudspeaker. By adjusting the spacing between the speaker and the wall, the frequency of the notch can be moved (downwards by making the distance greater), but the distance needed is often too great to be practical. If the speaker is moved close to the wall the notch position rises in frequency. This can be satisfactory for large loudspeakers whose directivity is high enough at middle frequencies to avoid too much rear radiation, but is a problem for smaller loudspeakers.

The use of a 5.1-channel monitoring arrangement (rather than five full-bandwidth loudspeakers), with proper bass management and crossovers, can in fact ameliorate the problems of free-standing loudspeakers considerably. This is because a subwoofer can be used to handle frequencies below 80–120Hz and it can be placed in the corner or near a wall where the cancelation problem is minimized. Furthermore, the low-frequency range of the main loudspeakers can then be limited so that the cancelation notch mentioned above occurs below their cut-off frequency.

FURTHER DEVELOPMENTS

Other surround formats will be described presently, and with the number of replay channels increasing as this area of activity develops, it is pertinent here to consider how many channels would be necessary for convincing, comprehensive coverage. Research indicates that the lateral directional resolution of the hearing mechanism can achieve an accuracy of about four degrees under ideal conditions, an exact figure being impossible to quote because it depends upon the frequency and transient content of a sound. It also depends upon different combinations of subject and environment. Under more general conditions, lateral resolution is probably little better than 10 degrees. This can be compared with the positioning resolution of the eyes, which is a fraction of a degree. Few bow and arrow shots and spear throws would reach their moving targets if they depended upon the ears alone, resolution being a function of wavelength of sound that the ears can respond to. Bats of course use ultrasonic frequencies with their very short wavelengths to achieve sufficient resolution to enable them to forage and avoid dangers in darkness. A format such as Dolby Atmos with its 64 replay speaker capability may therefore be capable of delivering directional information with a degree of resolution comparable with the hearing’s ability to resolve it. What one sees with one’s eyes on the projection screen, as in real life, is what supplies the pin-point homing in to fix the direction of an object precisely. The fact that the ear’s ability to resolve height information is rather less acute than in the lateral plain suggests that a comparatively small number of loudspeakers could give adequate representation of height.

A brief description of recent surround formats follows.

Auro-3D

The Auro-3D system has been developed for with-height surround sound in 9.1 channels, and for compatibility reasons it is embedded in the standard 5.1 format. Height information is created 30 degrees above the four main front-left, front-right, rear-left and rear-right channels with respect to a central listening position. A full Auro-3D system is specified as 13.1, the 11.1 format being regarded as the ideal format for cinema replay. Extra channels are added by applying the Auro-3D Octopus codec: it first reduces information of existing channels by forming data subsets and equating of adjacent samples. This makes room for corresponding samples of additional channels to be added, and seed samples of the original data are embedded such that two composite channels can be separated on replay by the application of a mathematical algorithm. The system designers point out that the least significant bits of a 24-bit system, a routine production bit-rate, cannot in practice be used for replay because the dynamic range of 24 bits, theoretically 144 dB, is far too large to reproduce. The codec therefore continuously monitors the signal during encoding and reduces its dynamic range so that it can be accommodated within 18 bits or fewer, freeing the lower bits which can now be used to carry decoding information.

Dolby Atmos

Dolby Atmos is a cinema surround sound system, providing up to 128 discrete audio input tracks feeding up to 64 separate loudspeaker feeds including overhead channels. A specific channel format can be supplied to a particular cinema which is optimized for its setup and replay capability. Figure 17.14 shows the basic processing chain. Dolby Atmos supports ‘beds’ — channel-based sub-mixes or systems which contain a variety of background atmospheric sounds, and combines these with objects — and specific foreground sounds such as dialog and story-telling effects, to produce an Atmos object and bed combination. A ‘print master’ is created during mastering which contains bed and object audio data together with metadata; this contains the Dolby Atmos mix along with Dolby Surround 7.1 and 5.1 mixes as needed. Material Exchange Format (MXF) wrapping techniques are used to deliver it to the digital cinema packaging facility via the standard Digital Cinema Project (DCP) format. The Dolby Atmosequipped cinema server recognizes the format and processes it for rendering; an Ethernet connection between the server and cinema processor allows the audio to be identified and synchronized. Cinema servers without the appropriate decoding simply ignore it and reproduce the standard 5.1 or 7.1 information which exists alongside.

FIGURE 17.14 Basic Dolby Atmos processing chain

A complex installation is required for full rendering, and Dolby Labs supply a setup service which includes comprehensive room analysis with equalization and level matching of loudspeakers. Dolby also recognizes that the system must be compatible with existing cinema layouts and future updated systems which still fall some way short of the ultimate 64-speaker implementation, and their setup is configured accordingly.

NHK 22.2

The NHK 22.2 or Hamasaki 22.2 format, named after its Japanese inventor, is a surround sound system to partner Super Hi-Vision, a television format having sixteen times the resolution of HDTV. It is worth mentioning here that Super Hi-Vision sound mixing recognizes that distance perception depends not only upon volume of a sound but also upon tonal quality and the direct-to-reverberation ratio as perceived from the listening position, and so mixer channels are equipped with two faders: one controls level in the conventional manner, the other controls the direct/reflected sound ratio of the object, thus controlling perceived distance.

22.2 supports 24 loudspeakers and two subwoofers arranged essentially in two layers, or heights:

CHANNELS 1–6: are equivalent to the familiar 5.1 layout.

CHANNELS 7–12: are a further five listener-level channels: FL centre, FR centre, back centre, side L, side R, plus a second subwoofer.

CHANNELS 13–21: are nine upper level channels, eight arranged around the periphery of the listening room, one directly overhead in the centre.

CHANNELS 22–24: are three lower front channels.

An additional low frequency effects channel (LFE) is also available. It is unusual for loudspeakers to be positioned at a height lower than the listening positions in surround sound formats, and it is more commonly encountered in live theater work where speakers can be positioned under rows of seats and beneath raised stages and stage traps, but these positions tend to be show specific. Cinema seating tends to be tiered, and one anticipates that future surround formats may use the voids underneath to house loudspeakers firing upwards through an acoustically transparent floor, giving complete envelopment.

For live transmission, NHK 22.2 experiments have used the MPEG-2 coding scheme, the video data being compressed to less than 600 Mbit/sec. from the original data rate of 24GB/sec. A 32-channel digital audio signal, with 48 K, 24 bit resolution, was embedded in the MPEG-2 stream.

Wave field synthesis

Wave Field Synthesis (WFS) is based on the Huygens-Fresnel principle which was originally developed for the analysis of light wave propagation. Christiaan Huygens (1629–95) argued that light consisted of waves, but Isaac Newton’s particle theory was the one generally accepted because of the latter’s prestige in the scientific community. Augustin-Jean Fresnel (1788–1827) however established by both theory and experiment that light was a wave phenomenon. The principle states that a light (or acoustical) wavefront can be regarded as the result of a superposition of a multitude of elementary spherical waves: each point of the wavefront can be regarded as the starting point of an elementary wave. Figure 16.1 illustrates early spatial recording and reproduction ideas for film which involved the use of a large number of microphones arranged in a line across the front of a stage feeding the same number of loudspeakers in a line in front of the listeners in the listening room. The resulting wavefront, created by the array of essentially hemispherical point sources of sound, created the information necessary for the ears to perceive directional information regarding the original sound field. Because the wavefront is created from many sound sources rather than just the two of conventional stereo, perceived positioning of sound sources is rather more consistent regardless of the lateral position of the listener. Conventional stereo is prone to image shift with comparatively little movements of the head because of the precedence effect, and also because of the various amplitude differences between the two speakers as heard from different listening positions.

The Kirchhoff-Helmholtz integral, too complex for brief description here (see AES Convention Paper 5788 on-line, which also provides information on higher order Ambisonics) indicates that the wave front consists of a continuous distribution of secondary sources, each emanating from both velocity (pressure gradient) and pressure primary source components. Thus, the ideal microphone array should consist of both pressure (omni) and pressure gradient (figure of eight) types, and ideally the reproducing loudspeakers should also consist of both dipole and essentially conventional hemispherical types.

A practical appreciation of how the system can work may be gleaned by first imagining a stone being dropped into a pond of water. The waves that are produced by the disturbance radiate uniformly in concentric circles towards the edge of the pond. As a wave approaches, one can draw an imaginary tangent and a perpendicular line where the tangent just touches the wave. The perpendicular line will point to the place where the stone entered the water, giving an accurate indication of the direction of the original disturbance without the observer necessarily having witnessed it. Figure 17.15a shows how this is accomplished with sound. Initially, the sound waves radiate from the source and are picked up by a large number of microphones in a row which ‘sample’ the wave front. These signals are conveyed to an equally large number of loudspeakers which are shown occupying effectively the same positions in space as the microphones, the latter used during the recording process, the former during listening. The loudspeakers can be seen to recreate the original wave front in the listening space as if it has passed through the dividing wall, supplying the ears with the necessary information for the perception of the direction of the original sound. Returning to the pond analogy, if one now moves to a different place at the edge of the pond, one can again use the approaching wave to deduce the direction of the original disturbance just as before, and so it is with the sound wave-front as illustrated in Figure 17.15b. Whatever the listening position, the sound source will appear to come from the same direction, and this phenomenon gives WFS a distinct advantage over stereo and other surround sound formats which rely upon amplitude and to a lesser extent time of arrival differences between loudspeakers to give directional clues. In these systems, changes in listening positions result in different amplitude relationships between loudspeakers which result in image shifting, something which WFS avoids.

FIGURE 17.15
a) In Wave Field Synthesis, a multitude of loudspeaker outputs in the listening space reproduce the wave front created by the original sound source in the recording space. b) The sound source remains in apparently the same position fora variety of listening positions

An additional factor is present in the practical implementation of the technique. As well as the listener hearing the output from the loudspeaker which lies on a direct line (the shortest path) between sound source and listening position, an ideal which would fix the sound source firmly in place, output from other speakers in its near vicinity will also be heard. This would tend to produce blurred or over-wide images if it were not for the fact that the speaker lying on the direct line delivers the sound to the ears first, the path length from sound source to listener via any other combination of microphone and loudspeaker being longer. The precedence effect therefore ensures that a firm image is perceived, and each listening position will have a particular loudspeaker lying on a direct line between listener and sound source which helps to ensure a firm image.

The Wave Field Synthesis technique can also produce images in front of the speakers in the listening space by creating a concave wave front. For instance speakers to the left and right can reproduce the sound of a central source ahead of the sound from the central speakers (in practice, the signals sent to the central speakers are delayed), creating in-the-room or even in-the-head images. It is the equivalent of the sound from an object in the listening space being reflected back to the listener by a concave surface.

A complete implementation of Wave Field Synthesis would involve the use of a large number of microphones positioned both in a line in front of the sound stage, and also at different heights and around the walls of the room to capture height and reverberation information accurately. But there is more than one possible approach to the placing of microphones. For instance, one can place them in positions so that they envelop the entire listening area, the performers being outside of this, so that reproduction again places the listeners in that same area. An alternative is to envelop the performers in an array of microphones, reproduction then placing the listeners among the musicians. In the listening room a similar number of loudspeakers would need to be placed in positions corresponding to the microphone positions. An approximation to the ideal was achieved at a concert in 2008 in Cologne cathedral, reproduced in a lecture hall by the Technical University of Berlin, where 2700 loudspeakers fed by 832 independent channels comprised the replay system. This emphasizes the necessity to develop rationalized systems for practical implementation of the idea capable of general adoption. The use of cardioid microphones with their tendency towards omni at low frequencies and a narrower pickup at high frequencies, rather than a combination of omnis and figure-of-eights, together with conventional loudspeakers with their tendency towards an omni polar pattern at low frequencies and a somewhat narrower dispersion at HF, has been found to complement the system.

Alternatively, rather fewer microphones can be used close up to the sound sources, their signals processed to give appropriate amplitude and time delays for allocation to the replay channels for the required perceived positions during replay. Imagine in Figure 17.15a that the object is very close miced, the signal from it being conveyed to the loudspeakers with the appropriate delays and amplitude levels, as if the large array of microphones were picking up the sound. This gives complete control over both the sound balance and the natural acoustics, and would be appropriate for non-classical music recording and close-miced live events. It also introduces flexibility and creativity during the recording and post-production stages. Speakers can be conventional types with essentially hemispherical or fan-like dispersion, arranged around the periphery of the listening room to give good horizontal directional information rather than full surround sound with height. A problem with a simplified replay system is that too few speakers can cause aliasing. At certain frequencies speakers will be spaced at half or whole wavelengths or multiples thereof, and a particular speaker can therefore give information that is apparently the same as that from an adjacent speaker. This, and missing information that would otherwise be present from a continuous array of speakers, can result in ambiguous directional information to be presented.

SURROUND SOUND MONITORING

This section is principally concerned with monitoring environments and configurations for 5.1-channel surround sound, although many of the principles may be found to apply in other configurations. The Audio Engineering Society has published an information document on this topic, containing more detailed guidelines (see ‘Recommended further reading’ at the end of this chapter).

Differences between two-channel and surround mixing rooms

There is a gradual consensus building around the view that rooms for surround monitoring should have an even distribution of absorbing and diffusing material. This is so that the rear loudspeakers function in a similar acoustic environment to the front loudspeakers. This is contrary to a number of popular two-channel control room designs that have one highly absorbtive end and the other end more reflective.

The effects of the acoustics of non-ideal control room acoustics on surround channels may be ameliorated if a distributed array of surround loudspeakers is used, preferably with some form of decorrelation between them to avoid strong comb filtering effects. (Appropriate gain/EQ modification should also be applied to compensate for the acoustic summing of their outputs.) This is more akin to the film sound situation, though, and may only be possible in larger dubbing stages. In smaller control rooms used for music and broadcasting mixing the space may not exist for such arrays. The ITU standard allows for more than one surround loudspeaker on either side and recommends that they are spaced equally on an arc from 60 to 150° from the front.

One of the difficulties of installing loudspeaker layouts according to the ITU standard, with equal spacing from the listening position and the surrounds at 110° ± 10°, is the required width of the space. This arrangement often makes it appropriate for the room to be laid out ‘wide’ rather than ‘long’ (as it might be for two-channel setups). If the room is one that was previously designed for two-channel stereo the rotation of the axis of symmetry may result in the acoustic treatment being inappropriately distributed. Also the location of doors and windows may make the modification of existing rooms difficult. If building a new room for surround monitoring then it is obviously possible to start from scratch and make the room wide enough to accommodate the surround loudspeakers and absorption in more suitable places. See also Fact File 17.4.

Front loudspeakers in general

As a rule, front loudspeakers can be similar to those used for two-channel stereo, although noting the particular problems with the center loudspeaker described in the next section. It has been suggested that low-directivity front loudspeakers may be desirable when trying to emulate the effect of a film mixing situation in a smaller surround control room. This is because in the large rooms typical of cinema listening the sound balancer is often well beyond the critical distance where direct and reflected sound are equal in level, and using speakers with low directivity helps to emulate this scenario in smaller rooms. Film mixers generally want to hear what the large auditorium audience member would hear, and this means being further from the loudspeakers than for small room domestic listening or conventional music mixing.

What to do with the center loudspeaker

One of the main problems encountered with surround monitoring is that of where to put the center loudspeaker in a mixing room. Ideally it should be of the same type or quality as the rest of the channels and this can make such speakers quite large. In 5.1 surround setups there is an increasing tendency to use somewhat smaller monitors for the five main channels than would be used for two-channel setups, handling the low bass by means of a subwoofer or two. This makes it more practical to mount a center loudspeaker behind the mixing console, but its height will often be dictated by a control room window or video monitor (see below). The center loudspeaker should be on the same arc as that bounding the other loudspeaker positions, as shown in the ITU layout above, otherwise the time delay of its direct sound at the listening position will be different from that of the other channels. If the center speaker is closer than the left or right channels then it should be delayed slightly to put it back in the correct place acoustically.

The biggest problem with the center loudspeaker arises when there is a video display present. A lot of 5.1 surround work is carried out in conjunction with pictures and clearly the display is likely to be in exactly the same place as one wants to put the center speaker. In cinemas this is normally solved by making the screen acoustically ‘transparent’ and using front projection, although this transparency is never complete and usually requires some equalization. In smaller mixing rooms the display is often a flat-screen plasma monitor or a CRT display and these do not allow the same arrangement.

With modestly sized solid displays for television purposes it can be possible to put the center loudspeaker underneath the display, with the display raised slightly, or above the display angled down slightly. The presence of a mixing console may dictate which of these is possible, and care should be taken to avoid strong reflections from the center loudspeaker off the console surface. Neither position is ideal and the problem may not be solved easily. Dolby suggests that if the center loudspeaker has to be offset height-wise it could be turned upside down compared with the left and right channels to make the tweeters line up, as shown in Figure 17.16.

Interestingly, the flat-panel loudspeaker company, NXT, has shown large flat-panel loudspeakers that can double as projection display screens, which may be one way forward if the sound quality of the flat panel speakers can be made high enough.

Surround loudspeakers

The standard recommendations for professional setups suggest that the surround loudspeakers should be of the same quality as the front ones. This is partly to ensure a degree of inter-system compatibility. In consumer environments this can be difficult to achieve, and the systems sold at the lower end of the market often incorporate much smaller surround loudspeakers than front. As mentioned above, the use of a separate loudspeaker to handle the low bass (a so-called ‘subwoofer’) may help to ameliorate this situation, as it makes the required volume of all the main speakers quite a lot smaller. Indeed Bose has had considerable success with a consumer system involving extremely small satellite speakers for the mid-high-frequency content of the replay system, mountable virtually anywhere in the room, coupled with a low-frequency driver that can be situated somewhere unobtrusive.

FIGURE 17.16
Possible arrangement of the centre loudspeaker in the presence of a TV screen, aligning HF units more closely

The directivity requirements of the surround loudspeakers have been the basis of some considerable disagreement in recent years. The debate centers around the use of the surround loudspeakers to create a diffuse, enveloping soundfield — a criterion that tends to favor either decorrelated arrays of direct radiators (speakers that produce their maximum output in the direction of the listener) or dipole surrounds (bi-directional speakers that are typically oriented so that their main axes do not point towards the listener). If the creation of a diffuse, enveloping rear and side soundfield is the only role for surround loudspeakers then dipoles can be quite suitable if only two loudspeaker positions are available. If, on the other hand, attempts are to be made at all-round source localization (which, despite the evidence in some literature, is not entirely out of the question), direct radiators are considered more suitable. Given the physical restrictions in the majority of control rooms it is likely that conventional loudspeakers will be more practical to install than dipoles (for the reason that dipoles, by their nature, need to be free-standing, away from the walls) whereas conventional speakers can be mounted flush with surfaces.

A lot depends on the application, since film sound mixing has somewhat different requirements from some other forms of mixing, and is intended for large auditoria. Much music and television sound is intended for small-room listening and is mixed in small rooms. This was also the primary motivation behind the use of dipoles in consumer environments — that is, the translation of the large-room listening experience into the small room. In large rooms the listener is typically further into the diffuse field than in small rooms, so film mixes made in large dubbing stages tend not to sound right in smaller rooms with highly directional loudspeakers. Dipoles or arrays can help to translate the listening experience of large-room mixes into smaller rooms.

Subwoofers

Low-frequency interaction between loudspeakers and rooms has a substantial bearing on the placement of subwoofers or low-frequency loudspeakers. There appears to be little agreement about the optimum location for a single subwoofer in a listening room, although it has been suggested that a corner location for a single subwoofer provides the most extended, smoothest, low-frequency response. In choosing the optimum locations for subwoofers one must remember the basic principle that loudspeakers placed in corners tend to give rise to a noticeable bass boost, and couple well to most room modes (because they have antinodes in the corners). Some subwoofers are designed specifically for placement in particular locations whereas others need to be moved around until the most subjectively satisfactory result is obtained. Some artificial equalization may be required to obtain a reasonably flat overall frequency response at the listening position. Phase shifts or time-delay controls are sometimes provided to enable some correction of the time relationship of the subwoofer to other loudspeakers, but this will necessarily be a compromise with a single unit. A subwoofer phase shift can be used to optimize the sum of the subwoofer and main loudspeakers in the crossover region for a flat response.

There is some evidence to suggest that multiple low-frequency drivers generating decorrelated signals from the original recording create a more natural spatial reproduction than monaural low-frequency reproduction from a single driver. Griesinger proposes that if monaural LF content is reproduced it is better done through two units placed to the sides of the listener, driven 90° out of phase, to excite the asymmetrical lateral modes more successfully and improve LF spaciousness.

Others warn of the dangers of multiple low-frequency drivers, particularly the problem of mutual coupling between loudspeakers that takes place when the driver spacing is less than about half a wavelength. In such situations the outputs of the drivers couple to produce a level greater than would be predicted from simple summation of the powers. This is due to the way in which the drivers couple to the impedance of the air and the effect that one unit has on the radiation impedance of the other. The effect of this coupling will depend on the positions to which sources are panned between drivers, affecting the compatibility between the equalization of mixes made for different numbers of loudspeakers.

SURROUND SOUND RECORDING TECHNIQUES

This section deals with the extension of the conventional two-channel recording technique to multiple channels for surround sound applications, concentrating on standard 5(.1 )-channel reproduction. Many of the concepts described here have at least some basis in conventional two-channel stereo, although analysis of the psychoacoustics of 5.1 surround has been nothing like as exhaustively investigated to date. Consequently a number of the techniques described below are at a relatively early stage of development and are still being evaluated.

The section begins with a review of microphone techniques that have been proposed for the pickup of natural acoustic sources in surround, followed by a discussion of multichannel panning and mixing techniques, mixing aesthetics and artificial reverberation, for use with more artificial forms of production such as pop music. Film sound approaches are not covered in any detail as they are well established and not the main theme of this book.

Principles of surround sound microphone technique

Surround sound microphone technique, as discussed here, is unashamedly biased towards the pickup of sound for 5.1 surround, although Ambisonic techniques are also covered because they are well documented and can be reproduced over five-channel loudspeaker systems if required, using suitable decoders. The techniques described in this section are most appropriate for use when the spatial acoustics of the environment are as important as those of the sources within, such as in classical music and other ‘natural’ recording. These microphone techniques tend to split into two main groups: those that are based on a single array of microphones in reasonably close proximity to each other, and those that treat the front and rear channels separately. The former are usually based on some theory that attempts to generate phantom images with different degrees of accuracy around the full 360° in the horizontal plane. (The problems of this are outlined in Fact File 17.5.) The latter usually have a front array providing reasonably accurate phantom images in the front, coupled with a separate means of capturing the ambient sound of the recording space (often for feeding to all channels in varying degrees). It is rare for such microphone techniques to provide a separate feed for the LFE channel, so they are really five-channel techniques not 5.1-channel techniques.

The concept of a ‘main array’ or ‘main microphone configuration’ for stereo sound recording is unusual to some recording engineers, possibly being a more European than American concept. The traditional European approach has tended to involve starting with a main microphone technique of some sort that provides a basic stereo image and captures the spatial effect of the recording environment in an aesthetically satisfactory way, and then supporting this subtly to varying degrees with spot mics as necessary. It has been suggested by some that many balances in fact end up with more sound coming from the spot mics than from the main array in practice, and that in this case it is the spatial treatment of the spot mics and any artificial reverberation that will have most effect on the perceived result. This is covered in the next section and the issue is open to users for further experimentation.

One must accept also that the majority of consumer systems will have great variability in the location and nature of the surround loudspeakers, making it unwise to set too much store by the ability of such systems to enable accurate soundfield reconstruction in the home. Better, it seems, would be to acknowledge the limitations of such systems and to create recordings that work best on a properly configured reproduction arrangement but do not rely on 100% adherence to a particular reproduction alignment and layout, or on a limited ‘hot spot’ listening position. Surround sound provides an opportunity to create something that works over a much wider range of listening positions than two-channel stereo, does not collapse rapidly into the nearest loudspeaker when one moves, and enhances the spatial listening experience.

Five-channel ‘main microphone’ arrays

Recent interest in five-channel recording has led to a number of variants on a common theme involving fairly closely spaced microphones (often cardioids) configured in a five-point array. The basis of most of these arrays is pairwise time-intensity trading, usually treating adjacent microphones as pairs covering a particular sector of the recording angle around the array. The generic layout of such arrays is shown in Figure 17.17. Cardioids or even supercardioids tend to be favored because of the increased direct-to-reverberant pickup they offer, and the interchannel level differences created for relatively modest spacings and angles, enabling the array to be mounted on a single piece of metalwork. The center microphone is typically spaced slightly forward of the L and R microphones thereby introducing a useful time advance in the center channel for center-front sources.

FACT FILE 17.5 SURROUND IMAGING

It is difficult to create stable phantom images to the sides of a listener in a standard 5.1 surround configuration, using simple pairwise amplitude or time differences. If the listener turns to face the speaker pair then the situation may be improved somewhat, but the subtended angle of about 80° still results in something of a hole in the middle and the same problem as before then applies to the front and rear pairs. Phantom sources can be created between the rear speakers but the angle is again quite great (about 140°), leading to a potential hole in the middle for many techniques, with the sound pulling towards the loudspeakers. This suggests that those techniques attempting to provide 360° phantom imaging may meet with only limited success over a limited range of listening positions, and might imply that one would be better off working with two- or three-channel stereo in the front and decorrelated ambient signals in the rear.

There is no escaping the fact that it is easiest to create images where there are loudspeakers, and that phantom images between loudspeakers subtending wide angles tend to be unstable or ‘hole-in-the-middle’. Given this unavoidable aspect of surround sound psychoacoustics, one should always expect imaging in standard five-channel replay systems to be best between the front loudspeakers, only moderate to the rear, and highly variable to the sides, as shown below. Since the majority of material one listens to tends to conform to this paradigm in any case (primary sources in front, secondary content to the sides and rear), the problem is possibly not as serious as it might seem.

FIGURE 17.17 Generic layout of five-channel microphone arrays based on time-amplitude trading

The spacing and angles between the capsules are typically based on the so-called ‘Williams curves’, based on time and amplitude differences required between single pairs of microphones to create phantom sources in particular locations. (In fact the Williams curves were based on two-channel pairs and loudspeaker reproduction in front of the listener. It is not necessarily the case that the same technique can be applied to create images between pairs at the sides of the listener, or that the same level and time differences will be suitable. There is some evidence that different delays are needed between side and rear pairs than those used between front pairs, and that inter-microphone crosstalk can affect the accuracy of stereo imaging to varying degrees depending on the array configuration and microphone type.) One possible configuration of many that satisfy Williams’ psychoacoustic criteria is pictured in Figure 17.18. To satisfy the requirements for this particular array the front triplet is attenuated by 2.4 dB in relation to the back pair.

Some success has also been had by the author’s colleagues using omni microphones instead of cardioids, with appropriate adjustments to the spacings according to ‘Williams-style’ time-amplitude trading curves (also with modifications to correct for different inter-loudspeaker angles and spacings to the sides and rear). These tend to give better overall sound quality but (possibly unsurprisingly) poorer front imaging. Side imaging has proved to be better than expected with omni arrays.

The closeness between the microphones in these arrays is likely to result in only modest low-frequency decorrelation between the channels. Good LF decorrelation is believed to be important for creating a sense of spaciousness, so these ‘near-coincident’ or ‘semi-correlated’ techniques will be less spacious than more widely spaced microphone arrays. Furthermore, the strong dependence of these arrays on precedence effect cues for localization makes their performance quite dependent on listener position and front-rear balance.

FIGURE 17.18
Five-channel microphone array using cardioids, one of a family of arrays designed by Williams and Le Dû. In this example the front triplet should be attenuated 2.4dB with respect to the rear pair

The INA (Ideale Nieren Anordnung) or ‘Ideal Cardioid Array’ (devised by Hermann and Henkels) is a three-channel front array of cardioids (INA-3) coupled with two surround microphones of the same polar pattern (making it into an INA-5 array). One configuration of this is shown in Figure 17.19, and a commercial implementation by Brauner is pictured in Figure 17.20. Table 17.1 shows some possible combinations of microphone spacing and recording angle for the front three microphones of this proposed array. In the commercial implementation the capsules can be moved and rotated and their polar patterns can be varied. The configuration shown in Figure 17.20 is termed an ‘Atmokreuz’ (atmosphere cross) by the authors. Its large front recording angle of 180° means that to use it as a main microphone it would have to be placed very close to the source unless all the sources were to appear to come from near the center. This might make it less well placed for the surroundings. Such a configuration may be more suitable for general pickup slightly further back in the hall.

Separate treatment of front imaging and ambience

Many alternative approaches to basic microphone coverage for 5.1 surround treat the stereo imaging of front signals separately from the capture of a natural-sounding spatial reverberation and reflection component, and some are hybrid approaches without a clear theoretical basis. Most do this by adopting a three-channel variant on a conventional two-channel technique for the front channels, as introduced in the previous chapter (sometimes optimized for more direct sound than in a two-channel array), coupled with a more or less decorrelated combination of microphones in a different location for capturing spatial ambience (sometimes fed just to the surrounds, other times to both front and surrounds). Sometimes the front microphones also contribute to the capture of spatial ambience, depending on the proportion of direct to reflected sound picked up, but the essential point here is that the front and rear microphones are not intentionally configured as an attempt at a 360° imaging array.

FIGURE 17.19 INA-5 cardioid array configuration. (see Table 17.1)

FIGURE 17.20 SPL Atmos 5.1 Surround Recording System. (Courtesy of Sound Performance Lab.)

Table 17.1 Dimensions and angles for the front three cardioid microphones of the INA array (see Figure 17.20). Note that the angle between the outer microphones should be the same as the recording angle

Recording angle (0)°	Microphone spacing (a) cm	Microphone spacing (b) cm	Array depth (c) cm
100	69	126	29
120	53	92	27
140	41	68	24
160	32	49	21
180	25	35	17.5

The so-called ‘Fukada Tree’, shown in Figure 17.21, is based on a Decca Tree, but instead of using omni mics it mainly uses cardioids. The reason for this is to reduce the amount of reverberant sound pickup by the front mics. Omni outriggers are sometimes added as shown, typically panned between L-LS and R-RS, in an attempt to increase the breadth of orchestral pickup and to integrate front and rear elements. The rear mics are also cardioids and are typically located at approximately the critical distance of the space concerned (where the direct and reverberant components are equal). They are sometimes spaced further back than the front mics by nearly 2 meters, although the dimensions of the tree can be varied according to the situation, distance, etc. (Variants are known that have the rear mics quite close to the front ones, for example.) The spacing between the mics more closely fulfills requirements for the decorrelated microphone signals needed to create spaciousness, depending on the critical distance of the space in which they are used. (Mics should be separated by at least the room’s critical distance for adequate decorrelation.) The front imaging of such an array would be similar to that of an ordinary Decca Tree (not bad, but not as precise as some other techniques).

FIGURE 17.21
The so-called ‘Fukada Tree’ of five spaced microphones for surround recording

The Dutch recording company Polyhymnia International has developed a variant on this approach that uses omnis instead of cardioids, to take advantage of their better sound quality. Using an array of omnis separated by about 3 meters between left-right and front-back they achieve a spacious result where the rear channels are well integrated with the front. The center mic is placed slightly forward of left and right. It is claimed that placing the rear omnis too far away from the front tree makes the rear sound detached from the front image, so one gets a distinct echo or repeat of the front sound from the rear.

Hamasaki of NHK (the Japanese broadcasting company) has proposed an arrangement based on near-coincident cardioids (30cm) separated by a baffle, as shown in Figure 17.22. Here the center cardioid is placed slightly forward of left and right, and omni outriggers are spaced by about 3 meters. These omnis are low-pass filtered at 250 Hz and mixed with the left and right front signals to improve the LF sound quality. Left and right surround cardioids are spaced about 2–3 meters behind the front cardioids and 3 meters apart. An ambience array is used further back, consisting of four figure-eight mics facing sideways, spaced by about 1 meter, to capture lateral reflections, fed to the four outer channels. This is placed high in the recording space.

FIGURE 17.22
A surround technique proposed by Hamasaki (NHK) consisting of a cardioid array, omni outriggers and separate ambience matrix

Theile proposes a front microphone arrangement shown in Figure 17.23. Whilst superficially similar to the front arrays described in the previous section, he reduces crosstalk between the channels by the use of supercardioid microphones at ±90° for the left and right channels and a cardioid for the center. (Supercardioids are more directional than cardioids and have the highest direct/reverberant pickup ratio of any first-order directional microphone. They have a smaller rear lobe than hypercardioids.) Theile’s rationale behind this proposal is the avoidance of crosstalk between the front segments. He proposes to enhance the LF response of the array by using a hybrid microphone for left and right, which crosses over to omni below 100 Hz, thereby restoring the otherwise poor LF response. The center channel is high-pass filtered above 100 Hz. Furthermore, the response of the supercardioids should be equalized to have a flat response to signals at about 30° to the front of the array (they would normally sound quite colored at this angle). Schoeps has developed a prototype of this array, and it has been christened ‘OCT’ for ‘Optimum Cardioid Triangle’.

For the ambient sound signal, Theile proposes the use of a crossed configuration of microphones, which has been christened the ‘IRT cross’ or ‘atmo-cross’. This is shown in Figure 17.24. The microphones are either cardioids or omnis, and the spacing is chosen according to the degree of correlation desired between the channels. Theile suggests 25 cm for cardioids and about 40 cm for omnis, but says that this is open to experimentation. Small spacings are appropriate for more accurate imaging of reflection sources at the hot spot, whereas larger spacings are appropriate for providing diffuse reverberation over a large listening area. The signals are mixed in to L, R, LS and RS channels, but not the center.

FIGURE 17.23
Theile’s proposed three-channel array for front pickup using supercardioids for the outer mics, crossed over to omni at LF. The spacing depends on the recording angle (C — R = 40cm for 90° and 30cm for 110°)

A ‘double MS’ technique has been proposed by Curt Wittig and others, shown in Figure 17.25. Two MS pairs (see previous chapter) are used, one for the front channels and one for the rear. The center channel can be fed from the front M microphone. The rear pair is placed at or just beyond the room’s critical distance. S gain can be varied to alter the image width in either sector, and the M mic’s polar pattern can be chosen for the desired directional response (it would typically be a cardioid). Others have suggested using a fifth microphone (a cardioid) in front of the forward MS pair, to feed the center channel, delayed to time align it with the pair. If the front and rear MS pairs are co-located it may be necessary to delay the rear channels somewhat (10–30ms) so as to reduce perceived spill from front sources into rear channels. In a co-located situation the same figure-eight microphone could be used as the S channel for both front and back pairs.

In general, the signals from separate ambience microphones fed to the rear loudspeakers may often be made less obtrusive and front-back ‘spill’ may be reduced by rolling off the high-frequency content of the rear channels. Some additional delay may also assist in the process of integrating the rear channel ambience. The precise values of delay and equalization can only really be arrived at by experimentation in each situation.

FIGURE 17.24 The IRT ‘atmocross’ designed for picking up ambient sound for routing to four loudspeaker channels (omitting the centre). Mics can be cardioids or omnis (wider spacing for omnis)

FIGURE 17.25 Double MS pair arrangement with small spacing between front and rear pair

Pseudo-binaural techniques

As with two-channel stereo, some engineers have experimented with pseudo-binaural recording techniques intended for loudspeaker reproduction. Jerry Bruck adapted the Schoeps ‘Sphere’ microphone (described earlier) for surround sound purposes by adding bi-directional (figure-eight) microphones near to the ‘ears’ (omni mics) of the sphere, with their main axis front-back, as pictured in Figure 17.26. This microphone is now manufactured by Schoeps as the KFM360. The figure-eights are mounted just below the sphere transducers so as to affect their frequency response in as benign a way as possible for horizontal sources. The outputs from the figure-eight and the omni at each side of the sphere are MS matrixed to create pairs of roughly back-to-back cardioids facing sideways. The size of the sphere creates an approximately ORTF spacing between the side-facing pairs. The matrixed output of this microphone can be used to feed four of the channels in a five-channel reproduction format (L, R, LS and RS). A Schoeps processing unit can be used to derive an equalized center channel from the front two, and enables the patterns of front and rear coverage to be modified.

FIGURE 17.26 (a) Schoeps KFM360 sphere microphone with additional figure-eights near the surface-mounted omnis. (b) KFM360 control box. (Courtesy of Schalltechnik Dr.-Ing. Schoeps GmbH.)

FIGURE 17.27 Double MS pairs facing sideways used to feed the side pairs of channels combined with a dummy head facing forwards to feed the front image

Michael Bishop of Telarc has reportedly adapted the ‘double MS’ technique described in the previous section by using MS pairs facing sideways, and a dummy head some 1–2.5 m in front, as shown in Figure 17.27. The MS pairs are used between side pairs of channels (L and LS, R and RS) and line-up is apparently tricky. The dummy head is a model equalized for a natural response on loudspeakers (Neumann KU100) and is used for the front image.

Multimicrophone techniques

Most real recording involves the use of spot microphones in addition to a main microphone technique of some sort, indeed in many situations the spot microphones may end up at higher levels than the main microphone or there may be no main microphone. The principles outlined in the previous chapter still apply in surround mixing, but now one has the issue of surround panning to contend with. The principles of this are covered in more detail in ‘Multichannel panning techniques’, below.

Some engineers report success with the use of multiple sphere microphones for surround balances, which is probably the result of the additional spatial cues generated by using a ‘stereo’ spot mic rather than a mono one, avoiding the flatness and lack of depth often associated with panned mono sources. Artificial reverberation of some sort is almost always helpful when trying to add spatial enhancement to panned mono sources, and some engineers prefer to use amplitude-panned signals to create a good balance in the front image, plus artificial reflections and reverberation to create a sense of spaciousness and depth.

Ambisonic or ‘SoundField’ microphone principles

The so-called ‘SoundField’ microphone, pictured in Figure 17.28, is designed for picking up full periphonic sound in the Ambisonic A-format (see ‘Signal formats’, above), and is coupled with a control box designed for converting the microphone output into both the B-format and the D-format. Decoders can be created for using the output of the SoundField microphone with a 5.1-channel loudspeaker array, including that recently introduced by SoundField Research. The full periphonic effect can only be obtained by reproduction through a suitable periphonic decoder and the use of a tetrahedral loudspeaker array with a height component, but the effect is quite stunning and worth the effort.

FIGURE 17.28 (a) The SoundField microphone, (b) accompanying control box, (c) capsule arrangement. (Courtesy of SoundField Ltd.)

The SoundField microphone is capable of being steered electrically by using the control box, in terms of azimuth, elevation, tilt or dominance, and as such it is also a particularly useful stereo microphone for two-channel work. The microphone encodes directional information in all planes, including the pressure and velocity components of indirect and reverberant sounds.

Figure 17.28c shows the physical capsule arrangement of the microphone, which was shown diagrammatically in Figure 17.10. Four capsules with sub-cardioid polar patterns (between cardioid and omni, with a response equal to 2 + cosθ) are mounted so as to face in the A-format directions, with electronic equalization to compensate for the inter-capsule spacing, such that the output of the microphone truly represents the soundfield at a point (true coincidence is maintained up to about 10kHz). The capsules are matched very closely and each contributes an equal amount to the B-format signal, thus resulting in cancelation between variations in inherent capsule responses. The A-format signal from the microphone can be converted to B-format according to the equations given in ‘Signal formats’, above.

The combination of B-format signals in various proportions can be used to derive virtually any polar pattern in a coincident configuration, using a simple circuit as shown in Figure 17.29 (two-channel example). Crossed figure-eights are the most obvious and simple stereo pair to synthesize, since this requires the sum-and-difference of X and Y, whilst a pattern such as crossed cardioids requires that the omni component be used also, such that:

FIGURE 17.29 Circuit used for controlling stereo angle and polar pattern in Sound Field microphone. (Courtesy of Ken Farrar.)

From the circuit it will be seen that a control also exists for adjusting the effective angle between the synthesized pair of microphones, and that this works by varying the ratio between X and Y in a sine/cosine relationship.

FIGURE 17.30 Azimuth, elevation and dominance in SoundField microphone

The microphone may be controlled, without physical reorientation, so as to ‘point’ in virtually any direction (see Figure 17.30). It may also be electrically inverted, so that it may be used upside-down. Inversion of the microphone is made possible by providing a switch which reverses the phase of Y and Z components. W and X may remain unchanged since their directions do not change if the microphone is used upside-down.

MULTICHANNEL PANNING TECHNIQUES

The panning of signals between more than two loudspeakers presents a number of psychoacoustic problems, particularly with regard to appropriate energy distribution of signals, accuracy of phantom source localization, off-center listening and sound timbre. A number of different solutions have been proposed, in addition to the relatively crude pairwise approach used in much film sound, and some of these are outlined below. The issue of source distance simulation is also discussed.

Here are Michael Gerzon’s criteria for a good panning law for surround sound:

The aim of a good panpot law is to take monophonic sounds, and to give each one amplitude gains, one for each loudspeaker, dependent on the intended illusory directional localisation of that sound, such that the resulting reproduced sound provides a convincing and sharp phantom illusory image. Such a good panpot law should provide a smoothly continuous range of image directions for any direction between those of the two outermost loudspeakers, with no ‘bunching’ of images close to any one direction or ‘holes’ in which the illusory imaging is very poor.

Pairwise amplitude panning

Pairwise amplitude panning is the type of pan control most recording engineers are familiar with, as it is the approach used on most two-channel mixers. As described in the previous chapter, it involves adjusting the relative amplitudes between a pair of adjacent loudspeakers so as to create a phantom image at some point between them. This has been extended to three front channels and is also sometimes used for panning between side loudspeakers (e.g. L and LS) and rear loudspeakers. The typical sine/cosine panning law devised by Blumlein for two-channel stereo is often simply extended to more loudspeakers. Most such panners are constructed so as to ensure constant power as sources are panned to different combinations of loudspeakers, so that the approximate loudness of signals remains constant.

Panning using amplitude or time differences between widely spaced side loudspeakers is not particularly successful at creating accurate phantom images. Side images tend not to move linearly as they are panned and tend to jump quickly from front to back. Spectral differences resulting from differing HRTFs of front and rear sound tend to result in sources appearing to be spectrally split or ‘smeared’ when panned to the sides.

In some mixers designed for five-channel surround work, particularly in the film domain, separate panners are provided for L-C-R, LS-RS, and front-surround. Combinations of positions of these amplitude panners enable sounds to be moved to various locations, but some more successfully than others. For example, sounds panned so that some energy is emanating from all loudspeakers (say, panned centrally on all three pots) tend to sound diffuse for center listeners, and in the nearest loudspeaker for those sitting off-center. Joystick panners combine these amplitude relationships under the control of a single lever that enables a sound to be ‘placed’ dynamically anywhere in the surround soundfield. Moving effects made possible by these joysticks are often unconvincing and need to be used with experience and care.

Research undertaken by Jim West at the University of Miami showed that, despite the limitations of constant power ‘pairwise’ panning, it proved to offer reasonably stable images for center and off-center listening positions, for moving and stationary sources, compared with some other more esoteric algorithms. Front-back confusion was noticed in some cases, for sources panned behind the listener.

‘Ambisonic’ panning laws

A number of variations of panning laws loosely based on Ambisonic principles have been attempted. These are primarily based on the need to optimize psychoacoustic localization parameters according to low- and high-frequency models of human hearing. Gerzon proposed a variety of psychoacoustically optimal panning laws for multiple speakers that can theoretically be extended to any number of speakers. Some important features of these panning laws are:

■ There is often output from multiple speakers in the array, rather than just two.

■ They tend to exhibit negative gain components (out-of-phase signals) in some channels for some panning positions.

■ The channel separation is quite poor.

A number of authors have shown how this type of panning could be extended to five-channel layouts according to the standards of interest in this book. McKinnie proposed a five-channel panning law based on similar principles, suitable for the standard loudspeaker angles. It is shown in Figure 17.31. Moorer also proposed some four- and five-channel panning laws, pictured in Figure 17.32 (only half the circle is shown because the other side is symmetrical). They differ because Moorer has chosen to constrain the solution to first order spatial harmonics (a topic beyond the scope of this book). He proposes that the standard ±30° angle for the front loudspeakers is too narrow for music, and that it gives rise to levels in the center channel that are too high in many cases to obtain adequate L-R decorrelation, as well as giving rise to strong out-of-phase components. He suggests at least ±45° to avoid this problem. Furthermore, he states that the four-channel law is better behaved with these particular constraints and might be more appropriate for surround panning.

FIGURE 17.31 Five channel panning law based on Gerzon’s psychoacoustic principles. (Courtesy of Douglas McKinnie.)

FIGURE 17.32 Two panning laws proposed by Moorer designed for optimum velocity and energy vector localization with 2nd spatial harmonics constrained to zero. (a) Four-channel soundfield panning. The front speakers are placed at 30° angles left and right, and the rear speakers are at 110° left and right. (b) This shows an attempt to perform soundfield panning across five speakers where the front left and right are at 30° angles and the rear left and right are at 110° angles. Note that at zero degrees, the center speaker is driven strongly out of phase. At 180°, the center speaker is driven quite strongly, and the front left and right speakers are driven strongly out of phase. At low frequencies, the wavelengths are quite large and the adjacent positive and negative sound pressures will cancel out. At higher frequencies, their energies can be expected to sum in an RMS sense. (Courtesy of James A. Moorer.)

Head-related panning

Horbach of Studer has proposed alternative panning techniques based on Theile’s ‘association model’ of stereo perception. This uses assumptions similar to those used for the Schoeps ‘sphere’ microphone, based on the idea that ‘head-related’ or pseudo-binaural signal differences should be created between the loudspeaker signals to create natural spatial images. It is proposed that this can work without crosstalk canceling, but that crosstalk canceling can be added to improve the full 3D effect for a limited range of listening positions.

In creating his panning laws, Horbach chooses to emulate the response of a simple spherical head model that does not give rise to the high-frequency peaks and troughs in response typical of heads with pinnae. This is claimed to create a natural frequency response for loudspeaker listening, very similar to that which would arise from a sphere microphone used to pick up the same source. Sources can be panned outside the normal loudspeaker angle at the front by introducing a basic crosstalk canceling signal into the opposite front loudspeaker (e.g. into the right when a signal is panned left). Front-back and center channel panning are incorporated by conventional amplitude control means. He also proposes using a digital mixer to generate artificial echoes or reflections of the individual sources, routed to appropriate output channels, to simulate the natural acoustics of sources in real spaces, and to provide distance cues.

USEFUL WEBSITES

www.dolby.com: Contains an informative history of cinema surround sound systems, and information about current Dolby formats.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 17 Surround Sound

Create new playlist

Sign In

Sign Up

CHAPTER CONTENTS

Table of Contents for
Chapter 17 Surround Sound