Chapter 17

Surround sound

This chapter is concerned with the most commonly encountered multichannel (i.e. more than two channels) stereo reproduction configurations, most of which are often referred to as surround sound. Standards or conventions that specify basic channel or loudspeaker configurations are distinguished from proprietary systems such as Dolby Digital and DTS whose primary function is the coding and delivery of multichannel audio signals. The latter are discussed in the second part of the chapter, in which is also contained an explanation of the Ambisonic system for stereo signal representation. Surround sound standards often specify little more than the channel configuration and the way the loudspeakers should be arranged. This leaves the business of how to create or represent a spatial sound field entirely up to the user.

Three-channel (3-0) stereo

It is not proposed to say a great deal about the subject of three-channel stereo here, as it is rarely used on its own. Nonetheless it does form the basis of a lot of surround sound systems. It requires the use of a left (L), centre (C) and right (R) channel, the loudspeakers arranged equidistantly across the front sound stage, as shown in Figure 17.1. It has some precedents in historical development, in that the stereophonic system developed by Steinberg and Snow in the 1930s used three channels (see Chapter 16). Three front channels have also been commonplace in cinema stereo systems, mainly because of the need to cover a wide listening area and because wide screens tend to result in a large distance between left and right loudspeakers. Two channels only became the norm in consumer systems for reasons of economy and convenience, and particularly because it was much more straightforward to cut two channels onto an analogue disk than three.

There are various advantages of three-channel stereo. Firstly, it allows for a somewhat wider front sound stage than two-channel stereo, if desired, because the centre channel acts to ‘anchor’ the central image and the left and right loudspeakers can be placed further out to the sides (say ±45°). (Note, though, that in the current five-channel surround sound standard the L and R loudspeakers are in fact placed at ±30°, for compatibility with two-channel stereo material.) Secondly, the centre loudspeaker enables a wider range of listening positions in many cases, as the image does not collapse quite as readily into the nearest loudspeaker. It also anchors dialogue more clearly in the middle of the screen in sound-for-picture applications. Thirdly, the centre image does not suffer the same timbral modification as the centre image in two-channel stereo, because it emanates from a real source.

Images

Figure 17.1   Three-channel stereo reproduction usually involves three equally spaced loudspeakers in front of the listener. The angle between the outer loudspeakers is 60° in the ITU standard configuration, for compatibility with two-channel reproduction, but the existence of a centre loudspeaker makes wider spacings feasible if compatibility is sacrificed

A practical problem with three-channel stereo is that the centre loudspeaker position is often very inconvenient. Although in cinema reproduction it can be behind an acoustically transparent screen, in consumer environments, studios and television environments it is almost always just where one wants a television monitor or a window. Consequently the centre channel has to be mounted above or below the object in question, and possibly made smaller than the other loudspeakers.

Four-channel surround (3-1 stereo)

In this section the form of stereo called ‘3-1 stereo’ in some international standards, or ‘LCRS surround’ in some other circles, is briefly described.

Proprietary encoding and decoding technology from Dolby relating to this format is described later.

‘Quadraphonic’ reproduction using four loudspeakers in a square arrangement is not covered further here (it was mentioned in the Introduction), as it has little relevance to current practice.

Purpose of four-channel systems

The merits of three front channels have already been introduced in the previous section. In the 3-1 approach, an additional ‘effects’ channel or ‘surround’ channel is added to the three front channels, routed to a loudspeaker or loudspeakers located behind (and possibly to the sides) of listeners. It was developed first for cinema applications, enabling a greater degree of audience involvement in the viewing/listening experience by providing a channel for ‘wrap-around’ effects. This development is attributed to 20th Century Fox in the 1950s, along with widescreen Cinemascope viewing, being intended to offer effective competition to the new television entertainment.

There is no specific intention in 3-1 stereo to use the effects channel as a means of enabling 360° image localisation. In any case, this would be virtually impossible with most configurations as there is only a single audio channel feeding a larger number of surround loudspeakers, effectively in mono.

Loudspeaker configuration

Figure 17.2 shows the typical loudspeaker configuration for this format. In the cinema there are usually a large number of surround loudspeakers fed from the single S channel (‘surround channel’, not to be confused with the ‘S’ channel in sum-and-difference stereo), in order to cover a wide audience area. This has the tendency to create a relatively diffuse or distributed reproduction of the effects signal. The surround speakers are sometimes electronically decorrelated to increase the degree of spaciousness or diffuseness of surround effects, in order that they are not specifically localised to the nearest loudspeaker or perceived inside the head.

In consumer systems reproducing 3-1 stereo, the mono surround channel is normally fed to two surround loudspeakers located in similar positions to the 3-2 format described below. The gain of the channel is usually reduced by 3 dB so that the summation of signals from the two speakers does not lead to a level mismatch between front and rear.

Limitations of four-channel reproduction

The mono surround channel is the main limitation in this format. Despite the use of multiple loudspeakers to reproduce the surround channel, it is still not possible to create a good sense of envelopment of spaciousness without using surround signals that are different on both sides of the listener. Most of the psychoacoustic research suggests that the ears need to be provided with decorrelated signals to create the best sense of envelopment and effects can be better spatialised using stereo surround channels.

Images

Figure 17.2   3-1 format reproduction uses a single surround channel usually routed (in cinema environments) to an array of loudspeakers to the sides and rear of the listening area. In consumer reproduction the mono surround channel may be reproduced through only two surround loudspeakers, possibly using artificial decorrelation and/or dipole loudspeakers to emulate the more diffused cinema experience

5.1 channel surround (3-2 stereo)

This section deals with the 3-2 configuration that has been standardised for numerous surround sound applications, including cinema, television and consumer applications. Because of its wide use in general parlance, the term ‘5.1 surround’ will be used below. While without doubt a compromise, it has become widely adopted in professional and consumer circles and is likely to form the basis for consumer surround sound for the foreseeable future.

Various international groups have worked on developing recommendations for common practice and standards in this area, and some of the information below is based on the effort of the AES Technical Committee on Multichannel and Binaural Audio Technology to bring together a number of proposals.

Purpose of 5.1-channel systems

Four-channel systems have the disadvantage of a mono surround channel, and this limitation is removed in the 5.1-channel system, enabling the provision of stereo effects or room ambience to accompany a primarily front-orientated sound stage. This front-oriented paradigm is a most important one as it emphasises the intentions of those that finalised this configuration, and explains the insistence in some standards on the use of the term ‘3-2 stereo’ rather than ‘five-channel surround’. Essentially the front three channels are intended to be used for a conventional three-channel stereo sound image, while the rear/side channels are only intended for generating supporting ambience, effects or ‘room impression’. In this sense, the standard does not directly support the concept of 360° image localisation, although it may be possible to arrive at recording techniques or signal processing methods that achieve this to a degree.

The front–rear distinction is a conceptual point often not appreciated by those that use the format. Two-channel stereo can be relatively easily modelled and theoretically approached in terms of localisation vectors etc. for sounds at any angle between the loudspeakers. It is more difficult, though, to come up with such a model for the five-channel layout described below, as it has unequal angles between the loudspeakers and a particularly large angle between the two rear loudspeakers. It is possible to arrive at gain and phase relationships between these five loudspeakers that are similar to those used in Ambisonics for representing different source angles, but the varied loudspeaker angles make the imaging stability less reliable in some sectors than others. For those who do not have access to the sophisticated panning laws or psychoacoustic matrices required to feed five channels accurately for all-round localisation it may be better to treat the format in ‘cinema style’ – in other words with a three-channel front image and two surround effect channels. With such an approach it is still possible to create very convincing spatial illusions, with good envelopment and localisation qualities.

One cannot introduce the 5.1 surround system without explaining the meaning of the ‘.1’ component. This is a dedicated low-frequency effects (LFE) channel or sub-bass channel. It is called ‘.1’ because of its limited bandwidth. Strictly, the international standard nomenclature for 5.1 surround should be ‘3-2-1’, the last digit indicating the number of LFE channels.

International standards and configurations

The loudspeaker layout and channel configuration is specified in the ITU-R BS.775 standard. This is shown in Figure 17.3 and Fact File 17.1 A display screen is also shown in the figure for sound with picture applications, and there are recommendations concerning the relative size of the screen and the loudspeaker base width shown in the accompanying table. The left and right loudspeakers are located at ±30° for compatibility with two-channel stereo reproduction. In many ways this need for compatibility with 2/0 is a pity, because the centre channel unavoidably narrows the front sound stage in many applications, and the front stage could otherwise take advantage of the wider spacing facilitated by three-channel reproduction. It was none the less considered crucial for the same loudspeaker configuration to be usable for all standard forms of stereo reproduction, for reasons most people will appreciate.

Images

Figure 17.3   3-2 format reproduction according to the ITU-R BS.775 standard uses two independent surround channels routed to one or more loudspeakers per channel

Fact file 17.1 Track allocations in 5.1

Standards recommend the track allocations to be used for 5.1 surround on eight-track recording formats, as shown in the table below.

Although other configurations are known to exist there is a strong move to standardise on this arrangement (see also the notes below the table).

Images

1The term ‘track’ is used to mean either tracks on magnetic tape or virtual tracks on other storage media where no real tracks exist.

2This colour coding is only a proposal of the German Surround Sound Forum at present, and not internationally standardised.

3Preferably used in film sound, but is optional for home reproduction. If no LFE signal is being used, track 4 can be used freely, e.g. for commentary. In some regions a mono surround signal MS = LS + RS is applied, where the levels of LS and RS are decreased by 3 dB before summing.

4Tracks 7 and 8 can be used alternatively, for example for commentary, for additional surround-signals, or for half-left/half-right front signal (e.g. for special film formats), or rather for the matrix format sum signal Lt/Rt.

The surround loudspeaker locations, at approximately ±110°, are placed so as to provide a compromise between the need for effects panning behind the listener and the lateral energy important for good envelopment. In this respect they are more like ‘side’ loudspeakers than rear loudspeakers, and in many installations this is an inconvenient location causing people to mount them nearer the rear than the standard suggests. (Some have said that a 150° angle for the rear loudspeakers provides a more exciting surround effect.) In the 5.1 standard there are normally no loudspeakers directly behind the listener, which can make for creative difficulties. This has led to a Dolby proposal called EX (described below) that places an additional speaker at the centre-rear location. (This is not part of the current standard, though.) The ITU standard allows for additional surround loudspeakers to cover the region around listeners, similar to the 3-1 arrangement described earlier. If these are used then they are expected to be distributed evenly in the angle between ±60° and ±150°.

Surround loudspeakers should be the same as front loudspeakers where possible, in order that uniform sound quality can be obtained all around. That said, there are arguments for use of dipole loudspeakers in these positions. Dipoles radiate sound in more of a figure-eight pattern and one way of obtaining a diffuse surround impression is to orient these with the nulls of the figure-eight towards the listening position. In this way the listener experiences more reflected than direct sound and this can give the impression of a more spacious ambient soundfield that may better emulate the cinema listening experience in small rooms. Dipoles make it correspondingly more difficult to create defined sound images in rear and side positions, though.

The LFE channel and use of subwoofers

The low-frequency effects channel is a separate sub-bass channel with an upper limit extending to a maximum of 120 Hz (see Fact File 17.2). It is intended for conveying special low-frequency content that requires greater sound pressure levels and headroom than can be handled by the main channels. It is not intended for conveying the low-frequency component of the main channel signals, and its application is likely to be primarily in sound-for-picture applications where explosions and other high-level rumbling noises are commonplace, although it may be used in other circumstances.

In consumer audio systems, reproduction of the LFE channel is considered optional. Because of this, recordings should normally be made so that they sound satisfactory even if the LFE channel is not reproduced. The EBU (European Broadcasting Union) comments on the use of the LFE channel as follows:

When an audio programme originally produced as a feature film for theatrical release is transferred to consumer media, the LFE channel is often derived from the dedicated theatrical subwoofer channel. In the cinema, the dedicated subwoofer channel is always reproduced, and thus film mixes may use the subwoofer channel to convey important low frequency programme content. When transferring programmes originally produced for the cinema over television media (e.g. DVD), it may be necessary to re-mix some of the content of the subwoofer channel into the main full bandwidth channels. It is important that any low frequency audio which is very significant to the integrity of the programme content is not placed into the LFE channel. The LFE channel should be reserved for extreme low frequency, and for very high level <120 Hz programme content which, if not reproduced, will not compromise the artistic integrity of the programme.

With cinema reproduction the in-band gain of this channel is usually 10 dB higher than that of the other individual channels. This is achieved by a level increase of the reproduction channel, not by an increased recording level. (This does not mean that the broadband or weighted SPL of the LFE loudspeaker should measure 10 dB higher than any of the other channels – in fact it will be considerably less than this as its bandwidth is narrower.)

Fact file 17.2 Bass management in 5.1

It is a common misconception that any sub-bass or subwoofer loudspeaker(s) that may be used on reproduction must be fed directly from the LFE channel in all circumstances. While this may be the case in the cinema, bass management in the consumer reproducing system is not specified in the standard and is entirely system dependent. It is not mandatory to feed low-frequency information to the LFE channel during the recording process, neither is it mandatory to use a subwoofer, indeed it has been suggested that restricting extreme low-frequency information to a monophonic channel may limit the potential for low-frequency spaciousness in balances. In music mixing it is likely to be common to send the majority of full-range LF information to the main channels, in order to retain the stereo separation between them.

In practical systems it may be desirable to use one or more subwoofers to handle the low-frequency content of a mix on reproduction. The benefit of this is that it enables the size of the main loudspeakers to be correspondingly reduced, which may be useful practically when it comes to finding places to put them in living rooms or sound control rooms. In such cases crossover systems split the signals between main loudspeakers and subwoofer(s) somewhere between 80 Hz and 160 Hz. In order to allow for reproduction of the LFE channel and/or the low-frequency content from the main channels through subwoofer loudspeakers, a form of bass management akin to that shown below is typically employed.

Images

Limitations of 5.1-channel reproduction

The main limitations of the 5.1 surround format are firstly, that it was not intended for accurate 360° phantom imaging capability, as explained above. While it may be possible to achieve a degree of success in this respect, the loudspeaker layout is not ideally suited to it. Secondly, the front sound stage is narrower than it could be if compatibility with 2/0 reproduction was not a requirement. Thirdly, the centre channel can prove problematic for music balancing, as conventional panning laws and coincident microphone techniques are not currently optimised for three loudspeakers, having been designed for two-speaker stereo. Simple bridging of the centre loudspeaker between left and right signals has the effect of narrowing the front image compared with a two-channel stereo reproduction of the same material. This may be resolved over time as techniques suited better to three-channel stereo are resurrected or developed. Fourthly, the LS and RS loudspeakers are located in a compromise position, leading to a large hole in the potential image behind the listener and making it difficult to find physical locations for the loudspeakers in practical rooms.

These various limitations of the format, particularly in some peoples’ view for music purposes, have led to various non-standard uses of the five or six channels available on new consumer disc formats such as DVD-A (Digital Versatile Disc – Audio) and SACD (Super Audio Compact Disc). For example, some are using the sixth channel (that would otherwise be LFE) in its full bandwidth form on these media to create a height channel. Others are making a pair out of the ‘LFE’ channel and the centre channel so as to feed a pair of front-side loudspeakers, enabling the rear loudspeakers to be further back. These are non-standard uses and should be clearly indicated on any recordings.

Signal levels in 5.1 surround

In film sound environments it is the norm to increase the relative recording level of the surround channels by 3 dB compared with that of the front channels. This is in order to compensate for the −3 dB acoustic alignment of each surround channel’s SPL with respect to the front that takes place in dubbing stages and movie theatres. It is important to be aware of this discrepancy between practices, as it is the norm in music mixing and broadcasting to align all channels for equal level both on recording media and for acoustical monitoring. Transfers from film masters to consumer or broadcast media may require 3 dB alteration in the gain of the surround channels.

Other multichannel configurations

Although the 5.1 surround standard is becoming widely adopted as the norm for the majority of installations, other proposals and systems exist, typically involving more channels to cover a large listening area more accurately. It is reasonable to assume that the more real loudspeakers exist in different locations around the listener, the less one has to rely on the formation of phantom images to position sources accurately, and the more freedom one has in listener position. The added complication of mixing for such larger numbers of channels must be considered as a balancing factor.

The reader is also referred to the discussion of Ambisonics, as this system can be used with a wide range of different loudspeaker configurations depending on the decoding arrangements used.

7.1 channel surround

Deriving from widescreen cinema formats, the 7.1 channel configuration normally adds two further loudspeakers to the 5.1 channel configuration, located at centre-left (CL) and centre-right (CR), as shown in Figure 17.4. This is not a format primarily intended for consumer applications, but for large cinema auditoria where the screen width is such that the additional channels are needed to cover the angles between the loudspeakers satisfactorily for all the seats in the auditorium. Sony’s SDDS cinema system is a common proprietary implementation of this format, as is the original 70 mm Dolby Stereo format (see below), although the original 70 mm analogue format only used one surround channel.

Images

Figure 17.4   Some cinema sound formats for large auditorium reproduction enhance the front imaging accuracy by the addition of two further loudspeakers, centre-left and centre-right

Lexicon and Meridian have also implemented a seven-channel mode in their consumer surround decoders, but the recommended locations for the loudspeakers are not quite the same as in the cinema application. The additional channels are used to provide a wider side-front component and allow the rear speakers to be moved round more to the rear than in the 5.1 arrangement.

10.2 channel surround

Tomlinson Holman has spent considerable effort promoting a 10.2-channel surround sound system as ‘the next step’ in spatial reproduction, but this has not yet been adopted as standard. To the basic five-channel array he adds wider side-front loudspeakers and a centre-rear channel to ‘fill in the holes’ in the standard layout. He also adds two height channels and a second LFE channel. The second LFE channel is intended to provide lateral separation of decorrelated low bass content to either side of the listening area, as suggested by Griesinger to enhance low-frequency spaciousness.

Surround sound systems

This part of the chapter concerns what will be called surround sound ‘systems’, which includes proprietary formats for the coding and transfer of surround sound. These are distinguished from the generic configurations and international standards discussed already. Most of the systems covered here are the subject of patents and intellectual property rights. In some proprietary systems the methods of signal coding or matrixing for storage and delivery are defined (e.g.: Dolby Stereo), whilst others define a full source-receiver signal representation system (e.g.: Ambisonics).

Matrixed surround sound systems

While ideally one would like to be able to transfer or store all the channels of a surround sound mix independently and discretely, it may be necessary to make use of existing two-channel media for compatibility with other systems. The systems described in the following sections all deal with multichannel surround sound in a matrixed form (in other words, using an algorithm that combines the channels in such a way that they can be subsequently extracted using a suitable decoder). By matrixing the signals they can be represented using fewer channels than the source material contains. This gives rise to some side-effects and the signals require careful dematrixing, but the approach has been used widely for many years, mainly because of the unavailability of multichannel delivery media in many environments.

Dolby Stereo, Surround and Prologic

Dolby Labs was closely involved with the development of cinema surround sound systems, and gradually moved into the area of surround sound for consumer applications.

The original Dolby Stereo system involved a number of different formats for film sound with three to six channels, particularly a 70 mm film format with six discrete tracks of magnetically recorded audio, and a 35 mm format with two optically recorded audio tracks onto which were matrixed four audio channels in the 3-1 configuration (described above). The 70 mm format involved L, LC, C, RC, R and S channels, whereas the 35 mm format involved only L, C, R and S. Both clearly only involved mono surround information. The four-channel system is the one most commonly known today as Dolby Stereo, having found widespread acceptance in the cinema world and used on numerous movies. Dolby Surround was introduced in 1982 as a means of emulating the effects of Dolby Stereo in a consumer environment. Essentially the same method of matrix decoding was used, so movies transferred to television formats could be decoded in the home in a similar way to the cinema. Dolby Stereo optical sound tracks for the cinema were Dolby A noise-reduction encoded and decoded, in order to improve the signal-to-noise ratio, but this is not a feature of consumer Dolby Surround (more recent cinema formats have used Dolby SR-type noise reduction, alongside a digital soundtrack).

The Dolby Stereo matrix (see Figure 17.5) is a form of ‘4-2-4’ matrix that encodes the mono surround channel so that it is added out of phase into the left and right channels (+90° in one channel and −90° in the other). The centre channel signal is added to left and right in phase. The resulting sum is called Lt/Rt (left total and right total). By doing this the surround signal can be separated from the front signals upon decoding by summing the Lt/Rt signals out of phase (extracting the stereo difference signal), and the centre channel can be extracted by summing Lt/Rt in phase. In consumer systems using passive decoding the centre channel is not always fed to a separate loudspeaker but can be heard as a phantom image between left and right. A decoder block diagram for the consumer version (Dolby Surround) is shown in Figure 17.6. Here it can be seen that in addition to the sum-and-difference-style decoding, the surround channel is subject to an additional delay, band-limiting between 100 Hz and 7 kHz and a modified form of Dolby B noise reduction. The low-pass filtering and the delay are both designed to reduce matrix side-effects that could otherwise result in front signals appearing to come from behind. Crosstalk between channels and effects of any misalignment in the system can cause front signals to ‘bleed’ into the surround channel, and this can be worse at high frequencies than low. The delay (of the order of 20–30 ms in consumer systems, depending on the distance of the rear speakers) relies on the precedence effect (see Chapter 2) to cause the listener to localise signals according to the first arriving wavefront which will now be from the front rather than the rear of the sound stage. The rear signal then becomes psychoacoustically better separated from the front and localisation of primary signals is biased more towards the front. The modified B-type NR reduces surround channel noise and also helps to reduce the effects of decoding errors and interchannel crosstalk, as some distortions introduced between encoding and decoding will be reduced by B-type decoding.

Images

Figure 17.5   Basic components of the Dolby Stereo matrix encoding process

Images

Figure 17.6   Basic components of the passive Dolby surround decoder

A problem with passive Dolby Surround decoding is that the separation between adjacent channels is relatively modest, although the separation of left/right and centre/surround remains high. When a signal is panned fully left it will tend to appear only 3 dB down in the centre, and also in the surround, for example. The effects of this can be ameliorated in passive consumer systems by the techniques described above (phantom centre and surround delay/filtering). Dolby’s ProLogic system, based on principles employed in the professional decoder, attempts to resolve this problem by including sophisticated ‘steering’ mechanisms into the decoder circuit to improve the perceived separation between the channels. A basic block diagram is shown in Figure 17.7. This enables a real centre loudspeaker to be employed. Put crudely, ProLogic works by sensing the location of ‘dominant’ signal components and selectively attenuating channels away from the dominant component. (A variety of other processes are involved as well as this.) So, for example, if a dialogue signal is predominantly located in the centre, the control circuit will reduce the output of the other channels (L, R, S) in order that the signal comes mainly from the centre loudspeaker. (Without this it would also have appeared at quite high level in left and right as well.) A variety of algorithms are used to determine how quickly the system should react to changes in dominant signal position, and what to do when no signal appears dominant.

Images

Figure 17.7   Basic components of the active Dolby ProLogic decoder

Dolby has recently introduced an enhancement to ProLogic, entitled ProLogic 2, that adds support for full-bandwidth stereo rear channels, with various options that make it more suitable for music programmes. It is also claimed to be effective in the up-conversion of unencoded two-channel material to five-channel surround.

Mixes that are to be matrix encoded using the Dolby system should be monitored via the encode–decode chain in order that the side-effects of the process can be taken into account by the balance engineer. Dolby normally licenses the system for use on a project, and will assist in the configuration and alignment of their equipment during the project.

Dolby Stereo/Surround can be complemented by the THX system, as described in Fact File 17.3.

Circle Surround

Circle Surround was developed by the Rocktron Corporation (RSP Technologies) as a matrix surround system capable of encoding stereo surround channels in addition to the conventional front channels. They proposed the system as more appropriate than Dolby Surround for music applications, and claimed that it should be suitable for use on material that had not been encoded as well as that which had.

The Circle Surround encoder is essentially a sum and difference Lt/Rt process (similar to Dolby but without the band limiting and NR encoding of the surround channel). One incarnation of this involves 5-2 encoding, intended for decoding back to five channels (the original white paper on the system described a 4-2 encoder). Among other methods, the Circle decoder steers the rear channels separately according to a split-band technique that steers low- and high-frequency components independently from each other. In this way they claim to avoid the broadband ‘pumping’ effects associated with some other systems. They also decode the rear channels slightly differently, using L–R for the left rear channel and R–L for the right rear channel, which it is claimed allows side-images to be created on either side. They avoid the use of a delay in the rear channels for the ‘Music’ mode of the system and do not band-limit the rear channels as Dolby Surround does.

Fact file 17.3 What is THX?

The THX system was developed by Tomlinson Holman at Lucasfilm (THX is derived from ‘Tomlinson Holman Experiment’). The primary aim of the system was to improve the sound quality in movie theatres and make it closer to the sound experienced by sound mixers during post-production. It was designed to complement the Dolby Stereo system, and does not itself deal with the encoding or representation of surround sound. In fact THX is more concerned with the acoustics of cinemas and the design of loudspeaker systems, optimising the acoustic characteristics and noise levels of the theatre, as well as licensing a particular form of loudspeaker system and crossover network. THX licenses the system to theatres and requires that the installation is periodically tested to ensure that it continues to meet the specification.

Home THX was developed, rather like Dolby Surround, in an attempt to convey the cinema experience to the home. Through the use of a specific controller, amplifiers and speakers, the THX system enhances the decoding of Dolby Surround and can also be used with digital surround sound signals. The mono surround signal of Dolby Surround is subject to decorrelation of the signals sent to the two surround loudspeakers in order that the surround signal is made more diffuse and less ‘mono’.

It is claimed that this has the effect of preventing surround signals from collapsing into the nearest loudspeaker. Signals are re-equalised to compensate for the excessive high-frequency content that can arise when cinema balances are replayed in small rooms, and the channels are ‘timbre matched’ to compensate for the spectral changes that arise when sounds are panned to different positions around the head.

In terms of hardware requirements, the Home THX system also specifies certain aspects of amplifier performance, as well as controlling the vertical and horizontal directivity of the front loudspeakers. Vertical directivity is tightly controlled to increase the direct sound component arriving at listeners, while horizontal directivity is designed to cover a reasonably wide listening area. Front speakers should have a frequency response from 80 Hz to 20 kHz and all speakers must be capable of radiating an SPL of 105 dB without deterioration in their response or physical characteristics. The surround speakers are unusual in having a bipolar radiation pattern, arranged so that the listener hears reflected sound rather than direct sound from these units. These have a more relaxed frequency response requirement of 125 Hz to 8 kHz. A subwoofer feed is usually also provided.

Lexicon Logic 7

Logic 7 is another surround matrix decoding process that can be used as an alternative for Dolby Surround decoding. Variants on this algorithm (such as the so-called Music Logic and Music Surround modes) can also be used for generating a good surround effect from ordinary two-channel material. Lexicon developed the algorithm for its high-end consumer equipment, and it is one of a family of steered decoding processes that distributes sound energy appropriately between a number of loudspeakers depending on the gain and phase relationships in the source material. In this case seven loudspeaker feeds are provided rather than five, adding two ‘side’ loudspeakers to the array, as shown in Figure 17.8. The rear speakers can then be further to the rear than would otherwise be desirable. The side loudspeakers can be used for creating an enhanced envelopment effect in music modes and more accurate side panning of effects in movie sound decoding.

Images

Figure 17.8   Approximate loudspeaker layout suitable for Lexicon’s Logic 7 reproduction. Notice the additional side loudspeakers that enable a more enveloping image and may enable rear loudspeakers to be placed further to the rear

In Logic 7 decoding of Dolby matrix material the front channel decoding is almost identical to Dolby ProLogic, with the addition of a variable centre channel delay to compensate for non-ideal locations of the centre speaker. The rear channels operate differently depending on whether the front channel content is primarily steered dialogue/effects or music/ambience. In the former case the front signals are cancelled from the rear channels and panned effects behave as they would with ProLogic, with surround effects panned ‘full rear’ appearing in mono on both rear channels. In the latter case the rear channels work in stereo, but reproducing the front left and right channels with special equalisation and delay to create an enveloping spatial effect. The side channels carry steered information that attempts to ensure that effects which pan from left to rear pass through the left-side on the way, and similarly for the right side with right-to-rear pans.

It is claimed that by using these techniques the effect of decoding a 3-1 matrix surround version of a 3-2 format movie can be brought close to that of the original 3-2 version. Matrix encoding of five channels to Lt/Rt is also possible with a separate algorithm, suitable for decoding to five or more loudspeakers using Logic 7.

Dolby EX

In 1998 Dolby and Lucasfilm THX joined forces to promote an enhanced surround system that added a centre rear channel to the standard 5.1-channel setup. They introduced it, apparently, because of frustrations felt by sound designers for movies in not being able to pan sounds properly to the rear of the listener – the surround effect typically being rather diffuse. This system was christened ‘Dolby Digital – Surround EX’, and apparently uses matrix-style centre channel encoding and decoding between the left and right surround channels of a 5.1-channel mix. The loudspeakers at the rear of the auditorium are then driven separately from those on the left and right sides, using the feed from this ‘rear centre’ channel, as shown in Figure 17.9.

Digital surround sound formats

Matrix surround processes are gradually giving way to digital formats that enable multiple channels to be delivered discretely, bypassing the two-channel restriction of most previous delivery formats. While it is desirable to be able to store or transfer multichannel surround sound signals in a discrete, full-resolution digital PCM format, this can occupy considerable amounts of storage space or transmission bandwidth (somewhere between about 0.75 and 2 Mbit/s per channel, depending on the resolution). This is too high for practical purposes in broadcasting, film sound and consumer systems, using current technology. Consequently a number of approaches have been developed whereby the information can be digitally encoded at a much lower bit rate than the source material, with minimal loss of sound quality (see Chapter 8). The sections below briefly describe some of these systems, used in cinema sound, digital consumer formats and broadcasting systems.

Dolby Digital

Dolby Digital or AC-3 encoding was developed as a means of delivering 5.1-channel surround to cinemas or the home without the need for analogue matrix encoding. It is likely to replace Dolby Stereo/Surround gradually as digital systems replace analogue ones. It relies on a digital low-bit-rate encoding and decoding process that enables the multiple channels of the surround mix to be conveyed without the separation and steering problems inherent in matrixed surround. Dolby Digital can code signals based on the ITU-standard 3-2-1 surround format of loudspeakers and it should be distinguished from such international standards since it is primarily a signal coding and representation method. In fact, the AC-3 coding algorithm can be used for a wide range of different audio signal configurations and bit rates from 32 kbit/s for a single mono channel up to 640 kbit/s for surround signals. It is used widely for the distribution of digital sound tracks on 35 mm movie films, the data being stored optically in the space between the sprocket holes on the film, as shown in Figure 17.10. In this way, the analogue optical soundtracks can be retained in their normal place alongside the picture for compatibility purposes. In this format it is combined with a Dolby-SR encoded analogue Dolby Stereo mix, and the combined format is called Dolby SR-D. Dolby Digital is also used for surround sound on DVD video releases, and for certain digital broadcasting applications.

Images

Figure 17.9   Dolby EX adds a centre-rear channel fed from a matrix-decoded signal that was originally encoded between left and right surround channels in a manner similar to the conventional Dolby Stereo matrix process

The Dolby Digital encoding process can be controlled by a software application that enables various parameters of the encoding process to be varied, as shown in Figure 17.11. Dolby Digital can operate at sampling rates of 32, 44.1 or 48 kHz, and the LFE channel is sampled at 240 Hz (because of its limited bandwidth). A 90° phase shift is normally introduced into each of the surround channels during encoding, which apparently improves the smoothness of front–back panning and reduces crosstalk between centre and surround channels when decoded to Dolby Surround. For this reason it is important to monitor recordings via the encode–decode process to ensure that this phase shift does not affect the spatial intention of the producer.

Images

Figure 17.10   Dolby Digital data is stored optically in an area between the sprocket holes of 35 mm film. (Courtesy of Dolby Laboratories, Inc.)

Aside from the representation of surround sound in a compact digital form, Dolby Digital includes a variety of operational features that enhance system flexibility and help adapt replay to a variety of consumer situations. These include dialogue normalisation (‘dialnorm’) and the option to include dynamic range control information alongside the audio data for use in environments where background noise prevents the full dynamic range of the source material to be heard. Downmix control information can also be carried alongside the audio data in order that a two-channel version of the surround sound material can be reconstructed in the decoder. As a rule, Dolby Digital data is stored or transmitted with the highest number of channels needed for the end product to be represented, and any compatible downmixes are created in the decoder. This differs from some other systems where a two-channel downmix is carried alongside the surround information.

Images

Figure 17.11   Screen display of Dolby Digital encoding software options. (Courtesy of Dolby Laboratories, Inc.)

Dialnorm indication can be used on broadcast and other material to ensure that the dialogue level remains roughly constant from programme to programme. It is assumed that dialogue level is the main factor governing the listening level used in peoples’ homes, and that they do not want to keep changing this as different programmes come on the air (e.g. from advertising to news programmes). The dialnorm level is the average dialogue level over the duration of the programme compared to the maximum level that would be possible, measured using an A-weighted LEQ reading (this averages the level linearly over time). So, for example, if the dialogue level averaged at 70 dBA over the programme, and the SPL corresponding to peak recording level was 100 dBA, the dialnorm setting would be −30 dB.

DTS

The DTS (Digital Theater Systems) ‘Coherent Acoustics’ system is another digital signal coding format that can be used to deliver surround sound in consumer or professional applications, using low bit rate coding techniques to reduce the data rate of the audio information. The DTS system can accommodate a wide range of bit rates from 32 kbits/up to 4.096 Mbit/s (somewhat higher than Dolby Digital), with up to eight source channels and with sampling rates up to 192 kHz. Variable bit rate and lossless coding are also optional. Downmixing and dynamic range control options are provided in the system.

DTS data is found on some film releases and occupies a different area of the film from Dolby Digital and SDDS data (see below). In fact it is possible to have film release prints in a multi-format version with all three digital sound formats plus the analogue Dolby Stereo tracks on one piece of film, making it almost universally compatible. DTS is also used on a number of surround CD releases and is optional on DVD, requiring a special decoder to replay the data signal from the disc. Because the maximum data rate is typically somewhat higher than that of Dolby Digital or MPEG, a greater margin can be engineered between the signal and any artefacts of low bit rate coding, leading to potentially higher sound quality. Such judgements, though, are obviously up to the individual and it is impossible to make blanket statements about comparative sound quality between systems.

SDDS

SDDS stands for Sony Dynamic Digital Sound, and is the third of the main competing formats for digital film sound. Using Sony’s ATRAC data reduction system, it too encodes audio data with a substantial saving in bit rate compared with the original PCM (about 5:1 compression). The SDDS system employs 7.1 channels rather than 5.1, as described earlier in this chapter, providing detailed positional coverage of the front sound stage. It is not common to find SDDS data on anything but film release prints, although it could be included on DVD as a proprietary format if required. Consumer decoders are not currently available, to the author’s knowledge.

MPEG surround modes

The MPEG (Moving Pictures Expert Group) standards are widely used for low bit rate representation of audio and video signals in multimedia and other applications, and was introduced in Chapter 8. While MPEG-1 described a two-channel format, MPEG-2 extended this to multichannel information. There are two versions of MPEG-2, one of which was developed to be backwards compatible with MPEG-1 decoders and the other of which is known as MPEG-2 AAC (for advanced audio coding) and is not backwards compatible. The MPEG-4 standards also include scalable options for multichannel coding. These standards are described in detail in ISSO 11172-3, 13818-3, 13818-7 and 14496 for those who want to understand how they work in detail.

The MPEG-2 BC (backwards compatible) version worked by encoding a matrixed downmix of the surround channels and the centre channel into the left and right channels of an MPEG-1 compatible frame structure. This could be decoded by conventional MPEG-1 decoders. A multichannel extension part was then added to the end of the frame, containing only the C, LS and RS signal channels, as shown in Figure 17.12. Upon decoding in an MPEG-2 surround decoder, the three additional surround components could be subtracted again from the L0/R0 signals to leave the original five channels. The main problems with MPEG-2 BC are that (a) the downmix is performed in the encoder so it cannot be changed at the decoder end, and (b) the data rate required to transfer the signal is considerably higher than it would be if backward compatibility were not an issue. Consequently the bit rate required for MPEG-2 BC to transfer 5.1-channel surround with reasonable quality is in the region of 600–900 kbit/s.

MPEG-2 AAC, on the other hand, is a more sophisticated algorithm that codes multichannel audio to create a single bit stream that represents all the channels, in a form that cannot be decoded by an MPEG-1 decoder. Having dropped the requirement for backward compatibility, the bit rate can now be optimised by coding the channels as a group and taking advantage of interchannel redundancy if required. The situation is now more akin to that with Dolby Digital, and the bit rates required for acceptable sound quality are also similar. The MPEG-2 AAC system contained contributions from a wide range of different manufacturers.

The MPEG surround algorithms have not been widely implemented to date in broadcasting, film and consumer applications. Although MPEG-2 BC was originally intended for use with DVD releases in Region 2 countries (primarily Europe), this requirement appears to have been dropped in favour of Dolby Digital. MPEG two-channel standards, such as MPEG-1, Layer 3 (the well-known .MP3 format), have been widely adopted for consumer purposes, on the other hand.

Images

Figure 17.12   (a) MPEG-2-BC multichannel extension data appended to the MPEG-1-compatible two-channel frame. (b) Compatibility matrixing of surround information for MPEG-2-BC

MLP

Meridian Lossless Packing (MLP) is a lossless data reduction technique for multichannel audio, licensed by Meridian Audio through Dolby Labs. It has been specified for the DVD-Audio format as a way of reducing the data rate required for high-quality recordings without any effect on sound quality (in other words, you get back exactly the same bits you put in, which is not the case with lossy processes like Dolby Digital and MPEG). Using this technique, a sufficient playing time can be obtained from the disc whilst still enabling high audio resolution (sample rate up to 192 kHz and resolution between 16 and 24 bits) and up to six channel surround sound.

MLP enables the mastering engineer to create a sophisticated downmix (for two-channel replay) of the multichannel material that occupies very little extra space on the disc, owing to the exploitation of similarities between this material and the multichannel version during lossless encoding. This downmix can have characteristics that vary during the programme and is entirely under the artistic control of the engineer.

There are also modes of MLP that have not really seen the light of day yet. For example, the system is extensible to considerable numbers of channels, and has an option to incorporate hierarchical encoding processes such as Ambisonics where sound field components rather than loudspeaker feeds are represented. This could be used in future as a means of overcoming the limitations of a loudspeaker-feed-based format for delivering surround sound to consumers.

Ambisonics

Principles

The Ambisonic system of directional sound pickup and reproduction is discussed here because of its relative thoroughness as a unified system, being based on some key principles of psychoacoustics. It has its theoretical basis in work by Gerzon, Barton and Fellgett in the 1970s, as well as work undertaken earlier by Cooper and Shiga.

Ambisonics aims to offer a complete hierarchical approach to directional sound pickup, storage or transmission and reproduction, which is equally applicable to mono, stereo, horizontal surround-sound, or full ‘periphonic’ reproduction including height information. Depending on the number of channels employed it is possible to represent a lesser or greater number of dimensions in the reproduced sound. A number of formats exist for signals in the Ambisonic system, and these are as follows: the A-format for microphone pickup, the B-format for studio equipment and processing, the C-format for transmission, and the D-format for decoding and reproduction. A format known as UHJ (‘Universal HJ’, ‘HJ’ simply being the letters denoting two earlier surround sound systems) is also used for encoding multichannel Ambisonic information into two or three channels whilst retaining good mono and stereo compatibility for ‘non-surround’ listeners.

Ambisonic sound should be distinguished from quadraphonic sound, since quadrophonics explicitly requires the use of four loudspeaker channels, and cannot be adapted to the wide variety of pickup and listening situations that may be encountered. Quadraphonics generally works by creating conventional stereo phantom images between each pair of speakers and, as Gerzon states, conventional stereo does not perform well when the listener is off-centre or when the loudspeakers subtend an angle larger than 60°. Since in quadraphonic reproduction the loudspeakers are angled at roughly 90° there is a tendency towards a hole-in-the-middle, as well as there being the problem that conventional stereo theories do not apply correctly for speaker pairs to the side of the listener. Ambisonics, however, encodes sounds from all directions in terms of pressure and velocity components, and decodes these signals to a number of loudspeakers, with psychoacoustically optimised shelf filtering above 700 Hz to correct for the shadowing effects of the head. It also incorporates an amplitude matrix that determines the correct levels for each speaker for the layout chosen. Ambisonics might thus be considered as the theoretical successor to coincident stereo on two loud-speakers, since it is the logical extension of Blumlein’s principles to surround sound.

The source of an Ambisonic signal may be an Ambisonic microphone such as the Calrec Soundfield, or it may be an artificially panned mono signal, split into the correct B-format components (see below) and placed in a position around the listener by adjusting the ratios between the signals.

Signal formats

As indicated above, there are four basic signal formats for Ambisonic sound: A, B, C and D. The A-format consists of the four signals from a microphone with four sub-cardioid capsules orientated as shown in Figure 17.13 (or the pan-pot equivalent of such signals). These are capsules mounted on the four faces of a tetrahedron, and correspond to left-front (LF), right-front (RF), left-back (LB) and right-back (RB), although two of the capsules point upwards and two point downwards. Such signals should be equalised so as to represent the soundfield at the centre of the tetrahedron, since the capsules will not be perfectly coincident.

The B-format consists of four signals that between them represent the pressure and velocity components of the sound field in any direction, as shown in Figure 17.14. It can be seen that there is a similarity with the sum and difference format of two channel stereo, described in the previous chapter, since the B-format is made up of three orthogonal figure-eight components (X, Y and Z), and an omni component (W). All directions in the horizontal plane may be represented by scalar and vector combinations of W, X and Y, whilst Z is required for height information. X is equivalent to a forward-facing figure-eight (equivalent to M in MS stereo), Y is equivalent to a sideways-facing figure-eight (equivalent to S in MS stereo). The X, Y and Z components have a frontal, sideways or upwards gain of +3 dB or √2 with relation to the W signal (0 dB) in order to achieve roughly similar energy responses for sources in different positions.

A B-format signal may be derived from an A-format microphone. In order to derive B-format signals from these capsule-pair signals it is a simple matter of using sum and difference technique. Thus:

X = 0.5((LF−LB) + (RF−RB))

Y = 0.5((LF−RB) − (RF−LB))

Images

Figure 17.13   A-format capsule directions in an Ambisonic microphone

Images

Figure 17.14   The B-format components W, X, Y and Z in Ambisonics represent an omnidirectional pressure component and three orthogonal velocity (figure-eight) components of the sound field respectively

Z = 0.5((LF−LB) + (RB))

W, being an omni pressure component, is simply derived by adding the outputs of the four capsules in phase, thus:

W = 0.5(LF + LB + RF + RB)

In a microphone W, X, Y and Z are corrected electrically for the differences in level between them, so as to compensate for the differences between pressure and velocity components. For example, W is boosted at very low frequencies since it is derived from velocity capsules that do not have the traditionally extended bass response of omnis.

B-format signals may also be created directly by arranging capsules or individual microphones in the B-format mode (two or three figure-eights at 90° plus an omni). The Z component is not necessary for horizontal information. If B-format signals are recorded instead of speaker feeds (D-format), subsequent manipulation of the soundfield is possible, and the signal will be somewhat more robust to interchannel errors.

Taking ϑ as the angle of the incidence in the horizontal plane (the azimuth), and η as the angle of elevation above the horizontal, then in the B-format the polar patterns of the different signals can be represented as follows:

W = 1

X = √2cosϑ cosη

Y = √2sinϑ cosη

Z = √2sinη

The C-format consist of four signals L, R, T and Q, which conform to the UHJ hierarchy, and are the signals used for mono or stereo-compatible transmission or recording. The C-format is, in effect, a useful consumer matrix format. L is a two-channel-compatible left channel, R is the corresponding right channel, T is a third channel which allows more accurate horizontal decoding, and Q is a fourth channel containing height information. The proportions of B-format signals which are combined to make up a C-format signal have been carefully optimised for the best compatibility with conventional stereo and mono reproduction. If L + R is defined as Σ (similar to M in MS stereo) and L − R is defined as Δ (similar to S in MS stereo), then:

Σ = 0.9397W + 0.1856X

Δ = j(−0.3420W + 0.5099X) + 0.6555Y

T = j(−0.1432W + 0.6512X) − 0.7071Y

Q = 0.9772Z

where j (or √−1) represents a phase advance of 90°.

Two, three, or four channels of the C-format signal may be used depending on the degree of directional resolution required, with a two-and-a-half channel option available where the third channel (T) is of limited bandwidth. For stereo compatibility only L and R are used (L and R being respectively 0.5(Σ + Δ) and 0.5(Σ − Δ). The UHJ or C-format hierarchy is depicted graphically in Figure 17.15.

D-format signals are those distributed to loudspeakers for reproduction, and are adjusted depending on the selected loudspeaker layout. They may be derived from either B- or C-format signals using an appropriate decoder, and the number of speakers is not limited in theory, nor is the layout constrained to a square. Four speakers give adequate surround sound, whilst six provide better immunity against the drawing of transient and sibilant signals towards a particular speaker, and eight may be used for full periphony with height. The decoding of B- and C-format components into loudspeaker signals is too complicated and lengthy a matter to go into here, and is the subject of several patents that were granted to the NRDC (the UK National Research and Development Council, as was). It is sufficient to say that the principle of decoding involves the passing of two or more UHJ signals via a phase-amplitude matrix, resulting in B-format signals that are subjected to shelf filters (in order to correct the levels for head-related transfer functions such as shadowing and diffraction). These are passed through an amplitude matrix which feeds the loudspeakers (see Figure 17.16). A layout control is used to vary the level sent to each speaker depending on the physical arrangement of speakers. See also Fact File 17.4.

Images

Figure 17.15   The C-format or UHJ hierarchy enables a variety of matrix encoding forms for stereo signals, depending on the amount of spatial information to be conveyed and the number of channels available

Images

Figure 17.16   C-format signals are decoded to provide D-format signals for loudspeaker reproduction

B-format-to-5.1 decoding

Although the original Ambisonic specifications assumed symmetrical rectangular or square loudspeaker layouts, Gerzon showed in 1992 how Ambisonic signals could be decoded with reasonable success to layouts such as the five-channel configuration described earlier. These are often referred to as ‘Vienna decoders’ after the location of the AES Convention at which these were first described. The sound image is in this case ‘front biased’, with better localisation characteristics in the frontal region than the rear, owing to the loudspeaker layout. This is an unavoidable feature of such a configuration in any case.

Fact file 17.4 Higher-order Ambisonics

The incorporation of additional directional components into the Ambisonic signal structure can give rise to improved directional encoding that covers a larger listening area than first-order Ambisonics. These second-order and higher components are part of a family of so-called ‘spherical harmonics’. Horizontal Ambisonics can be enhanced by the addition of two further components, U and V, which have polar patterns described by:

U = 2cos(2ϑ)
V = 2sin(2ϑ)

provided that an appropriate decoder is implemented that can deal with the second-order components. Even higher-order components can be generated with the general form:

cn (forwards) = 2cos(nϑ)
cn (sideways) = 2sin(nϑ)

The problem with higher-order Ambisonics is that it is much more difficult to design microphones that produce the required polar patterns, although the signals can be synthesised artificially for sound modelling and rendering applications.

Surround sound monitoring

This section is principally concerned with monitoring environments and configurations for 5.1-channel surround sound, although many of the principles may be found to apply in other configurations. The Audio Engineering Society has published an information document on this topic, containing more detailed guidelines (see Recommended further reading at the end of this chapter).

Differences between two-channel and surround mixing rooms

There is a gradual consensus building around the view that rooms for surround monitoring should have an even distribution of absorbing and diffusing material. This is so that the rear loudspeakers function in a similar acoustic environment to the front loudspeakers. This is contrary to a number of popular two-channel control room designs that have one highly absorbtive end and the other end more reflective.

The effects of the acoustics of non-ideal control room acoustics on surround channels may be ameliorated if a distributed array of surround loudspeakers is used, preferably with some form of decorrelation between them to avoid strong comb filtering effects. (Appropriate gain/EQ modification should also be applied to compensate for the acoustic summing of their outputs.) This is more akin to the film sound situation, though, and may only be possible in larger dubbing stages. In smaller control rooms used for music and broadcasting mixing the space may not exist for such arrays. The ITU standard allows for more than one surround loudspeaker on either side and recommends that they are spaced equally on an arc from 60–150° from the front.

One of the difficulties of installing loudspeaker layouts according to the ITU standard, with equal spacing from the listening position and the surrounds at 110° ± 10°, is the required width of the space. This arrangement often makes it appropriate for the room to be laid out ‘wide’ rather than ‘long’ (as it might be for two-channel setups). If the room is one that was previously designed for two-channel stereo the rotation of the axis of symmetry may result in the acoustic treatment being inappropriately distributed. Also the location of doors and windows may make the modification of existing rooms difficult. If building a new room for surround monitoring then it is obviously possible to start from scratch and make the room wide enough to accommodate the surround loudspeakers and absorption in more suitable places. See also Fact File 17.5.

Fact file 17.5 Loudspeaker mounting

In many studios it is traditional to mount the monitor loudspeakers flush with the front wall. This has the particular advantage of avoiding the reflection that occurs with free-standing loudspeakers from the wall behind the loudspeaker, causing a degree of cancellation at a frequency where the spacing is equal to one quarter of the radiated wavelength. It also improves the low-frequency radiation conditions if the front walls are hard. Nonetheless, it is hard to find places to mount five large loudspeakers in a flush-mounted configuration, and such mounting methods can be expensive. Furthermore the problems noted above, of detrimental reflections from rear loudspeakers off a hard front wall or speaker enclosure, can arise, depending on the angle of the rear loudspeakers. For such reasons, some sources recommend making the surfaces around the loudspeakers reflective at low frequencies and absorbent at mid and high frequencies.

The problem of low-frequency cancellation notches with free-standing loudspeakers can be alleviated but not completely removed. The perceived depth of the notch depends on the absorption of the surface and the directivity of the loudspeaker. By adjusting the spacing between the speaker and the wall, the frequency of the notch can be moved (downwards by making the distance greater), but the distance needed is often too great to be practical. If the speaker is moved close to the wall the notch position rises in frequency. This can be satisfactory for large loudspeakers whose directivity is high enough at middle frequencies to avoid too much rear radiation, but is a problem for smaller loudspeakers.

The use of a 5.1-channel monitoring arrangement (rather than five full-bandwidth loudspeakers), with proper bass management and crossovers, can in fact ameliorate the problems of free-standing loudspeakers considerably. This is because a subwoofer can be used to handle frequencies below 80–120 Hz and it can be placed in the corner or near a wall where the cancellation problem is minimised. Furthermore, the low-frequency range of the main loudspeakers can then be limited so that the cancellation notch mentioned above occurs below their cut-off frequency.

Front loudspeakers in general

As a rule, front loudspeakers can be similar to those used for two-channel stereo, although noting the particular problems with the centre loudspeaker described in the next section. It has been suggested that low-directivity front loudspeakers may be desirable when trying to emulate the effect of a film mixing situation in a smaller surround control room. This is because in the large rooms typical of cinema listening the sound balancer is often well beyond the critical distance where direct and reflected sound are equal in level, and using speakers with low directivity helps to emulate this scenario in smaller rooms. Film mixers generally want to hear what the large auditorium audience member would hear, and this means being further from the loudspeakers than for small room domestic listening or conventional music mixing.

What to do with the centre loudspeaker

One of the main problems encountered with surround monitoring is that of where to put the centre loudspeaker in a mixing room. Ideally it should be of the same type or quality as the rest of the channels and this can make such speakers quite large. In 5.1 surround setups there is an increasing tendency to use somewhat smaller monitors for the five main channels than would be used for two-channel setups, handling the low bass by means of a subwoofer or two. This makes it more practical to mount a centre loudspeaker behind the mixing console, but its height will often be dictated by a control room window or video monitor (see below). The centre loudspeaker should be on the same arc as that bounding the other loudspeaker positions, as shown in the ITU layout above, otherwise the time delay of its direct sound at the listening position will be different from that of the other channels. If the centre speaker is closer than the left or right channels then it should be delayed slightly to put it back in the correct place acoustically.

The biggest problem with the centre loudspeaker arises when there is a video display present. A lot of 5.1 surround work is carried out in conjunction with pictures and clearly the display is likely to be in exactly the same place as one wants to put the centre speaker. In cinemas this is normally solved by making the screen acoustically ‘transparent’ and using front projection, although this transparency is never complete and usually requires some equalisation. In smaller mixing rooms the display is often a flat-screen plasma monitor or a CRT display and these do not allow the same arrangement.

With modestly sized solid displays for television purposes it can be possible to put the centre loudspeaker underneath the display, with the display raised slightly, or above the display angled down slightly. The presence of a mixing console may dictate which of these is possible, and care should be taken to avoid strong reflections from the centre loudspeaker off the console surface. Neither position is ideal and the problem may not be solved easily. Dolby suggests that if the centre loudspeaker has to be offset height-wise it could be turned upside down compared with the left and right channels to make the tweeters line up, as shown in Figure 17.17.

Interestingly, the flat-panel loudspeaker company, NXT, has shown large flat-panel loudspeakers that can double as projection display screens, which may be one way forward if the sound quality of the flat panel speakers can be made high enough.

Surround loudspeakers

The standard recommendations for professional setups suggest that the surround loudspeakers should be of the same quality as the front ones. This is partly to ensure a degree of inter-system compatibility. In consumer environments this can be difficult to achieve, and the systems sold at the lower end of the market often incorporate much smaller surround loudspeakers than front. As mentioned above, the use of a separate loudspeaker to handle the low bass (a so-called ‘subwoofer’) may help to ameliorate this situation, as it makes the required volume of all the main speakers quite a lot smaller. Indeed Bose has had considerable success with a consumer system involving extremely small satellite speakers for the mid–high frequency content of the replay system, mountable virtually anywhere in the room, coupled with a low-frequency driver that can be situated somewhere unobtrusive.

Images

Figure 17.17   Possible arrangement of the centre loudspeaker in the presence of a TV screen, aligning HF units more closely

The directivity requirements of the surround loudspeakers have been the basis of some considerable disagreement in recent years. The debate centres around the use of the surround loudspeakers to create a diffuse, enveloping soundfield – a criterion that tends to favour either decorrelated arrays of direct radiators (speakers that produce their maximum output in the direction of the listener) or dipole surrounds (bi-directional speakers that are typically oriented so that their main axis does not point towards the listener). If the creation of a diffuse, enveloping rear and side soundfield is the only role for surround loudspeakers then dipoles can be quite suitable if only two loudspeaker positions are available. If, on the other hand, attempts are to be made at all-round source localisation (which, despite the evidence in some literature, is not entirely out of the question), direct radiators are considered more suitable. Given the physical restrictions in the majority of control rooms it is likely that conventional loudspeakers will be more practical to install than dipoles (for the reason that dipoles, by their nature, need to be free-standing, away from the walls) whereas conventional speakers can be mounted flush with surfaces.

A lot depends on the application, since film sound mixing has somewhat different requirements from some other forms of mixing, and is intended for large auditoria. Much music and television sound is intended for small-room listening and is mixed in small rooms. This was also the primary motivation behind the use of dipoles in consumer environments – that is, the translation of the large-room listening experience into the small room. In large rooms the listener is typically further into the diffuse field than in small rooms, so film mixes made in large dubbing stages tend not to sound right in smaller rooms with highly directional loudspeakers. Dipoles or arrays can help to translate the listening experience of large-room mixes into smaller rooms.

Subwoofers

Low-frequency interaction between loudspeakers and rooms has a substantial bearing on the placement of subwoofers or low-frequency loudspeakers. There appears to be little agreement about the optimum location for a single subwoofer in a listening room, although it has been suggested that a corner location for a single subwoofer provides the most extended, smoothest low-frequency response. In choosing the optimum locations for subwoofers one must remember the basic principle that loudspeakers placed in corners tend to give rise to a noticeable bass boost, and couple well to most room modes (because they have antinodes in the corners). Some subwoofers are designed specifically for placement in particular locations whereas others need to be moved around until the most subjectively satisfactory result is obtained. Some artificial equalisation may be required to obtain a reasonably flat overall frequency response at the listening position. Phase shifts or time-delay controls are sometimes provided to enable some correction of the time relationship of the subwoofer to other loudspeakers, but this will necessarily be a compromise with a single unit. A subwoofer phase shift can be used to optimise the sum of the subwoofer and main loudspeakers in the crossover region for a flat response.

There is some evidence to suggest that multiple low-frequency drivers generating decorrelated signals from the original recording create a more natural spatial reproduction than monaural low-frequency reproduction from a single driver. Griesinger proposes that if monaural LF content is reproduced it is better done through two units placed to the sides of the listener, driven 90° out of phase, to excite the asymmetrical lateral modes more successfully and improve LF spaciousness.

Others warn of the dangers of multiple low-frequency drivers, particularly the problem of mutual coupling between loudspeakers that takes place when the driver spacing is less than about half a wavelength. In such situations the outputs of the drivers couple to produce a level greater than would be predicted from simple summation of the powers. This is due to the way in which the drivers couple to the impedance of the air and the effect that one unit has on the radiation impedance of the other. The effect of this coupling will depend on the positions to which sources are panned between drivers, affecting the compatibility between the equalisation of mixes made for different numbers of loudspeakers.

Surround sound recording techniques

This section deals with the extension of conventional two-channel recording technique to multiple channels for surround sound applications, concentrating on standard 5(.1)-channel reproduction. Many of the concepts described here have at least some basis in conventional two-channel stereo, although analysis of the psychoacoustics of 5.1 surround has been nothing like as exhaustively investigated to date. Consequently a number of the techniques described below are at a relatively early stage of development and are still being evaluated.

The section begins with a review of microphone techniques that have been proposed for the pickup of natural acoustic sources in surround, followed by a discussion of multichannel panning and mixing techniques, mixing aesthetics and artificial reverberation, for use with more artificial forms of production such as pop music. Film sound approaches are not covered in any detail as they are well established and not the main theme of this book.

Principles of surround sound microphone technique

Surround sound microphone technique, as discussed here, is unashamedly biased towards the pickup of sound for 5.1 surround, although Ambisonic techniques are also covered because they are well documented and can be reproduced over five-channel loudspeaker systems if required, using suitable decoders. The techniques described in this section are most appropriate for use when the spatial acoustics of the environment are as important as those of the sources within, such as in classical music and other ‘natural’ recording. These microphone techniques tend to split into two main groups: those that are based on a single array of microphones in reasonably close proximity to each other, and those that treat the front and rear channels separately. The former are usually based on some theory that attempts to generate phantom images with different degrees of accuracy around the full 360° in the horizontal plane. (The problems of this are outlined in Fact File 17.6.) The latter usually have a front array providing reasonably accurate phantom images in the front, coupled with a separate means of capturing the ambient sound of the recording space (often for feeding to all channels in varying degrees). It is rare for such microphone techniques to provide a separate feed for the LFE channel, so they are really five-channel techniques not 5.1-channel techniques.

The concept of a ‘main array’ or ‘main microphone configuration’ for stereo sound recording is unusual to some recording engineers, possibly being a more European than American concept. The traditional European approach has tended to involve starting with a main microphone technique of some sort that provides a basic stereo image and captures the spatial effect of the recording environment in an aesthetically satisfactory way, and then supporting this subtly to varying degrees with spot mics as necessary. It has been suggested by some that many balances in fact end up with more sound coming from the spot mics than from the main array in practice, and that in this case it is the spatial treatment of the spot mics and any artificial reverberation that will have most effect on the perceived result. This is covered in the next section and the issue is open to users for further experimentation.

Fact file 17.6 Surround imaging

It is difficult to create stable phantom images to the sides of a listener in a standard 5.1 surround configuration, using simple pairwise amplitude or time differences. If the listener turns to face the speaker pair then the situation may be improved somewhat, but the subtended angle of about 80° still results in something of a hole in the middle and the same problem as before then applies to the front and rear pairs. Phantom sources can be created between the rear speakers but the angle is again quite great (about 140°), leading to a potential hole in the middle for many techniques, with the sound pulling towards the loudspeakers. This suggests that those techniques attempting to provide 360° phantom imaging may meet with only limited success over a limited range of listening positions, and might imply that one would be better off working with two- or three-channel stereo in the front and decorrelated ambient signals in the rear.

There is no escaping the fact that it is easiest to create images where there are loudspeakers, and that phantom images between loudspeakers subtending wide angles tend to be unstable or ‘hole-in-the-middle’. Given this unavoidable aspect of surround sound psychoacoustics, one should always expect imaging in standard five-channel replay systems to be best between the front loudspeakers, only moderate to the rear, and highly variable to the sides, as shown below. Since the majority of material one listens to tends to conform to this paradigm in any case (primary sources in front, secondary content to the sides and rear), the problem is possibly not as serious as it might seem.

Images

One must accept also that the majority of consumer systems will have great variability in the location and nature of the surround loudspeakers, making it unwise to set too much store by the ability of such systems to enable accurate soundfield reconstruction in the home. Better, it seems, would be to acknowledge the limitations of such systems and to create recordings that work best on a properly configured reproduction arrangement but do not rely on 100 per cent adherence to a particular reproduction alignment and layout, or on a limited ‘hot spot’ listening position. Surround sound provides an opportunity to create something that works over a much wider range of listening positions than two-channel stereo, does not collapse rapidly into the nearest loudspeaker when one moves, and enhances the spatial listening experience.

Five-channel ‘main microphone’ arrays

Recent interest in five-channel recording has led to a number of variants on a common theme involving fairly closely spaced microphones (often cardioids) configured in a five-point array. The basis of most of these arrays is pair-wise time–intensity trading, usually treating adjacent microphones as pairs covering a particular sector of the recording angle around the array. The generic layout of such arrays is shown in Figure 17.18. Cardioids or even supercardioids tend to be favoured because of the increased direct-to-reverberant pickup they offer, and the interchannel level differences created for relatively modest spacings and angles, enabling the array to be mounted on a single piece of metalwork. The centre microphone is typically spaced slightly forward of the L and R microphones thereby introducing a useful time advance in the centre channel for centre-front sources.

The spacing and angles between the capsules are typically based on the so-called ‘Williams curves’, based on time and amplitude differences required between single pairs of microphones to create phantom sources in particular locations. (In fact the Williams curves were based on two-channel pairs and loudspeaker reproduction in front of the listener. It is not necessarily the case that the same technique can be applied to create images between pairs at the sides of the listener, or that the same level and time differences will be suitable. There is some evidence that different delays are needed between side and rear pairs than those used between front pairs, and that inter-microphone crosstalk can affect the accuracy of stereo imaging to varying degrees depending on the array configuration and microphone type.) One possible configuration of many that satisfy Williams’ psychoacoustic criteria is pictured in Figure 17.19. To satisfy the requirements for this particular array the front triplet is attenuated by 2.4 dB in relation to the back pair.

Images

Figure 17.18   Generic layout of five-channel microphone arrays based on time–amplitude trading

Some success has also been had by the author’s colleagues using omni microphones instead of cardioids, with appropriate adjustments to the spacings according to ‘Williams-style’ time–amplitude trading curves (also with modifications to correct for different inter-loudspeaker angles and spacings to the sides and rear). These tend to give better overall sound quality but (possibly unsurprisingly) poorer front imaging. Side imaging has proved to be better than expected with omni arrays.

The closeness between the microphones in these arrays is likely to result in only modest low-frequency decorrelation between the channels. Good LF decorrelation is believed to be important for creating a sense of spaciousness, so these ‘near-coincident’ or ‘semi-correlated’ techniques will be less spacious than more widely spaced microphone arrays. Furthermore, the strong dependence of these arrays on precedence effect cues for localisation makes their performance quite dependent on listener position and front–rear balance.

The INA (Ideale Nieren Anordnung) or ‘Ideal Cardioid Array’ (devised by Hermann and Henkels) is a three-channel front array of cardioids (INA-3) coupled with two surround microphones of the same polar pattern (making it into an INA-5 array). One configuration of this is shown in Figure 17.20, and a commercial implementation by Brauner is pictured in Figure 17.21. Table 17.1 shows some possible combinations of microphone spacing and recording angle for the front three microphones of this proposed array. In the commercial implementation the capsules can be moved and rotated and their polar patterns can be varied. The configuration shown in Figure 17.20 is termed an ‘Atmokreuz’ (atmosphere cross) by the authors. Its large front recording angle of 180° means that to use it as a main microphone it would have to be placed very close to the source unless all the sources were to appear to come from near the centre. This might make it less well placed for the surroundings. Such a configuration may be more suitable for general pickup slightly further back in the hall.

Images

Figure 17.19   Five-channel microphone array using cardioids, one of a family of arrays designed by Williams and Le Dû. In this example the front triplet should be attenuated 2.4 dB with respect to the rear pair

Images

Figure 17.20   INA-5 cardioid array configuration (see Table 17.1)

Separate treatment of front imaging and ambience

Many alternative approaches to basic microphone coverage for 5.1 surround treat the stereo imaging of front signals separately from the capture of a natural-sounding spatial reverberation and reflection component, and some are hybrid approaches without a clear theoretical basis. Most do this by adopting a three-channel variant on a conventional two-channel technique for the front channels, as introduced in the previous chapter (sometimes optimised for more direct sound than in a two-channel array), coupled with a more or less decorrelated combination of microphones in a different location for capturing spatial ambience (sometimes fed just to the surrounds, other times to both front and surrounds). Sometimes the front microphones also contribute to the capture of spatial ambience, depending on the proportion of direct to reflected sound picked up, but the essential point here is that the front and rear microphones are not intentionally configured as an attempt at a 360° imaging array.

Images

Figure 17.21   SPL Atmos 5.1 Surround Recording System. (Courtesy of Sound Performance Lab)

Table 17.1 Dimensions and angles for the front three cardioid microphones of the INA array (see Figure 17.20). Note that the angle between the outer microphones should be the same as the recording angle

Images

Images

Figure 17.22   The so-called ‘Fukada Tree’ of five spaced microphones for surround recording

The so-called ‘Fukada Tree’, shown in Figure 17.22, is based on a Decca Tree, but instead of using omni mics it mainly uses cardioids. The reason for this is to reduce the amount of reverberant sound pickup by the front mics. Omni outriggers are sometimes added as shown, typically panned between L–LS and R–RS, in an attempt to increase the breadth of orchestral pickup and to integrate front and rear elements. The rear mics are also cardioids and are typically located at approximately the critical distance of the space concerned (where the direct and reverberant components are equal). They are sometimes spaced further back than the front mics by nearly 2 metres, although the dimensions of the tree can be varied according to the situation, distance, etc. (Variants are known that have the rear mics quite close to the front ones, for example.) The spacing between the mics more closely fulfils requirements for the decorrelated microphone signals needed to create spaciousness, depending on the critical distance of the space in which they are used. (Mics should be separated by at least the room’s critical distance for adequate decorrelation.) The front imaging of such an array would be similar to that of an ordinary Decca Tree (not bad, but not as precise as some other techniques).

The Dutch recording company, Polyhymnia International, has developed a variant on this approach that uses omnis instead of cardioids, to take advantage of their better sound quality. Using an array of omnis separated by about 3 metres between left–right and front–back they achieve a spacious result where the rear channels are well integrated with the front. The centre mic is placed slightly forward of left and right. It is claimed that placing the rear omnis too far away from the front tree makes the rear sound detached from the front image, so one gets a distinct echo or repeat of the front sound from the rear.

Images

Figure 17.23   A surround technique proposed by Hamasaki (NHK) consisting of a cardioid array, omni outriggers and separate ambience matrix

Hamasaki of NHK (the Japanese broadcasting company) has proposed an arrangement based on near-coincident cardioids (30 cm) separated by a baffle, as shown in Figure 17.23. Here the centre cardioid is placed slightly forward of left and right, and omni outriggers are spaced by about 3 metres. These omnis are low-pass filtered at 250 Hz and mixed with the left and right front signals to improve the LF sound quality. Left and right surround cardioids are spaced about 2–3 metres behind the front cardioids and 3 metres apart. An ambience array is used further back, consisting of four figure-eight mics facing sideways, spaced by about 1 metre, to capture lateral reflections, fed to the four outer channels. This is placed high in the recording space.

Theile proposes a front microphone arrangement shown in Figure 17.24. While superficially similar to the front arrays described in the previous section, he reduces crosstalk between the channels by the use of supercardioid microphones at ±90° for the left and right channels and a cardioid for the centre. (Supercardioids are more directional than cardioids and have the highest direct/reverberant pickup ratio of any first-order directional microphone. They have a smaller rear lobe than hypercardioids.) Theile’s rationale behind this proposal is the avoidance of crosstalk between the front segments. He proposes to enhance the LF response of the array by using a hybrid microphone for left and right, that crosses over to omni below 100 Hz, thereby restoring the otherwise poor LF response. The centre channel is high-pass filtered above 100 Hz. Furthermore, the response of the supercardioids should be equalised to have a flat response to signals at about 30° to the front of the array (they would normally sound quite coloured at this angle). Schoeps has developed a prototype of this array, and it has been christened ‘OCT’ for ‘Optimum Cardioid Triangle’.

Images

Figure 17.24   Theile’s proposed three-channel array for front pickup using supercardioids for the outer mics, crossed over to omni at LF. The spacing depends on the recording angle (C − R = 40 cm for 90° and 30 cm for 110°)

For the ambient sound signal, Theile proposes the use of a crossed configuration of microphones, that has been christened the ‘IRT cross’ or ‘atmo-cross’. This is shown in Figure 17.25. The microphones are either cardioids or omnis, and the spacing is chosen according to the degree of correlation desired between the channels. Theile suggests 25 cm for cardioids and about 40 cm for omnis, but says that this is open to experimentation. Small spacings are appropriate for more accurate imaging of reflection sources at the hot spot, whereas larger spacings are appropriate for providing diffuse reverberation over a large listening area. The signals are mixed in to L, R, LS and RS channels, but not the centre.

Images

Figure 17.25   The IRT ‘atmo-cross’ designed for picking up ambient sound for routing to four loudspeaker channels (omitting the centre). Mics can be cardioids or omnis (wider spacing for omnis)

Images

Figure 17.26   Double MS pair arrangement with small spacing between front and rear pair

A ‘double MS’ technique has been proposed by Curt Wittig and others, shown in Figure 17.26. Two MS pairs (see previous chapter) are used, one for the front channels and one for the rear. The centre channel can be fed from the front M microphone. The rear pair is placed at or just beyond the room’s critical distance. S gain can be varied to alter the image width in either sector, and the M mic’s polar pattern can be chosen for the desired directional response (it would typically be a cardioid). Others have suggested using a fifth microphone (a cardioid) in front of the forward MS pair, to feed the centre channel, delayed to time align it with the pair. If the front and rear MS pairs are co-located it may be necessary to delay the rear channels somewhat (10–30 ms) so as to reduce perceived spill from front sources into rear channels. In a co-located situation the same figure-eight microphone could be used as the S channel for both front and back pairs.

In general, the signals from separate ambience microphones fed to the rear loudspeakers may often be made less obtrusive and front–back ‘spill’ may be reduced by rolling off the high-frequency content of the rear channels. Some additional delay may also assist in the process of integrating the rear channel ambience. The precise values of delay and equalisation can only really be arrived at by experimentation in each situation.

Pseudo-binaural techniques

As with two-channel stereo, some engineers have experimented with pseudo-binaural recording techniques intended for loudspeaker reproduction. Jerry Bruck adapted the Schoeps ‘Sphere’ microphone (described earlier) for surround sound purposes by adding bi-directional (figure-eight) microphones near to the ‘ears’ (omni mics) of the sphere, with their main axis front–back, as pictured in Figure 17.27. This microphone is now manufactured by Schoeps as the KFM360. The figure-eights are mounted just below the sphere transducers so as to affect their frequency response in as benign a way as possible for horizontal sources. The outputs from the figure-eight and the omni at each side of the sphere are MS matrixed to create pairs of roughly back-to-back cardioids facing sideways. The size of the sphere creates an approximately ORTF spacing between the side-facing pairs. The matrixed output of this microphone can be used to feed four of the channels in a five-channel reproduction format (L, R, LS and RS). A Schoeps processing unit can be used to derive an equalised centre channel from the front two, and enables the patterns of front and rear coverage to be modified.

Images

Figure 17.27   (a) Schoeps KFM360 sphere microphone with additional figure-eights near the surface-mounted omnis. (b) KFM360 control box. (Courtesy of Schalltechnik Dr.-Ing. Schoeps GmbH)

Michael Bishop of Telarc has reportedly adapted the ‘double MS’ technique described in the previous section by using MS pairs facing sideways, and a dummy head some 1–2.5 m in front, as shown in Figure 17.28. The MS pairs are used between side pairs of channels (L and LS, R and RS) and line-up is apparently tricky. The dummy head is a model equalised for a natural response on loudspeakers (Neumann KU100) and is used for the front image.

Images

Figure 17.28   Double MS pairs facing sideways used to feed the side pairs of channels combined with a dummy head facing forwards to feed the front image

Multimicrophone techniques

Most real recording involves the use of spot microphones in addition to a main microphone technique of some sort, indeed in many situations the spot microphones may end up at higher levels than the main microphone or there may be no main microphone. The principles outlined in the previous chapter still apply in surround mixing, but now one has the issue of surround panning to contend with. The principles of this are covered in more detail in ‘Multichannel panning techniques’, below.

Some engineers report success with the use of multiple sphere microphones for surround balances, which is probably the result of the additional spatial cues generated by using a ‘stereo’ spot mic rather than a mono one, avoiding the flatness and lack of depth often associated with panned mono sources. Artificial reverberation of some sort is almost always helpful when trying to add spatial enhancement to panned mono sources, and some engineers prefer to use amplitude-panned signals to create a good balance in the front image, plus artificial reflections and reverberation to create a sense of spaciousness and depth.

Ambisonic or ‘Soundfield’ microphone principles

The so-called ‘Soundfield’ microphone, pictured in Figure 17.29, is designed for picking up full periphonic sound in the Ambisonic A-format (see ‘Signal formats’, above), and is coupled with a control box designed for converting the microphone output into both the B-format and the D-format. Decoders can be created for using the output of the Soundfield microphone with a 5.1-channel loudspeaker array, including that recently introduced by Soundfield Research. The full periphonic effect can only be obtained by reproduction through a suitable periphonic decoder and the use of a tetrahedral loudspeaker array with a height component, but the effect is quite stunning and worth the effort.

The Soundfield microphone is capable of being steered electrically by using the control box, in terms of either azimuth, elevation, tilt or dominance, and as such it is also a particularly useful stereo microphone for two-channel work. The microphone encodes directional information in all planes, including the pressure and velocity components of indirect and reverberant sounds.

Images

Figure 17.29   (a) The Soundfield microphone, (b) accompanying control box, (c) capsule arrangement. (Courtesy of SoundField Ltd)

Figure 17.29(b) shows the physical capsule arrangement of the microphone, which was shown diagrammatically in Figure 17.13. Four capsules with subcardioid polar patterns (between cardioid and omni, with a response equal to 2 + cos ϑ) are mounted so as to face in the A-format directions, with electronic equalisation to compensate for the inter-capsule spacing, such that the output of the microphone truly represents the soundfield at a point (true coincidence is maintained up to about 10 kHz). The capsules are matched very closely and each contributes an equal amount to the B-format signal, thus resulting in cancellation between variations in inherent capsule responses. The A-format signal from the microphone can be converted to B-format according to the equations given in ‘Signal formats’, above.

The combination of B-format signals in various proportions can be used to derive virtually any polar pattern in a coincident configuration, using a simple circuit as shown in Figure 17.30 (two-channel example). Crossed figure-eights are the most obvious and simple stereo pair to synthesise, since this requires the sum-and-difference of X and Y, whilst a pattern such as crossed cardioids requires that the omni component be used also, such that:

Left = W + (X/2) + (Y/2)

Right = W + (X/2) − (Y/2)

From the circuit it will be seen that a control also exists for adjusting the effective angle between the synthesised pair of microphones, and that this works by varying the ratio between X and Y in a sine/cosine relationship.

Images

Figure 17.30   Circuit used for controlling stereo angle and polar pattern in Soundfield microphone. (Courtesy of Ken Farrar)

Images

Figure 17.31   Azimuth, elevation and dominance in Soundfield microphone

The microphone may be controlled, without physical re-orientation, so as to ‘point’ in virtually any direction (see Figure 17.31). It may also be electrically inverted, so that it may be used upside-down. Inversion of the microphone is made possible by providing a switch which reverses the phase of Y and Z components. W and X may remain unchanged since their directions do not change if the microphone is used upside-down.

Multichannel panning techniques

The panning of signals between more than two loudspeakers presents a number of psychoacoustic problems, particularly with regard to appropriate energy distribution of signals, accuracy of phantom source localisation, off-centre listening and sound timbre. A number of different solutions have been proposed, in addition to the relatively crude pairwise approach used in much film sound, and some of these are outlined below. The issue of source distance simulation is also discussed.

Here are Michael Gerzon’s criteria for a good panning law for surround sound:

The aim of a good panpot law is to take monophonic sounds, and to give each one amplitude gains, one for each loudspeaker, dependent on the intended illusory directional localisation of that sound, such that the resulting reproduced sound provides a convincing and sharp phantom illusory image. Such a good panpot law should provide a smoothly continuous range of image directions for any direction between those of the two outermost loudspeakers, with no ‘bunching’ of images close to any one direction or ‘holes’ in which the illusory imaging is very poor.

Pairwise amplitude panning

Pairwise amplitude panning is the type of pan control most recording engineers are familiar with, as it is the approach used on most two-channel mixers. As described in the previous chapter, it involves adjusting the relative amplitudes between a pair of adjacent loudspeakers so as to create a phantom image at some point between them. This has been extended to three front channels and is also sometimes used for panning between side loudspeakers (e.g.: L and LS) and rear loudspeakers. The typical sine/cosine panning law devised by Blumlein for two-channel stereo is often simply extended to more loudspeakers. Most such panners are constructed so as to ensure constant power as sources are panned to different combinations of loudspeakers, so that the approximate loudness of signals remains constant.

Panning using amplitude or time differences between widely spaced side loudspeakers is not particularly successful at creating accurate phantom images. Side images tend not to move linearly as they are panned and tend to jump quickly from front to back. Spectral differences resulting from differing HRTFs of front and rear sound tend to result in sources appearing to be spectrally split or ‘smeared’ when panned to the sides.

In some mixers designed for five-channel surround work, particularly in the film domain, separate panners are provided for L–C–R, LS–RS, and front-surround. Combinations of positions of these amplitude panners enable sounds to be moved to various locations, but some more successfully than others. For example, sounds panned so that some energy is emanating from all loudspeakers (say, panned centrally on all three pots) tend to sound diffuse for centre listeners, and in the nearest loudspeaker for those sitting off-centre. Joystick panners combine these amplitude relationships under the control of a single lever that enables a sound to be ‘placed’ dynamically anywhere in the surround soundfield. Moving effects made possible by these joysticks are often unconvincing and need to be used with experience and care.

Research undertaken by Jim West at the University of Miami showed that, despite the limitations of constant power ‘pairwise’ panning, it proved to offer reasonably stable images for centre and off-centre listening positions, for moving and stationary sources, compared with some other more esoteric algorithms. Front–back confusion was noticed in some cases, for sources panned behind the listener.

‘Ambisonic’ panning laws

A number of variations of panning laws loosely based on Ambisonic principles have been attempted. These are primarily based on the need to optimise psychoacoustic localisation parameters according to low- and high-frequency models of human hearing. Gerzon proposed a variety of psychoacoustically optimal panning laws for multiple speakers that can theoretically be extended to any number of speakers. Some important features of these panning laws are:

•  There is often output from multiple speakers in the array, rather than just two.

Images

Figure 17.32   Five channel panning law based on Gerzon’s psychoacoustic principles. (Courtesy of Douglas McKinnie)

•  They tend to exhibit negative gain components (out-of-phase signals) in some channels for some panning positions.

•  The channel separation is quite poor.

A number of authors have shown how this type of panning could be extended to five-channel layouts according to the standards of interest in this book. McKinnie proposed a five-channel panning law based on similar principles, suitable for the standard loudspeaker angles. It is shown in Figure 17.32. Moorer also proposed some four and five channel panning laws, pictured in Figure 17.33 (only half the circle is shown because the other side is symmetrical). They differ because Moorer has chosen to constrain the solution to first order spatial harmonics (a topic beyond the scope of this book). He proposes that the standard ±30° angle for the front loudspeakers is too narrow for music, and that it gives rise to levels in the centre channel that are too high in many cases to obtain adequate L–R decorrelation, as well as giving rise to strong out-of-phase components. He suggests at least ±45° to avoid this problem. Furthermore, he states that the four-channel law is better behaved with these particular constraints and might be more appropriate for surround panning.

Head-related panning

Horbach of Studer has proposed alternative panning techniques based on Theile’s ‘association model’ of stereo perception. This uses assumptions similar to those used for the Schoeps ‘sphere’ microphone, based on the idea that ‘head-related’ or pseudo-binaural signal differences should be created between the loudspeaker signals to create natural spatial images. It is proposed that this can work without crosstalk cancelling, but that crosstalk cancelling can be added to improve the full 3D effect for a limited range of listening positions.

In creating his panning laws, Horbach chooses to emulate the response of a simple spherical head model that does not give rise to the high-frequency peaks and troughs in response typical of heads with pinnae. This is claimed to create a natural frequency response for loudspeaker listening, very similar to that which would arise from a sphere microphone used to pick up the same source. Sources can be panned outside the normal loudspeaker angle at the front by introducing a basic crosstalk cancelling signal into the opposite front loudspeaker (e.g.: into the right when a signal is panned left). Front–back and centre channel panning are incorporated by conventional amplitude control means. He also proposes using a digital mixer to generate artificial echoes or reflections of the individual sources, routed to appropriate output channels, to simulate the natural acoustics of sources in real spaces, and to provide distance cues.

Images

Figure 17.33   Two panning laws proposed by Moorer designed for optimum velocity and energy vector localisation with 2nd spatial harmonics constrained to zero. (a) Four-channel soundfield panning. The front speakers are placed at 30° angles left and right, and the rear speakers are at 110° left and right. (b) This shows an attempt to perform soundfield panning across five speakers where the front left and right are at 30° angles and the rear left and right are at 110° angles. Note that at zero degrees, the centre speaker is driven strongly out of phase. At 180°, the centre speaker is driven quite strongly, and the front left and right speakers are driven strongly out of phase. At low frequencies, the wavelengths are quite large and the adjacent positive and negative sound pressures will cancel out. At higher frequencies, their energies can be expected to sum in an RMS sense. (Courtesy of James A. Moorer)

Recommended further reading

AES (2001) Proceedings of the 19th International Conference: Surround SoundTechniques, Technology and Perception. Audio Engineering Society

AES (2001) Technical document ESTD1001.0.01-05: Multichannel surround sound systems and operations. Available from website: http://www.aes.org

Eargle, J. (2005) The Microphone Book. Focal Press

Holman, T. (1999) 5.1 Surround Sound: Up and Running. Focal Press

ITU-R (1993) Recommendation BS 755: Multi-channel stereophonic sound system with or without accompanying picture. International Telecommunications Union

Rumsey, F. (2001) Spatial Audio. Focal Press

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset