Chapter 1

Perception of Spatial Sound

Elizabeth M. Wenzel, Durand R. Begault, and Martine Godfroy-Cooper

Immersion refers acoustically to sounds as coming from all directions around a listener, which normally is an inevitable consequence of natural human listening in an air medium. Audible sound sources are everywhere in real environments where sound waves propagate and reflect from surfaces around a listener. Even in the quietest of environments, such as an anechoic chamber, the sounds of one’s own body will be audible. However, the common meaning of immersion in audio and acoustics refers to the psychological sensation of being surrounded by specific sound sources as well as ambient sound. Although acoustically a sound can reach a listener from multiple surrounding directions, its spatial characteristics may be judged as unrealistic, static or constrained. For example, good quality concert hall acoustics has traditionally been correlated with a listener’s sensation of being immersed by the sound of the orchestra, as opposed to the sound seeming distant and removed. Spatial audio techniques, particularly 3D audio, can provide an immersive experience because virtual sound sources and sound reflections can be made to appear from anywhere in space around a listener. This chapter introduces a listener to the physiological, psychoacoustic and acoustic bases of these sensations.

Auditory Physiology

Auditory perception is a complex phenomenon determined by the physiology of the auditory system and affected by cognitive processes. The auditory system transforms the fundamental independent aspects of sound stimuli, such as their spectral content, temporal properties and location in space into distinct patterns of neural activity. These patterns will give rise to the qualitative experience of pitch, loudness, timbre and location. They will ultimately be integrated with information from the other sensory systems to form a unified perceptual representation and provide behavior guidance that includes orienting to acoustical stimuli and engaging in intra-species communication.

Auditory Function: Peripheral Processing

The functional auditory system extends from the ears to the brain’s frontal lobes with successively more complex functions occurring as one ascends the hierarchy of the nervous system (Figure 1.1). The different functions performed by the auditory system are classically categorized as peripheral auditory processing and central auditory processing.

The peripheral auditory system includes processing stages from the outer ear to the cochlear nerve. A crucial transformation is performed within these early stages, which is often compared to a Fourier analysis of the incoming sound waves that defines how the sounds are processed at the later stages of the auditory hierarchy. Sound enters the ear as pressure waves. At the periphery of the system, the external and the middle ear respectively collect sound waves and selectively amplify their pressure, so that they can be successfully transmitted to the fluid-filled cochlea in the inner ear.

The external ear, which consists of pinna (plural, pinnae) and auditory meatus (or canal), gathers the pressure waves and focuses them on the eardrum (tympanic membrane) at the end of the canal. One consequence of the configuration of the human auditory canal is that it selectively boosts the sound pressure 30 to 100 fold for frequencies around 3 kHz via a passive resonance effect due to the length of the ear canal. This amplification makes humans especially sensitive to frequencies in the range of 2–5 kHz, which appears to be directly related to speech perception. A second important function of the pinnae is to selectively filter sound frequencies in order to provide cues about the elevation of a sound source: up/down and front/back angles (Shaw, 1974). The vertically asymmetrical convolutions of the pinna are shaped so that the external ear transmits more high frequency components from an elevated source than from the same source at ear level. Similarly, high frequencies tend to be more attenuated for sources in the rear than for sources in the front, as a consequence of the orientation and structure of the pinna (Blauert, 1997).

The middle ear is a small cavity, which separates the outer and the inner ear. The cavity contains the three smallest bones (hammer, anvil and stirrup) in the body called ossicles, connected more or less flexibly to each other. Its major function is to match the relatively low impedance (impedance in this context refers to a medium’s resistance to movement) airborne sounds to the higher impedance fluid in the inner ear. Without this action, there would be a loss of transmission of 1000:1 corresponding to a loss of sensitivity of 30 dB. The middle ear has two small muscles known as the tensor tympani and the stapedius, which have a protective function. When sound pressure reaches a threshold loudness level (at approximately 85 dB HL1 in humans with normal hearing), a sensory driven afferent signal is sent to the brainstem via the cochlear nerve, which initiates an efferent reflexive contraction of the stapedius muscle within the middle ears referred to as the stapedius reflex or acoustic middle ear reflex. The excitation of the muscle results in a stimulus level-dependent attenuation of low-frequency (< 1 kHz) ossicular chain vibration reaching the cochlea (Wilson & Margolis, 1999).

Transduction Mechanisms in the Cochlea

The cochlea in the inner ear is the most critical structure in the peripheral auditory pathway. The cochlea is a small-coiled snail-like structure that responds to the sound-induced vibrations and converts them into electrical impulses, a process known as mechanoelectrical transduction. Cochlear signal transduction involves amplification and decomposition of complex acoustical waveforms into their component frequencies.

Two membranes, the basilar membrane (BM) and the vestibular membrane (VM), divide the cochlea in three fluid-filled chambers. The organ of Corti sits on the BM and contains an array of sensory hair cells that contact with the tectorial membrane (TM), a structure that plays multiple, critical roles in hearing including coupling elements along the length of the cochlea, supporting a travelling wave and ensuring the gain and timing of cochlear feedback are optimal (Richardson, Lukashkin, & Russell, 2008). The sensory hair cells are responsible for the mechanoelectrical transduction, i.e., the transformation of mechanical stimulus into electrochemical activity. The acoustical stimulus initiates a traveling wave by displacing the hair, thus enabling the encoding of frequency, amplitude and phase of the original sound stimulus by the electrical activity of the auditory nerve fibers. As the BM displaces, it causes deflection in the hair bundles (stereocilliae, tiny processes that protrude from the apical ends of the hair cells) of the hair cells of the location-matched inner hair cells, which results in a current flow and ultimately an action potential. Because the stiffness of the BM changes throughout the cochlea, the displacement of the membrane induced by an incoming sound depends on the frequency of that sound. Specifically, the BM is stiffer near its “base” than near the middle of the spiral (“the apex”). Consequently, high frequency sounds (20 kHz) produce displacement near the base, while low frequency sounds (20 Hz) disturb the membrane near the apex. As a result, each locus on the BM is identified by its characteristic frequency (CF) and the whole BM can be described as a bank of overlapping filters (Patterson et al., 1987; Meddis & Lopez-Poveda, 2010). Because the BM displaces in a frequency-dependent manner, the corresponding hair cells are “tuned” to sound frequency. The resulting spatial proximity of contiguous preferred sound frequency (“place theory of hearing”, von Békésy, 1960; “place code” model, Jeffress, 1948) is referred to as tonotopy or better, cochleotopy. This tonotopic organization is carried up through the auditory hierarchy to the cortex (Moerel et al., 2013, Saenz & Langers, 2014) and defines the functional topography in each of the intermediate relays.

Auditory Function: Central Processing

The central auditory system is composed of a number of nuclei and complex pathways that ascend within the brainstem. The earliest stage of central processing occurs at the cochlear nucleii (dorsal, DCN and ventral, VCN), where the tonotopic organization of the cochlea is maintained. Accordingly the output of the CN has several targets.

One is the superior olivary complex (SOC), the first point at which information from the two ears interacts. The best understood function of the SOC is sound localization. Humans use at least two different strategies, and two different pathways, to localize the horizontal position of sound sources, depending on the frequencies of the stimulus. For frequencies below 3 kHz, which the auditory nerve can follow in a phase-locked manner, interaural time differences (ITDs) are used to localize the source; above these frequencies, interaural intensity differences (IIDs) are used as cues (King & Middlebrooks, 2011; Yin, 2002). ITDs are processed in the medial superior olive (MSO), while IIDs are processed in the lateral superior olive (LSO). These two pathways eventually merge in the midbrain auditory centers. The elevation of sound sources is determined by spectral filtering mediated by the external pinnae. Experimental evidence suggests that the spectral notches created by the shape of the pinnae are detected by neurons in the DCN. See the section on human sound localization for additional discussion of these cues.

The binaural pathways for sound localization are only part of the output of the CN. A second major set of pathways from the CN bypasses the SOC and terminates in the nuclei of the lateral lemniscus (LL) on the contralateral side of the brainstem. These particular pathways respond to sound arriving at one ear only and are thus referred to as monaural. Some cells in the nuclei of the LL signal the onset of sound, regardless of its intensity or frequency. Other cells process other temporal aspects of sound such as duration.

As with the outputs of the SOC, the pathways from the LL project to the midbrain auditory center, also known as the inferior colliculus (IC). This structure is a major integrative center where the convergence of binaural inputs produces a computed topographical representation of the auditory space. At this level, neurons are typically sensitive to multiple localization cues (Chase & Young, 2006) and respond best to sounds originating in a specific region of space, with a preferred elevation and a preferred azimuthal location. As a consequence, it is the first point at which auditory information can interact with the motor system. Another important property of the IC is its ability to process sounds with complex temporal patterns. Many neurons in the IC respond only to frequency-modulated sounds while others respond only to sounds of specific durations. Such sounds are typical components of biologically relevant sounds, such as those made by predators and in humans, speech.

The IC relays auditory information to the medial geniculate nucleus (MGN) of the thalamus, which is an obligatory relay for all ascending information destined for the cortex. It is the first station in the auditory pathway where pronounced selectivity for combinations of frequencies is found. Cells in the MGN are also selective for specific time intervals between frequencies. The detection of harmonic and temporal combination of sounds is an important feature of the processing of speech.

In addition to the cortical projection, a pathway to the superior colliculus (SC) gives rise to an organized representation of ITDs and IIDs in a point-to-point map of the auditory space (King & Palmer, 1983). Topographic representations of multiple sensory modalities (visual, auditory and somatosensory) are integrated to control the orientation of movements toward specific spatial locations (King, 2005).

Auditory Function: The Auditory Cortex

The auditory cortex (AC, Figure 1.2) is the major target of the ascending fibers from the MGC and plays an essential role in our conscious perception of sound, including speech comprehension, which is arguably the most significant social stimulus for humans.

Although the AC has a number of subdivisions, a broad distinction can be made between a primary area and a secondary area. The primary auditory cortex (BA41, core area) located on the superior temporal gyrus of the temporal lobe receives point-to-point input from the MGC and comprises three distinct tonotopic fields (Saenz & Langers, 2014). Neurons in BA41 have narrow tuning functions and respond best to tone stimuli, supporting basic auditory functions such as frequency discrimination and sound localization. It also plays a role in processing of within-species communication sounds.

The belt areas of the auditory cortex receive more diffuse inputs from the MGC as well as inputs from BA41, and are less precise in their tonotopic organization. Neurons in the belt areas (BA42b) have broader frequency tuning functions and respond better to complex sounds, such as those that mediate communication. Lateral to the belt is a region of cortex denoted as the parabelt (BA42p) where neurons prefer complex stimuli including band-passed noise, moving stimuli and vocalizations (Rauschecker, Tian & Hauser, 1995).

The projections from the parabelt out of the auditory cortex to higher order cortical structures define the auditory dorsal processing stream and the ventral processing stream (see Figure 1.2). According to the auditory dual-stream model, spatial information (“where”) is primarily processed within the dorsal stream and non-spatial information (object features, i.e., “what”) within the ventral stream (Rauschecker & Tian, 2000; Romanski & Goldman-Rakic, 2002). The auditory ventral stream supports the perception and recognition of auditory objects and is involved in the processing of pitch changes, auditory working memory for words and tones, as well as semantic processing. There is less agreement regarding the functional role of the auditory dorsal stream. The earliest models argued for a role in spatial hearing, but recent research suggests that the auditory dorsal stream supports an interface with the motor system. In fact, this segregation between dorsal/ventral streams appears to be more relative than absolute. It seems that the functions of sound identification and spatial analysis are rather co-localized in the dorsal and ventral auditory streams (Gifford & Cohen, 2005; Lewald et al., 2008) and that human spatial processing is strongly linked with functions of pitch perception (Douglas & Bilkey, 2007).

Connections from the auditory cortex to the frontal lobes mediate a number of functions including language, object recognition and spatial localization. The frontal cortex is a heterogeneous region with multiple functional subdivisions, including the prefrontal cortex (PFC), and is part of the association cortices. The frontal association cortex is importantly involved in guiding complex behavior by planning responses, ongoing stimulation or remembered information. Collectively, the association cortices mediate the cognitive functions of the brain, including speech processing and executive functions that include attention, working memory, planning and decision-making (Fuster, 2009; Plakke et al., 2014).

Human Sound Localization

Primary Localization Cues

Auditory spatial perception refers to the ability to localize individual sound sources in 3D space even when multiple, simultaneous sources are present. Unlike the visual and somatosensory systems, spatial information is not directly represented at the sensory receptor in the auditory system. Instead, spatial locations are estimated by integrating neural binaural properties and frequency-dependent pinna filtering (binaural and monaural cues, Yost & Dye, 1997).

Sound source location is often specified in terms of azimuth, elevation and distance using a coordinate system in which a listener facing directly forward is defined as 0° azimuth and 0° elevation. Azimuth is defined by the angle (θ) between the source location and the median plane at 0° azimuth (projected onto the horizontal plane) and elevation is the angle (δ) between the source location and the horizontal plane at 0° elevation (projected onto the median plane).

Azimuths to right of the listener are positive, to the left are negative, and the rear is defined as 180°.2 Elevations are positive for upper directions and negative for lower directions relative to the listener. Distance is defined as the radius (r) projected along the vector formed by the azimuth and elevation of the source. Another important terminological distinction relevant to interaural cues is between the ipsilateral and contralateral ears. The ipsilateral ear is that closest to the sound source; sound thus arrives first and is greater in intensity at the ipsilateral ear. The contralateral ear is that farthest from the sound source, thus sound arrives later and with less intensity at the contralateral ear.

The localization of a sound source in the horizontal dimension (azimuth) results from the detection of left-right interaural differences in time of arrival and interaural differences in intensity at the two ears (Middlebrooks & Green, 1991). These cues also facilitate speech intelligibility in background noise in human listeners (Culling, Hawley & Litovsky, 2004). To localize a sound in the vertical dimension (elevation) and to resolve front-back confusions, the auditory system relies on the detailed geometry of the pinnae, causing acoustic waves to diffract and undergo direction-dependent reflections (Blauert, 1997; Hofman & Van Opstal, 2003). The two different modes of indirect coding of the position of a sound source in space (as compared to the direct spatial coding of visual stimuli) result in differences in spatial resolution in these two directions.

Sound Localization in the Horizontal Dimension

Much of the research on human sound localization in azimuth has derived from Lord Rayleigh’s “duplex theory” (1907) which emphasizes the role of two primary cues (top two panels of Figure 1.3): interaural time differences (ITDs) and interaural intensity differences (IIDs, also referred to as Interaural Level Differences, ILDs, particularly when specified in dB SPL). Because the theory had been based primarily on experiments with single-frequency (sine wave) sounds, the original proposal was that IIDs resulting from head-shadowing determine localization at high frequencies (roughly 2000 Hz for larger human heads), while ITDs were thought to be important only for low frequencies because of the phase ambiguities occurring at frequencies greater than ~1000–1500 Hz. Recently, Brughera, Dunai and Hartmann (2013) have demonstrated that humans are sensitive to ITD fine structure in sound (temporal fine structure, TFS; the sound pressure waveform) up to a limit of 1400 Hz. For broadband sounds, the situation is more complex, and ITDs contribute to sound localization at higher frequencies, up to 4000 Hz (Bernstein, 2001).

The interaural differences present in natural spatial hearing can be understood by considering a simplified rigid sphere model of a listener with a perfectly round head and no outer ears (spherical head model, Woodworth, 1938) placed at a fixed distance in an anechoic chamber from a broadband sound source at eye level (see Figure 1.4).

Modeling this situation involves calculating two paths representing the sound source wavefront from its center of origin to two points representing the entrance to the ear canals. An additional simplification is the placement of these points exactly at the midline crossing the sphere, at the ends of the interaural axis. With the source at position A at 0° azimuth, the path lengths are equal, causing the wavefront to arrive at the eardrums at the same time and with equal intensity. At position B, the sound source is at +60° azimuth to the right of the listener, and the paths are now unequal; this will cause the sound source wavefront to arrive later in time at the left ear relative to the right. This sound path length difference is the basis of the ITD cue, and it relates to the hearing system’s ability to detect interaural phase differences (IPDs) below approximately 1,000 Hz. If the sounds are pure tones, a simple frequency factor relates the ITDs to the IPDs, for which there are known iso-IPD boundaries (90°, 180°…) defining regions of spatial perception. Although dependent upon the nature of the stimulus and the measurement technique, a value of around 650 µsec (750 µsec for low-frequency sounds, Kuhn, 1977) is a good approximation of the observed maximum value for an average human head. Figure 1.5 illustrates how the ITD changes as a function of the azimuthal angle of incidence.

The auditory system can phase-lock, or change the rate of neural firing corresponding to the peaks in a stimulus waveform, as long as the inter-peak interval is above about 1 msec. Such phase-locking can occur for the peaks in either sine waves or the envelopes of complex signals. One theory is that ITDs are estimated by the auditory system by comparing peaks in neural firing rates between the two ears in a manner known as coincidence detection (place code or topographic model, Jeffress, 1948), a neural mechanism that enables interaural cross-correlation. More recent data support the opponent channel coding of auditory space in humans (van Bergeijk, 1962; Magezi & Krumbholz, 2010; Salminen et al., 2010) where ITD is represented by a non-topographic population rate code, which involves only two opponent (left and right) channels, broadly tuned to ITDs from the two auditory hemifields. The data suggest that the majority of ITD-sensitive neurons in each hemisphere are tuned to ITDs from the contralateral hemifield.

The sound source at position B in Figure 1.4 will also yield a significant interaural intensity difference cue, but only for those waveform components that are smaller than the diameter of the head, i.e., for frequencies greater than about 2,000 Hz. Higher frequencies will be attenuated at the left ear because the head acts as an obstacle, creating a “head shadow” effect at the opposite side. The relation of a wavefront to an obstacle of fixed size is such that the shadow effect increases with increasing frequency (i.e., decreasing size of the wavelength). However, below 2,000 Hz, the IID is no longer effective as a natural spatial hearing cue because longer wavelengths will diffract (“bend”) around the obstructing surface of the head, thereby minimizing the intensity differences. Figure 1.6 illustrates this head-shadow effect by plotting the IID as a function of azimuth location and stimulus frequency. Measured data from the literature for IIDs shows that a 3,000 Hz sine wave at 90° azimuth will be attenuated by about 10 dB, a 6,000 Hz sine wave will be attenuated by about 20 dB, and a 10,000 Hz wave by about 35 dB (Feddersen et al., 1957; Middlebrooks & Green, 1991). Measured IIDs derived from an individual listener are also shown in Figure 1.6 (bottom). Note that the pattern of IIDs can be quite complex across both frequency and azimuth.

Independent of frequency content, variations in the overall difference between left and right intensity levels at the eardrum are interpreted as changes in the sound source position from the perspective of the listener. Consider the primary spatial auditory cueing device built into stereo recording consoles, the panpot (short for “panoramic potentiometer”). Over headphones, the panpot creates IIDs without regard to frequency, yet it works for separating sound sources in most applications. This is because the frequency content of typical sounds includes frequencies above and below the hypothetical “cut-off” points for IID and ITD, and listeners are sensitive to IID cues for localization across most of the audible frequency range, down to at least 200 Hz (Blauert, 1997).

Binaural research over the last few decades, however, points to serious limitations of the duplex theory. For example, for nearby sources, the IID is available even at low frequencies (Shinn-Cunningham, 2000). Similarly, it is known that ITD cues based on the relative timing of the amplitude envelopes (“envelope ITD”) of high-frequency sounds can be used by a mechanism such as interaural coincidence detection (Henning, 1974, 1980; van de Par & Kohlrausch, 1997; Bernstein & Trahiotis, 2010). Finally, in theory, the azimuth of a sound source can also be determined monaurally because the high-frequency components of the sound are more attenuated compared to low-frequency components as the sound source moves contra-laterally (Shub, Durlach & Colburn, 2008).

Sound Localization in the Vertical Dimension

The duplex theory cannot account for the ability of subjects to localize sounds on the vertical median plane (directly in front of the listener), where interaural cues are minimal. Similarly, when subjects listen to stimuli over headphones, the sounds are perceived as being lateralized inside the head even though interaural temporal and intensity differences appropriate to an external source location are present.

The results of many studies now suggest that these deficiencies of the duplex theory reflect the important contribution to localization of the direction-dependent filtering that occurs when incoming sound waves interact with the outer ears or pinnae and other body structures such as the shoulders and torso.

The main cue the human auditory system uses to determine the elevation of a sound source is the monaural spectrum determined by the interaction of the sound with the pinnae (Wightman & Kistler, 1997). However, small head asymmetries may provide a weak binaural elevation cue. Specifically, there is a spectral notch that moves in frequency from approximately 5 kHz to 10 kHz as the source moves from 0° (directly ahead of listener) to 90° (above the listener’s head) that is considered to be the main elevation cue (Musicant & Butler, 1985; Moore, Oldfield & Dooley, 1989). As sound propagates from a source to a listener’s ears, reflection and refraction effects tend to alter the sound in subtle ways and the effect depends on frequency. For example, for a particular location, a group of high-frequency components centered at 8 kHz may be attenuated more than a different band of components centered at 6 kHz. Such frequency dependent effects or filtering also vary greatly with the direction of the sound source. Thus, for a different source location, the band at 6 kHz may be more attenuated than the higher frequency band at 8 kHz. It is clear that listeners use these kinds of frequency-dependent effects to discriminate one location from another. Experiments have shown that spectral shaping by the pinnae is highly direction-dependent, that the absence of pinna cues degrades localization accuracy, and that pinna cues are partially responsible for externalization or the “outside-the head” sensation (Gardner & Gardner, 1973; Oldfield & Parker, 1984a, b; Plenge, 1974; Shaw, 1974).

Other monaural cues are provided by the ratio of direct-to-reverberant energy that expresses the amount of sound energy that reaches our ears directly from the source versus the amount that is reflected off the walls in enclosed spaces (Larsen et al., 2008). In general, monaural cues are more ambiguous spatial cues than binaural cues because the auditory system must make a priori assumptions about the acoustic features of the original sound in order to estimate the filtering effects corresponding to the monaural spatial cues. Environmental cues will be discussed in more detail in a later section.

Factors Affecting Localization Performance

Localization performance generally refers to the degree of accuracy with which listeners can identify and/or discriminate the location of a sound source. It may be measured using a variety of experimental paradigms with different types of response measures. For example, in a direct localization task a sound source (real, recorded or virtual) is presented to a listener who is asked to report its location, perhaps in terms of estimates of azimuth, elevation and distance.

Several kinds of error are usually observed in perceptual studies of localization when listeners are asked to judge the position of a static sound source in the free field. One, which Blauert (1997) refers to as localization blur, is a relatively small error in resolution on the order of about 5° to 20°. A related measure of localization accuracy is the minimum audible angle (MAA), the minimum detectable angular difference between two successive sound sources. MAAs increase from about 1° for a sound directly ahead, to 20° or more for a sound directly to the right or left (Mills, 1958; Perrott & Saberi, 1990). The minimum audible movement angle (MAMA) is the minimum detectable angular difference of a continuously moving sound source (Perrott & Tucker, 1988). The MAMA depends on the speed of the moving sound source and ranges from about 8° for a velocity of 90°/s to about 21° for a velocity of 360°/s.

Another class of error observed in nearly all localization studies is the occurrence of front-back “reversals” (Figure 1.7, right). These are judgments indicating that a source in the front hemisphere was perceived by the listener as if it were in the rear hemisphere. Occasionally, back-to-front confusions are also found (e.g., Oldfield & Parker, 1984a, b). Confusions in elevation, with up locations heard as down, and vice versa, have also been observed (Wenzel, 1991).

Although the reason for such reversals is not completely understood, they are probably due in large part to the static nature of the stimulus and the ambiguities resulting from the so-called cone of confusion (Woodworth, 1938; Woodworth & Schlosberg, 1954; Mills, 1972). Assuming a stationary, spherical model of the head and symmetrically located ear canals (without pinnae), a given interaural time or intensity difference will correlate ambiguously with the direction of a sound source, with a conical shell describing the locus of all possible sources (Figure 1.7, left). Intersection of these conical surfaces with the surface of a sphere results in circular projections corresponding to contours of constant ITD or IID (i.e., considering sources at an arbitrary fixed distance). While the rigid sphere model is not the whole story, the observed pattern of such iso-IDT and iso-IID contours indicates that the interaural characteristics of the stimulus are inherently ambiguous. In the absence of other cues, both front-back and up-down reversals would seem to be quite likely.

Several cues are thought to help in disambiguating the cones of confusion. One is the complex spectral shaping provided by the pinnae as a function of location that was described above. For example, because of the orientation and shell-like structure of the pinnae, high frequencies tend to be more attenuated for sources in the rear than for sources in the front [e.g., see Blauert’s (1997) discussion of “boosted bands”, pp. 111–116]. For the case of static sounds, such cues would essentially be the only clue to disambiguating source location. With dynamic stimuli, however, the situation improves greatly. A variety of studies have shown that allowing listeners to move their heads substantially improves localization ability and can almost completely eliminate reversals (e.g., Wallach, 1939, 1940; Thurlow & Runge, 1967; Fisher & Freedman, 1968; Wightman & Kistler, 1999; Begault, Wenzel & Anderson, 2001). With head-motion, the listener can apparently disambiguate front-back locations by tracking changes in the magnitude of the interaural cues over time; for a given lateral head movement, ITDs and IIDs for sources in the front will change in the opposite direction compared to sources in the rear (Wallach, 1939, 1940). Time-varying cues provided by moving sources may also aid in disambiguation, particularly if there is a priori knowledge about the direction of motion (Wightman & Kistler, 1999), although relatively little research has been done on the topic of source motion in this context.

In addition to the primary localization cues, the localizability of a sound also depends on other factors such as its spectral content: narrowband (pure) tones are generally difficult to localize while broadband, impulsive sounds are the easiest to locate. A closely related issue in the localizability of sound sources is their degree of familiarity. Logically, localization based on spatial cues other than the interaural cues, e.g., cues related to spectral shaping by the pinnae, is largely determined by a listener’s a priori knowledge of the spectrum of the sound source. The listener must “know” what the spectrum of a sound is to begin with to determine that the same sound at different positions has been differentially shaped by the effects of his or her ear structures. Thus both the perception of elevation and relative distance, which depend heavily on the detection of spectral differences, tend to be superior for familiar signals like speech (e.g., Plenge & Brunschen, 1971, in Blauert, 1997, p. 104; Coleman, 1963). Similarly, spectral familiarity can be established through training (Batteau, 1967).

In an acoustical environment multiple acoustic objects can be present and room reverberation can also distort spatial cues. When the listener is in a room or other reverberant environment the direct sound received at the ears is combined with multiple copies of the sound reflected off the walls before arriving at the ears. Reverberation alters the monaural spectrum of the sound as well as the IIDs and IPDs of the signals reaching the listener (Shinn-Cunningham, Kopco & Martin, 2005). These effects depend on the source position relative to the listener as well as on the listener position in the room. On the other hand, the ratio of direct to reverberant energy itself can provide a spatial cue.

Such phenomena suggest that the auditory system exhibits neural plasticity, adapting over time in response to perceptual experience and changes in the sensory environment. Neural plasticity will be discussed in a later section.

Head-Related Transfer Functions (HRTFs) and Virtual Acoustics

The data on the primary localization cues suggest that perceptually veridical localization over headphones is possible if the spectral shaping by the pinnae and other body structures as well as the interaural difference cues can be adequately reproduced in a 3D sound system or virtual acoustic display. There may be many cumulative effects on the sound as it makes its way to the eardrum, but it turns out that all of these effects can be expressed as a single filtering operation much like the effects of a graphic equalizer in a stereo system. The exact nature of this filter can be measured by a simple experiment in which an impulse (a single, very short sound pulse or click) or other broadband probe stimulus is produced by a loudspeaker at a particular location. The acoustic shaping by the two ears is then measured by recording the outputs of small probe microphones placed inside an individual’s ear canals (Figure 1.8). If the measurement of the two ears occurs simultaneously, the responses, when taken together as a pair of filters, include an estimate of the interaural differences as well. Thus, this technique allows one to measure all of the relevant spatial cues together for a given source location, a given listener and in a given room or environment.

The bottom panel of Figure 1.3 illustrates these effects for the transfer functions of the ears. The illustration on the left shows the frequency domain representation, derived from a mathematical operation known as the Fourier Transform, of an acoustic impulse in the time domain before interaction with the outer ear (and other body) structures. The illustration on the right shows what happens to the frequency response of an impulse delivered from a loudspeaker located directly to the right of a listener after interaction with the outer ear structures, as measured in the left (solid line) and right (dashed line) ear canals of a listener. The differences between the left and right intensity curves are the IIDs at each frequency. Spectral phase effects (frequency-dependent phase, or time delays) are also present in the measurements, but are not shown here for clarity. The filters constructed from these ear-dependent characteristics are examples of Finite Impulse Response (FIR) filters and are often referred to as Head-Related Impulse Responses (HRIRs) in the time domain, and Head-Related Transfer Functions (HRTFs) in the frequency domain. Filtering in the frequency domain is a point-by-point multiplication operation while filtering in the time domain occurs via a somewhat more complex operation known as convolution [see Brigham (1974) for a useful pictorial discussion of filtering and convolution]. By filtering an arbitrary sound with these HRTF-based filters, it is possible to impose spatial characteristics on the signal such that it apparently emanates from the originally measured location. If the filtering occurs in real-time, the effects of source motion and the listener’s head motion can also be simulated.

Figure 1.9 provides examples of the magnitude responses for the left and right ears derived from measured HRTFs. The top panels show the frequency responses for azimuths of −90°, 0° and +90° at 0° elevation. Note that one can see how the magnitude responses are similar at 0° azimuth and are larger in the ipsilateral ear for the −90° (left ear) and +90° (right ear) source locations. The difference between the left and right ear responses represents the IIDs as a function of frequency. The overall IIDs averaged across frequency, as well as the ITDs, tend to be similar between individual subjects. Consequently, accuracy in azimuth perception is generally observed to be reasonably comparable when listening to stimuli generated from either individualized (one’s own) or non-individualized HRTFs (Wightman & Kistler, 1989; Wenzel et al., 1993). The bottom panels show the frequency responses for elevations of −45°, 0° and +45° at +45° azimuth. Note that one can see how the center frequency of the notches in the magnitude spectra shift toward different frequencies as the elevation moves from −45 to +45° in the ipsilateral (right ear). Such frequency notches are thought to be a primary cue for elevation (Hebrank & Wright, 1974; Middlebrooks, 1992). Since these notch locations are highly dependent on specific pinna structures their particular frequency location may vary greatly between individuals. Thus, elevation perception is generally observed to be more accurate when listening to stimuli generated from individualized HRTFs.

It should be noted that the spatial cues provided by HRTFs, especially those derived from simple anechoic (free-field or echoless) environments, are not the only cues likely to be necessary to achieve veridical localization in a virtual display. Anechoic simulation is merely a first step, allowing a systematic study of the technological requirements and perceptual consequences of synthesizing spatial cues by using a less complex, and therefore more tractable, stimulus. The section on Distance and Environmental Context Perception will discuss the impact of environmental cues on localization and the perception of distance and immersion in more detail.

Neural Plasticity in Sound Localization

In natural environments, neural systems must be continuously updated to reflect changes in sensory inputs and behavioral goals. Recent studies of sound localization have shown that adaptation and learning involve multiple mechanisms that operate at different timescales and stages of processing, with other sensory and motor-related inputs playing a key role. In recent years, studies on sound localization have provided evidence to support the view that neural processing is rapidly updated to reflect changes in sensory conditions.

Spatial Cue Remapping

A popular approach to studying the plasticity of auditory spatial processing has been to reversibly alter the relation between stimulus location and the binaural cues available. This can be easily achieved by occluding one ear so that the acoustical input is attenuated and delayed, thereby changing the IID and ITD values corresponding to each direction in space. In the barn owl, this procedure leads to adaptation to the abnormal binaural cues, with neurons shifting their sensitivity to these cues in a way that compensates for the effects of monaural occlusion (Gold & E. I. Knudsen, 2000; E. I. Knudsen, P. F. Knudsen & Esterly, 1984). Recent work has shown that mammals also possess the ability to developmentally remap spatial position onto abnormal IIDs, which can be observed both behaviorally and in the responses of neurons in the primary auditory cortex (Keating, Dahmen & King, 2015). This capacity to accommodate altered spectral cues is not restricted to development and adult humans can learn to localize accurately using altered binaural (Bauer et al., 1966; Mendonça et al., 2013) or spectral localization cues (Hofman, Van Riswick & Van Opstal, 1998; Hofman & Van Opstal, 1998; Carlile & Blackman, 2014; Majdak, Walder & Laback, 2013). Interestingly, no aftereffect is seen in adult humans following adaptation to altered spatial cues, implying that different sets of spatial cues can be mapped onto the same location (Hofman et al., 1998; Hofman & Van Opstal, 1998; Carlile & Blackman, 2014; Majdak et al., 2013).

Spatial Cue Reweighting

In situations where some, but not all, of the spatial cues are altered, an alternative form of plasticity to cue remapping is to down-weight the spatial information provided by the altered cues and instead, to rely more on the cues that remain intact. In the specific case of monaural hearing loss, a number of studies have shown that sound localization behavior in mammals adapts both during development and adulthood by giving greater weight to the unchanged monaural spatial cues provided by the normal hearing ear (Agterberg et al., 2014; Kumpik, Kacelnik & King, 2010; Keating et al., 2013).

Importance of Behavioral Context

In addition to adapting to changes in the localization cues available, auditory spatial processing can be refined in situations where its behavioral importance is increased even if the acoustical input remains the same. Studies in humans have shown that training-induced improvement in spatial processing are specific to individual binaural cues, though cue specificity may be asymmetric, with one study showing that IID training generalizes to an ITD task, but not vice versa (Sand & Nilsson, 2014). Training in adulthood can even reverse the negative impact of abnormal developmental experience on sound localization accuracy and responses in the primary auditory cortex responses (Guo et al., 2012; Pan et al., 2011) and can improve sound localization performance in hearing-impaired populations (Firszt et al., 2015). Although training-dependent plasticity often takes place slowly, recent work indicates that feature-specific learning can occur rapidly in a task that involves spatial processing, which may reflect top-down biasing (Du et al., 2015).

Visual Influences on Auditory Spatial Plasticity

Sound sources are often visible as well as audible and the availability of visual information can improve the accuracy of sound localization estimates (Tabry, Zatorre & Voss, 2013) and even help to suppress echoes that are the consequence of listening in reverberant environments (Bishop, London & Miller, 2012). Binaural cue discrimination is also enhanced if subjects look toward the sound while keeping their head still (Maddox et al., 2014), adding evidence that eye position signals can modulate activity in the auditory system (Bulkin & Groh, 2012). Not surprisingly, auditory localization abilities can be altered if vision is impaired. The most commonly reported finding is that some blind individuals show superior auditory spatial perception relative to sighted control subjects (Hoover, Harris & Steeves, 2012; Lewald, 2013; Jiang, Stecker & Fine, 2014). Interestingly, as with adaptation to a unilateral hearing loss, more accurate sound localization in blind humans is associated with greater dependence on spectral cues (Voss et al., 2011). However, this superior use of spectral cues for localization in the horizontal plane appears to come at the cost of reduced ability to use these cues for localization in the vertical plane (Voss, Tabry & Zatorre, 2015).

Distance and Environmental Context Perception

Fundamental Cues to Sound Source Distance

The perceived distance of a sound source is mainly cued by the acoustic attributes of sound level and reverberation. Sound sources ordinarily reach a listener by not only an unobstructed wavefront (referred to as direct sound) but also by reflection off of and diffraction around nearby objects, such as the ground, walls or other surfaces (referred to as indirect sound, or reverberation). In fact, distance judgments involve a process of integrating multiple perceptual cues, including loudness (perceived level), timbre (primarily driven by spectral content), amplitude envelope (attack and decay), reverberation and cognitive familiarity.

The perceived distance of a sound source is tied to the perceived environment in which the sound source is heard; hence, the overall immersive experience of a sound event’s location relative to a listener is a multi-dimensional percept. We can model or identify the specific cause of reverberation in terms of the environmental context of a sound source: the surrounding physical surfaces that result in reverberation (either a room or outdoor environment). The environmental context typically consists of multiple sound sources, including background sounds (“noise”) that combine with a specific sound source. It provides an important cognitive cue if the environmental context is familiar to the user.

An important distinction is made between absolute or relative perception of distance of a sound source. Absolute distance perception refers to a listener’s accuracy in estimating the distance of a sound source from the listener themselves, upon an initial exposure to a new or familiar sound. Relative distance perception refers to judgments of the distance of one virtual source to another, and includes more strongly the benefits gained from listening to the source at different distances over time, perhaps within a particular environmental context. In most cases of spatial audio reproduction, one is typically more interested in relative distance judgments. Within a reasonable range, changing the overall volume level of a playback system still preserves the relative distance relationships between different virtual sound sources, an important advantage for audio production.

In the absence of other acoustic cues, the sound level of a sound source (and its interpretation as loudness) is the primary distance cue used by a listener. Coleman (1963, p. 302) stated in a review of cues for distance perception that, “It seems a truism to state that amplitude, or pressure, of the sound wave is a cue in auditory depth perception by virtue of attenuation of sound with distance.” From one perspective, auditory distance is learned from a lifetime of visual-aural observations, correlating the physical displacement of sound sources with corresponding increases or reductions in sound pressure level. This is likely the primary means we use for many everyday survival tasks, for instance, knowing when to step out of the way of an automobile coming from behind us. In isolation from other cues, which happens rarely in the real world, sound level probably plays a more important role as a cue to distance with unfamiliar sounds than with familiar sounds. Exposure to a particular sound source at different distances allows an integration of multiple cues over time for distance perception; but without this exposure, cues other than level (loudness) fall out of the equation.

The relationship between sound source distance and level changes at a listener can be predicted under anechoic conditions via inverse square law for sound intensity reduction with increasing distance. In the absence of significant reflections, an omnidirectional point sound source’s level will fall almost exactly 6 dB for each subsequent doubling of distance from a source. If the sound source was not omnidirectional but instead a line source, such as a freeway, then the level reduction is closer to a 3 dB per doubling of distance. This illustrates the importance of characterizing the sound power profile of a modeled sound source dimensionally.

A theoretical problem with the inverse square law as an effective cue to distance lies in the fact that perceptual scales of loudness are unaccounted for. Given that a perceptual scale is desired, when loudness is the only available cue for distance involved, a mapping where the relative estimation of doubled distance follows “half-loudness” rather than “half-level” may be more effective. Studies have shown that estimates of half-loudness to be equivalent to half of the auditory distance (Stevens & Guirao, 1962; Begault, 1991). Based on the sone scale of loudness, doubling of distance would require a 10 dB rather than a 6 dB reduction in level.

Level adjustments to create relative relationships between the loudness of different sounds are of course ubiquitous in the world of audio. A recording engineer, given the assignment to distribute a number of sound sources on different tracks of a multitrack recording to different apparent distances from a listener, would most probably accomplish the task intuitively by adjusting the volume of each track. Terminology taken from the visual world, such as “foreground” and “background”, are usually used in these contexts; most audio professionals don’t get more specific than this verbally, although in practice the distance-intensity relationships of a multitrack recording can be quite intricate.

In contrast to the study of physical acoustics or acoustical engineering, determining realistic relationships between the levels of different sound sources is frequently not desired in the art of sound design for film or music. A good sound designer must take into account both the emotional impact of a narrative and the constraints of a limited dynamic range, as opposed to representing realistically the sound levels of the real world. More often than not, the sound levels and corresponding auditory distance cues used in film sound express an artistic rather than realistic adjustment of speech, footsteps, gunshots, ambient sounds and the like. In music, recording engineers often use “close microphone” techniques in orchestral recordings to alter the balance of different instruments and vocalists, and in most popular music create completely synthetic distance relationships between sound sources that would never exist in the real world. Nevertheless, in order to be sensible to a listener there is a necessary reference to real-world conditions on which to base the boundaries within a specific artistic sound creation.

Familiarity and Cognitive Cues to Distance

Distance cues can be modified as a function of expectation or familiarity with the sound source, especially with speech (Coleman, 1962; Gardner, 1969). Even with non-spatial cues, one can tell something about the distance of different sounds and their context; for instance, you can experience distance listening with one ear, and you can sometimes tell in a telephone conversation where a person is calling from, based on the type of background noise. But the inclusion of 3-D sound techniques and spatial reverberation cues can greatly increase the immersiveness of a simulation, as well as the overall quality and the available nuance in representing different environmental contexts.

A good example of the role of familiarity is a comparison of the experience of listening to sounds just before going to sleep, in an unfamiliar versus a familiar environmental context. In the familiar environment, say an apartment in a city, you know the distance of the sound of the local bus passing by and of the ticking clock in the kitchen. Although the bus is louder than the clock, familiarity allows distance estimations that would be reversed if sound level were the only cue. But when for instance camping outdoors in an unfamiliar environment, the distance percepts of different unfamiliar animal noises are likely cued more by level.

Implementation of distance cues into a 3-D sound system requires an assessment of the cognitive associations for a given sound source. If the sound source is completely synthetic or unfamiliar, then a listener may need more time to familiarize themselves with the parametric changes in loudness and other cues that occur for different simulated distances. If the sound source is associated with a particular location from repeated listening experiences, the simulation of that distance will be easier than simulation of a distance that is unexpected or unfamiliar. Evidence exists that listeners are very good at distance estimates of talkers speaking at a particular level (Zahorik, 2002; Zahorik, Brungart & Bronkhorst, 2005). However, the manner of speaking can act as a cognitive cue to bias distance estimates. The average level of male speech at a distance of 1 meter ranges from 52 dB(A) for “casual” speech up to 89 dB(A) for a “shouted” voice (Pearsons, Bennett & Fidell, 1977), and there are spectral and phonetic differences resulting from different speaking levels that can act as cues. Whispering, caused by air pressure through the vocal tract without vibration of the vocal cords, can be as loud as speech, but is immediately associated with more intimate conversation at closer distances than normal speech.

Gardner (1969) conducted several studies for speech stimuli that illustrate the role of familiarity and expectation on estimated distance. In one experiment, categorical estimations were given by subjects of sound source positions at 0° azimuth, by choosing from numbered locations at 3, 10, 20 and 30 feet in an anechoic chamber. When loudspeaker playback of recorded normal speech was presented, the perceived distance was always a function of the sound pressure level at the listener instead of the actual location of the loudspeaker. But with a live person speaking inside the chamber, subjects based their estimates of distance on the manner of speaking rather than on the actual distance. Figure 1.10 shows an illustration of these results. Listeners overestimated the distance of shouting in reference to normal speech and underestimated the distance of whispering, although the opposite should have been true if intensity were the relevant cue.

Reverberation Cues

In reverberant environmental contexts, the ratio of direct to reverberant sound changes as a function of distance between a sound source and listener. Typically the inverse square law reduction in sound pressure only operates in the acoustic “near field” of a sound source where the level of a direct sound significantly exceeds that of the indirect sound. With increasing distance, the overall sound pressure of a sound source at a receiving point becomes increasingly made up of indirect as well as direct sounds. At a certain point, the sound source reaches a critical distance (also termed “reverberation distance” or “reverberation radius”), where the level of direct and reflected sound are the same. At locations at or beyond the critical distance, the overall level tends to be the same because the sound at the receiving point is made up principally of indirect sound. See Figure 1.11.

To effect a relatively crude (but effective) audio simulation of distance, it is possible to change the ratio of reverberant to direct sound directly by adjusting level controls on an audio mixer. This ratio is a measurement of the proportion of reflected-to-direct sound energy at a particular receiver location. As one moves away from a sound source in an enclosure, the level of the direct sound will decrease, while the reverberation level will remain constant. The interaction between reverberation and the direct sound level as a sound source varies in distance is too complex to allow a precise prediction of a distance percept as a function of the R/D ratio.

The reverberant-to-direct sound (R/D) ratio has been cited in many studies as a cue to distance, with varying degrees of significance attributed to it (Coleman, 1963; von Békésy, 1960; Sheeline, 1983; Mershon & King, 1975; Mershon & Bowers, 1979). Sheeline (1983, p. 71) found that reverberation was an important adjunct to intensity in the formation of distance percepts, concluding that “reverberation provides the ‘spatiality’ that allows listeners to move from the domain of loudness inferences to the domain of distance inferences.”

Von Békésy (1960, p. 303) observed that when he changed the R/D ratio, the loudness of the sound remained constant, but a sensation of changing distance occurred. He noted that, “though this alteration in the ratio between direct and reverberant sound can indeed be used to produce the perception of a moving sound image, this ratio is not the basis of auditory distance. This is true because in [an anechoic room] the sensation of distance is present, and, in fact, is even more distinct and of much greater extensiveness than elsewhere.” He also observed that the sound image’s width increased with increasing reverberation: “Along with this increase in distance there was an apparent increase in the size or vibrating surface of the sound source … for the direct sound field the source seemed to be extremely small and have great density, whereas the reverberant field had a more diffuse character.” This demonstrates the multidimensional nature of sound source “locatedness” as a function of both level and context. The apparent source width of a sound source is defined as its perceived extent or size of the sound source’s image, and is related to the concept of auditory spaciousness. Blauert (1997, p. 348) describes auditory spaciousness to mean that “… auditory events, in a characteristic way, are themselves perceived as being spread out in an extended region of space,” and cites low levels of varying interaural coherence over time (“temporal incoherence”) as a primary cause.

IHL

An often-reported sensation for headphone listening is that the sound image appears to exist entirely within or at the edge of the head, instead of externalized outside the listener. Inside-the-head locatedness (IHL) can be considered a type of “externalization failure” for correct distance simulation, particularly for binaural and 3D sound simulations over headphones (the effect happens rarely with loudspeaker reproduction). For example, when binaural cues are nearly identical in a 3D audio simulation, as with a source directly in front of the listener, the case is similar to diotic sound presentation, where the sound presented to both ears is the same or nearly so. This is an artificial sound condition compared to the experience of sound sources outside a listener, leading to a cognitive conclusion that the sound originates within or near the body, like self-generated speech.

However, it is obviously possible even listening with one earphone, such as to a baseball game with an old-fashioned AM radio, that the sound events are in fact externalized. The illusion of an “internalized” versus “externalized” sound image can in many cases be willfully switched in a manner analogous to the Necker cube or other similar illusions (von Békésy, 1960). One study of 3D audio simulation of speech stimuli showed that inclusion of reverberation or head-tracking into a simulation helped mitigate internalized sound (Begault, Wenzel & Anderson, 2001).

Figure 1.12 shows how HRTF-processed speech presented at a level corresponding to a distance of ~15 in. is underestimated and varies as a function of direction. This underestimation has also been observed with actual as opposed to virtual sound sources (Holt & Thurlow, 1969; Mershon & Bowers, 1979; Butler, Levy & Neff, 1980). One reason for this underestimation may be the absence of reverberation in the stimulus. Underestimation in general may also be related to the bounds of perceptual space, i.e., the auditory horizon. Note that the standard deviation bars of Figure 1.12 indicate a high amount of variability between subjects, for what was essentially one target distance; one might have expected a more stable distance estimate among individuals, based on the common familiarity of speech.

The goal of eliminating IHL effects arose in the 1970s with the desire to make improved binaural (dummy head) recordings. Many who heard these recordings were disturbed by the fact that the sound remained inside the head, as with lateralization. Ensuring that the sound was filtered by accurate replicas of the human pinnae and head was found to be an important consideration. Plenge (1974) had subjects compare recordings made with a single microphone to those made with a dummy head with artificial pinnae; the IHL that occurred with the single microphone disappeared with the dummy head micing arrangement. Laws (1973) and others determined that part of the reason for this had to do with non-linear distortions caused by various parts of the communication chain, and the use of free-field instead of diffuse-field equalized headphones. Durlach and Colburn (1978, p. 374) have mentioned that the externalization of a sound source is difficult to predict (or even describe) with precision, but “clearly, however, it increases as the stimulation approximates more closely stimulation that is natural.” The likely sources of these natural interaural attributes include the binaural HRTF, head movement and reverberation. Many researchers have discounted theories that IHL is a natural consequence of headphone listening (due to bone conduction or pressure on the head), simply because externalized sounds are heard through headphones in many instances.

Environmental Context Cues

The defining characteristics of an environmental context allow us to discriminate between the same source heard in different size enclosures or out-of-doors. Although the reverberation time and R/D ratio serve as cues, oftentimes the spatial distribution of early reflections over time from sounds heard within an environmental context can be just as important (Bronkhorst & Houtgast, 1999; Kendall & Martens, 1984). Such attributes can serve as cues for characterizing and identifying both the context of a sound source and its potential range of distances. The environment can also provide a characteristic background noise that can help define the range and context of sounds. Studies have also noted that listeners eventually adapt to reverberation in a room; compared to an initial exposure, both distance and azimuth estimates improve over time as the environmental context is “learned” (Shinn-Cunningham, 2000).

Different spatial-temporal patterns particularly from indoors can affect distance perception and image broadening, as well as the sensation of being “immersed” or surrounded by sound, a percept sometimes referred to “auditory spaciousness” (Beranek, 1992; Kendall, Martens & Wilde, 1990). By measuring the similarity of the reverberation over a specific time window at the two ears, a value for interaural cross-correlation can be obtained that is frequently cited in measures of immersion (Blauert & Cobben, 1978; Ando, 1985). Cross-correlation analysis indicates the degree of similarity of two time-domain waveforms over a given period of time. For example, the cross-correlation of a signal with a delayed version of itself will show a peak at the time lead or lag of the delay within an analysis window (Figure 1.13). In the case of interaural cross-correlation, the analysis window corresponds to lead or lag times corresponding to the maximum interaural time delay (typically 0.7 milliseconds); a “running” interaural cross-correlation refers to a succession of cross-correlation analysis windows. Perceptually, the magnitude of the overall differences between lead and lag within a running interaural cross-correlation corresponds to the percept of auditory spaciousness.

Relevant physical parameters that can cue a specific environmental context include the volume or size of an enclosure, the absorptiveness and diffusion of reflective surfaces, and the complexity of the shape of the enclosure. This effect occurs in “partially enclosed” environmental contexts as well, such as under an overhung surface on the outside of a building. The size (volume) of an environmental context is usually cued by the reverberation time and level. The absorptiveness and diffusion of the reflective surfaces will be frequency dependent, allowing for cognitive categorization and comparison on the basis of timbral modification and possibly on the basis of speech intelligibility. Finally, the complexity of the shape of the enclosure will shape the spatial distribution of reflections to the listener, particularly the early reflections. A considerable literature exists in the domain of concert hall acoustics (see, e.g., Beranek, 1992) that relates physical measures to percepts, while there is less research for other typical environmental contexts.

Late Reverberation

Late reverberation is informative perceptually as to the volume of a particular space occupied by a sound source. This is most noticeable when a sound source is “turned off”, since one can hear the time it takes for the echoes to decay into relative silence, and the shape of the amplitude decay envelope over time. During continuous speech or music, only the first 10–20 dB of decay will be heard, since energy is constantly being “injected” back into the system. The long-term “room frequency response”—i.e., the reverberation time as a function of frequency—can also be significant in the formulation of a percept of the environmental context. The relative decay times for different frequency regions is affected by both the volume of the enclosure and the relative acoustic absorption of materials within the environment. For example, although a tiled bathroom has a smaller volume than a typical living room, the minimal acoustic absorption of the tiles causes the reverberation to be relatively brighter and the reverberation time to be longer compared to a living room with a carpet, couches and other absorptive surfaces.

Concert hall studies since the beginning of the 20th century have emphasized reverberation time as a strong physical factor affecting subjective preference, with particular frequency ranges being important for “warmth” or “clarity”. Many enclosures in fact do not decay exponentially, although the use of reverberation time is often used as a standard approximation. Many small enclosures including automobiles do not have a proper “reverberation time”; their “sound” is a function of a complex early reflection field. Another characteristic of reverberation in complex spaces is the irregular and random nature of the fine structure of its decay, resulting in a “ragged” response that can make determination of a single reverberation time difficult. A smoother plot of the response can be obtained through the technique of reverse integration of the decay (Schroeder, 1965). Figure 1.14, top, shows the result of the “impulse response” of a room, obtained from a balloon pop (a starter pistol or analytic signal such as a swept sine wave can also be used). Figure 1.14, bottom, shows the result of plotting the impulse response on a decibel scale (10log10 of the squared impulse response values) with the thin line; note the raggedness of the response. To determine a reverberation time, a straight line must be fit to 15, 20 or 30 decibels of decay (T15, T20 or T30). By reverse integration of the impulse response, a smoother decay is obtained, shown by the thick line, making the straight line fit more apparent. The result of reverse integration is equivalent to the average of multiple impulse responses.

An interesting phenomenon is the reverberation that occurs from the interaction of multiple enclosures that are connected to one another via hallways, atriums or other openings. This is referred to in acoustics as a “coupled space” phenomenon.

While reverberation time can be an adequate descriptor for a single enclosed space, the coupling of two volumes with different reverberation times can result in multiple decay slopes (Figure 1.15). This is most audible in large complex spaces that have adjoining volumes of different sizes. In some such spaces, such as a cathedral, the late reverberation is perceptibly modulated both in amplitude and spatial location (Woszczyk, Begault & Higbie, 2014).

For relatively distant sources, particularly out-of-doors, the influence of atmospheric conditions, molecular absorption of the air, and wavefront curvature can change the spectral content of a virtual sound. From a psychoacoustic standpoint, these cues are relatively weak, compared to loudness, familiarity and reverberation cues. Since the sound sources used in a spatial audio simulation are more than likely dynamically changing in location, their spectra are constantly changing as well, making it difficult to establish any type of “spectral reference” for perceived distance. “As one would expect on the basis of purely physical considerations, if one eliminates the cues arising from changes in loudness and changes in reverberant structure, the binaural system is exceedingly poor at determining the distance of a sound source” (Durlach & Colburn, 1978, p. 375).

Conclusion

A listener’s perception of immersion is influenced by a complex set of interactions between humans and acoustic waves, beginning with the peripheral auditory system and ending with the information processing aspects of cognition. Each of these interactions contributes to overall judgments of the spatial attributes of these sensations, including the sense of acoustic immersion and the location of specific sound sources. Virtual simulations of spatial audio are best realized when an understanding of the relevant psychoacoustic and acoustical cues are implemented into signal processing design. This chapter has given an overview of the role of auditory processes and psychoacoustic data relevant to the execution of successful spatial audio techniques for providing listeners with many types of immersive experiences.

A recording engineer aims for conveyance of spatial imagery, and will be challenged by an ever-increasing number of distribution formats and hardware: headphones or loudspeakers, traditional two-channel systems, 22.2–9.1–7.1 and other systems, or wave field synthesis, streaming audio or high-quality archival compression and so on. If the creative imagination of the engineer is the source of spatial imagery, and the listeners are the receivers of this imagery, then we can judge the quality and the challenges presented by a particular system by how successfully the intended creative imagery is conveyed. All of the perceptual factors discussed in this chapter will contribute to the resulting listener experience in a virtual auditory environment using any of the methods of sound reproduction that will be discussed in the rest of the book. Further, they will pose different challenges to different fields of expertise and practice, including the audio engineer, sound designer and digital signal processing effects designer.

Notes

1For a specified signal, the Hearing Level (HL) refers to the amount in decibels by which the hearing threshold for a listener, for either one or two ears, exceeds a specified reference equivalent threshold level. The reference is based on a large number of otologically normal individuals of both genders ranging in age from 18 to 25 years. (ANSI/ASA S1.1–1994, S3.6–2010)

2Alternatively, sometimes azimuth is represented in a 360° coordinate system with 0° in front, 90° to the right, 180° in back and 270° to the left.

References

ANSI/ASA S3.6–2010 “Specification for Audiometers”.

ANSI/ASA S1.1–1994 “Acoustical Terminology”.

Agterberg, M. J., Hol, M. K., Van Wanrooij, M. M., Van Opstal, A. J., & Snik, A. F. (2014). Single-sided deafness and directional hearing: Contribution of spectral cues and high frequency hearing loss in the hearing ear. Frontiers in Neuroscience, 8, 188.

Ando, Y. (1985). Concert Hall Acoustics. Berlin: Springer-Verlag.

Batteau, D. W. (1967). The role of the pinna in human localization. Proceedings of the Royal Society of London B: Biological Sciences, 168(1011), 158–180.

Bauer, R. W., Matuzsa, J. L., Blackmer, R. F., & Glucksberg, S. (1966). Noise localization after unilateral attenuation. Journal of the Acoustical Society of America, 40(2), 441–444.

Begault, D. R. (1991). Preferred sound intensity increase for sensation of half distance. Perceptual and Motor Skills, 72, 1019–1029.

Begault, D. R., & Wenzel, E. M. (1993). Headphone localization of speech. Human Factors, 35, 361–376.

Begault, D. R., Wenzel, E. M., & Anderson, M. R. (2001). Direct comparison of the impact of head tracking, reverberation and individualized head-related transfer functions on the spatial perception of a virtual speech source. Journal of the Audio Engineering Society, 49, 904–916.

Békésy, G. von. (1960). Experiments in Hearing. New York: McGraw-Hill.

Beranek, L. L. (1992). Concert hall acoustics. Journal of the Acoustical Society of America, 92, 1–39.

Bergeijk, W. A. van. (1962). Variation on a theme of Békésy: A model of binaural interaction. Journal of the Acoustical Society of America, 34(9B), 1431–1437.

Bernstein, L. R. (2001). Auditory processing of interaural timing information: New insights. Journal of Neuroscience Research, 66(6), 1035–1046.

Bernstein, L. R., & Trahiotis, C. (2010). Accounting quantitatively for sensitivity to envelope based interaural temporal disparities at high frequencies. Journal of the Acoustical Society of America, 128, 1224–1234.

Bishop, C. W., London, S., & Miller, L. M. (2012). Neural time course of visually enhanced echo suppression. Journal of Neurophysiology, 108(7), 1869–1883.

Blauert, J. (1997). Spatial Hearing: The Psychophysics of Human Sound Localization, Rev. ed. (J. Allen, Trans.). Cambridge, MA: MIT Press.

Blauert, J., & Cobben, W. (1978). Some consideration of binaural cross correlation analysis. Acustica, 39, 96–104.

Brigham, E. (1974). The Fast Fourier Transform. New Jersey: Englewood Cliffs.

Bronkhorst, A., & Houtgast, T. (1999). Auditory distance perception in rooms. Nature, 397, 517–520.

Brughera, A., Dunai, L., & Hartmann, W. M. (2013). Human interaural time difference thresholds for sine tones: The high-frequency limit. Journal of the Acoustical Society of America, 133, 2839–2855.

Bulkin, D. A., & Groh, J. M. (2012). Distribution of eye position information in the monkey inferior colliculus. Journal of Neurophysiology, 107(3), 785–795.

Butler, R. A., Levy, E. T., & Neff, W. D. (1980). Apparent distance of sounds recorded in echoic and anechoic chambers. Journal of Experimental Psychology: Human Perception and Performance, 6(4), 745.

Carlile, S., & Blackman, T. (2014). Relearning auditory spectral cues for locations inside and outside the visual field. Journal of the Association for Research in Otolaryngology, 15(2), 249–263.

Chase, S. M., & Young, E. D. (2006). Spike-timing codes enhance the representation of multiple simultaneous sound-localization cues in the inferior colliculus. Journal of Neuroscience, 26(15), 3889–3898.

Coleman, P. D. (1962). Failure to localize the source distance of an unfamiliar sound. Journal of the Acoustical Society of America, 34(3), 345–346.

Coleman, P. D. (1963). An analysis of cues to auditory depth perception in free space. Psychological Bulletin, 60(3), 302–315.

Culling, J. F., Hawley, M. L., & Litovsky, R. Y. (2004). The role of head-induced interaural time and level differences in the speech reception threshold for multiple interfering sound sources. Journal of the Acoustical Society of America, 116(2), 1057–1065.

Douglas, K. M., & Bilkey, D. K. (2007). Amusia is associated with deficits in spatial processing. Nature Neuroscience, 10(7), 915–921.

Du, Y., He, Y., Arnott, S. R., Ross, B., Wu, X., Li, L., & Alain, C. (2015). Rapid tuning of auditory “what” and “where” pathways by training. Cerebral Cortex, 25(2), 496–506.

Durlach, N. I., & Colburn, H. S. (1978). Binaural phenomena. In E. C. Carterette & M. P. Friedman (Eds.), Handbook of Perception (pp. 365–466). New York: Academic Press.

Feddersen, W. E., Sandel, T. T., Teas, D. C., & Jeffress, L. A. (1957). Localization of high frequency tones. Journal of the Acoustical Society of America, 29(9), 988–991.

Firszt, J. B., Reeder, R. M., Dwyer, N. Y., Burton, H., & Holden, L. K. (2015). Localization training results in individuals with unilateral severe to profound hearing loss. Hearing Research, 319, 48–55.

Fisher, H. G., & Freedman, S. J. (1968). The role of the pinna in auditory localization. Journal of Auditory Research, 8(1), 15–26.

Foster, S. (1986). Impulse response measurement using Golay codes. IEEE 1986 Conference on Acoustics, Speech and Signal Processing, 2, 929–932. New York: IEEE.

Fuster, J. M. (2009). Cortex and memory: Emergence of a new paradigm. Journal of Cognitive Neuroscience, 21(11), 2047–2072.

Gardner, M. B. (1969). Distance estimation of 0 degree or apparent 0 degree‑oriented speech signals in anechoic space. Journal of the Acoustical Society of America, 45, 47–53.

Gardner, M. B., & Gardner, R. S. (1973). Problem of localization in the median plane: Effect of pinnae cavity occlusion. Journal of the Acoustical Society of America, 53, 400–408.

Gifford III, G. W., & Cohen, Y. E. (2005). Spatial and non-spatial auditory processing in the lateral intraparietal area. Experimental Brain Research, 162(4), 509–512.

Gold, J. I., & Knudsen, E. I. (2000). Abnormal auditory experience induces frequency-specific adjustments in unit tuning for binaural localization cues in the optic tectum of juvenile owls. Journal of Neuroscience, 20(2), 862–877.

Gulick, W. L., Gescheider, G. A., & Frisina, R. D. (1989). Hearing: Physiological Acoustics, Neural Coding, and Psychoacoustics. New York: Oxford University Press, Inc.

Guo, F., Zhang, J., Zhu, X., Cai, R., Zhou, X., & Sun, X. (2012). Auditory discrimination training rescues developmentally degraded directional selectivity and restores mature expression of GABA A and AMPA receptor subunits in rat auditory cortex. Behavioural Brain Research, 229(2), 301–307.

Hebrank, J., & Wright, D. (1974). Spectral cues used in the localization of sound sources on the median plane. Journal of the Acoustical Society of America, 56, 1829–1834.

Henning, G. B. (1974). Detectability of interaural delay in high-frequency complex waveforms. Journal of the Acoustical Society of America, 55(1), 84–90.

Henning, G. B. (1980). Some observations on the lateralization of complex waveforms. Journal of the Acoustical Society of America, 68, 446–454.

Hofman, P. M., & Van Opstal, A. J. (1998). Spectro-temporal factors in two-dimensional human sound localization. Journal of the Acoustical Society of America, 103(5), 2634–2648.

Hofman, P., & Van Opstal, A. (2003). Binaural weighting of pinna cues in human sound localization. Experimental Brain Research, 148(4), 458–470.

Hofman, P. M., Van Riswick, J. G. A., & Van Opstal, A. J. (1998). Relearning sound localization with new ears. Nature Neuroscience, 1(5), 417–421.

Holt, R. E., & Thurlow, W. R. (1969). Subject orientation and judgement of distance of a sound source. Journal of the Acoustical Society of America, 46, 1584–1585.

Hoover, A. E., Harris, L. R., & Steeves, J. K. (2012). Sensory compensation in sound localization in people with one eye. Experimental Brain Research, 216(4), 565–574.

Jeffress, L. A. (1948). A place theory of sound localization. Journal of Comparative and Physiological Psychology, 41, 35–39.

Jiang, F., Stecker, G. C., & Fine, I. (2014). Auditory motion processing after early blindness. Journal of Vision, 14(13), 4.

Keating, P., Dahmen, J. C., & King, A. J. (2015). Complementary adaptive processes contribute to the developmental plasticity of spatial hearing. Nature Neuroscience, 18(2), 185–187.

Keating, P., & King, A. J. (2013). Developmental plasticity of spatial hearing following asymmetric hearing loss: Context-dependent cue integration and its clinical implications. Frontiers in Systems Neuroscience, 7, doi: 10.3389/fnsys.2013.00123.

Kendall, G. S., & Martens, W. L. (1984). Simulating the cues of spatial hearing in natural environments. Proceedings of the 1984 International Computer Music Conference. San Francisco: International Computer Music Association.

Kendall, G., Martens, W. L., & Wilde, M. D. (1990). A spatial sound processor for loudspeaker and headphone reproduction. Proceedings of the AES 8th International Conference. New York: Audio Engineering Society.

King, A. J. (2005). Multisensory integration: Strategies for synchronization. Current Biology, 15(9), 339–341.

King, A. J., & Middlebrooks, J. C. (2011). Cortical representation of auditory space. In J. A. Winer & C. E. Schreiner (Eds.), The Auditory Cortex (pp. 329–341). New York: Springer.

King, A. J., & Palmer, A. R. (1983). Cells responsive to free-field auditory stimuli in guinea-pig superior colliculus: Distribution and response properties. Journal of Physiology, 342(1), 361–381.

Knudsen, E. I., Knudsen, P. F., & Esterly, S. D. (1984). A critical period for the recovery of sound localization accuracy following monaural occlusion in the barn owl. Journal of Neuroscience, 4(4), 1012–1020.

Kuhn, G. F. (1977). Model for the interaural time differences in the azimuthal plane. Journal of the Acoustical Society of America, 62(1), 157–167.

Kumpik, D. P., Kacelnik, O., & King, A. J. (2010). Adaptive reweighting of auditory localization cues in response to chronic unilateral earplugging in humans. Journal of Neuroscience, 30(14), 4883–4894.

Larsen, E., Iyer, N., Lansing, C. R., & Feng, A. S. (2008). On the minimum audible difference in direct-to-reverberant energy ratio. Journal of the Acoustical Society of America, 124(1), 450–461.

Laws, P. (1973). Auditory distance perception and the problem of “in-head localization” of sound images [Translation of “Entfernungshören und das Problem der Im-Kopf-Lokalisiertheit von Hörereignissen.” Acustica, 29, 243–259]. NASA Technical Translation TT—20833.

Lewald, J. (2013). Exceptional ability of blind humans to hear sound motion: Implications for the emergence of auditory space. Neuropsychologia, 51(1), 181–186.

Lewald, J., Riederer, K. A., Lentz, T., & Meister, I. G. (2008). Processing of sound location in human cortex. European Journal of Neuroscience, 27(5), 1261–1270.

Lopez-Poveda, E., Fay, R. R., & Popper, A. N. (2010). Computational Models of the Auditory System. New York: Springer Verlag.

Lord Rayleigh (Strutt, J. W.) (1907). XII: On our perception of sound direction. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 13(74), 214–232.

Maddox, R. K., Pospisil, D. A., Stecker, G. C., & Lee, A. K. (2014). Directing eye gaze enhances auditory spatial cue discrimination. Current Biology, 24(7), 748–752.

Magezi, D. A., & Krumbholz, K. (2010). Evidence for opponent-channel coding of interaural time differences in human auditory cortex. Journal of Neurophysiology, 104(4), 1997–2007.

Majdak, P., Walder, T., & Laback, B. (2013). Effect of long-term training on sound localization performance with spectrally warped and band-limited head-related transfer functions. Journal of the Acoustical Society of America, 134(3), 2148–2159.

Meddis, R., & Lopez-Poveda, E. A. (2010). Auditory periphery: From pinna to auditory nerve. In Meddis et al. (Eds.), Computational Models of the Auditory System (pp. 7–38). New York: Springer.

Mendonça, C., Campos, G., Dias, P., & Santos, J. A. (2013). Learning auditory space: Generalization and long-term effects. PloS one, 8(10), e77900.

Mershon, D. H., & Bowers, J. N. (1979). Absolute and relative cues for the auditory perception of egocentric distance. Perception, 8(3), 311–322.

Mershon, D. H., & King, L. E. (1975). Intensity and reverberation as factors in the auditory perception of egocentric distance. Perception & Psychophysics, 18(6), 409–415.

Middlebrooks, J. C. (1992). Narrow-band sound localization related to external ear acoustics. Journal of the Acoustical Society of America, 92, 2607–2624.

Middlebrooks, J. C., & Green, D. M. (1991). Sound localization by human listeners. Annual Review of Psychology, 42(1), 135–159.

Miller, J. D., Godfroy-Cooper, M., & Wenzel, E. M. (2014). Using published HRTFS with Slab3D: Metric-based database selection and phenomena observed. Proceedings of the International Conference on Auditory Display, New York, June 2014.

Mills, A. W. (1958). On the minimum audible angle. Journal of the Acoustical Society of America, 30, 237.

Mills, A. W. (1972). Auditory localization (Binaural acoustic field sampling, head movement and echo effect in auditory localization of sound sources position, distance and orientation). Foundations of Modern Auditory Theory, 2, 303–348.

Moerel, M., De Martino, F., Santoro, R., Ugurbil, K., Goebel, R., Yacoub, E., & Formisano, E. (2013). Processing of natural sounds: Characterization of multipeak spectral tuning in human auditory cortex. Journal of Neuroscience, 33(29), 11888–11898.

Moore, B. C. J., Oldfield, S. R., & Dooley, G. (1989). Detection and discrimination of spectral peaks and notches at 1 and 8 kHz. Journal of the Acoustical Society of America, 85, 820–836.

Musicant, A. D., & Butler, R. A. (1985). Influence of monaural spectral cues on binaural localization. Journal of the Acoustical Society of America, 77(1), 202–208.

Oldfield, S. R., & Parker, S. P. (1984a). Acuity of sound localization: A topography of auditory space: I: Normal hearing conditions. Perception, 13, 581–600.

Oldfield, S. R., & Parker, S. P. (1984b). Acuity of sound localization: A topography of auditory space: II: Pinna cues absent. Perception, 13, 601–617.

Pan, Y., Zhang, J., Cai, R., Zhou, X., & Sun, X. (2011). Developmentally degraded directional selectivity of the auditory cortex can be restored by auditory discrimination training in adults. Behavioural Brain Research, 225(2), 596–602.

Par, S. van de, & Kohlrausch, A. (1997). A new approach to comparing binaural masking level differences at low and high frequencies. Journal of the Acoustical Society of America, 101(3), 1671–1680.

Patterson, R. D., Nimmo-Smith, I., Holdsworth, J., & Rice, P. (1987). An efficient auditory filterbank based on the gammatone function. A Meeting of the IOC Speech Group on Auditory Modelling at RSRE, 2(7). Retrieved from http://www.pdn.cam.ac.uk/other-pages/cnbh/files/publications/SVOSAnnexB1988.pdf.

Pearsons, K. S., Bennett, R. L., & Fidell, S. (1977). Speech Levels in Various Noise Environments. Office of Health and Ecological Effects, Office of Research and Development, US EPA.

Perrott, D. R., & Saberi, K. (1990). Minimum audible angle thresholds for sources varying in both elevation and azimuth. The Journal of the Acoustical Society of America, 87(4), 1728–1731.

Perrott, D. R., & Tucker, J. (1988). Minimum audible movement angle as a function of signal frequency and the velocity of the source. Journal of the Acoustical Society of America, 83, 1522.

Plakke, B., & Romanski, L. M. (2014). Auditory connections and functions of prefrontal cortex. Frontiers in Neuroscience, 8, 199. Retrieved from http://doi.org/10.3389/fnins.2014.00199

Plenge, G. (1974). On the differences between localization and lateralization. Journal of the Acoustical Society of America, 56, 944–951.

Rauschecker, J. P., & Tian, B. (2000). Mechanisms and streams for processing of “what” and “where” in auditory cortex. Proceedings of the National Academy of Sciences, 97(22), 11800–11806.

Rauschecker, J. P., Tian, B., & Hauser, M. (1995). Processing of complex sounds in the macaque nonprimary auditory cortex. Science, 268(5207), 111–114.

Richardson, G. P., Lukashkin, A. N., & Russell, I. J. (2008). The tectorial membrane: One slice of a complex cochlear sandwich. Current Opinion in Otolaryngology & Head and Neck Surgery, 16(5), 458.

Romanski, L. M., & Goldman-Rakic, P. S. (2002). An auditory domain in primate prefrontal cortex. Nature Neuroscience, 5(1), 15–16.

Saenz, M., & Langers, D. R. (2014). Tonotopic mapping of human auditory cortex. Hearing Research, 307, 42–52.

Salminen, N. H., Tiitinen, H., Yrttiaho, S., & May, P. J. (2010). The neural code for interaural time difference in human auditory cortex. Journal of the Acoustical Society of America, 127(2), 60–65.

Sand, A., & Nilsson, M. E. (2014). Asymmetric transfer of sound localization learning between indistinguishable interaural cues. Experimental Brain Research, 232(6), 1707–1716.

Schroeder, M. R. (1965). New method of measuring reverberation time. Journal of the Acoustical Society of America, 37, 409–412.

Shaw, E. A. G. (1974). The external ear. In W. D. Keidel & W. D. Neff (eds.), Handbook of Sensory Physiology, Vol. 5/1, Auditory System (pp. 455–490). New York: SpringerVerlag.

Sheeline, C. W. (1983). An Investigation of the Effects of Direct and Reverberant Signal Interaction on Auditory Distance Perception, Doctoral dissertation, Stanford University.

Shinn-Cunningham, B. (2000). Learning reverberation: Considerations for spatial auditory displays. Proceedings of the International Conference on Auditory Display, Atlanta, Georgia USA, April 2000, 126–134.

Shinn-Cunningham, B. G., Kopco, N., & Martin, T. J. (2005). Localizing nearby sound sources in a classroom: Binaural room impulse responses. Journal of the Acoustical Society of America, 117(5), 3100–3115.

Shub, D. E., Durlach, N. I., & Colburn, H. S. (2008). Monaural level discrimination under dichotic conditions. Journal of the Acoustical Society of America, 123(6), 4421–4433.

Stevens, S. S., & Guirao, M. (1962). Loudness, reciprocality, and partition scales. Journal of the Acoustical Society of America, 34, 1466–1471.

Tabry, V., Zatorre, R. J., & Voss, P. (2013). The influence of vision on sound localization abilities in both the horizontal and vertical planes. Frontiers in Psychology, 4, 932. doi: 10.3389/fpsyg.2013.00932

Thurlow, W. R., & Runge, P. S. (1967). Effect of induced head movements on localization of direction of sounds. Journal of the Acoustical Society of America, 42, 480–488.

Voss, P., Lepore, F., Gougoux, F., & Zatorre, R. J. (2011). Relevance of spectral cues for auditory spatial processing in the occipital cortex of the blind. Frontiers in Psychology, 2(48). doi: 10.3389/fpsyg.2011.00048

Voss, P., Tabry, V., & Zatorre, R. J. (2015). Trade-off in the sound localization abilities of early blind individuals between the horizontal and vertical planes. Journal of Neuroscience, 35(15), 6051–6056.

Wallach, H. (1939). On sound localization. Journal of the Acoustical Society of America, 10(4), 270–274.

Wallach, H. (1940). The role of head movements and vestibular and visual cues in sound localization. Journal of Experimental Psychology, 27, 339–368.

Wenzel, E. M. (1991). Three-dimensional Virtual Acoustic Displays. NASA-Ames Research Center, NASA Technical Memorandum 103835.

Wenzel, E. M., Arruda, M., Kistler, D. J., & Wightman, F. L. (1993). Localization using nonindividualized head-related transfer functions. Journal of the Acoustical Society of America, 94, 111.

Wenzel, E. M., Miller, J. D., & Abel, J. S. (2000). Sound lab: A real-time, software-based system for the study of spatial hearing. Proceedings of the 108th Audio Engineering Society Convention. New York: Audio Engineering Society.

Wightman, F. L., & Kistler, D. J. (1989). Headphone simulation of free-field listening: II: Psychophysical validation. Journal of the Acoustical Society of America, 85, 868–878.

Wightman, F. L., & Kistler, D. J. (1997). Monaural sound localization revisited. Journal of the Acoustical Society of America, 101(2), 1050–1063.

Wightman, F. L., & Kistler, D. J. (1999). Resolution of front-back ambiguity in spatial hearing by listener and source movement. Journal of the Acoustical Society of America, 105, 2841–2853.

Wilson, R. H., & Margolis, R. H. (1999). Acoustic-reflex measurements. In F. E. Musiek & F. E. Rintelmann (Eds.), Contemporary Perspectives in Hearing Assessment, 1, 131. Boston, MA: Allyn and Bacon.

Woodworth, R. S. (1938). Experimental Psychology. New York: Holt.

Woodworth, R. S., & Schlosberg, H. (1954). Experimental Psychology. New York: Holt.

Woszczyk, W., Begault, D. R., & Higbie, A. G. (2014). Comparison and contrast of reverberation measurements in Grace Cathedral San Francisco. Audio Engineering Society 137th Convention, ebrief 178.

Yin, T. C. (2002). Neural mechanisms of encoding binaural localization cues in the auditory brainstem. In D. Oertel, R. R. Fay & A. N. Popper (Eds.), Integrative Functions in the Mammalian Auditory Pathway (pp. 99–159). New York: Springer.

Yost, W. A., & Dye, R. H. (1997). Fundamentals of directional hearing. Seminars in Hearing, 18(4), 321–344. New York: Thieme Medical Publishers.

Zahorik, P. (2002). Assessing auditory distance perception using virtual acoustics. Journal of the Acoustical Society of America, 111, 1832–1846.

Zahorik, P., Brungart, D. S., & Bronkhorst, A. W. (2005). Auditory distance perception in humans: A summary of past and present research. Acta Acustica United with Acustica, 91, 409‑420.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset