Chapter 16

Contribution of Noise Reduction Algorithms

Perception Versus Localization Simulation in the Case of Binaural Cochlear Implant (BCI) Coding

Arnaud Jeanvoine1,2,4; Dan Gnansia4; Eric Truy1,2,5; Christian Berger-Vachon1,2,3    1 INSERM U1028, Lyon Neuroscience Research Center, DYCOG Team/ PACS Group (Speech, Audiology, Communication Health), Lyon, France
2 CNRS UMR5292, Lyon Neuroscience Research Center, DYCOG Team/ PACS Group (Speech, Audiology, Communication Health), Lyon, France
3 University Lyon 1, Lyon, France
4 Neurelec, Vallauris, France
5 Audiology and ORL Department, Edouard Herriot Hospital, Lyon, France

Abstract

Communication and warning are two basic tasks devoted to the auditory system and it is worth to see them together when assistive techniques are considered for hearing rehabilitation. French phonemes recognition in noisy conditions and acoustic source localization, in the case of a cochlear implant (CI) coding simulation, are compared in this chapter.

Three binaural noise reduction systems have been considered: the Beamformer algorithm, the Doerbecker's processing combined with the Ephraim & Malah's noise estimator or with the Scalart's noise reduction strategy. This study has been conducted with twenty normally hearing subjects.

Results show that the Beamformer algorithm and the Doerbecker's processing improved the phoneme recognition scores. The best results in recognition were obtained using the Beamformer algorithm. On the contrary, the beamformer algorithm and the Doerbecker's processing lowered the source localization. A small reinjection of the signal (20%) was profitable to the Beamformer algorithm; this improvement was not seen with the Doerbecker's processing.

Keywords

Binaural signal processing

binaural cochlear implant (BCI)

vocoder simulation

speech recognition

localization

Acknowledgments

This work was supported by a French CIFRE contract with the Neurelec society and the CNRS. The Phonak company supplied the KEMAR head. We are grateful to the listeners who participated in this experiment. We thank Professors Lionel Collet, Hung Thai Van and Olivier Bertrand, as well as Dr. Evelyne Veuillet for providing the facilities to conduct this experiment. We also appreciated the useful discussions with the PACS group members. Finally, our gratitude goes to the anonymous referees; their comments greatly helped us improve the manuscript.

1 Introduction

Hearing in humans has two important purposes: communication (speech recognition) and warning (acoustic source localization). The influence of assistive technology (deafness rehabilitation) on these two basic functions is worth studying. This feature can be seen also with cochlear implants (CIs). Nowadays, CIs are widely used for profound deafness rehabilitation, but speech perception in noisy environments still remains a challenge for hearing-impaired subjects and for persons using a CI (implantees) (Dunn et al., 2010); a lot of work is presently done in this field of hearing rehabilitation (Blauert, 2013; van Dijk et al., 2012; Yousefian and Loizou, 2013).

Binaural hearing uses two main acoustic cues: interaural time difference (ITD) and interaural level difference (ILD) (Doerbecker and Ernst, 1996; Francart et al., 2011):

 ITD is the delay between both ears. It is efficient for low frequencies (below 850 Hz). It is due to the envelope of the signal reaching the two ears. It can be reminded that a sound coming from the side at 90° has an ITD of 0.6 ms. When the source is situated in the front (azimuth 0°), the so-called front target, the ITD is 0 ms.

 ILD is related to the intensity reaching the two ears. The signal is more or less attenuated by the head shadow. This effect is mostly perceptible with high frequencies (above 3 kHz). ILD is 0 for the front target.

In the case of binaural cochlear implants (BCIs), two CIs are used, and only one processor drives them. The ITD can be employed in a coordinated process (van Hoesel et al., 2009; Lawson et al., 1998).

Several noise reduction algorithms favor the signal coming from the front (Van den Bogaert et al., 2008, 2009). The aim of noise reduction algorithms is to improve speech intelligibility, but localization should be considered as well; when an algorithm improves speech perception (for instance, by focusing on a source situated in the front), the localization of other sound sources may be affected. Furthermore, some reinjection of the input signal has been suggested by Van den Bogaert in the case of hearing aids (Van den Bogaert et al., 2008), in order to restore some source localization of the sound source; this effect will be seen in the current work. This strategy is worthy in the case of CI coding (Loizou, 2006).

Several binaural algorithms are classically used for noise reduction, such as the Beamformer, which favors a direction through delay (Bouchard et al., 2009; Kompis and Dillier, 2001). Spectral subtraction, based on the estimation of the noise, followed by the attenuation of the band spectrum containing the noise by widely used Wiener filtering, may improve the signal-to-noise ratio (SNR) (Van den Bogaert et al., 2009; Kallel et al., 2012; Yang and Fu, 2005).

In the case of CIs, an additional processing is needed to adapt the acoustic signal to the CI electric stimulation. In the current study, a vocoder simulation was taken to simulate CI coding (Loizou, 2006). Cochlear implantees cannot use some localization cues, such as the ITD (Dorman and Loizou, 1997; Lawson et al., 1998), which is lost. In our study, subjects with normal hearing were looked at to estimate the influence of the noise reduction algorithms after CI coding. Ultimately, the results will have to be seen with implantees when enough subjects wearing the BCI are available, leading to a homogeneous population. Coding of the signal, according to CI processing, with a homogeneous population of normal hearing listeners is worthy to be considered (Dorman and Loizou, 1997; Kerber and Seeber, 2012).

The goal of the present study is to evaluate the influence of the noise-reduction systems suggested by Matthias Döerbecker in the localization of a sound source, in a BCI context, compared to the beamformer algorithm. Doerbecker’s processing is classically used in normal hearing aids. A sine vocoder has been considered to simulate BCI speech processing.

The technical aspects are indicated in section 2. Results are then presented and discussed in sections 3 and 4. Finally, salient features are outlined in the conclusion of the chapter.

2 Materials and Methods

2.1 Signal processing

2.1.1 Overall organization

The synoptic representation of the acquisition of the signal is represented in Figure 16.1. The speech signal was emitted by the front loudspeaker (0°), and the noise was presented from a loudspeaker (LS) array. The LSs were situated at angles θ ranging from − 90° to 90°. The acoustic signal is captured by the microphones (m1 and m2).

f16-01-9780128025086
Figure 16.1 Synoptic of the experiment; the algorithm used is the Beamformer or either of Doerbecker’s strategies.

In addition, a percentage (α) of the input signal was reinjected after the noise reduction algorithms (Van den Bogaert et al., 2008). The values of α were fixed to 0.0, 0.2, and 0.4 (0%, 20% and 40%).

The basic reinjection formula used in this work was

st=1α.st+α.et,

si1_e  (16.1)

where e(t) is the input signal, s(t) is the signal leaving the noise reduction processor, s′(t) is the signal output, (entering the vocoder), α is the reinjection (0%, 20%, 40%), and s″(t) is the output signal presented to the listeners.

2.1.2 Beamformer

The schematic representation of the Beamformer is presented on Figure 16.2.

f16-02-9780128025086
Figure 16.2 Synoptic representation of the beamformer algorithm

The signal emitted by the sound source is captured by two microphones, m1 and m2 (right and left), situated on the ears of a Knowles Electronics Manikin for Acoustical Research (KEMAR) head. The classical Beamformer (BF) response is a cardioid centered on the front loudspeaker, LS3 (Bouchard et al., 2009). The BF algorithm is presented next, and it favors the front target (0° azimuth); the experiment was in the horizontal plane.

The input signal reaching the two ears e(t) can be represented (complex plane) in the case of a sine wave by

et=A.ejωt-τ=A.ejωt-xc=A.ejωt-2π.xcT=A.ejωt_2π.xλ=A.ejωt-Φ

si2_e  (16.2)

where A is the signal magnitude, x is the distance between the source and the microphones (x1 and x2), c is the sound velocity in air (340 m/s), λ is the wavelength, τ is the propagation time between the source and the microphones, d is the distance between the two ears (20 cm), Φ is the phase delay between the source and the microphones, and θ is the orientation angle of the sound source.

The phase difference between the signals reaching the two ears is

ΔΦ=2πλx1x22π.dλsinθ

si3_e

The beamformer algorithm (based on the phase information) is mainly helpful when

ΔΦ=2π.dλπ,

si4_e

leading to fc ≤ 850 Hz (850 Hz is obtained for θ = π/2). The maximum delay between the two ears represents about 10 samples at the sampling frequency fs = 16 kHz.

In the Beamformer strategy, the low-frequency part of the signal (below 850 Hz) is processed and the signals coming from the two microphones are added. Then the high-frequency components of the signal are restored for further treatment.

2.1.3 Doerbecker’s processing

Doerbecker’s processing is a noise reduction procedure enhancing the sound coming from the front; it is based on a spectral subtraction followed by an adaptive Wiener filtering (Figure 16.3). It begins by the estimation of the noise. Doerbecker’s strategy is mostly based on spectrum consideration.

f16-03-9780128025086
Figure 16.3 The main steps of the Doerbecker’s algorithm.

In Doerbecker’s processing, the noise estimation follows one of two methods. The first is referred to as the Ephraim and Malah correction (Ephraim and Malah, 1984), which is a binaural process reducing the musical noise introduced by the spectral subtraction. The second is Scalart’s correction (Scalart and Filho, 1996), which selects, on each channel, the best SNR, which is similar to Wiener’s algorithm (Van den Bogaert et al., 2009).

The steps of Doerbecker’s processing are as follows:

1. A short-term fast Fourier transform (FFT) is applied to the signal captured by the two microphones m1 and m2, leading to two processing lines, one on each side:

E1f=FFTe1tE2f=FFTe2t

si5_e

where E1(f) and E2(f) are the spectrum components at a frequency f (amplitude and phase) in the complex plane.

2. The estimation of the noise spectrum is obtained by subtracting the cross-power spectrum to the power spectra as follows:
Power spectra:

Φx1x1f=E1f2Φx2x2f=E2f2

si6_e

Cross-power spectrum (scalar product):

Φx1x2f=E1f.E2f.

si7_e

3. Noise estimation (on line 1) is given by

Φnn1f=Φx1x1fΦx1x2f

si8_e

If Φnn(f) is negative, Φnn(f) is set to zero; this action is called the musical noise.
A similar formula is used on line 2.
It can be seen that if X1(f) = X2(f), then Φnn(f) = 0 (no noise) for the frequency f.

4. Ephraim and Malah correction
The Ephraim and Malah formula (Ephraim and Malah, 1984) introduces a spectrum amplitude correction for the speech signal in order to minimize the mean-square error of the log spectra. It attenuates the musical noise and represents a smoothing between the time frames:

G=π211+RpostRprio1+Rprio*M1+RpostRprio1+Rprio

si9_e  (16.3)

where G is the gain that is applied to each short time spectrum; Rpost is the a posteriori SNR, which is the local estimate of the SNR computed from the current data in the current short time frame; Rprio is the a priori SNR, which corresponds to the information on the spectrum magnitude gathered from previous frames (this value defines the attenuation of the musical noise and is widely discussed in Cappe, 1994); and M represents the Bessel functions of zero and first order.

5. Scalart correction
The Scalart formula was developed to improve the SNR in the frequency domain (Scalart and Filho, 1996):

Gf=SNRf1SNRf=11SNRf

si10_e  (16.4)

G(f) weights the frequencies of the signal and is set to zero if SNR(f) < 1. Thus, frequencies with a low SNR are disadvantaged.

6. Spectral subtraction
In the next step, a spectral subtraction is performed (Yang and Fu, 2005):

Rightsideear:X1¬f=Φx1x1fΦnnf

si11_e  (16.5a)

Leftsideear:X2¬f=Φx2x2fΦnnf

si12_e  (16.5b)

This strategy disadvantages the frequencies where X1(f) is different from X2(f).

7. Wiener filtering
A general Wiener coefficient Ws(f) is calculated (Van den Bogaert et al., 2009; Doerbecker and Ernst, 1996), indicating the percentage of noise, for each frequency (Figure 16.3):

Wsf=4ΦX1X2f2ΦX1X1f+ΦX2X2f2,

si13_e

with Ws(f) = 1 if X1(f) = X2(f).
Then, each frequency f of the input signal is corrected by Ws, on each way, as follows:

S1f=ΦX1X1fWsfS2f=ΦX2X2fWsf

si14_e  (16.6)

Finally, the processed signal is reconstructed using an inverse fast Fourier transform (IFFT).

2.1.3 Vocoder

CI coding was represented by a classical vocoder (Figure 16.4) (also see Loizou, 2006) performing the channel simulation.

f16-04-9780128025086
Figure 16.4 Synoptic of the vocoder.

For this application, the Neurelec BCI parameters have been taken:

1. Sampling frequency (fs): 16 kHz

2. Frame length: 8 ms (128 time samples); a 75% overlap between the frames was used, leading to the signal refreshing every 2 ms. Spectral bins (spectral lines) were calculated using a FFT; then the 64 bins were grouped according to the bark scale in order to build up the channels (Table 16.1). The bark scale follows the cochlea lin-log properties of the cochlea.

Table 16.1

Representation of the 12 Frequency Bands (Channels)

Channel123456789101112
Fm (Hz)
fM (Hz)
325
473
473
641
641
836
836
1067
1067
1344
1344
1675
1675
2087
2087
2585
2585
3195
3195
3945
3945
4866
4866
6000

t0010

Note: fm and fM represent the low and the high cutoff frequencies of the corresponding bandpass, respectively.

3. The spectrum was parted into 12 frequency bands, (12 channels, corresponding to the 12 electrodes) on each ear. According to the short-term FFT theory, bin spacing was 125 Hz.

4. The eight highest energy channels were selected Dorman and Loizou, 1997; Friesen et al., 2001). For each channel, a sine wave, taken at the center frequency of each band, was multiplied by the energy of the frequency band. The four remaining (nonselected) channels were eliminated.

2.2 Phoneme recognition session

2.2.1 Phonetic material

The noise was obtained from a speech signal analyzed by a FFT and reconstructed with a random phase. The signal was the phonetically balanced Lafon’s lists (www.extpdf.com/lafon-pdf/html) classically used in the French clinical tests. Each list has 17 three-phoneme words, leading to 51 phonemes. A total of 20 lists are available.

The noise was presented at the input of the system, from one of the five loudspeakers (Figure 16.1). The input angle was θ. Lafon’s lists were always presented from the front LS (θ = 0°).

Then, the signal was recorded on the KEMAR head. Consequently, the ITD and the ILD effects were present. The distance head-LS was 1 m; azimuth ranged from − 90° to 90°, with five positions (− 90°, − 45°, 0°, 45°, 90°). A sample frequency was fs = 44.1 kHz; then it was downsampled to fs = 16 kHz. The intensity was 70 dB SPL, allowing a good dynamic for the recording. The recorded signal was then processed and the final signal, represented by s1″(t) and s2″(t) in Figure 16.1, was burned on a CD. All subjects followed two successive stages: first, a training session; and second, an assessment (test) session.

2.2.2 Training session

Each subject listened to the vocoded speech (VOC) with no noise added, and was asked to repeat each three-phoneme word. The percentage of correct recognition of phonemes (PCRPs) was recorded for each of the 20 lists. This training task lasted until each subject reached a performance of at least 80% of correctly repeated phonemes. The training session for all the subjects lasted less than 15 min.

2.2.3 Phoneme recognition test

Then, subjects were instructed to listen to a total of 150 lists; each list corresponded to a given experimental condition (as described next). The lists were chosen randomly in the data bank (the 20 Lafon’s lists), and each list was equally represented throughout the different 150 conditions indicated here:

1. Three algorithms: Beamformer (BF), Doerbecker + Ephraim and Malah (DOEM), and Doerbecker + Scalart (DOS)

2. Three reinjection percentages: α = 0% (denoted 00), 20% (20), and 40% (40)

3. Five angles for the noise: θ = -90°, -45°, 0°, + 45° and + 90°

4. Three SNRs: − 6 dB, 0 dB, and 6 dB

This leads to 3 × 3 × 5 × 3 = 135 situations. Consequently, the parameters of the study were the algorithm, the reinjection, the noise angle, and the SNR. The corresponding condition codes were BEAM00, BEAM20, BEAM40, DOEM00, DOEM20, DOEM40, DOS00, DOS20, and DOS40, indicating the algorithm and the reinjection.

Furthermore, speech perception was tested in the VOC situation, yielding another 15 experimental conditions (3 SNRs * 5 noise angles). At the beginning of a session, a test was made without any processing to stand up as a reference (UNPROC).

The whole listening experiment lasted 3 h, including at least one break per hour (and more upon the subjects’ request). Programs were developed on Matlab® software and graphical user interface (GUIDE).

2.3 Localization task

Subjects listened to the noise signal delivered by one of the loudspeakers, and they had to locate the source among the five directions (Figure 16.1). They listened to 165 stimulations (the 11 conditions indicated in subsection 2.2.3, the five angles; three stimulations occurred for each condition-angle situation, and the average value was taken.

The duration of this session was about 0.5 h. Each sound lasted 5 s, and then the subject had to press a button indicating which LS he or she thought the signal was coming from. The percentage of correct localization (PCL) was recorded.

2.4 Listeners

A total of 26 subjects (14 males and 12 females), aged from 20 to 26 years (average age 24 years), participated in the experiment. None of the subjects had any hearing problems; they were checked in the ORL Department of the Edouard-Herriot Hospital prior to the experiment. Normal pure tone thresholds (hearing thresholds) were lower than 20 dB HL (Hearing Level) for the octave frequencies ranging from 125 Hz to 8 kHz.

This study was performed according to the French laws that apply to biomedical research and was approved by the Ethics Committee (CPP, 0100630314037, Léon-Bérard Center, Lyon).

3 Results

3.1 Localization

PCLs indicating the source localization, with and without the reinjection, are represented in Figure 16.5; standard deviations are also indicated.

f16-05-9780128025086
Figure 16.5 Source localization according to the strategy; BF, DOEM, DOS, UNPROC, VOC.

In Figure 16.5, percentages are indicated when the signal is directly presented to the listeners with the vocoder only situation (VOC), and without processing (UNPROC) for comparison purposes.

Mean values (over the reinjection) are given in Table 16.2. It can be clearly seen that UNPROC led to the best results.

Table 16.2

PCL According to the Strategy (Mean Values)

StrategyBeamDOEMDOSVOCUNPROC
Average PCL3641434454

t0015

3.2 Phoneme recognition

3.2.1 Full recognition

The full recognition percentages of the phonemes PCRP (Percentage Correct Recognition Phoneme) are presented in Table 16.3. These results will be analyzed and compared to the source localization.

Table 16.3

PCRPs

SNR− 606
θ− 90− 4504590− 90− 4504590− 90− 4504590
Beam 00163225332738454852525256586160
Beam 20183424373038494453555761586666
Beam 40203629433352565159606766706869
DOEM 00151420181630403539425354546058
DOEM 20152025242250496244526461676767
DOEM 40272925343055556257596766676971
DOS 00775131130213027373236553842
DOS 20302922312744414443454761635960
DOS 40331722322940524549495761676666
VOC302637273134222924394033383743

t0020

3.2.2 Specific recognition

As the angle of the noise perturbation did not strongly affect the percentage recognition, the results indicated here were averaged on θ (Table 16.4) for clarity purposes. Table 16.5 shows the mean values (over the SNRs) according to the algorithm and the reinjection. The corresponding value for “Voc only” was 33%.

Table 16.4

PCRP According to Strategy and Reinjection, Related to the SNR

SNR− 606
Beam0127147157
20294862
40325668
DOEM0217137156
20215165
40195868
DOS039329341
20284358
40274763
VOC303038

t0025

Table 16.5

Mean PCRPs According to Strategy (Algorithm and Reinjection)

α002040
Beam444652
DOEM374648
DOS264346

t0030

The overall representation of the reinjection is shown in Figure 16.6.

f16-06-9780128025086
Figure 16.6 Percentage recognition according to the reinjection percentage (mean results).

In order to show the influence of the algorithms, PCRPs are reorganized as shown in Figure 16.7, after averaging on the SNR and on the noise angle.

f16-07-9780128025086
Figure 16.7 Recognition scores seen with the different algorithms.

4 Discussion

Results can be discussed from different points of view; the following subsections give several of them.

4.1 Source localization (PCL)

Figure 16.5 indicated the localization of the acoustic source. The Beamformer, mostly without reinjection, led to worse results.

With α = 0 (no reinjection), DOEM presented the best percentage (50%). When a reinjection was done (20% and 40%), the results obtained with the three methods were equivalent. In this case, the percentages reached those seen with the vocoder alone.

It is worth noting that the unprocessed signal (UNPROC) led to the best localization. The vocoder showed results lower than those obtained with the unprocessed signal. This is consistent with the findings of Van den Bogaert (2009); he indicated that signal reinjection had a positive action in restoring the localization ability.

4.2 Recognition (PCRP)

4.2.1 Influence of the algorithms

Beam and DOEM led to the best PCRP (Figure 16.7). Results were clearly above those obtained with the vocoder only (VOC). The difference between Beam and DOEM was very small, suggesting that time strategy and spectrum processing present similar behaviors.

The Adaptive Directional Microphone (ADM) strategy was clearly an advantage for the sounds coming in front of the listener (Van den Bogaert et al., 2008).

4.2.2 Influence of SNR

When no reinjection was done, the beamformer algorithm led to the best PCRPs (Table 16.4). When a reinjection was introduced, the recognition was improved with all the algorithms, but BEAM stayed ahead. (Note: SNR = − 6 dB.) In this adverse condition (low SNR), the algorithms were not efficient (compared to VOC).

In the case of SNR = 0, recognition led to better results (the noise and the signal were at the same level). Without reinjection, BEAM results were the best. With reinjection, DOEM PCRPs were greatly improved (Table 16.4).

Compared to the VOC (vocoder only) situation, PCRP recognition percentages were clearly better when a noise reduction strategy was used; reinjection leveled the recognition percentages among the algorithms.

In the case, of SNR = 6 dB, the results obtained with the different strategies were rather equivalent (Table 16.4). Reinjection had a positive influence (α starting from 20%). The VOC situation was far inferior. Similarily, Kokkinakis and Loizou (2010) indicated that when SNR was high, the improvements according to the strategies were leveled.

4.3 Other influences

4.3.1 Reinjection

The reinjection factor α improved the phoneme recognition (Figure 16.6). It is interesting to point out that the best results were always obtained with α = 40% in PCRP recognition.

About localization, reinjection favored the beam strategy but lowered the PCRPs obtained with the DOEM algorithm (Figure 16.5). Consequently, α improved the perception results PCs (PCRPs and PCLs) with the beam processing, which is based on the temporal properties of the signal, in both localization and recognition.

If we consider spectrum processing (Doerbecker), α lowered the source localization ability with DOEM and improved the phoneme recognition. Using DOEM processing, the choice seems to be that either we want to recognize or we want to localize—improving one reduced the other. For DOS, the results were more even with α for the localization and similar to DOEM for phoneme recognition (improvement), compared to VOC.

4.3.2 Noise angle

The noise angle (Table 16.3) did not change the behavior drastically in all the conditions. Nevertheless, it may be seen that results were slightly better when the noise was stimulating the right ear rather than to the left ear (θ > 0). Is it linked to the brain hemisphere preference? The noise reaching the right ear goes mostly to the left hemisphere, which is more specialized in terms of language. The left brain may be more efficient to separate noise and speech. The ear preference is an interesting issue (Poeppel, 2003; Saoud et al., 2012).

4.4 Simulation with normal hearing listeners

The aspect of perception and localization, with normal-hearing listeners (NHLs) and with implantees, has been addressed by several studies. Dorman and Loizou (1997) considered this subject. They stated that performances obtained with NHLs establish a benchmark for how well implanted patients could perform if electrode arrays were able to reproduce, by artificial electrical stimulation, the stimulation produced by auditory stimulation of the cochlea.

Kerber (2012) studied the localization performances of CI users as opposed to NHLs; they indicated that the use of both ears is an advantage for helping hearing-impaired patients as well as implantees. They reached the same conclusion as Dorman.

Finally, taking NHLs in simulation instead of CI users must be considered. Results cannot be transposed carelessly from one category of people to the other, but they can be used as an indication.

5 Conclusions

The improvement in acoustic source localization and in phoneme recognition brought by noise reduction strategies (Doerbecker and Beamformer algorithms) in a simulated binaural CI coding environment has been discussed in this chapter.

Without any signal processing, listeners can localize the acoustic source. Similarly, with the binaural CI processing alone (VOC), a fairly acceptable localization was maintained. The influence of speech processing on the acoustic source localization is interesting. With the beamformer, the localization (which was poor at the beginning) was really improved when a signal reinjection was introduced. On the contrary, reinjection altered the DOEM algorithm performance. With DOS, an influence of the reinjection was not clearly seen.

Thus, with respect to the VOC situation, it can be seen that signal processing did not improve the source localization ability.

For recognition, the Beamformer and Doerbecker’s strategies showed similar behavior. They increased the phoneme perception percentages in good noise situations (SNR = 0 or 6 dB) compared to the VOC situation; the average improvement was around 20%. Amelioration was poor for the worst value of the SNR (− 6 dB).

Without reinjection, the Beamformer algorithm led to the best recognition (PCRP) results mostly when the SNR was low. Compared to the VOC situation, the increase brought by the Beamformer algorithm was about 30% when the SNR was 0 or 6 dB. Doerbecker’s processing also improved the phoneme perception; the Ephraim and Malah’s correction showed better percentages than the Scalart’s strategy. A reinjection (20% or 40%) of the input signal raised the phoneme recognition results in our experiment. When the reinjection percentage grew from 20% to 40%, the performances were not deeply modified.

Finally, this work indicates that the Beamformer algorithm and Doerbecker’s processing improved phoneme recognition, but they lowered the source localization. Reinjection had a positive effect on recognition, and on the localization with the Beamformer; it was less helpful for the localization with the Doerbecker’s processing.

Now these indications should be revisited with cochlear implantees; investigating the noise reduction in a noisy environment still remains a challenge.

References

Blauert J. The Technology of Binaural Listening. Berlin: Springer; 2013.

Bouchard C, Havelock DI, Bouchard M. Beamforming with microphone arrays for directional sources. J. Acoust. Soc. Am. 2009;125(4):2098–2104.

Cappe O. Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. Speech Audio Process. 1994;2(2):345–349.

Doerbecker M, Ernst S. Combination of two-channel spectral subtraction and adaptive wiener post-filtering for noise reduction and deverberation. In: European Signal Processing Conference, Trieste, Italy; 1996:995–998.

Dorman MF, Loizou PC. Speech intelligibility as a function of the number of channels of stimulation for normal-hearing listeners and patients with cochlear implants. Am. J. Otol. 1997;18(6, Suppl.):S113–S114.

Dunn CC, Noble W, Tyler RS, Kordus M, Gantz BJ, Ji H. Bilateral and unilateral cochlear implant users compared on speech perception in noise. Ear Hear. 2010;31(2):296–298.

Ephraim Y, Malah D. Speech enhancement using a minimum mean-square error short -time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 1984;32(6):1109–1121.

Francart T, Lenssen A, Wouters J. Enhancement of interaural level differences improves sound localization in bimodal hearing. J. Acoust. Soc. Am. 2011;130(5):2817–2826.

Friesen LM, Shannon RV, Baskent D, Wang X. Speech recognition in noise as a function of the number of spectral channels: comparison of acoustic hearing and cochlear implants. J. Acoust. Soc. Am. 2001;110(2):1150–1163.

Kallel F, Frikha M, Ghorbel M, Hamida AB, Berger-Vachon C. Dual-channel spectral subtraction algorithms based speech enhancement dedicated to a bilateral cochlear implant. Appl. Acoust. 2012;73:12–20.

Kerber S, Seeber BU. Sound localization in noise by normal-hearing listeners and cochlear implant users. Ear Hear. 2012;33(4):445–457.

Kokkinakis K, Loizou PC. Multi-microphone adaptive noise reduction strategies for coordinated stimulation in bilateral cochlear implant devices. J. Acoust. Soc. Am. 2010;127(5):3136–3144.

Kompis M, Dillier N. Performance of an adaptive beamforming noise reduction scheme for hearing aid applications. I. Prediction of the signal-to-noise-ratio improvement. J. Acoust. Soc. Am. 2001;109(3):1123–1133.

Lawson DT, Wilson BS, Zerbi M, van den Honert C, Finley CC, Farmer JC, McElveen JT, Roush PA. Cochlear implants controlled by a single speech processor. Am. J. Otol. 1998;19(6):758–761.

Loizou PC. Speech processing in vocoder-centric cochlear implants. Adv Otorhinolaryngol. 2006;64:109–143.

Poeppel D. The analysis of speech in different temporal integration windows: cerebral lateralization as ‘asymmetric sampling in time. Speech Comm. 2003;41:245–255.

Saoud H, Josse G, Bertasi E, Truy E, Chait M, Giraud A-L. Brain–speech alignment enhances auditory cortical responses and speech perception. J. Neurosci. 2012;32:275–281.

Scalart P, Filho JV. Speech enhancement based on a priori signal to noise estimation. In: Proc. Conf. IEEE Int Acoustics, Speech, and Signal Processing ICASSP-96; 1996:629–632.

van den Bogaert T, Doclo S, Wouters J, Moonen M. The effect of multimicrophone noise reduction systems on sound source localization by users of binaural hearing aids. J. Acoust. Soc. Am. 2008;124(1):484–497.

van den Bogaert T, Doclo S, Wouters J, Moonen M. Speech enhancement with multichannel Wiener filter techniques in multimicrophone binaural hearing aids. J. Acoust. Soc. Am. 2009;125(1):360–371.

van Dijk B, Moonen M, Wouters J. Speech understanding performance of cochlear implant subjects using time-frequency masking-based noise reduction. IEEE Trans. Biomed. Eng. 2012;59(5):1363–1373.

van Hoesel RJM, Jones GL, Litovsky RY. Interaural time-delay sensitivity in bilateral cochlear implant users: effects of pulse rate, modulation rate, and place of stimulation. J. Assoc. Res. Otolaryngol. 2009;10(4):557–567.

Yang L-P, Fu Q-J. Spectral subtraction-based speech enhancement for cochlear implant patients in background noise. J. Acoust. Soc. Am. 2005;117(3 Pt 1):1001–1004.

Yousefian N, Loizou PC. A dual microphone algorithm that can cope with competing talkers scenarios. IEEE Trans. Audio Speech Lang. Process. 2013;21(1):145–155.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset