Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 16

Contribution of Noise Reduction Algorithms

Perception Versus Localization Simulation in the Case of Binaural Cochlear Implant (BCI) Coding

Arnaud Jeanvoine¹^,²^,⁴; Dan Gnansia⁴; Eric Truy¹^,²^,⁵; Christian Berger-Vachon¹^,²^,³ ¹ INSERM U1028, Lyon Neuroscience Research Center, DYCOG Team/ PACS Group (Speech, Audiology, Communication Health), Lyon, France
² CNRS UMR5292, Lyon Neuroscience Research Center, DYCOG Team/ PACS Group (Speech, Audiology, Communication Health), Lyon, France
³ University Lyon 1, Lyon, France
⁴ Neurelec, Vallauris, France
⁵ Audiology and ORL Department, Edouard Herriot Hospital, Lyon, France

Abstract

Communication and warning are two basic tasks devoted to the auditory system and it is worth to see them together when assistive techniques are considered for hearing rehabilitation. French phonemes recognition in noisy conditions and acoustic source localization, in the case of a cochlear implant (CI) coding simulation, are compared in this chapter.

Three binaural noise reduction systems have been considered: the Beamformer algorithm, the Doerbecker's processing combined with the Ephraim & Malah's noise estimator or with the Scalart's noise reduction strategy. This study has been conducted with twenty normally hearing subjects.

Results show that the Beamformer algorithm and the Doerbecker's processing improved the phoneme recognition scores. The best results in recognition were obtained using the Beamformer algorithm. On the contrary, the beamformer algorithm and the Doerbecker's processing lowered the source localization. A small reinjection of the signal (20%) was profitable to the Beamformer algorithm; this improvement was not seen with the Doerbecker's processing.

Keywords

Binaural signal processing

binaural cochlear implant (BCI)

vocoder simulation

speech recognition

localization

Acknowledgments

This work was supported by a French CIFRE contract with the Neurelec society and the CNRS. The Phonak company supplied the KEMAR head. We are grateful to the listeners who participated in this experiment. We thank Professors Lionel Collet, Hung Thai Van and Olivier Bertrand, as well as Dr. Evelyne Veuillet for providing the facilities to conduct this experiment. We also appreciated the useful discussions with the PACS group members. Finally, our gratitude goes to the anonymous referees; their comments greatly helped us improve the manuscript.

1 Introduction

Hearing in humans has two important purposes: communication (speech recognition) and warning (acoustic source localization). The influence of assistive technology (deafness rehabilitation) on these two basic functions is worth studying. This feature can be seen also with cochlear implants (CIs). Nowadays, CIs are widely used for profound deafness rehabilitation, but speech perception in noisy environments still remains a challenge for hearing-impaired subjects and for persons using a CI (implantees) (Dunn et al., 2010); a lot of work is presently done in this field of hearing rehabilitation (Blauert, 2013; van Dijk et al., 2012; Yousefian and Loizou, 2013).

Binaural hearing uses two main acoustic cues: interaural time difference (ITD) and interaural level difference (ILD) (Doerbecker and Ernst, 1996; Francart et al., 2011):

• ITD is the delay between both ears. It is efficient for low frequencies (below 850 Hz). It is due to the envelope of the signal reaching the two ears. It can be reminded that a sound coming from the side at 90° has an ITD of 0.6 ms. When the source is situated in the front (azimuth 0°), the so-called front target, the ITD is 0 ms.

• ILD is related to the intensity reaching the two ears. The signal is more or less attenuated by the head shadow. This effect is mostly perceptible with high frequencies (above 3 kHz). ILD is 0 for the front target.

In the case of binaural cochlear implants (BCIs), two CIs are used, and only one processor drives them. The ITD can be employed in a coordinated process (van Hoesel et al., 2009; Lawson et al., 1998).

Several noise reduction algorithms favor the signal coming from the front (Van den Bogaert et al., 2008, 2009). The aim of noise reduction algorithms is to improve speech intelligibility, but localization should be considered as well; when an algorithm improves speech perception (for instance, by focusing on a source situated in the front), the localization of other sound sources may be affected. Furthermore, some reinjection of the input signal has been suggested by Van den Bogaert in the case of hearing aids (Van den Bogaert et al., 2008), in order to restore some source localization of the sound source; this effect will be seen in the current work. This strategy is worthy in the case of CI coding (Loizou, 2006).

Several binaural algorithms are classically used for noise reduction, such as the Beamformer, which favors a direction through delay (Bouchard et al., 2009; Kompis and Dillier, 2001). Spectral subtraction, based on the estimation of the noise, followed by the attenuation of the band spectrum containing the noise by widely used Wiener filtering, may improve the signal-to-noise ratio (SNR) (Van den Bogaert et al., 2009; Kallel et al., 2012; Yang and Fu, 2005).

In the case of CIs, an additional processing is needed to adapt the acoustic signal to the CI electric stimulation. In the current study, a vocoder simulation was taken to simulate CI coding (Loizou, 2006). Cochlear implantees cannot use some localization cues, such as the ITD (Dorman and Loizou, 1997; Lawson et al., 1998), which is lost. In our study, subjects with normal hearing were looked at to estimate the influence of the noise reduction algorithms after CI coding. Ultimately, the results will have to be seen with implantees when enough subjects wearing the BCI are available, leading to a homogeneous population. Coding of the signal, according to CI processing, with a homogeneous population of normal hearing listeners is worthy to be considered (Dorman and Loizou, 1997; Kerber and Seeber, 2012).

The goal of the present study is to evaluate the influence of the noise-reduction systems suggested by Matthias Döerbecker in the localization of a sound source, in a BCI context, compared to the beamformer algorithm. Doerbecker’s processing is classically used in normal hearing aids. A sine vocoder has been considered to simulate BCI speech processing.

The technical aspects are indicated in section 2. Results are then presented and discussed in sections 3 and 4. Finally, salient features are outlined in the conclusion of the chapter.

2 Materials and Methods

2.1 Signal processing

2.1.1 Overall organization

The synoptic representation of the acquisition of the signal is represented in Figure 16.1. The speech signal was emitted by the front loudspeaker (0°), and the noise was presented from a loudspeaker (LS) array. The LSs were situated at angles θ ranging from − 90° to 90°. The acoustic signal is captured by the microphones (m₁ and m₂).

f16-01-9780128025086 — Figure 16.1 Synoptic of the experiment; the algorithm used is the Beamformer or either of Doerbecker’s strategies.

In addition, a percentage (α) of the input signal was reinjected after the noise reduction algorithms (Van den Bogaert et al., 2008). The values of α were fixed to 0.0, 0.2, and 0.4 (0%, 20% and 40%).

The basic reinjection formula used in this work was

$s' (t) = (1 - α) . s (t) + α . e (t),$ $s' (t) = (1 - α) . s (t) + α . e (t),$

(16.1)

where e(t) is the input signal, s(t) is the signal leaving the noise reduction processor, s′(t) is the signal output, (entering the vocoder), α is the reinjection (0%, 20%, 40%), and s″(t) is the output signal presented to the listeners.

2.1.2 Beamformer

The schematic representation of the Beamformer is presented on Figure 16.2.

f16-02-9780128025086 — Figure 16.2 Synoptic representation of the beamformer algorithm

The signal emitted by the sound source is captured by two microphones, m₁ and m₂ (right and left), situated on the ears of a Knowles Electronics Manikin for Acoustical Research (KEMAR) head. The classical Beamformer (BF) response is a cardioid centered on the front loudspeaker, LS3 (Bouchard et al., 2009). The BF algorithm is presented next, and it favors the front target (0° azimuth); the experiment was in the horizontal plane.

The input signal reaching the two ears e(t) can be represented (complex plane) in the case of a sine wave by

$e (t) = A . e^{j ω (t - τ)} = A . e^{j ω (t - \frac{x}{c})} = A . e^{j (ω t - \frac{2 π . x}{c T})} = A . e^{j (ω t^{_} \frac{2 π . x}{λ})} = A . e^{j (ω t - Φ)}$ $e (t) = A . e^{j ω (t - τ)} = A . e^{j ω (t - \frac{x}{c})} = A . e^{j (ω t - \frac{2 π . x}{c T})} = A . e^{j (ω t^{_} \frac{2 π . x}{λ})} = A . e^{j (ω t - Φ)}$

(16.2)

where A is the signal magnitude, x is the distance between the source and the microphones (x₁ and x₂), c is the sound velocity in air (340 m/s), λ is the wavelength, τ is the propagation time between the source and the microphones, d is the distance between the two ears (20 cm), Φ is the phase delay between the source and the microphones, and θ is the orientation angle of the sound source.

The phase difference between the signals reaching the two ears is

$Δ Φ = \frac{2 π}{λ} (x_{1} - x_{2}) \approx \frac{2 π . d}{λ} sin (θ)$ $Δ Φ = \frac{2 π}{λ} (x_{1} - x_{2}) \approx \frac{2 π . d}{λ} sin (θ)$

The beamformer algorithm (based on the phase information) is mainly helpful when

$Δ Φ = \frac{2 π . d}{λ} \leq π,$ $Δ Φ = \frac{2 π . d}{λ} \leq π,$

leading to fc ≤ 850 Hz (850 Hz is obtained for θ = π/2). The maximum delay between the two ears represents about 10 samples at the sampling frequency fs = 16 kHz.

In the Beamformer strategy, the low-frequency part of the signal (below 850 Hz) is processed and the signals coming from the two microphones are added. Then the high-frequency components of the signal are restored for further treatment.

2.1.3 Doerbecker’s processing

Doerbecker’s processing is a noise reduction procedure enhancing the sound coming from the front; it is based on a spectral subtraction followed by an adaptive Wiener filtering (Figure 16.3). It begins by the estimation of the noise. Doerbecker’s strategy is mostly based on spectrum consideration.

f16-03-9780128025086 — Figure 16.3 The main steps of the Doerbecker’s algorithm.

In Doerbecker’s processing, the noise estimation follows one of two methods. The first is referred to as the Ephraim and Malah correction (Ephraim and Malah, 1984), which is a binaural process reducing the musical noise introduced by the spectral subtraction. The second is Scalart’s correction (Scalart and Filho, 1996), which selects, on each channel, the best SNR, which is similar to Wiener’s algorithm (Van den Bogaert et al., 2009).

The steps of Doerbecker’s processing are as follows:

1. A short-term fast Fourier transform (FFT) is applied to the signal captured by the two microphones m₁ and m₂, leading to two processing lines, one on each side:

$\begin{array}{l} E_{1} (f) = F F T (e_{1} (t)) \\ E_{2} (f) = F F T (e_{2} (t)) \end{array}$ $\begin{array}{l} E_{1} (f) = F F T (e_{1} (t)) \\ E_{2} (f) = F F T (e_{2} (t)) \end{array}$

si5_e

where E₁(f) and E₂(f) are the spectrum components at a frequency f (amplitude and phase) in the complex plane.

2. The estimation of the noise spectrum is obtained by subtracting the cross-power spectrum to the power spectra as follows:
Power spectra:

$\begin{array}{l} Φ_{x_{1} x_{1}} (f) = {|E_{1} (f)|}^{2} \\ Φ_{x_{2} x_{2}} (f) = {|E_{2} (f)|}^{2} \cdot \end{array}$ $\begin{array}{l} Φ_{x_{1} x_{1}} (f) = {|E_{1} (f)|}^{2} \\ Φ_{x_{2} x_{2}} (f) = {|E_{2} (f)|}^{2} \cdot \end{array}$

si6_e

Cross-power spectrum (scalar product):

$Φ_{x_{1} x_{2}} (f) = |E_{1} (f) . E_{2} (f)| .$ $Φ_{x_{1} x_{2}} (f) = |E_{1} (f) . E_{2} (f)| .$

3. Noise estimation (on line 1) is given by

${Φ_{n n}}_{1} (f) = Φ_{x_{1} x_{1}} (f) - Φ_{x_{1} x_{2}} (f)$ ${Φ_{n n}}_{1} (f) = Φ_{x_{1} x_{1}} (f) - Φ_{x_{1} x_{2}} (f)$

If Φ_nn(f) is negative, Φ_nn(f) is set to zero; this action is called the musical noise.
A similar formula is used on line 2.
It can be seen that if X₁(f) = X₂(f), then Φ_nn(f) = 0 (no noise) for the frequency f.

4. Ephraim and Malah correction
The Ephraim and Malah formula (Ephraim and Malah, 1984) introduces a spectrum amplitude correction for the speech signal in order to minimize the mean-square error of the log spectra. It attenuates the musical noise and represents a smoothing between the time frames:

$G = \frac{\sqrt{π}}{2} \sqrt{(\frac{1}{1 + R_{post}}) (\frac{R_{prio}}{1 + R_{prio}})} * M [(1 + R_{post}) (\frac{R_{prio}}{1 + R_{prio}})]$ $G = \frac{\sqrt{π}}{2} \sqrt{(\frac{1}{1 + R_{post}}) (\frac{R_{prio}}{1 + R_{prio}})} * M [(1 + R_{post}) (\frac{R_{prio}}{1 + R_{prio}})]$

si9_e (16.3)

where G is the gain that is applied to each short time spectrum; R_post is the a posteriori SNR, which is the local estimate of the SNR computed from the current data in the current short time frame; R_prio is the a priori SNR, which corresponds to the information on the spectrum magnitude gathered from previous frames (this value defines the attenuation of the musical noise and is widely discussed in Cappe, 1994); and M represents the Bessel functions of zero and first order.

5. Scalart correction
The Scalart formula was developed to improve the SNR in the frequency domain (Scalart and Filho, 1996):

$G (f) = \sqrt{\frac{S N R (f) - 1}{S N R (f)}} = \sqrt{1 - \frac{1}{S N R (f)}}$ $G (f) = \sqrt{\frac{S N R (f) - 1}{S N R (f)}} = \sqrt{1 - \frac{1}{S N R (f)}}$

si10_e (16.4)

G(f) weights the frequencies of the signal and is set to zero if SNR(f) < 1. Thus, frequencies with a low SNR are disadvantaged.

6. Spectral subtraction
In the next step, a spectral subtraction is performed (Yang and Fu, 2005):

$Right side (ear) : \overset{\neg}{X_{1}} (f) = \sqrt{Φ_{x_{1} x_{1}} (f) - Φ_{n n} (f)}$ $Right side (ear) : \overset{\neg}{X_{1}} (f) = \sqrt{Φ_{x_{1} x_{1}} (f) - Φ_{n n} (f)}$

(16.5a)

$Left side (ear) : \overset{\neg}{X_{2}} (f) = \sqrt{Φ_{x_{2} x_{2}} (f) - Φ_{n n} (f)}$ $Left side (ear) : \overset{\neg}{X_{2}} (f) = \sqrt{Φ_{x_{2} x_{2}} (f) - Φ_{n n} (f)}$

(16.5b)

This strategy disadvantages the frequencies where X₁(f) is different from X₂(f).

7. Wiener filtering
A general Wiener coefficient W_s(f) is calculated (Van den Bogaert et al., 2009; Doerbecker and Ernst, 1996), indicating the percentage of noise, for each frequency (Figure 16.3):

$W s (f) = \frac{4 {|Φ_{X_{1} X_{2}} (f)|}^{2}}{{(|Φ_{X_{1} X_{1}} (f)| + |Φ_{X_{2} X_{2}} (f)|)}^{2}},$ $W s (f) = \frac{4 {|Φ_{X_{1} X_{2}} (f)|}^{2}}{{(|Φ_{X_{1} X_{1}} (f)| + |Φ_{X_{2} X_{2}} (f)|)}^{2}},$

si13_e

with W_s(f) = 1 if X₁(f) = X₂(f).
Then, each frequency f of the input signal is corrected by W_s, on each way, as follows:

$\begin{array}{l} |S_{1} (f)| = \sqrt{|Φ_{X_{1} X_{1}} (f)| ⁎ W_{s} (f)} \\ |S_{2} (f)| = \sqrt{|Φ_{X_{2} X_{2}} (f)| ⁎ W_{s} (f)} \end{array}$ $\begin{array}{l} |S_{1} (f)| = \sqrt{|Φ_{X_{1} X_{1}} (f)| ⁎ W_{s} (f)} \\ |S_{2} (f)| = \sqrt{|Φ_{X_{2} X_{2}} (f)| ⁎ W_{s} (f)} \end{array}$

si14_e (16.6)

Finally, the processed signal is reconstructed using an inverse fast Fourier transform (IFFT).

2.1.3 Vocoder

CI coding was represented by a classical vocoder (Figure 16.4) (also see Loizou, 2006) performing the channel simulation.

f16-04-9780128025086 — Figure 16.4 Synoptic of the vocoder.

For this application, the Neurelec BCI parameters have been taken:

1. Sampling frequency (fs): 16 kHz

2. Frame length: 8 ms (128 time samples); a 75% overlap between the frames was used, leading to the signal refreshing every 2 ms. Spectral bins (spectral lines) were calculated using a FFT; then the 64 bins were grouped according to the bark scale in order to build up the channels (Table 16.1). The bark scale follows the cochlea lin-log properties of the cochlea.

Table 16.1

Representation of the 12 Frequency Bands (Channels)

Channel	1	2	3	4	5	6	7	8	9	10	11	12
F_m (Hz) f_M (Hz)	325 473	473 641	641 836	836 1067	1067 1344	1344 1675	1675 2087	2087 2585	2585 3195	3195 3945	3945 4866	4866 6000

t0010

Note: f_m and f_M represent the low and the high cutoff frequencies of the corresponding bandpass, respectively.

3. The spectrum was parted into 12 frequency bands, (12 channels, corresponding to the 12 electrodes) on each ear. According to the short-term FFT theory, bin spacing was 125 Hz.

4. The eight highest energy channels were selected Dorman and Loizou, 1997; Friesen et al., 2001). For each channel, a sine wave, taken at the center frequency of each band, was multiplied by the energy of the frequency band. The four remaining (nonselected) channels were eliminated.

2.2 Phoneme recognition session

2.2.1 Phonetic material

The noise was obtained from a speech signal analyzed by a FFT and reconstructed with a random phase. The signal was the phonetically balanced Lafon’s lists (www.extpdf.com/lafon-pdf/html) classically used in the French clinical tests. Each list has 17 three-phoneme words, leading to 51 phonemes. A total of 20 lists are available.

The noise was presented at the input of the system, from one of the five loudspeakers (Figure 16.1). The input angle was θ. Lafon’s lists were always presented from the front LS (θ = 0°).

Then, the signal was recorded on the KEMAR head. Consequently, the ITD and the ILD effects were present. The distance head-LS was 1 m; azimuth ranged from − 90° to 90°, with five positions (− 90°, − 45°, 0°, 45°, 90°). A sample frequency was fs = 44.1 kHz; then it was downsampled to fs = 16 kHz. The intensity was 70 dB SPL, allowing a good dynamic for the recording. The recorded signal was then processed and the final signal, represented by s₁″(t) and s₂″(t) in Figure 16.1, was burned on a CD. All subjects followed two successive stages: first, a training session; and second, an assessment (test) session.

2.2.2 Training session

Each subject listened to the vocoded speech (VOC) with no noise added, and was asked to repeat each three-phoneme word. The percentage of correct recognition of phonemes (PCRPs) was recorded for each of the 20 lists. This training task lasted until each subject reached a performance of at least 80% of correctly repeated phonemes. The training session for all the subjects lasted less than 15 min.

2.2.3 Phoneme recognition test

Then, subjects were instructed to listen to a total of 150 lists; each list corresponded to a given experimental condition (as described next). The lists were chosen randomly in the data bank (the 20 Lafon’s lists), and each list was equally represented throughout the different 150 conditions indicated here:

1. Three algorithms: Beamformer (BF), Doerbecker + Ephraim and Malah (DOEM), and Doerbecker + Scalart (DOS)

2. Three reinjection percentages: α = 0% (denoted 00), 20% (20), and 40% (40)

3. Five angles for the noise: θ = -90°, -45°, 0°, + 45° and + 90°

4. Three SNRs: − 6 dB, 0 dB, and 6 dB

This leads to 3 × 3 × 5 × 3 = 135 situations. Consequently, the parameters of the study were the algorithm, the reinjection, the noise angle, and the SNR. The corresponding condition codes were BEAM00, BEAM20, BEAM40, DOEM00, DOEM20, DOEM40, DOS00, DOS20, and DOS40, indicating the algorithm and the reinjection.

Furthermore, speech perception was tested in the VOC situation, yielding another 15 experimental conditions (3 SNRs * 5 noise angles). At the beginning of a session, a test was made without any processing to stand up as a reference (UNPROC).

The whole listening experiment lasted 3 h, including at least one break per hour (and more upon the subjects’ request). Programs were developed on Matlab® software and graphical user interface (GUIDE).

2.3 Localization task

Subjects listened to the noise signal delivered by one of the loudspeakers, and they had to locate the source among the five directions (Figure 16.1). They listened to 165 stimulations (the 11 conditions indicated in subsection 2.2.3, the five angles; three stimulations occurred for each condition-angle situation, and the average value was taken.

The duration of this session was about 0.5 h. Each sound lasted 5 s, and then the subject had to press a button indicating which LS he or she thought the signal was coming from. The percentage of correct localization (PCL) was recorded.

2.4 Listeners

A total of 26 subjects (14 males and 12 females), aged from 20 to 26 years (average age 24 years), participated in the experiment. None of the subjects had any hearing problems; they were checked in the ORL Department of the Edouard-Herriot Hospital prior to the experiment. Normal pure tone thresholds (hearing thresholds) were lower than 20 dB HL (Hearing Level) for the octave frequencies ranging from 125 Hz to 8 kHz.

This study was performed according to the French laws that apply to biomedical research and was approved by the Ethics Committee (CPP, 0100630314037, Léon-Bérard Center, Lyon).

3 Results

3.1 Localization

PCLs indicating the source localization, with and without the reinjection, are represented in Figure 16.5; standard deviations are also indicated.

f16-05-9780128025086 — Figure 16.5 Source localization according to the strategy; BF, DOEM, DOS, UNPROC, VOC.

In Figure 16.5, percentages are indicated when the signal is directly presented to the listeners with the vocoder only situation (VOC), and without processing (UNPROC) for comparison purposes.

Mean values (over the reinjection) are given in Table 16.2. It can be clearly seen that UNPROC led to the best results.

Table 16.2

PCL According to the Strategy (Mean Values)

Strategy	Beam	DOEM	DOS	VOC	UNPROC
Average PCL	36	41	43	44	54

t0015

3.2 Phoneme recognition

3.2.1 Full recognition

The full recognition percentages of the phonemes PCRP (Percentage Correct Recognition Phoneme) are presented in Table 16.3. These results will be analyzed and compared to the source localization.

Table 16.3

PCRPs

SNR	− 6					0					6
θ	− 90	− 45	0	45	90	− 90	− 45	0	45	90	− 90	− 45	0	45	90
Beam 00	16	32	25	33	27	38	45	48	52	52	52	56	58	61	60
Beam 20	18	34	24	37	30	38	49	44	53	55	57	61	58	66	66
Beam 40	20	36	29	43	33	52	56	51	59	60	67	66	70	68	69
DOEM 00	15	14	20	18	16	30	40	35	39	42	53	54	54	60	58
DOEM 20	15	20	25	24	22	50	49	62	44	52	64	61	67	67	67
DOEM 40	27	29	25	34	30	55	55	62	57	59	67	66	67	69	71
DOS 00	7	7	5	13	11	30	21	30	27	37	32	36	55	38	42
DOS 20	30	29	22	31	27	44	41	44	43	45	47	61	63	59	60
DOS 40	33	17	22	32	29	40	52	45	49	49	57	61	67	66	66
VOC	30	26	37	27	31	34	22	29	24	39	40	33	38	37	43

t0020

3.2.2 Specific recognition

As the angle of the noise perturbation did not strongly affect the percentage recognition, the results indicated here were averaged on θ (Table 16.4) for clarity purposes. Table 16.5 shows the mean values (over the SNRs) according to the algorithm and the reinjection. The corresponding value for “Voc only” was 33%.

Table 16.4

PCRP According to Strategy and Reinjection, Related to the SNR

SNR		− 6		0		6
Beam	0	1	27	1	47	1	57
	20		29		48		62
	40		32		56		68
DOEM	0	2	17	1	37	1	56
	20		21		51		65
	40		19		58		68
DOS	0	3	9	3	29	3	41
	20		28		43		58
	40		27		47		63
VOC			30		30		38

t0025

Table 16.5

Mean PCRPs According to Strategy (Algorithm and Reinjection)

α	00	20	40
Beam	44	46	52
DOEM	37	46	48
DOS	26	43	46

t0030

The overall representation of the reinjection is shown in Figure 16.6.

f16-06-9780128025086 — Figure 16.6 Percentage recognition according to the reinjection percentage (mean results).

In order to show the influence of the algorithms, PCRPs are reorganized as shown in Figure 16.7, after averaging on the SNR and on the noise angle.

f16-07-9780128025086 — Figure 16.7 Recognition scores seen with the different algorithms.

4 Discussion

Results can be discussed from different points of view; the following subsections give several of them.

4.1 Source localization (PCL)

Figure 16.5 indicated the localization of the acoustic source. The Beamformer, mostly without reinjection, led to worse results.

With α = 0 (no reinjection), DOEM presented the best percentage (50%). When a reinjection was done (20% and 40%), the results obtained with the three methods were equivalent. In this case, the percentages reached those seen with the vocoder alone.

It is worth noting that the unprocessed signal (UNPROC) led to the best localization. The vocoder showed results lower than those obtained with the unprocessed signal. This is consistent with the findings of Van den Bogaert (2009); he indicated that signal reinjection had a positive action in restoring the localization ability.

4.2 Recognition (PCRP)

4.2.1 Influence of the algorithms

Beam and DOEM led to the best PCRP (Figure 16.7). Results were clearly above those obtained with the vocoder only (VOC). The difference between Beam and DOEM was very small, suggesting that time strategy and spectrum processing present similar behaviors.

The Adaptive Directional Microphone (ADM) strategy was clearly an advantage for the sounds coming in front of the listener (Van den Bogaert et al., 2008).

4.2.2 Influence of SNR

When no reinjection was done, the beamformer algorithm led to the best PCRPs (Table 16.4). When a reinjection was introduced, the recognition was improved with all the algorithms, but BEAM stayed ahead. (Note: SNR = − 6 dB.) In this adverse condition (low SNR), the algorithms were not efficient (compared to VOC).

In the case of SNR = 0, recognition led to better results (the noise and the signal were at the same level). Without reinjection, BEAM results were the best. With reinjection, DOEM PCRPs were greatly improved (Table 16.4).

Compared to the VOC (vocoder only) situation, PCRP recognition percentages were clearly better when a noise reduction strategy was used; reinjection leveled the recognition percentages among the algorithms.

In the case, of SNR = 6 dB, the results obtained with the different strategies were rather equivalent (Table 16.4). Reinjection had a positive influence (α starting from 20%). The VOC situation was far inferior. Similarily, Kokkinakis and Loizou (2010) indicated that when SNR was high, the improvements according to the strategies were leveled.

4.3 Other influences

4.3.1 Reinjection

The reinjection factor α improved the phoneme recognition (Figure 16.6). It is interesting to point out that the best results were always obtained with α = 40% in PCRP recognition.

About localization, reinjection favored the beam strategy but lowered the PCRPs obtained with the DOEM algorithm (Figure 16.5). Consequently, α improved the perception results PCs (PCRPs and PCLs) with the beam processing, which is based on the temporal properties of the signal, in both localization and recognition.

If we consider spectrum processing (Doerbecker), α lowered the source localization ability with DOEM and improved the phoneme recognition. Using DOEM processing, the choice seems to be that either we want to recognize or we want to localize—improving one reduced the other. For DOS, the results were more even with α for the localization and similar to DOEM for phoneme recognition (improvement), compared to VOC.

4.3.2 Noise angle

The noise angle (Table 16.3) did not change the behavior drastically in all the conditions. Nevertheless, it may be seen that results were slightly better when the noise was stimulating the right ear rather than to the left ear (θ > 0). Is it linked to the brain hemisphere preference? The noise reaching the right ear goes mostly to the left hemisphere, which is more specialized in terms of language. The left brain may be more efficient to separate noise and speech. The ear preference is an interesting issue (Poeppel, 2003; Saoud et al., 2012).

4.4 Simulation with normal hearing listeners

The aspect of perception and localization, with normal-hearing listeners (NHLs) and with implantees, has been addressed by several studies. Dorman and Loizou (1997) considered this subject. They stated that performances obtained with NHLs establish a benchmark for how well implanted patients could perform if electrode arrays were able to reproduce, by artificial electrical stimulation, the stimulation produced by auditory stimulation of the cochlea.

Kerber (2012) studied the localization performances of CI users as opposed to NHLs; they indicated that the use of both ears is an advantage for helping hearing-impaired patients as well as implantees. They reached the same conclusion as Dorman.

Finally, taking NHLs in simulation instead of CI users must be considered. Results cannot be transposed carelessly from one category of people to the other, but they can be used as an indication.

5 Conclusions

The improvement in acoustic source localization and in phoneme recognition brought by noise reduction strategies (Doerbecker and Beamformer algorithms) in a simulated binaural CI coding environment has been discussed in this chapter.

Without any signal processing, listeners can localize the acoustic source. Similarly, with the binaural CI processing alone (VOC), a fairly acceptable localization was maintained. The influence of speech processing on the acoustic source localization is interesting. With the beamformer, the localization (which was poor at the beginning) was really improved when a signal reinjection was introduced. On the contrary, reinjection altered the DOEM algorithm performance. With DOS, an influence of the reinjection was not clearly seen.

Thus, with respect to the VOC situation, it can be seen that signal processing did not improve the source localization ability.

For recognition, the Beamformer and Doerbecker’s strategies showed similar behavior. They increased the phoneme perception percentages in good noise situations (SNR = 0 or 6 dB) compared to the VOC situation; the average improvement was around 20%. Amelioration was poor for the worst value of the SNR (− 6 dB).

Without reinjection, the Beamformer algorithm led to the best recognition (PCRP) results mostly when the SNR was low. Compared to the VOC situation, the increase brought by the Beamformer algorithm was about 30% when the SNR was 0 or 6 dB. Doerbecker’s processing also improved the phoneme perception; the Ephraim and Malah’s correction showed better percentages than the Scalart’s strategy. A reinjection (20% or 40%) of the input signal raised the phoneme recognition results in our experiment. When the reinjection percentage grew from 20% to 40%, the performances were not deeply modified.

Finally, this work indicates that the Beamformer algorithm and Doerbecker’s processing improved phoneme recognition, but they lowered the source localization. Reinjection had a positive effect on recognition, and on the localization with the Beamformer; it was less helpful for the localization with the Doerbecker’s processing.

Now these indications should be revisited with cochlear implantees; investigating the noise reduction in a noisy environment still remains a challenge.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 16: Contribution of Noise Reduction Algorithms: Perception Versus Localization Simulation in the Case of Binaural Cochlear Implant (BCI) Coding

Create new playlist

Sign In

Sign Up