You are here

Spectral refinement to speech enhancement

Download pdf | Full Screen View

Date Issued:
2009
Summary:
The goal of a speech enhancement algorithm is to remove noise and recover the original signal with as little distortion and residual noise as possible. Most successful real-time algorithms thereof have done in the frequency domain where the frequency amplitude of clean speech is estimated per short-time frame of the noisy signal. The state of-the-art short-time spectral amplitude estimator algorithms estimate the clean spectral amplitude in terms of the power spectral density (PSD) function of the noisy signal. The PSD has to be computed from a large ensemble of signal realizations. However, in practice, it may only be estimated from a finite-length sample of a single realization of the signal. Estimation errors introduced by these limitations deviate the solution from the optimal. Various spectral estimation techniques, many with added spectral smoothing, have been investigated for decades to reduce the estimation errors. These algorithms do not address significantly issue on quality of speech as perceived by a human. This dissertation presents analysis and techniques that offer spectral refinements toward speech enhancement. We present an analytical framework of the effect of spectral estimate variance on the performance of speech enhancement. We use the variance quality factor (VQF) as a quantitative measure of estimated spectra. We show that reducing the spectral estimator VQF reduces significantly the VQF of the enhanced speech. The Autoregressive Multitaper (ARMT) spectral estimate is proposed as a low VQF spectral estimator for use in speech enhancement algorithms. An innovative method of incorporating a speech production model using multiband excitation is also presented as a technique to emphasize the harmonic components of the glottal speech input.
Title: Spectral refinement to speech enhancement.
133 views
68 downloads
Name(s): Charoenruengkit, Werayuth.
College of Engineering and Computer Science
Department of Computer and Electrical Engineering and Computer Science
Type of Resource: text
Genre: Electronic Thesis Or Dissertation
Date Issued: 2009
Publisher: Florida Atlantic University
Physical Form: electronic
Extent: xiv, 124 p. : ill. (some col.).
Language(s): English
Summary: The goal of a speech enhancement algorithm is to remove noise and recover the original signal with as little distortion and residual noise as possible. Most successful real-time algorithms thereof have done in the frequency domain where the frequency amplitude of clean speech is estimated per short-time frame of the noisy signal. The state of-the-art short-time spectral amplitude estimator algorithms estimate the clean spectral amplitude in terms of the power spectral density (PSD) function of the noisy signal. The PSD has to be computed from a large ensemble of signal realizations. However, in practice, it may only be estimated from a finite-length sample of a single realization of the signal. Estimation errors introduced by these limitations deviate the solution from the optimal. Various spectral estimation techniques, many with added spectral smoothing, have been investigated for decades to reduce the estimation errors. These algorithms do not address significantly issue on quality of speech as perceived by a human. This dissertation presents analysis and techniques that offer spectral refinements toward speech enhancement. We present an analytical framework of the effect of spectral estimate variance on the performance of speech enhancement. We use the variance quality factor (VQF) as a quantitative measure of estimated spectra. We show that reducing the spectral estimator VQF reduces significantly the VQF of the enhanced speech. The Autoregressive Multitaper (ARMT) spectral estimate is proposed as a low VQF spectral estimator for use in speech enhancement algorithms. An innovative method of incorporating a speech production model using multiband excitation is also presented as a technique to emphasize the harmonic components of the glottal speech input.
Summary: The preconditioning of the noisy estimates by exploiting other avenues of information, such as pitch estimation and the speech production model, effectively increases the localized narrow-band signal-to noise ratio (SNR) of the noisy signal, which is subsequently denoised by the amplitude gain. Combined with voicing structure enhancement, the ARMT spectral estimate delivers enhanced speech with sound clarity desirable to human listeners. The resulting improvements in enhanced speech are observed to be significant with both Objective and Subjective measurement.
Identifier: 318327310 (oclc), 186327 (digitool), FADT186327 (IID), fau:2875 (fedora)
Note(s): by Werayuth Charoenruengkit.
Vita.
Thesis (Ph.D.)--Florida Atlantic University, 2009.
Includes bibliography.
Electronic reproduction. Boca Raton, Fla., 2009. Mode of access: World Wide Web.
Subject(s): Adaptive signal processing -- Digital techniques
Spectral theory (Mathematics)
Noise control
Fuzzy algorithms
Speech processing systems -- Digital techniques
Persistent Link to This Record: http://purl.flvc.org/FAU/186327
Use and Reproduction: http://rightsstatements.org/vocab/InC/1.0/
Host Institution: FAU