Stereo-Based Stochastic Mapping for Robust Speech Recognition

Mohamed Afify; Xiaodong Cui; Yuqing Gao

doi:10.1109/tasl.2009.2018017

ScienceGate Book Chapters

JOURNAL ARTICLE

Stereo-Based Stochastic Mapping for Robust Speech Recognition

Mohamed Afify Xiaodong Cui Yuqing Gao

Year: 2009 Journal: IEEE Transactions on Audio Speech and Language Processing Vol: 17 (7)Pages: 1325-1334 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tasl.2009.2018017

Get Full-Text PDF Get Analytical Report

Abstract

We present a stochastic mapping technique for robust speech recognition that uses stereo data. The idea is based on constructing a Gaussian mixture model for the joint distribution of the clean and noisy features and using this distribution to predict the clean speech during testing. The proposed mapping is called stereo-based stochastic mapping (SSM). Two different estimators are considered. One is iterative and is based on the maximum a posteriori (MAP) criterion while the other uses the minimum mean square error (MMSE) criterion. The resulting estimators are effectively a mixture of linear transforms weighted by component posteriors, and the parameters of the linear transformations are derived from the joint distribution. Compared to the uncompensated result, the proposed method results in 45% relative improvement in word error rate (WER) for digit recognition in the car. In the same setting, SSM outperforms SPLICE and gives similar results to MMSE compensation of Huang A 66% relative improvement in word error rate (WER) is observed when applied in conjunction with multistyle training (MST) for large vocabulary English speech recognition in a real environment. Also, the combination of the proposed mapping with CMLLR leads to about 38% relative improvement in performance compared to CMLLR alone for real field data.

Keywords:

Estimator Maximum a posteriori estimation Minimum mean square error Computer science Word error rate Speech recognition A priori and a posteriori Artificial intelligence Pattern recognition (psychology) Gaussian Word (group theory) Joint probability distribution Mathematics Statistics Maximum likelihood

Metrics

Cited By

4.57

FWCI (Field Weighted Citation Impact)

Refs

0.96

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Stereo-Based Stochastic Mapping for Robust Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Stereo-Based Stochastic Mapping for Robust Speech Recognition

MMSE-based stereo feature stochastic mapping for noise robust speech recognition

Synthesized stereo-based stochastic mapping with data selection for robust speech recognition

Stereo-based stochastic mapping with discriminative training for noise robust speech recognition

N-best based stochastic mapping on stereo HMM for noise robust speech recognition