Frame-dependent multi-stream reliability indicators for audio-visual speech recognition

A. Garg; Gerasimos Potamianos; C. Neti; Thomas S. Huang

doi:10.1109/icme.2003.1221384

ScienceGate Book Chapters

JOURNAL ARTICLE

Frame-dependent multi-stream reliability indicators for audio-visual speech recognition

A. Garg Gerasimos Potamianos C. Neti Thomas S. Huang

Year: 2003 Vol: 3 Pages: III-605

DOI: 10.1109/icme.2003.1221384

Get Full-Text PDF Get Analytical Report

Abstract

We investigate the use of local, frame-dependent reliability indicators of the audio and visual modalities, as a means of estimating stream exponents of multi-stream hidden Markov models for audio-visual automatic speech recognition. We consider two such indicators at each modality, defined as functions of the speech-class conditional observation probabilities of appropriate audio or visual-only classifiers. We subsequently map the four reliability indicators into the stream exponents of a state-synchronous, two-stream hidden Markov model, as a sigmoid function of their linear combination. We propose two algorithms to estimate the sigmoid weights, based on the maximum conditional likelihood and minimum classification error criteria. We demonstrate the superiority of the proposed approach on a connected-digit audio-visual speech recognition task, under varying audio channel noise conditions. Indeed, the use of estimated, frame-dependent stream exponents results in a significantly smaller word error rare than using global stream exponents. In addition, it outperforms utterance-level exponents, even though the latter utilize a-priori knowledge of the utterance noise level.

Keywords:

Computer science Speech recognition Hidden Markov model Frame (networking) Pattern recognition (psychology) Artificial intelligence Noise (video) Utterance Image (mathematics)

Metrics

Cited By

0.29

FWCI (Field Weighted Citation Impact)

Refs

0.62

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Frame-dependent multi-stream reliability indicators for audio-visual speech recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Frame-dependent multi-stream reliability indicators for audio-visual speech recognition

Multi-stream Articulator Model with Adaptive Reliability Measure for Audio Visual Speech Recognition

Multi-stream Asynchrony Modeling for Audio-Visual Speech Recognition

Multi-Stream Asynchrony Modeling for Audio Visual Speech Recognition

DBN based multi-stream models for audio-visual speech recognition