JOURNAL ARTICLE

Dynamic Bayesian Networks for Audio-Visual Speech Recognition

Ara NefianLuhong LiangXiaobo PiXiaoxing LiuKevin J. Murphy

Year: 2002 Journal:   EURASIP Journal on Advances in Signal Processing Vol: 2002 (11)   Publisher: Springer Science+Business Media

Abstract

The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speech generation mechanism, which is essentially bimodal in audio and visual representation, and by the need for features that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements in environments affected by acoustic noise. In this paper, we describe the use of two statistical models for audio-visual integration, the coupled HMM (CHMM) and the factorial HMM (FHMM), and compare the performance of these models with the existing models used in speaker dependent audio-visual isolated word recognition. The statistical properties of both the CHMM and FHMM allow to model the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. In our experiments, the CHMM performs best overall, outperforming all the existing models and the FHMM.

Keywords:
Speech recognition Computer science Hidden Markov model Statistical model Artificial intelligence Pattern recognition (psychology)

Metrics

317
Cited By
9.14
FWCI (Field Weighted Citation Impact)
41
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Blind Source Separation Techniques
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

BOOK-CHAPTER

Dynamic Bayesian Networks for Audio-Visual Speaker Recognition

Dongdong LiYingchun YangZhaohui Wu

Lecture notes in computer science Year: 2005 Pages: 539-545
JOURNAL ARTICLE

A phone-viseme dynamic Bayesian network for audio-visual automatic speech recognition

Louis H. TerryAggelos K. Katsaggelos

Journal:   Proceedings - International Conference on Pattern Recognition/Proceedings/International Conference on Pattern Recognition Year: 2008 Vol: 4 Pages: 1-4
JOURNAL ARTICLE

Dynamic Bayesian networks for automatic speech recognition

Murat Deviren

Journal:   National Conference on Artificial Intelligence Year: 2002 Pages: 981-981
© 2026 ScienceGate Book Chapters — All rights reserved.