Dynamic Bayesian Networks for Audio-Visual Speech Recognition

Ara Nefian; Luhong Liang; Xiaobo Pi; Xiaoxing Liu; Kevin J. Murphy

doi:10.1155/s1110865702206083

ScienceGate Book Chapters

JOURNAL ARTICLE

Dynamic Bayesian Networks for Audio-Visual Speech Recognition

Ara Nefian Luhong Liang Xiaobo Pi Xiaoxing Liu Kevin J. Murphy

Year: 2002 Journal: EURASIP Journal on Advances in Signal Processing Vol: 2002 (11) Publisher: Springer Science+Business Media

DOI: 10.1155/s1110865702206083

Get Full-Text PDF Get Analytical Report

Abstract

The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speech generation mechanism, which is essentially bimodal in audio and visual representation, and by the need for features that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements in environments affected by acoustic noise. In this paper, we describe the use of two statistical models for audio-visual integration, the coupled HMM (CHMM) and the factorial HMM (FHMM), and compare the performance of these models with the existing models used in speaker dependent audio-visual isolated word recognition. The statistical properties of both the CHMM and FHMM allow to model the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. In our experiments, the CHMM performs best overall, outperforming all the existing models and the FHMM.

Keywords:

Speech recognition Computer science Hidden Markov model Statistical model Artificial intelligence Pattern recognition (psychology)

Metrics

317

Cited By

9.14

FWCI (Field Weighted Citation Impact)

Refs

0.98

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Blind Source Separation Techniques

Physical Sciences → Computer Science → Signal Processing

Dynamic Bayesian Networks for Audio-Visual Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Dynamic Bayesian Networks for Audio-Visual Speaker Recognition

A phone-viseme dynamic Bayesian network for audio-visual automatic speech recognition

Multi-Stream Asynchrony Dynamic Bayesian Network Model for Audio-Visual Continuous Speech Recognition

Dynamic Bayesian networks for automatic speech recognition

Audio-visual speaker detection using dynamic Bayesian networks