JOURNAL ARTICLE

Audio-Visual Speech Fusion Using Coupled Hidden Markov Models

Abstract

The fusion of audio and visual speech is an instance of the general sensory fusion problem. The sensory fusion problem arises in the situation when multiple channels carry complementary information about different components of a system. In the case of audio-visual speech, the two modalities manifest two aspects of the same underlying speech production process. From an observer's view, the audio channel and the visual channel represent two interacting stochastic processes. We seek a framework that can model the two individual processes as well as their dynamic interactions. One interesting aspect of audio-visual speech is the inherent asynchrony between the audio and visual channels. Most early integration approaches to the fusion problem assume tight synchrony between the two. However, studies have shown that human perception of bimodal speech does not require rigid synchronization of the two modalities. Furthermore, humans appear to use the audio-visual asynchronies as multimodal features. For example, it is well known that the voice onset time is an important cue to the voicing feature in stop consonants. This information can be conveyed bimodally by the interval between seeing the stop release and hearing the vocal cord vibration. Therefore, a successful fusion scheme should not only be tolerant to asynchrony between the audio and visual cues, but also be apt to capture and exploit this bimodal feature.

Keywords:
Computer science Speech recognition Asynchrony (computer programming) Speech processing Perception Process (computing) Hidden Markov model Feature (linguistics) Artificial intelligence Asynchronous communication Psychology

Metrics

3
Cited By
0.31
FWCI (Field Weighted Citation Impact)
7
Refs
0.57
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Multisensory perception and integration
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Hearing Loss and Rehabilitation
Life Sciences →  Neuroscience →  Cognitive Neuroscience

Related Documents

JOURNAL ARTICLE

Audio-visual speech modeling using coupled hidden Markov models

Stephen M. ChuThomas S. Huang

Journal:   IEEE International Conference on Acoustics Speech and Signal Processing Year: 2002 Pages: II-2009
JOURNAL ARTICLE

Audio-visual speech modeling using coupled hidden Markov models

ChuHuang

Journal:   IEEE International Conference on Acoustics Speech and Signal Processing Year: 2002 Pages: II-II
JOURNAL ARTICLE

Audio–visual sports highlights extraction using Coupled Hidden Markov Models

Ziyou Xiong

Journal:   Pattern Analysis and Applications Year: 2005 Vol: 8 (4)Pages: 392-392
JOURNAL ARTICLE

Audio-visual sports highlights extraction using Coupled Hidden Markov Models

Ziyou Xiong

Journal:   Pattern Analysis and Applications Year: 2005 Vol: 8 (1-2)Pages: 62-71
© 2026 ScienceGate Book Chapters — All rights reserved.