Audio-Visual Speech Fusion Using Coupled Hidden Markov Models

Stephen M. Chu; Thomas S. Huang

doi:10.1109/cvpr.2007.383524

ScienceGate Book Chapters

JOURNAL ARTICLE

Audio-Visual Speech Fusion Using Coupled Hidden Markov Models

Stephen M. Chu Thomas S. Huang

Year: 2007 Pages: 1-2

DOI: 10.1109/cvpr.2007.383524

Get Full-Text PDF Get Analytical Report

Abstract

The fusion of audio and visual speech is an instance of the general sensory fusion problem. The sensory fusion problem arises in the situation when multiple channels carry complementary information about different components of a system. In the case of audio-visual speech, the two modalities manifest two aspects of the same underlying speech production process. From an observer's view, the audio channel and the visual channel represent two interacting stochastic processes. We seek a framework that can model the two individual processes as well as their dynamic interactions. One interesting aspect of audio-visual speech is the inherent asynchrony between the audio and visual channels. Most early integration approaches to the fusion problem assume tight synchrony between the two. However, studies have shown that human perception of bimodal speech does not require rigid synchronization of the two modalities. Furthermore, humans appear to use the audio-visual asynchronies as multimodal features. For example, it is well known that the voice onset time is an important cue to the voicing feature in stop consonants. This information can be conveyed bimodally by the interval between seeing the stop release and hearing the vocal cord vibration. Therefore, a successful fusion scheme should not only be tolerant to asynchrony between the audio and visual cues, but also be apt to capture and exploit this bimodal feature.

Keywords:

Computer science Speech recognition Asynchrony (computer programming) Speech processing Perception Process (computing) Hidden Markov model Feature (linguistics) Artificial intelligence Asynchronous communication Psychology

Metrics

Cited By

0.31

FWCI (Field Weighted Citation Impact)

Refs

0.57

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Multisensory perception and integration

Social Sciences → Psychology → Experimental and Cognitive Psychology

Hearing Loss and Rehabilitation

Life Sciences → Neuroscience → Cognitive Neuroscience

Audio-Visual Speech Fusion Using Coupled Hidden Markov Models

Abstract

Metrics

Citation History

Topics

Related Documents

Audio-visual speech modeling using coupled hidden Markov models

Audio-visual speech modeling using coupled hidden Markov models

Audio-visual speaker identification using coupled hidden Markov models

Audio–visual sports highlights extraction using Coupled Hidden Markov Models

Audio-visual sports highlights extraction using Coupled Hidden Markov Models