HMM modeling for audio-visual speech recognition

Zhi Qi; M.N. Kaynak; K. Sengupta; Adrian David Cheok; C.C. Ko

doi:10.1109/icme.2001.1237674

ScienceGate Book Chapters

JOURNAL ARTICLE

HMM modeling for audio-visual speech recognition

Zhi Qi M.N. Kaynak K. Sengupta Adrian David Cheok C.C. Ko

Year: 2001 Pages: 136-139

DOI: 10.1109/icme.2001.1237674

Get Full-Text PDF Get Analytical Report

Abstract

Bimodal speech recognition is a robust technique for automated speech analysis, and has received a lot of attention in the last few decades. In this paper, we analyze the effect of the HMM models on the performance of the bimodal speech recognizer, present a comparative analysis of the different HMM models that can be used in bimodal speech recognition, and finally propose a novel model, which has been experimentally verified to perform better than others. One of the unique characteristic of our HMM model is the novel fusion strategy of the acoustic and the visual features, that takes into account the different sampling rates of these two signals. Compared to audio only, the bimodal speech recognition scheme has a much more improved recognition accuracy, especially in presence of noise.

Keywords:

Hidden Markov model Speech recognition Computer science Acoustic model Artificial intelligence Pattern recognition (psychology) Speaker recognition Speech processing

Metrics

Cited By

0.29

FWCI (Field Weighted Citation Impact)

Refs

0.59

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Blind Source Separation Techniques

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

HMM modeling for audio-visual speech recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Audio-visual speech modeling for continuous speech recognition

Asynchrony modeling for audio-visual speech recognition

Audio-visual modeling for bimodal speech recognition

Audio visual speech recognition

Speech Recognition, Audio-Visual