JOURNAL ARTICLE

Product HMMs for audio-visual continuous speech recognition using facial animation parameters

Abstract

The use of visual information in addition to acoustic can improve automatic speech recognition. In this paper we compare different approaches for audio-visual information integration and show how they affect automatic speech recognition performance. We utilize facial animation parameters (FAPs), supported by the MPEG-4 standard for the visual representation as visual features. We use both single-stream and multi-stream hidden Markov models (HMM) to integrate audio and visual information. We performed both state and phone synchronous multi-stream integration. Product HMM topology is used to model the phone-synchronous integration. ASR experiments were performed under noisy audio conditions using a relatively large vocabulary (approximately 1000 words) audio-visual database. The proposed phone-synchronous system, which performed the best, reduces the word error rate (WER) by approximately 20% relatively to audio-only ASR (A-ASR) WERs, at various SNRs with additive white Gaussian noise.

Keywords:
Computer science Speech recognition Hidden Markov model Phone Word error rate Audio mining Vocabulary Computer facial animation Artificial intelligence Animation Visualization Acoustic model Computer animation Pattern recognition (psychology) Speech processing Computer graphics (images)

Metrics

12
Cited By
0.58
FWCI (Field Weighted Citation Impact)
12
Refs
0.69
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Automatic Facial Expression Recognition Using Facial Animation Parameters and Multistream HMMs

Petar AleksicAggelos K. Katsaggelos

Journal:   IEEE Transactions on Information Forensics and Security Year: 2006 Vol: 1 (1)Pages: 3-11
BOOK-CHAPTER

Continuous audio-visual speech recognition

Juergen LuettinStéphane Dupont

Lecture notes in computer science Year: 1998 Pages: 657-673
JOURNAL ARTICLE

Speech recognition using HMMs with quantized parameters

Marcel Vasilache

Year: 2000 Pages: vol. 1, 441-444
© 2026 ScienceGate Book Chapters — All rights reserved.