BOOK-CHAPTER

Multi-Stream Asynchrony Modeling for Audio Visual Speech Recognition

Abstract

In this paper, two multi-stream asynchrony Dynamic Bayesian Network (DBN) model: MSADBN model and MM-ADBN model, are proposed for small vocabulary and large vocabulary audio-visual speech recognition, which loose the limitation of asynchrony of the audio stream and visual stream to word level. Essentially, MS-ADBN model is a word model with word-phone-observation topology structure, whose recognition basic units are word, while MM-ADBN model is phone model with word-phone-state-observation topology structure, whose recognition basic units are phones. Speech recognition experiments are done on digit audio-vidio database and continuous audio-vidio database, results show that: MS-ADBN model has the highest recognition rate on digit audio-visual database; while for continuous audio-visual database, in clean speech environment, comparing with SA-MSHMM and MS-ADBN model, the improvements of 35.91% and 9.97% are obtained for MM-ADBN model in speech recognition rate. In the future work, we

Keywords:
Asynchrony (computer programming) Speech recognition Audio visual Computer science Multimedia Telecommunications Asynchronous communication

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
17
Refs
0.44
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.