Multi-Stream Asynchrony Modeling for Audio Visual Speech Recognition

Guoyun Lv; Yangyu Fan; Dongmei Jiang; Rongchun Zhao

doi:10.5772/6373

ScienceGate Book Chapters

BOOK-CHAPTER

Multi-Stream Asynchrony Modeling for Audio Visual Speech Recognition

Guoyun Lv Yangyu Fan Dongmei Jiang Rongchun Zhao

Year: 2008 InTech eBooks

DOI: 10.5772/6373

Get Full-Text PDF Get Analytical Report

Abstract

In this paper, two multi-stream asynchrony Dynamic Bayesian Network (DBN) model: MSADBN model and MM-ADBN model, are proposed for small vocabulary and large vocabulary audio-visual speech recognition, which loose the limitation of asynchrony of the audio stream and visual stream to word level. Essentially, MS-ADBN model is a word model with word-phone-observation topology structure, whose recognition basic units are word, while MM-ADBN model is phone model with word-phone-state-observation topology structure, whose recognition basic units are phones. Speech recognition experiments are done on digit audio-vidio database and continuous audio-vidio database, results show that: MS-ADBN model has the highest recognition rate on digit audio-visual database; while for continuous audio-visual database, in clean speech environment, comparing with SA-MSHMM and MS-ADBN model, the improvements of 35.91% and 9.97% are obtained for MM-ADBN model in speech recognition rate. In the future work, we

Keywords:

Asynchrony (computer programming) Speech recognition Audio visual Computer science Multimedia Telecommunications Asynchronous communication

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.44

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Multi-Stream Asynchrony Modeling for Audio Visual Speech Recognition

Abstract

Metrics

Topics

Related Documents

Multi-stream Asynchrony Modeling for Audio-Visual Speech Recognition

Asynchrony modeling for audio-visual speech recognition

Multi-Stream Asynchrony Dynamic Bayesian Network Model for Audio-Visual Continuous Speech Recognition

Overcoming asynchrony in Audio-Visual Speech Recognition

Dynamic Stream Weight Modeling for Audio-Visual Speech Recognition