JOURNAL ARTICLE

Adaptive stream reliability modeling based on local dispersion measures for audio visual speech recognition

Abstract

This paper proposes an adaptive stream reliability modeling technique for audio visual speech recognition (AVSR). As recognition conditions vary locally, we present two local measures - frame and window dispersions to depict the temporal discriminative powers and noise levels of both audio and visual streams. The dispersions are subsequently mapped to stream exponents according to the minimum classification error (MCE) criterion. Experiments on a connected-digits task show that our method consistently outperforms the popular discriminative training (DT) and grid search (GS) methods at various signal noise ratios (SNRs), improving for example word accuracy rate (WAR) from 94.7% to 96.4% at 28dB SNR.

Keywords:
Discriminative model Speech recognition Computer science Reliability (semiconductor) Pattern recognition (psychology) Noise (video) Artificial intelligence Word error rate Frame (networking) Image (mathematics)

Metrics

1
Cited By
0.32
FWCI (Field Weighted Citation Impact)
14
Refs
0.59
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Advanced Adaptive Filtering Techniques
Physical Sciences →  Engineering →  Computational Mechanics
© 2026 ScienceGate Book Chapters — All rights reserved.