JOURNAL ARTICLE

Comparing audio- and a-posteriori-probability-based stream confidence measures for audio-visual speech recognition

Abstract

During the fusion of audio and video information for speech recognition, the estimation of the reliability of the noise affected audio channel is crucial to get meaningful recognition results. In this paper we compare two types of reliability measures. One is the use of the statistics of the phoneme a-posteriori probabilities and the other is the analysis of the audio signal itself. We implemented the entropy and the dispersion of the probabilities and, from the audio-based criteria, the so called Voicing Index. To test the criteria a hybrid ANN/HMM audio-visual recognition system was used and 5 different types of noise at 12 SNR levels each were added to the audio signal. The best sigmoidal fit for each criterion between the fusion parameter and the value of the criterion over all noise types and SNR values was performed. The resulting individual errors and the corresponding averaged relative errors are given.

Keywords:
Speech recognition Computer science Audio signal Hidden Markov model Pattern recognition (psychology) Audio signal processing Artificial intelligence Entropy (arrow of time) A priori and a posteriori Speech coding

Metrics

6
Cited By
1.46
FWCI (Field Weighted Citation Impact)
7
Refs
0.80
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Blind Source Separation Techniques
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.