Comparing audio- and a-posteriori-probability-based stream confidence measures for audio-visual speech recognition

Martin Heckmann; Thorsten Wild; Frédéric Berthommier; Kristian Kroschel

doi:10.21437/eurospeech.2001-293

ScienceGate Book Chapters

JOURNAL ARTICLE

Comparing audio- and a-posteriori-probability-based stream confidence measures for audio-visual speech recognition

Martin Heckmann Thorsten Wild Frédéric Berthommier Kristian Kroschel

Year: 2001 Pages: 1023-1026

DOI: 10.21437/eurospeech.2001-293

Get Full-Text PDF Get Analytical Report

Abstract

During the fusion of audio and video information for speech recognition, the estimation of the reliability of the noise affected audio channel is crucial to get meaningful recognition results. In this paper we compare two types of reliability measures. One is the use of the statistics of the phoneme a-posteriori probabilities and the other is the analysis of the audio signal itself. We implemented the entropy and the dispersion of the probabilities and, from the audio-based criteria, the so called Voicing Index. To test the criteria a hybrid ANN/HMM audio-visual recognition system was used and 5 different types of noise at 12 SNR levels each were added to the audio signal. The best sigmoidal fit for each criterion between the fusion parameter and the value of the criterion over all noise types and SNR values was performed. The resulting individual errors and the corresponding averaged relative errors are given.

Keywords:

Speech recognition Computer science Audio signal Hidden Markov model Pattern recognition (psychology) Audio signal processing Artificial intelligence Entropy (arrow of time) A priori and a posteriori Speech coding

Metrics

Cited By

1.46

FWCI (Field Weighted Citation Impact)

Refs

0.80

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Blind Source Separation Techniques

Physical Sciences → Computer Science → Signal Processing

Comparing audio- and a-posteriori-probability-based stream confidence measures for audio-visual speech recognition

Abstract

Metrics

Topics

Related Documents

Stream confidence estimation for audio-visual speech recognition

DBN based multi-stream models for audio-visual speech recognition

Adaptive stream reliability modeling based on local dispersion measures for audio visual speech recognition

Multi-stream Confidence Analysis for Audio-Visual Affect Recognition

Optimizing Audio-Visual Speech Enhancement Using Multi-Level Distortion Measures for Audio-Visual Speech Recognition