Robust Audio-Visual Speech Recognition Under Noisy Audio-Video Conditions

Darryl Stewart; Rowan Seymour; Adrian Pass; Ji Ming

doi:10.1109/tcyb.2013.2250954

ScienceGate Book Chapters

JOURNAL ARTICLE

Robust Audio-Visual Speech Recognition Under Noisy Audio-Video Conditions

Darryl Stewart Rowan Seymour Adrian Pass Ji Ming

Year: 2013 Journal: IEEE Transactions on Cybernetics Vol: 44 (2)Pages: 175-184 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tcyb.2013.2250954

Get Full-Text PDF Get Analytical Report

Abstract

This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.

Keywords:

Computer science Speech recognition Weighting Noise (video) Frame (networking) Voice activity detection Artificial intelligence Pattern recognition (psychology) Speech processing Image (mathematics) Telecommunications

Metrics

Cited By

4.14

FWCI (Field Weighted Citation Impact)

Refs

0.95

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Advanced Adaptive Filtering Techniques

Physical Sciences → Engineering → Computational Mechanics

Robust Audio-Visual Speech Recognition Under Noisy Audio-Video Conditions

Abstract

Metrics

Citation History

Topics

Related Documents

Audio-visual speech recognition in noisy audio environments

Robust Audio-Visual Speech Recognition in Noisy Clinical Environments

Using a High-Speed Video Camera for Robust Audio-Visual Speech Recognition in Acoustically Noisy Conditions

Audio visual speech recognition in noisy visual environments

Towards Robust Audio-Visual Speech Recognition