JOURNAL ARTICLE

Noise-Robust Multimodal Audio-Visual Speech Recognition System for Speech-Based Interaction Applications

Sanghun JeonMun Sang Kim

Year: 2022 Journal:   Sensors Vol: 22 (20)Pages: 7738-7738   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

Speech is a commonly used interaction-recognition technique in edutainment-based systems and is a key technology for smooth educational learning and user–system interaction. However, its application to real environments is limited owing to the various noise disruptions in real environments. In this study, an audio and visual information-based multimode interaction system is proposed that enables virtual aquarium systems that use speech to interact to be robust to ambient noise. For audio-based speech recognition, a list of words recognized by a speech API is expressed as word vectors using a pretrained model. Meanwhile, vision-based speech recognition uses a composite end-to-end deep neural network. Subsequently, the vectors derived from the API and vision are classified after concatenation. The signal-to-noise ratio of the proposed system was determined based on data from four types of noise environments. Furthermore, it was tested for accuracy and efficiency against existing single-mode strategies for extracting visual features and audio speech recognition. Its average recognition rate was 91.42% when only speech was used, and improved by 6.7% to 98.12% when audio and visual information were combined. This method can be helpful in various real-world settings where speech recognition is regularly utilized, such as cafés, museums, music halls, and kiosks.

Keywords:
Speech recognition Computer science Audio mining Concatenation (mathematics) Noise (video) Voice activity detection Speech processing Artificial intelligence

Metrics

10
Cited By
1.95
FWCI (Field Weighted Citation Impact)
46
Refs
0.82
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Hearing Loss and Rehabilitation
Life Sciences →  Neuroscience →  Cognitive Neuroscience

Related Documents

JOURNAL ARTICLE

Noise-Robust Speech Recognition System based on Multimodal Audio-Visual Approach Using Different Deep Learning Classification Techniques

Eslam ElmaghrabyAmr M. GodyMohamed Hesham Farouk

Journal:   The Egyptian Journal of Language Engineering /The Egyptian Journal of Language Engineering Year: 2020 Vol: 7 (1)Pages: 27-42
JOURNAL ARTICLE

Noise robust speech recognition system using multimodal audio-visual approach using different deep learning classification techniques

Eslam E. El MaghrabyAmr M. Gody

Journal:   International Journal of Advanced Computer Research Year: 2020 Vol: 10 (47)Pages: 51-71
DISSERTATION

Towards Robust Audio-Visual Speech Recognition

Tofigh Naghibi

University:   Repository for Publications and Research Data (ETH Zurich) Year: 2015
© 2026 ScienceGate Book Chapters — All rights reserved.