Abstract

In recent times, automatic speech recognition (ASR) has seen many developments, but background noise persists to be one of its major hurdles. When speech corrupted with background noise, also termed as noisy speech, is the input for an automatic speech recognition (ASR), the predicted text is not indicative of the actual words spoken, implying a low accuracy. This paper proposes a deep neural network (DNN) based noise robust ASR system which overcomes this problem by employing a non-negative matrix factorization (NMF) based speech enhancement model prior to speech recognition. The aim is to provide enhanced speech as input to the ASR model since speech enhancement improves the intelligibility of noisy speech by eliminating the background noise. The proposed ASR system is evaluated with speech corrupted with distinct noises, namely airport, babble, street and white noises. The proposed model has been shown to improve the accuracy of the predicted text in noisy environments, thereby minimizing the word error rate (WER) of the ASR system. The proposed model is noise robust in the signal-to-noise ratio (SNR) range [-15,5] dB.

Keywords:
Speech recognition Computer science Intelligibility (philosophy) Word error rate Speech enhancement Noise (video) Noise measurement Non-negative matrix factorization Voice activity detection Background noise Acoustic model Speech processing Word recognition Robustness (evolution) Artificial intelligence Matrix decomposition Noise reduction

Metrics

1
Cited By
0.27
FWCI (Field Weighted Citation Impact)
28
Refs
0.49
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.