In recent times, automatic speech recognition (ASR) has seen many developments, but background noise persists to be one of its major hurdles. When speech corrupted with background noise, also termed as noisy speech, is the input for an automatic speech recognition (ASR), the predicted text is not indicative of the actual words spoken, implying a low accuracy. This paper proposes a deep neural network (DNN) based noise robust ASR system which overcomes this problem by employing a non-negative matrix factorization (NMF) based speech enhancement model prior to speech recognition. The aim is to provide enhanced speech as input to the ASR model since speech enhancement improves the intelligibility of noisy speech by eliminating the background noise. The proposed ASR system is evaluated with speech corrupted with distinct noises, namely airport, babble, street and white noises. The proposed model has been shown to improve the accuracy of the predicted text in noisy environments, thereby minimizing the word error rate (WER) of the ASR system. The proposed model is noise robust in the signal-to-noise ratio (SNR) range [-15,5] dB.
Cil Hardianto SatriawanDessi Puji Lestari
Thimmaraja Yadava GB. G. NagarajaH. S. Jayanna
Masahiro HamadaYumi TakizawaTakeshi Norimatsu