Throat Microphones (TM) are robust to highly non-stationary and adverse noisy conditions. But TM speech is narrow-band as it lacks high frequency and has reduced intelligibility compared to clean acoustic speech. This paper proposes a Residual Convolutional Neural Network (RCNN) to recognize throat microphone speech in the Hindi language. The paper investigates the performance of RCNN compared to Convolutional Neural Network. The feature used to develop the model is the log of spectral power density. The model maps input TM features as an image to the target word. Thus, the speech recognition problem is formulated as an image classification problem. The paper shows that the proposed RCNN model achieved an accuracy of 69.07% for males and 43.94% for females. It outperforms the CNN model as it has an accuracy of 42.38% for males and 24.24% for females.
Shubham BhartiPushparaj Mani Pathak
Amritha VijayanBipil Mary MathaiKarthik ValsalanRiyanka Raji JohnsonLani Rachel MathewK. Gopakumar
Raj KumarManoj TripathyR. S. AnandNiraj Kumar
Harinder Singh MashianaAbhishek SalariaKamaldeep Kaur