Utkarsh SinghAkshay GuptaDipjyoti BisharadWasim Arif
Speech analysis for extracting attributes such as the speaker, gender, accent and like has been a field of great interest and has been widely studied. The paper presents a novel architecture for accent identification by using a cascade of two deep-learning architecture. We design and test our proposed architecture on common voice dataset. The architecture consists of a cascade of Convolutional Neural Network (CNN) and Convolutional Recurrent Neural Network (CRNN). It is trained on Mel-spectrogram of the audios. We consider five of the most popular English accents groups namely India, Australia, US, England, Canada in this study. The proposed model has an accuracy of 78.48% using CNN and 83.21% using CRNN.
Divy Mohan Rai and Ms. ShikhaGupta
Abhigya VermaIshita PaulPooja GeraAmar Kumar Mohapatra
Aaron SalazarRodrigo ArroyoNoel PérezDiego S. Benítez