F. MueenAftab AhmedSanaullah SanaullahA. Gaba
We report on the application of RNN (recurrent neural net) in an open-set text-dependent speaker identification task. MFCC (Mel-frequency cepstral coefficient) features from the speech utterance are fed to a neural-network-based classifier to identify the speakers. We use a feedforward net architecture as proposed by A.J. Robinson (IEEE Trans. on Neural Networks, vol.5, no.2, 1994). We introduce a fully connected hidden layer between the input and state nodes and the output. We show that this hidden layer makes the learning of complex classification tasks more efficient. Training uses backpropagation through time. There is one output unit per speaker, with the training targets corresponding to speaker identity. For 10 male speakers, we obtain a true acceptance rate of 100% with a false acceptance rate of 10%. For 14 speakers these figures are 94% and 12% respectively. We also investigate the effect of environmental factors on the identification accuracy (signal level, change of microphone), choice of acoustic vectors (FFT or MFCC), size of the training database, inclusion of fundamental frequency. MFCC features plus fundamental frequency give the best results.
Nestor A. Garcia FragosoTetyana BaydykErnst Kussul
Kharibam Jilenkumari DeviKhelchandra Thongam
Bhargab MedhiProf. P.H. Talukdar