This chapter proposed the method of anchor model-based speaker recognition in textindependent way with phonetic modeling. Since the method doesn't require model training for the target speaker, only about single utterance is needed for reference speech. In order to improve the recognition performance, phonetic modeling was used instead of Gaussian Mixture Model (GMM) scheme as anchor models. The proposed method was evaluated on Japanese speaker identification task. Compared with the performance of GMM-based system, significant improvement could be achieved. The identification rate of 94.21% could be obtained with 3-state 10-mixture HMMs in 30-speaker identification task. In the experiments, the average length of reference speech was only 5.5 sec. By comparison with the GMM-based system, the relative improvement of 62.9% was achieved. The results show that the phonetic modeling is effective for anchor model-based speaker recognition. We are now conducting the evaluation of the method on speaker verification task. We are also conducting the evaluation of speaker identification in noisy conditions. Some results in noisy conditions have been reported in (Goto et al., 2008). The merit of this method is that the system can detect speaker characteristics with a very short utterance as short as 5 sec. Then the method can be used in the tasks of speaker indexing or tracking.
M.A. KohlerW.D. AndrewsJoseph P. CampbellJ. Herndndez-Cordero
W.D. AndrewsM.A. KohlerJoseph P. Campbell
Shengyu YaoRuohua ZhouPengyuan Zhang
Jianbo MaVidhyasaharan SethuEliathamby AmbikairajahKong Aik Lee
Lara Lynn StollJoe FrankelNikki Mirghafori