An automatic speaker-recognition method, using temporal variations of pitch in speech as a speaker-identifying characteristic, is described. The pitch data was obtained from 60 utterances, consisting of six repetitions of the same sentence, spoken by 10 speakers. The pitch data for each utterance was represented by a 20-dimensional vector in the Karhunen-Loève coordinate system. The 20-dimensional vectors representing the pitch contours were linearly transformed so that the ratio of interspeaker to intraspeaker variance in the transformed space was maximized. A reference vector was formed for each speaker by averaging the transformed vectors of that speaker. The recognition procedure was based on measuring the Euclidean distance between the test vector and the reference vectors in the transformed space; the speaker corresponding to the reference vector with the smallest distance was selected as the speaker of the test utterance. The percentage of correct identifications was found to be 97%. The results suggest that temporal variations of pitch could be used effectively for automatic speaker recognition.
J.C. de BruinJohan A. du Preez
Jianwei ZhuShuifa SunXiaoli LiuBangjun Lei
Makrem Ben JdiraImen JemâaKaïs Ouni