Abstract

This paper introduces a novel language-independent speaker-recognition system based on differences in dynamic realization of phonetic features (i.e., pronunciation) between speakers rather than spectral differences in voice quality. The system exploits phonetic information from six languages to perform text independent speaker recognition. All experiments were performed on the NIST 2001 Speaker Recognition Evaluation Extended Data Task. Recognition results are provided for unigram, bigram, and trigram models. Performance for each of the three models is examined for phones from each individual language and the final multilanguage fused system. Additional fusion experiments demonstrate that speaker recognition capability is maintained even without phonetic information in the language of the speaker.

Keywords:
Computer science Speech recognition Speaker recognition Bigram Pronunciation Speaker diarisation Trigram NIST Artificial intelligence Realization (probability) Natural language processing Task (project management) Language model Linguistics

Metrics

32
Cited By
1.76
FWCI (Field Weighted Citation Impact)
9
Refs
0.87
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

The phonetic bases of speaker recognition

Francis Nolan

Journal:   Speech Communication Year: 1987 Vol: 6 (2)Pages: 171-175
JOURNAL ARTICLE

Speaker independent bimodal phonetic recognition experiments

Piero CosiEmanuela Magno CaldognettoFranco FerreroM. DugattoK. Vagges

Journal:   4th International Conference on Spoken Language Processing (ICSLP 1996) Year: 1996 Pages: 54-57
© 2026 ScienceGate Book Chapters — All rights reserved.