JOURNAL ARTICLE

End-To-End Phonetic Neural Network Approach for Speaker Verification

Abstract

In this work, we have developed an end-to-end approach for text dependent speaker verification task. With this method, phonetic labels are fused with spectral features, and used to train a neural network for same/different speaker decision. The data used for tests is obtained from a real call center integrated voice response system. It consists of audio taken from calls made by people at different times in which they utter a specific, short sentence in Turkish. Contribution of in-domain data with target sentence and free format human-human call data for model training is investigated. For the inclusion of phonetic information in modelling three different methods are applied which are phoneme boundary, utterance boundary and phoneme boundary group. Test results show that, we attain an equal error rate of 10.7% for speaker verification on given dataset.

Keywords:
Computer science Speech recognition Utterance Sentence Artificial neural network Task (project management) Speaker verification Boundary (topology) Speaker recognition Artificial intelligence Word error rate End-to-end principle Domain (mathematical analysis) Natural language processing

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
10
Refs
0.21
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.