JOURNAL ARTICLE

Text-Independent Speaker Verification with Dual Attention Network

Abstract

This paper presents a novel design of attention model for textindependent speaker verification.The model takes a pair of input utterances and generates an utterance-level embedding to represent speaker-specific characteristics in each utterance.The input utterances are expected to have highly similar embeddings if they are from the same speaker.The proposed attention model consists of a self-attention module and a mutual attention module, which jointly contributes to the generation of the utterance-level embedding.The self-attention weights are computed from the utterance itself while the mutual-attention weights are computed with the involvement of the other utterance in the input pairs.As a result, each utterance is represented by a self-attention weighted embedding and a mutual-attention weighted embedding.The similarity between the embeddings is measured by a cosine distance score and a binary classifier output score.The whole model, named Dual Attention Network, is trained end-to-end on Voxceleb database.The evaluation results on Voxceleb 1 test set show that the Dual Attention Network significantly outperforms the baseline systems.The best result yields an equal error rate of 1.6%.

Keywords:
Computer science Dual (grammatical number) Speaker verification Speech recognition Natural language processing Artificial intelligence Speaker recognition Linguistics

Metrics

14
Cited By
1.91
FWCI (Field Weighted Citation Impact)
26
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.