DS-GAU: Dual-sequences gated attention unit architecture for text-independent speaker verification

Tsung‐Han Tsai; Tran Dang Khoa

doi:10.1016/j.mlwa.2023.100469

ScienceGate Book Chapters

JOURNAL ARTICLE

DS-GAU: Dual-sequences gated attention unit architecture for text-independent speaker verification

Tsung‐Han Tsai Tran Dang Khoa

Year: 2023 Journal: Machine Learning with Applications Vol: 13 Pages: 100469-100469 Publisher: Elsevier BV

DOI: 10.1016/j.mlwa.2023.100469

Get Full-Text PDF Get Analytical Report

Abstract

Text-independent speaker verification provides people identified from their voice characteristics. In this paper, we propose a new method, Dual-Sequences Gate Attention Unit to improve the accuracy of a massive speaker verification system. Dual-Sequences Gate Attention Unit is based on the Gated Dual Attention Unit and the Gated Recurrent Unit. Two different inputs from the same source are the state pooling layer in the x-vector and the frame layer information in the x-vector. It is developed by applying the attention mechanism to the traditional Gated Recurrent Unit to enhance the learning ability of the x-vector system. The whole system follows the statistics pooling from each time-delay neural network layer of the x-vector baseline. It passes through the Dual-Sequences Gate Attention Unit layer to aggregate more information from the variant temporal context of input features while training at the frame level. We train our model on the Voxceleb2 and then evaluate the accuracy of Voxceleb1 and the Speakers in the Wild dataset for simulation. Finally, the system is compared with the x-vector, L-vector, and ETDNN-OPGRUs x-vector. There is an obvious improvement to our proposed method. Compared with the x-vector system, it shows that at least 17.5% on Voxceleb1 and 0.5% on Speakers in the Wild equal error rate improvement is achieved in the fusion system.

Keywords:

Computer science Pooling Context (archaeology) Dual (grammatical number) Speech recognition Unit vector Artificial intelligence Pattern recognition (psychology) Frame (networking) Mathematics

Metrics

Cited By

0.51

FWCI (Field Weighted Citation Impact)

Refs

0.64

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

DS-GAU: Dual-sequences gated attention unit architecture for text-independent speaker verification

Abstract

Metrics

Citation History

Topics

Related Documents

Ds-Gau: Dual-Sequences Gated Attention Unit Architecture for Text-Independent Speaker Verification

Text-Independent Speaker Verification with Dual Attention Network

Self-Attention Networks for Text-Independent Speaker Verification

Text-independent Speaker Verification

CNN with Phonetic Attention for Text-Independent Speaker Verification