Global–Local Self-Attention Based Transformer for Speaker Verification

Fei Xie; Dalong Zhang; Chengming Liu

doi:10.3390/app121910154

ScienceGate Book Chapters

JOURNAL ARTICLE

Global–Local Self-Attention Based Transformer for Speaker Verification

Fei Xie Dalong Zhang Chengming Liu

Year: 2022 Journal: Applied Sciences Vol: 12 (19)Pages: 10154-10154 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/app121910154

Get Full-Text PDF Get Analytical Report

Abstract

Transformer models are now widely used for speech processing tasks due to their powerful sequence modeling capabilities. Previous work determined an efficient way to model speaker embeddings using the Transformer model by combining transformers with convolutional networks. However, traditional global self-attention mechanisms lack the ability to capture local information. To alleviate these problems, we proposed a novel global–local self-attention mechanism. Instead of using local or global multi-head attention alone, this method performs local and global attention in parallel in two parallel groups to enhance local modeling and reduce computational cost. To better handle local location information, we introduced locally enhanced location encoding in the speaker verification task. The experimental results of the VoxCeleb1 test set and the VoxCeleb2 dev set demonstrated the improved effect of our proposed global–local self-attention mechanism. Compared with the Transformer-based Robust Embedding Extractor Baseline System, the proposed speaker Transformer network exhibited better performance in the speaker verification task.

Keywords:

Computer science Transformer Embedding Artificial intelligence Speech recognition Engineering Voltage

Metrics

Cited By

1.76

FWCI (Field Weighted Citation Impact)

Refs

0.83

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Global–Local Self-Attention Based Transformer for Speaker Verification

Abstract

Metrics

Citation History

Topics

Related Documents

Max-Pooling Based Self-Attention with Transformer for Speaker Verification

Local-Global Self-Attention for Transformer-Based Object Tracking

Local Information Modeling with Self-Attention for Speaker Verification

Speaker Verification with Disentangled Self-attention

Multi-View Self-Attention Based Transformer for Speaker Recognition