On the Use of Cross-module Attention Statistics Pooling for Speaker Verification

Jahangir Alam; Abderrahim Fathan

doi:10.1109/iwbf57495.2023.10157564

ScienceGate Book Chapters

JOURNAL ARTICLE

On the Use of Cross-module Attention Statistics Pooling for Speaker Verification

Jahangir Alam Abderrahim Fathan

Year: 2023 Pages: 1-6

DOI: 10.1109/iwbf57495.2023.10157564

Get Full-Text PDF Get Analytical Report

Abstract

In deep learning-based speaker verification frameworks, extraction of a speaker embedding vector plays a key role. In this contribution, we propose a hybrid neural network that employs a cross-module attention pooling mechanism for the extraction of speaker discriminant utterance-level embeddings. In particular, the proposed system incorporates a 2D-Convolution Neural Network (CNN)-based feature extraction module in cascade with a frame-level network, which is composed of a fully Time Delay Neural Network (TDNN) network and a TDNN-Long Short Term Memory (TDNN-LSTM) hybrid network in a parallel manner. The proposed system also employs cross-module attention statistics pooling for aggregating the speaker information within an utterance-level context by capturing the complementarity between two parallelly connected modules. We conduct a set of experiments on the Voxceleb corpus for evaluating the performance of the proposed system and the proposed hybrid network is able to provide better results than the conventional approaches trained on the same dataset.

Keywords:

Computer science Pooling Artificial neural network Speech recognition Artificial intelligence Time delay neural network Feature extraction Pattern recognition (psychology) Convolutional neural network Speaker recognition

Metrics

Cited By

0.26

FWCI (Field Weighted Citation Impact)

Refs

0.54

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

On the Use of Cross-module Attention Statistics Pooling for Speaker Verification

Abstract

Metrics

Citation History

Topics

Related Documents

Hybrid Neural Network with Cross- and Self-Module Attention Pooling for Text-Independent Speaker Verification

On the Use of Cross- and Self-Module Attentive Statistics Pooling Techniques for Text-Independent Speaker Verification

External-Attentive Statistics Pooling for Text-Independent Speaker Verification

Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification

Max-Pooling Based Self-Attention with Transformer for Speaker Verification