Hybrid Neural Network with Cross- and Self-Module Attention Pooling for Text-Independent Speaker Verification

Jahangir Alam; Woo Hyun Kang; Abderrahim Fathan

doi:10.1109/icassp49357.2023.10096040

ScienceGate Book Chapters

JOURNAL ARTICLE

Hybrid Neural Network with Cross- and Self-Module Attention Pooling for Text-Independent Speaker Verification

Jahangir Alam Woo Hyun Kang Abderrahim Fathan

Year: 2023 Pages: 1-5

DOI: 10.1109/icassp49357.2023.10096040

Get Full-Text PDF Get Analytical Report

Abstract

Extraction of a speaker embedding vector plays an important role in deep learning-based speaker verification. In this contribution, to extract speaker discriminant utterance level embeddings, we propose a hybrid neural network that employs both cross- and self-module attention pooling mechanisms. More specifically, the proposed system incorporates a 2D-Convolution Neural Network (CNN)-based feature extraction module in cascade with a frame-level network, which is composed of a fully Time Delay Neural Network (TDNN) network and a TDNN-Long Short Term Memory (TDNN-LSTM) hybrid network in a parallel manner. The proposed system also employs a multi-level cross- and self-module attention pooling for aggregating the speaker information within an utterance-level context by capturing the complementarity between two parallelly connected modules. In order to evaluate the proposed system, we conduct a set of experiments on the Voxceleb corpus, and the proposed hybrid network is able to outperform the conventional approaches trained on the same dataset.

Keywords:

Computer science Pooling Time delay neural network Artificial neural network Artificial intelligence Speech recognition Feature extraction Pattern recognition (psychology) Convolutional neural network Hybrid neural network Neocognitron Speaker recognition

Metrics

Cited By

2.04

FWCI (Field Weighted Citation Impact)

Refs

0.85

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Hybrid Neural Network with Cross- and Self-Module Attention Pooling for Text-Independent Speaker Verification

Abstract

Metrics

Citation History

Topics

Related Documents

Text-Independent Speaker Verification with Dual Attention Network

On the Use of Cross- and Self-Module Attentive Statistics Pooling Techniques for Text-Independent Speaker Verification

Self-Attention Networks for Text-Independent Speaker Verification

Text independent speaker verification using modular neural network

On the Use of Cross-module Attention Statistics Pooling for Speaker Verification