Deep neural network-based speaker embeddings for end-to-end speaker verification

David Snyder; Pegah Ghahremani; Daniel Povey; Daniel Garcia-Romero; Yishay Carmiel; Sanjeev Khudanpur

doi:10.1109/slt.2016.7846260

ScienceGate Book Chapters

JOURNAL ARTICLE

Deep neural network-based speaker embeddings for end-to-end speaker verification

David Snyder Pegah Ghahremani Daniel Povey Daniel Garcia-Romero Yishay Carmiel Sanjeev Khudanpur

Year: 2016 Pages: 165-170

DOI: 10.1109/slt.2016.7846260

Get Full-Text PDF Get Analytical Report

Abstract

In this study, we investigate an end-to-end text-independent speaker verification system. The architecture consists of a deep neural network that takes a variable length speech segment and maps it to a speaker embedding. The objective function separates same-speaker and different-speaker pairs, and is reused during verification. Similar systems have recently shown promise for text-dependent verification, but we believe that this is unexplored for the text-independent task. We show that given a large number of training speakers, the proposed system outperforms an i-vector baseline in equal error-rate (EER) and at low miss rates. Relative to the baseline, the end-to-end system reduces EER by 13% average and 29% pooled across test conditions. The fused system achieves a reduction of 32% average and 38% pooled.

Keywords:

Speaker verification Computer science End-to-end principle Word error rate Speech recognition Speaker recognition Artificial neural network Task (project management) Speaker diarisation Embedding Baseline (sea) Artificial intelligence Pattern recognition (psychology)

Metrics

367

Cited By

43.97

FWCI (Field Weighted Citation Impact)

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Deep neural network-based speaker embeddings for end-to-end speaker verification

Abstract

Metrics

Citation History

Topics

Related Documents

Shortcut Connections Based Deep Speaker Embeddings for End-to-End Speaker Verification System

End-To-End Phonetic Neural Network Approach for Speaker Verification

Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings

Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios

Wav2sv: End-to-end Speaker Embeddings Learning from Raw Waveforms based on Metric Learning for Speaker Verification