Orthogonal Training for Text-Independent Speaker Verification

Yingke Zhu; Brian Mak

doi:10.1109/icassp40776.2020.9053198

ScienceGate Book Chapters

JOURNAL ARTICLE

Orthogonal Training for Text-Independent Speaker Verification

Yingke Zhu Brian Mak

Year: 2020 Pages: 6584-6588

DOI: 10.1109/icassp40776.2020.9053198

Get Full-Text PDF Get Analytical Report

Abstract

In this paper we propose orthogonal training schemes to improve the effectiveness of cosine similarity measurements in text-independent speaker verification (SV) tasks. Compared to the PLDA backend, cosine similarity is simple to compute, and it does not require extra data or time to build a separate model. The use of cosine similarity measurement is also highly desirable for building end-to-end SV systems. However, the cosine similarity has an underlying assumption that the dimensions of the speaker embeddings are orthogonal, which is usually not satisfied in current SV systems. The training scheme applies singular vector decomposition (SVD) to the weight matrix of the speaker embedding extraction layer in our time delay neural network (TDNN)-based SV system, and replaces the original weight matrix by the matrix constructed from the left unitary matrix and the singular value matrix. Then the reconstructed matrix in the extraction layer is held constant and the remaining network is fine-tuned with an orthogonality regularizer. We further investigate orthogonal training from scratch, with orthogonality regularization incorporated throughout the network training. Experimental results show that our orthogonal training methods can significantly improve the system performance with a cosine similarity backend.

Keywords:

Orthogonality Cosine similarity Singular value decomposition Computer science Artificial neural network Trigonometric functions Matrix (chemical analysis) Similarity (geometry) Singular value Orthogonal matrix Algorithm Discrete cosine transform Speaker recognition Pattern recognition (psychology) Speech recognition Artificial intelligence Mathematics

Metrics

Cited By

0.44

FWCI (Field Weighted Citation Impact)

Refs

0.67

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Orthogonal Training for Text-Independent Speaker Verification

Abstract

Metrics

Citation History

Topics

Related Documents

Text-independent Speaker Verification

Maximum model distance discriminative training for text-independent speaker verification

English-Chinese bilingual text-independent speaker verification

Text-independent speaker identification and verification

Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification