Unsupervised Learning of Total Variability Embedding for Speaker Verification with Random Digit Strings

Woo Hyun Kang; Nam Soo Kim

doi:10.3390/app9081597

ScienceGate Book Chapters

JOURNAL ARTICLE

Unsupervised Learning of Total Variability Embedding for Speaker Verification with Random Digit Strings

Woo Hyun Kang Nam Soo Kim

Year: 2019 Journal: Applied Sciences Vol: 9 (8)Pages: 1597-1597 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/app9081597

Get Full-Text PDF Get Analytical Report

Abstract

Recently, the increasing demand for voice-based authentication systems has encouraged researchers to investigate methods for verifying users with short randomized pass-phrases with constrained vocabulary. The conventional i-vector framework, which has been proven to be a state-of-the-art utterance-level feature extraction technique for speaker verification, is not considered to be an optimal method for this task since it is known to suffer from severe performance degradation when dealing with short-duration speech utterances. More recent approaches that implement deep-learning techniques for embedding the speaker variability in a non-linear fashion have shown impressive performance in various speaker verification tasks. However, since most of these techniques are trained in a supervised manner, which requires speaker labels for the training data, it is difficult to use them when a scarce amount of labeled data is available for training. In this paper, we propose a novel technique for extracting an i-vector-like feature based on the variational autoencoder (VAE), which is trained in an unsupervised manner to obtain a latent variable representing the variability within a Gaussian mixture model (GMM) distribution. The proposed framework is compared with the conventional i-vector method using the TIDIGITS dataset. Experimental results showed that the proposed method could cope with the performance deterioration caused by the short duration. Furthermore, the performance of the proposed approach improved significantly when applied in conjunction with the conventional i-vector framework.

Keywords:

Computer science Artificial intelligence Speech recognition Utterance Pattern recognition (psychology) Autoencoder Mixture model Support vector machine Feature vector Machine learning Deep learning

Metrics

Cited By

0.77

FWCI (Field Weighted Citation Impact)

Refs

0.77

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Unsupervised Learning of Total Variability Embedding for Speaker Verification with Random Digit Strings

Abstract

Metrics

Citation History

Topics

Related Documents

Adversarially Learned Total Variability Embedding for Speaker Recognition with Random Digit Strings

Speaker-Phonetic I-Vector Modeling for Text-Dependent Speaker Verification with Random Digit Strings

Text-Dependent Speaker Recognition With Random Digit Strings

JFA for speaker recognition with random digit strings

Latent Factor Analysis of Deep Bottleneck Features for Speaker Verification with Random Digit Strings