Improved Reference Speaker Weighting Using Aspect Model

Seong-Jun HAHM; Yuichi Ohkawa; Masashi Ito; Motoyuki Suzuki; Akinori Ito; Shozo Makino

doi:10.1587/transinf.e93.d.1927

ScienceGate Book Chapters

JOURNAL ARTICLE

Improved Reference Speaker Weighting Using Aspect Model

Seong-Jun HAHM Yuichi Ohkawa Masashi Ito Motoyuki Suzuki Akinori Ito Shozo Makino

Year: 2010 Journal: IEICE Transactions on Information and Systems Vol: E93-D (7)Pages: 1927-1935 Publisher: Institute of Electronics, Information and Communication Engineers

DOI: 10.1587/transinf.e93.d.1927

Get Full-Text PDF Get Analytical Report

Abstract

We propose an improved reference speaker weighting (RSW) and speaker cluster weighting (SCW) approach that uses an aspect model. The concept of the approach is that the adapted model is a linear combination of a few latent reference models obtained from a set of reference speakers. The aspect model has specific latent-space characteristics that differ from orthogonal basis vectors of eigenvoice. The aspect model is a “mixture-of-mixture” model. We first calculate a small number of latent reference models as mixtures of distributions of the reference speaker's models, and then the latent reference models are mixed to obtain the adapted distribution. The mixture weights are calculated based on the expectation maximization (EM) algorithm. We use the obtained mixture weights for interpolating mean parameters of the distributions. Both training and adaptation are performed based on likelihood maximization with respect to the training and adaptation data, respectively. We conduct a continuous speech recognition experiment using a Korean database (KAIST-TRADE). The results are compared to those of a conventional MAP, MLLR, RSW, eigenvoice and SCW. Absolute word accuracy improvement of 2.06 point was achieved using the proposed method, even though we use only 0.3 s of adaptation data.

Keywords:

Weighting Computer science Mixture model Expectation–maximization algorithm Latent variable Pattern recognition (psychology) Reference model Set (abstract data type) Reference data Speech recognition Adaptation (eye) Artificial intelligence Maximum likelihood Statistics Data mining Mathematics

Metrics

Cited By

0.40

FWCI (Field Weighted Citation Impact)

Refs

0.71

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Improved Reference Speaker Weighting Using Aspect Model

Abstract

Metrics

Citation History

Topics

Related Documents

Aspect-model-based reference speaker weighting

Unsupervised Speaker Adaptation Using Reference Speaker Weighting

Multigrained Model Adaptation With Map and Reference Speaker Weighting For Text Independent Speaker Verification

Improved speaker adaptation using multiple reference speakers

Improved speaker verification with discrimination power weighting