JOURNAL ARTICLE

Improved Reference Speaker Weighting Using Aspect Model

Seong-Jun HAHMYuichi OhkawaMasashi ItoMotoyuki SuzukiAkinori ItoShozo Makino

Year: 2010 Journal:   IEICE Transactions on Information and Systems Vol: E93-D (7)Pages: 1927-1935   Publisher: Institute of Electronics, Information and Communication Engineers

Abstract

We propose an improved reference speaker weighting (RSW) and speaker cluster weighting (SCW) approach that uses an aspect model. The concept of the approach is that the adapted model is a linear combination of a few latent reference models obtained from a set of reference speakers. The aspect model has specific latent-space characteristics that differ from orthogonal basis vectors of eigenvoice. The aspect model is a “mixture-of-mixture” model. We first calculate a small number of latent reference models as mixtures of distributions of the reference speaker's models, and then the latent reference models are mixed to obtain the adapted distribution. The mixture weights are calculated based on the expectation maximization (EM) algorithm. We use the obtained mixture weights for interpolating mean parameters of the distributions. Both training and adaptation are performed based on likelihood maximization with respect to the training and adaptation data, respectively. We conduct a continuous speech recognition experiment using a Korean database (KAIST-TRADE). The results are compared to those of a conventional MAP, MLLR, RSW, eigenvoice and SCW. Absolute word accuracy improvement of 2.06 point was achieved using the proposed method, even though we use only 0.3 s of adaptation data.

Keywords:
Weighting Computer science Mixture model Expectation–maximization algorithm Latent variable Pattern recognition (psychology) Reference model Set (abstract data type) Reference data Speech recognition Adaptation (eye) Artificial intelligence Maximum likelihood Statistics Data mining Mathematics

Metrics

1
Cited By
0.40
FWCI (Field Weighted Citation Impact)
18
Refs
0.71
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.