JOURNAL ARTICLE

Kernel eigenvoice speaker adaptation

Brian MakJames T. KwokSimon Ho

Year: 2005 Journal:   IEEE Transactions on Speech and Audio Processing Vol: 13 (5)Pages: 984-992   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Eigenvoice-based methods have been shown to be effective for fast speaker adaptation when only a small amount of adaptation data, say, less than 10 s, is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that nonlinear PCA using kernel methods may be even more effective. The eigenvoices thus derived will be called kernel eigenvoices (KEV), and we will call our new adaptation method kernel eigenvoice speaker adaptation. However, unlike the standard eigenvoice (EV) method, an adapted speaker model found by the kernel eigenvoice method resides in the high-dimensional kernel-induced feature space, which, in general, cannot be mapped back to an exact preimage in the input speaker supervector space. Consequently, it is not clear how to obtain the constituent Gaussians of the adapted model that are needed for the computation of state observation likelihoods during the estimation of eigenvoice weights and subsequent decoding. Our solution is the use of composite kernels in such a way that state observation likelihoods can be computed using only kernel functions without the need of a speaker-adapted model in the input supervector space. In this paper, we investigate two different composite kernels for KEV adaptation: direct sum kernel and tensor product kernel. In an evaluation on the TIDIGITS task, it is found that KEV speaker adaptation using both forms of coma posite Gaussian kernels are equally effective, and they outperform a speaker-independent model and adapted models found by EV, MAP, or MLLR adaptation using 2.1 and 4.1 s of speech. For example, with 2.1 s of adaptation data, KEV adaptation outperforms the speaker-independent model by 27.5\\%, whereas EV, MAP, or MLLR adaptation are not effective at all.

Keywords:
Kernel (algebra) Kernel principal component analysis Speech recognition Computer science Pattern recognition (psychology) Artificial intelligence Speaker recognition Kernel method Mathematics Support vector machine

Metrics

45
Cited By
4.60
FWCI (Field Weighted Citation Impact)
33
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting

Brian MakRoger HsiaoSimon HoJames T. Kwok

Journal:   IEEE Transactions on Audio Speech and Language Processing Year: 2006 Vol: 14 (4)Pages: 1267-1280
JOURNAL ARTICLE

Evolutionary eigenvoice MLLR speaker adaptation

Reza SahraeianMehdi MohammadiAhmad AkbariAhmad Ayatollahi

Journal:   Procedia Computer Science Year: 2011 Vol: 3 Pages: 992-997
© 2026 ScienceGate Book Chapters — All rights reserved.