Speaker adaptation of an acoustic-articulatory inversion model using cascaded Gaussian mixture regressions

Thomas Hueber; Gérard Bailly; Pierre Badin; Frédéric Elisei

doi:10.21437/interspeech.2013-631

ScienceGate Book Chapters

JOURNAL ARTICLE

Speaker adaptation of an acoustic-articulatory inversion model using cascaded Gaussian mixture regressions

Thomas Hueber Gérard Bailly Pierre Badin Frédéric Elisei

Year: 2013 Pages: 2753-2757

DOI: 10.21437/interspeech.2013-631

Get Full-Text PDF Get Analytical Report

Abstract

The article presents a method for adapting a GMM-based acoustic-articulatory inversion model trained on a reference speaker to another speaker. The goal is to estimate the articulatory trajectories in the geometrical space of a reference speaker from the speech audio signal of another speaker. This method is developed in the context of a system of visual biofeedback, aimed at pronunciation training. This system provides a speaker with visual information about his/her own articulation, via a 3D orofacial clone. In previous work, we proposed to use GMM-based voice conversion for speaker adaptation. Acoustic-articulatory mapping was achieved in 2 consecutive steps: 1) converting the spectral trajectories of the target speaker (i.e. the system user) into spectral trajectories of the reference speaker (voice conversion), and 2) estimating the most likely articulatory trajectories of the reference speaker from the converted spectral features (acoustic-articulatory inversion). In this work, we propose to combine these two steps into the same statistical mapping framework, by fusing multiple regressions based on trajectory GMM and maximum likelihood criterion (MLE). The proposed technique is compared to two standard speaker adaptation techniques based respectively on MAP and MLLR.

Keywords:

Computer science Speech recognition Mixture model Hidden Markov model Speaker recognition Speaker diarisation Formant Maximum a posteriori estimation Pronunciation Acoustic space Inversion (geology) Context (archaeology) Pattern recognition (psychology) Artificial intelligence Maximum likelihood Vowel Mathematics Acoustics

Metrics

Cited By

1.77

FWCI (Field Weighted Citation Impact)

Refs

0.85

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speaker adaptation of an acoustic-articulatory inversion model using cascaded Gaussian mixture regressions

Abstract

Metrics

Citation History

Topics

Related Documents

Speaker-Adaptive Acoustic-Articulatory Inversion Using Cascaded Gaussian Mixture Regression

Acoustic-to-articulatory inversion mapping with Gaussian mixture model

Extending the Cascaded Gaussian Mixture Regression Framework for Cross-Speaker Acoustic-Articulatory Mapping

Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion

On smoothing articulatory trajectories obtained from Gaussian mixture model based acoustic-to-articulatory inversion