JOURNAL ARTICLE

Autoregressive Articulatory WaveNet Flow for Speaker-Independent Acoustic-to-Articulatory Inversion

Abstract

In this paper we introduce a new speaker independent method for Acoustic-to-Articulatory Inversion. The proposed architecture, Speaker Independent-Articulatory WaveNet (SI-AWN), models the relationship between acoustic and articulatory features by conditioning the articulatory trajectories on acoustic features and then utilizes the structure for unseen target speakers. We evaluate the proposed SI-AWN on the Electro Magnetic Articulography corpus of Mandarin Accented English (EMA-MAE), using the pool of acoustic-articulatory information from 35 reference speakers and testing on target speakers that include male, female, native and non-native speakers. The results suggest that SI-AWN improves the performance of the acoustic-to-articulatory inversion process compared to the baseline Maximum Likelihood Regression-Parallel Reference Speaker Weighting (MLLR-PRSW) method by 21 percent. To the best of our knowledge, this is the first application of a WaveNet-like synthesis approach to the problem of Speaker Independent Acoustic-to-Articulatory Inversion, and results are comparable to or better than the best currently published systems.

Keywords:
Speech recognition Computer science Inversion (geology) Autoregressive model Weighting Mandarin Chinese Acoustics Mathematics Linguistics Statistics Geology

Metrics

1
Cited By
0.14
FWCI (Field Weighted Citation Impact)
52
Refs
0.56
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Phonetics and Phonology Research
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
© 2026 ScienceGate Book Chapters — All rights reserved.