JOURNAL ARTICLE

Text-to-audio-visual speech synthesis based on parameter generation from HMM

Abstract

We present a new self-organizing neural network which performs unsupervised learning and can be used for vector quantization. The main advantage over existing approaches, e.g., the Kohonen feature map, is the ability of the model to automatically find a suitable network structure and size. This is achieved through a controlled growth process which also includes occasional removal of units. The algorithm is evaluated on a database that includes 25 speakers each of them recorded in 12 diffrent sesions. The overall performance was 99.5%. That is, in 99.5% of the trials, the right speaker was correctly accepted and the impostor speaker correctly rejected.

Keywords:
Computer science Vector quantization Learning vector quantization Self-organizing map Artificial neural network Quantization (signal processing) Artificial intelligence Feature (linguistics) Speech recognition Pattern recognition (psychology) Process (computing) Speaker verification Feature vector Speaker recognition Unsupervised learning Computer vision

Metrics

3
Cited By
0.36
FWCI (Field Weighted Citation Impact)
6
Refs
0.53
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Neural Networks and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Coarticulation method for audio-visual text-to-speech synthesis

Eric Cosatto

Journal:   The Journal of the Acoustical Society of America Year: 2004 Vol: 116 (3)Pages: 1331-1331
JOURNAL ARTICLE

Emotion Audio-Visual Text-To-Speech

M. Abou ZliekhaNada GhneimS. Al-MoubayedOumayma Al Dakkak

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2006
JOURNAL ARTICLE

Humanoid Audio–Visual Avatar With Emotive Text-to-Speech Synthesis

Hao TangYun FuJilin TuMark Hasegawa‐JohnsonThomas S. Huang

Journal:   IEEE Transactions on Multimedia Year: 2008 Vol: 10 (6)Pages: 969-981
© 2026 ScienceGate Book Chapters — All rights reserved.