Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings

Pengcheng Zhu; Lei Xie; Yunlin Chen

doi:10.21437/interspeech.2015-493

ScienceGate Book Chapters

JOURNAL ARTICLE

Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings

Pengcheng Zhu Lei Xie Yunlin Chen

Year: 2015 Pages: 2192-2196

DOI: 10.21437/interspeech.2015-493

Get Full-Text PDF Get Analytical Report

Abstract

Automatic prediction of articulatory movements from speech or text can be beneficial for many applications such as speech recognition and synthesis. A recent approach has reported stateof-the-art performance in speech-to-articulatory prediction using feed forward neural networks. In this paper, we investigate the feasibility of using bidirectional long short-term memory based recurrent neural networks (BLSTM-RNNs) in articulatory movement prediction because they have long-context trajectory modeling ability. We show on the MNGU0 dataset that BLSTM-RNN apparently outperforms feed forward networks and pushes the state-of-the-art RMSE from 0.885 mm to 0.565 mm. On the other hand, predicting articulatory information from text heavily relies on handcrafted linguistic and prosodic features, e.g., POS and TOBI labels. In this paper, we propose to use word and phone embeddings to substitute these manual features. Word/phone embedding features are automatically learned from unlabeled text data by a neural network language model. We show that word and phone embeddings can achieve comparable performance without using POS and TOBI features. More promisingly, combining the conventional full feature set with phone embedding, the lowest RMSE is achieved.

Keywords:

Computer science Speech recognition Recurrent neural network Context (archaeology) Embedding Word (group theory) Artificial neural network Feature (linguistics) Phone Artificial intelligence Term (time) Word embedding Natural language processing Mathematics

Metrics

Cited By

4.71

FWCI (Field Weighted Citation Impact)

Refs

0.97

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Phonetics and Phonology Research

Social Sciences → Psychology → Experimental and Cognitive Psychology

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings

Abstract

Metrics

Citation History

Topics

Related Documents

Multimodal Affective Dimension Prediction Using Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks

Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks

Twitter Bot Detection Using Bidirectional Long Short-Term Memory Neural Networks and Word Embeddings

Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks

Multimodal dimensional affect recognition using deep bidirectional long short-term memory recurrent neural networks