Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders

Shigeki Karita; Shinji Watanabe; Tomoharu Iwata; Marc Delcroix; Atsunori Ogawa; Tomohiro Nakatani

doi:10.1109/icassp.2019.8682890

ScienceGate Book Chapters

JOURNAL ARTICLE

Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders

Shigeki Karita Shinji Watanabe Tomoharu Iwata Marc Delcroix Atsunori Ogawa Tomohiro Nakatani

Year: 2019 Pages: 6166-6170

DOI: 10.1109/icassp.2019.8682890

Get Full-Text PDF Get Analytical Report

Abstract

We introduce speech and text autoencoders that share encoders and decoders with an automatic speech recognition (ASR) model to improve ASR performance with large speech only and text only training datasets. To build the speech and text autoencoders, we leverage state-of-the-art ASR and text-to-speech (TTS) encoder decoder architectures. These autoencoders learn features from speech only and text only datasets by switching the encoders and decoders used in the ASR and TTS models. Simultaneously, they aim to encode features to be compatible with ASR and TTS models by a multi-task loss. Additionally, we anticipate that TTS joint training can also improve the ASR performance because both ASR and TTS models learn transformations between speech and text. The experimental result we obtained with our semi-supervised end-to-end ASR/TTS training revealed reductions from a model initially trained with a small paired subset of the LibriSpeech corpus in the character error rate from 10.4% to 8.4% and word error rate from 20.6% to 18.0% by retraining the model with a large unpaired subset of the corpus.

Keywords:

Computer science Speech recognition Encoder Leverage (statistics) Word error rate Artificial intelligence End-to-end principle Natural language processing

Metrics

Cited By

5.07

FWCI (Field Weighted Citation Impact)

Refs

0.96

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders

Abstract

Metrics

Citation History

Topics

Related Documents

Semi-Supervised End-to-End Speech Recognition

Semi-Supervised End-to-End Speech-to-Text Translation with Joint Text-to-Text and Speech-to-Text Decoding

Improving End-to-End Bangla Speech Recognition with Semi-supervised Training

Semi-Supervised end-to-end Speech Recognition via Local Prior Matching

Semi-supervised domain adaptation using unlabeled data for end-to-end speech recognition