Improved Training for Online End-to-end Speech Recognition Systems

Suyoun Kim; Michael L. Seltzer; Jinyu Li; Rui Zhao

doi:10.21437/interspeech.2018-2517

ScienceGate Book Chapters

JOURNAL ARTICLE

Improved Training for Online End-to-end Speech Recognition Systems

Suyoun Kim Michael L. Seltzer Jinyu Li Rui Zhao

Year: 2018 Pages: 2913-2917

DOI: 10.21437/interspeech.2018-2517

Get Full-Text PDF Get Analytical Report

Abstract

Achieving high accuracy with end-to-end speech recognizers requires careful parameter initialization prior to training.Otherwise, the networks may fail to find a good local optimum.This is particularly true for online networks, such as unidirectional LSTMs.Currently, the best strategy to train such systems is to bootstrap the training from a tied-triphone system.However, this is time consuming, and more importantly, is impossible for languages without a high-quality pronunciation lexicon.In this work, we propose an initialization strategy that uses teacher-student learning to transfer knowledge from a large, well-trained, offline end-to-end speech recognition model to an online end-to-end model, eliminating the need for a lexicon or any other linguistic resources.We also explore curriculum learning and label smoothing and show how they can be combined with the proposed teacher-student learning for further improvements.We evaluate our methods on a Microsoft Cortana personal assistant task and show that the proposed method results in a 19% relative improvement in word error rate compared to a randomly-initialized baseline system.

Keywords:

Computer science End-to-end principle Speech recognition Training (meteorology) Artificial intelligence

Metrics

Cited By

4.37

FWCI (Field Weighted Citation Impact)

Refs

0.94

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Improved Training for Online End-to-end Speech Recognition Systems

Abstract

Metrics

Citation History

Topics

Related Documents

Improved training of end-to-end attention models for speech recognition

Joint Training End-to-End Speech Recognition Systems with Speaker Attributes

Self-Training for End-to-End Speech Recognition

Improved Training Strategies for End-to-End Speech Recognition in Digital Voice Assistants

Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation