Sequence Noise Injected Training for End-to-end Speech Recognition

George Saon; Zoltán Tüske; Kartik Audhkhasi; Brian Kingsbury

doi:10.1109/icassp.2019.8683706

ScienceGate Book Chapters

JOURNAL ARTICLE

Sequence Noise Injected Training for End-to-end Speech Recognition

George Saon Zoltán Tüske Kartik Audhkhasi Brian Kingsbury

Year: 2019 Pages: 6261-6265

DOI: 10.1109/icassp.2019.8683706

Get Full-Text PDF Get Analytical Report

Abstract

We present a simple noise injection algorithm for training end-to-end ASR models which consists in adding to the spectra of training utterances the scaled spectra of random utterances of comparable length. We conjecture that the sequence information of the "noise" utterances is important and verify this via a contrast experiment where the frames of the utterances to be added are randomly shuffled. Experiments for both CTC and attention-based models show that the pro-posed scheme results in up to 9% relative word error rate improvements (depending on the model and test set) on the Switchboard 300 hours English conversational telephony database. Additionally, we set a new benchmark for attention-based encoder-decoder models on this corpus.

Keywords:

Computer science Speech recognition Noise (video) Telephony Encoder Benchmark (surveying) Sequence (biology) Set (abstract data type) Word error rate Word (group theory) Training set Test set End-to-end principle Decoding methods Artificial intelligence Encoding (memory) Pattern recognition (psychology) Algorithm Mathematics Telecommunications

Metrics

Cited By

4.30

FWCI (Field Weighted Citation Impact)

Refs

0.95

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Sequence Noise Injected Training for End-to-end Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

End-to-End Speech Recognition Sequence Training With Reinforcement Learning

Self-Training for End-to-End Speech Recognition

Minimum latency training of sequence transducers for streaming end-to-end speech recognition

Sequence-Level Consistency Training for Semi-Supervised End-to-End Automatic Speech Recognition

End to End Speech Recognition Error Prediction with Sequence to Sequence Learning