Jasper: An End-to-End Convolutional Neural Acoustic Model

Jason Li; Vitaly Lavrukhin; Boris Ginsburg; R. Bret Leary; Oleksii Kuchaiev; Jonathan Cohen; Huyen Nguyen; Ravi Teja Gadde

doi:10.21437/interspeech.2019-1819

ScienceGate Book Chapters

JOURNAL ARTICLE

Jasper: An End-to-End Convolutional Neural Acoustic Model

Jason Li Vitaly Lavrukhin Boris Ginsburg R. Bret Leary Oleksii Kuchaiev Jonathan Cohen Huyen Nguyen Ravi Teja Gadde

Year: 2019

DOI: 10.21437/interspeech.2019-1819

Get Full-Text PDF Get Analytical Report

Abstract

In this paper we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data.Our model, Jasper, uses only 1D convolutions, batch normalization, ReLU, dropout, and residual connections.To improve training, we further introduce a new layer-wise optimizer called NovoGrad.Through experiments, we demonstrate that the proposed deep architecture performs as well or better than more complex choices.Our deepest Jasper variant uses 54 convolutional layers.With this architecture, we achieve 2.95% WER using a beam-search decoder with an external neural language model and 3.86% WER with a greedy decoder on LibriSpeech test-clean.We also report competitive results on Wall Street Journal and the Hub5'00 conversational evaluation datasets.

Keywords:

Dropout (neural networks) End-to-end principle Computer science Normalization (sociology) Convolutional neural network Speech recognition Residual Language model Decoding methods Architecture Beam search Layer (electronics) Artificial intelligence Algorithm Machine learning Search algorithm Art

Metrics

213

Cited By

21.51

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Jasper: An End-to-End Convolutional Neural Acoustic Model

Abstract

Metrics

Citation History

Topics

Related Documents

End-to-End Acoustic Modeling Using Convolutional Neural Networks

An End-to-End Convolutional Neural Network Model for Autonomous Driving

Pyramid Residual Convolutional Neural Network based on an end-to-end model

End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition

End to End Leaf Detection Using Convolutional Neural Network