Streaming End-to-end Speech Recognition for Mobile Devices

Yanzhang He; Tara N. Sainath; Rohit Prabhavalkar; Ian McGraw; Raziel Álvarez; Ding Zhao; David Rybach; Anjuli Kannan; Yonghui Wu; Ruoming Pang; Qiao Liang; Deepti Bhatia; Yuan Shangguan; Bo Li; Golan Pundak; Khe Chai Sim; Tom Bagby; Shuo-Yiin Chang; Kanishka Rao; Alexander Gruenstein

doi:10.1109/icassp.2019.8682336

ScienceGate Book Chapters

JOURNAL ARTICLE

Streaming End-to-end Speech Recognition for Mobile Devices

Get Full-Text PDF Get Analytical Report

Abstract

End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition. E2E models, however, present numerous challenges: In order to be truly useful, such models must decode speech utterances in a streaming fashion, in real time; they must be robust to the long tail of use cases; they must be able to leverage user-specific context (e.g., contact lists); and above all, they must be extremely accurate. In this work, we describe our efforts at building an E2E speech recog-nizer using a recurrent neural network transducer. In experimental evaluations, we find that the proposed approach can outperform a conventional CTC-based model in terms of both latency and accuracy in a number of evaluation categories.

Keywords:

Computer science End-to-end principle Leverage (statistics) Speech recognition Latency (audio) Artificial neural network Voice activity detection Mobile device Context (archaeology) Artificial intelligence Speech processing Telecommunications World Wide Web

Metrics

593

Cited By

69.44

FWCI (Field Weighted Citation Impact)

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Streaming End-to-end Speech Recognition for Mobile Devices

Abstract

Metrics

Citation History

Topics

Related Documents

Streaming End-to-End Multi-Talker Speech Recognition

Transformer Model Compression for End-to-End Speech Recognition on Mobile Devices

Decoder-only Architecture for Streaming End-to-end Speech Recognition

WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit

Transfer Learning Approaches for Streaming End-to-End Speech Recognition System