Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Anjuli Kannan; Arindrima Datta; Tara N. Sainath; Eugene Weinstein; Bhuvana Ramabhadran; Yonghui Wu; Ankur Bapna; Zhifeng Chen; Seungji Lee

doi:10.21437/interspeech.2019-2858

JOURNAL ARTICLE

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Anjuli Kannan Arindrima Datta Tara N. Sainath Eugene Weinstein Bhuvana Ramabhadran Yonghui Wu Ankur Bapna Zhifeng Chen Seungji Lee

Year: 2019

DOI: 10.21437/interspeech.2019-2858

Get Full-Text PDF Get Analytical Report

Abstract

Multilingual end-to-end (E2E) models have shown great promise in expansion of automatic speech recognition (ASR) coverage of the world's languages.They have shown improvement over monolingual systems, and have simplified training and serving by eliminating language-specific acoustic, pronunciation, and language models.This work presents an E2E multilingual system which is equipped to operate in low-latency interactive applications, as well as handle a key challenge of real world data: the imbalance in training data across languages.Using nine Indic languages, we compare a variety of techniques, and find that a combination of conditioning on a language vector and training language-specific adapter layers produces the best model.The resulting E2E multilingual model achieves a lower word error rate (WER) than both monolingual E2E models (eight of nine languages) and monolingual conventional systems (all nine languages).

Keywords:

End-to-end principle Computer science Speech recognition Scale (ratio) End user Artificial intelligence World Wide Web Geography

Metrics

156

Cited By

13.21

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Abstract

Metrics

Citation History

Topics

Related Documents

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification

Multilingual Speech Recognition with a Single End-to-End Model

Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning

Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

End-to-End Multilingual Multi-Speaker Speech Recognition