Multilingual Speech Recognition with a Single End-to-End Model

Shubham Toshniwal; Tara N. Sainath; Ron J. Weiss; Bo Li; Pedro J. Moreno; Eugene Weinstein; Kanishka Rao

doi:10.1109/icassp.2018.8461972

ScienceGate Book Chapters

JOURNAL ARTICLE

Multilingual Speech Recognition with a Single End-to-End Model

Shubham Toshniwal Tara N. Sainath Ron J. Weiss Bo Li Pedro J. Moreno Eugene Weinstein Kanishka Rao

Year: 2018 Pages: 4904-4908

DOI: 10.1109/icassp.2018.8461972

Get Full-Text PDF Get Analytical Report

Abstract

Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we present a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their scripts. Specifically, we take a union of language-specific grapheme sets and train a grapheme-based sequence-to-sequence model jointly on data from all languages. We find that this model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous sequence-to-sequence models trained on each language individually. By modifying the model to accept a language identifier as an additional input feature, we further improve performance by an additional 7% relative and eliminate confusion between different languages.

Keywords:

Computer science Grapheme Pronunciation Natural language processing Lexicon Sequence (biology) Artificial intelligence Speech recognition Scripting language Language model Word (group theory) Phonotactics Cache language model Identifier Natural language Linguistics Programming language Universal Networking Language

Metrics

217

Cited By

25.22

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Multilingual Speech Recognition with a Single End-to-End Model

Abstract

Metrics

Citation History

Topics

Related Documents

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

End-to-End Multilingual Multi-Speaker Speech Recognition

Metric Learning Approach for End-to-End Multilingual Automatic Speech Recognition Model

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification

End-to-End Multilingual Speech Recognition System with Language Supervision Training