JOURNAL ARTICLE

A Real-Time End-to-End Multilingual Speech Recognition Architecture

Javier Gónzalez-DomínguezDavid EustisIgnacio López MorenoAndrew SeniorFrançoise BeaufaysPedro J. Moreno

Year: 2014 Journal:   IEEE Journal of Selected Topics in Signal Processing Vol: 9 (4)Pages: 749-759   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Automatic speech recognition (ASR) systems are used daily by millions of people worldwide to dictate messages, control devices, initiate searches or to facilitate data input in small devices. The user experience in these scenarios depends on the quality of the speech transcriptions and on the responsiveness of the system. For multilingual users, a further obstacle to natural interaction is the monolingual character of many ASR systems, in which users are constrained to a single preset language. In this work, we present an end-to-end multi-language ASR architecture, developed and deployed at Google, that allows users to select arbitrary combinations of spoken languages. We leverage recent advances in language identification and a novel method of real-time language selection to achieve similar recognition accuracy and nearly-identical latency characteristics as a monolingual system.

Keywords:
Computer science Speech recognition Leverage (statistics) Latency (audio) Architecture Spoken language End user Low latency (capital markets) Obstacle Natural language processing Artificial intelligence World Wide Web Computer network Telecommunications

Metrics

51
Cited By
5.31
FWCI (Field Weighted Citation Impact)
35
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and dialogue systems
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.