A Real-Time End-to-End Multilingual Speech Recognition Architecture

Javier Gónzalez-Domínguez; David Eustis; Ignacio López Moreno; Andrew Senior; Françoise Beaufays; Pedro J. Moreno

doi:10.1109/jstsp.2014.2364559

ScienceGate Book Chapters

JOURNAL ARTICLE

A Real-Time End-to-End Multilingual Speech Recognition Architecture

Javier Gónzalez-Domínguez David Eustis Ignacio López Moreno Andrew Senior Françoise Beaufays Pedro J. Moreno

Year: 2014 Journal: IEEE Journal of Selected Topics in Signal Processing Vol: 9 (4)Pages: 749-759 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/jstsp.2014.2364559

Get Full-Text PDF Get Analytical Report

Abstract

Automatic speech recognition (ASR) systems are used daily by millions of people worldwide to dictate messages, control devices, initiate searches or to facilitate data input in small devices. The user experience in these scenarios depends on the quality of the speech transcriptions and on the responsiveness of the system. For multilingual users, a further obstacle to natural interaction is the monolingual character of many ASR systems, in which users are constrained to a single preset language. In this work, we present an end-to-end multi-language ASR architecture, developed and deployed at Google, that allows users to select arbitrary combinations of spoken languages. We leverage recent advances in language identification and a novel method of real-time language selection to achieve similar recognition accuracy and nearly-identical latency characteristics as a monolingual system.

Keywords:

Computer science Speech recognition Leverage (statistics) Latency (audio) Architecture Spoken language End user Low latency (capital markets) Obstacle Natural language processing Artificial intelligence World Wide Web Computer network Telecommunications

Metrics

Cited By

5.31

FWCI (Field Weighted Citation Impact)

Refs

0.95

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

A Real-Time End-to-End Multilingual Speech Recognition Architecture

Abstract

Metrics

Citation History

Topics

Related Documents

End-to-End Multilingual Multi-Speaker Speech Recognition

Real-Time End-to-End Monaural Multi-Speaker Speech Recognition

Leveraging Language ID in Multilingual End-to-End Speech Recognition

Multilingual Speech Recognition with a Single End-to-End Model

Multilingual End-to-End Speech Translation