Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition

Xinyuan Zhou; Emre Yılmaz; Yanhua Long; Yijie Li; Haizhou Li

doi:10.21437/interspeech.2020-2488

ScienceGate Book Chapters

JOURNAL ARTICLE

Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition

Xinyuan Zhou Emre Yılmaz Yanhua Long Yijie Li Haizhou Li

Year: 2020 Pages: 1042-1046

DOI: 10.21437/interspeech.2020-2488

Get Full-Text PDF Get Analytical Report

Abstract

Code-switching (CS) occurs when a speaker alternates words of two or more languages within a single sentence or across sentences.Automatic speech recognition (ASR) of CS speech has to deal with two or more languages at the same time.In this study, we propose a Transformer-based architecture with two symmetric language-specific encoders to capture the individual language attributes, that improve the acoustic representation of each language.These representations are combined using a language-specific multi-head attention mechanism in the decoder module.Each encoder and its corresponding attention module in the decoder are pre-trained using a large monolingual corpus aiming to alleviate the impact of limited CS training data.We call such a network a multi-encoder-decoder (MED) architecture.Experiments on the SEAME corpus show that the proposed MED architecture achieves 10.2% and 10.8% relative error rate reduction on the CS evaluation sets with Mandarin and English as the matrix language respectively.

Keywords:

Computer science Encoder Transformer Speech recognition Code-switching Electrical engineering Engineering Voltage

Metrics

Cited By

3.52

FWCI (Field Weighted Citation Impact)

Refs

0.93

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Phonetics and Phonology Research

Social Sciences → Psychology → Experimental and Cognitive Psychology

Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer

Bi-Encoder Transformer Network for Mandarin-English Code-Switching Speech Recognition Using Mixture of Experts

Transformer with Enhanced Encoder and Monotonic Decoder for Automatic Speech Recognition

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for speech recognition

Cross-Language Code Mapping with Transformer Encoder-Decoder Model