End-to-End Language Diarization for Bilingual Code-Switching Speech

Hexin Liu; Leibny Paola Garcia; Xinyi Zhang; Justin Dauwels; Andy W. H. Khong; Sanjeev Khudanpur; Suzy J Styles

doi:10.21437/interspeech.2021-82

ScienceGate Book Chapters

JOURNAL ARTICLE

End-to-End Language Diarization for Bilingual Code-Switching Speech

Hexin Liu Leibny Paola Garcia Xinyi Zhang Justin Dauwels Andy W. H. Khong Sanjeev Khudanpur Suzy J Styles

Year: 2021

DOI: 10.21437/interspeech.2021-82

Get Full-Text PDF Get Analytical Report

Abstract

<p>We propose two end-to-end neural configurations for language diarization on bilingual code-switching speech. The first, a BLSTM-E2E architecture, includes a set of stacked bidirectional LSTMs to compute embeddings and incorporates the deep clustering loss to enforce grouping of languages belonging to the same class. The second, an XSA-E2E architecture, is based on an x-vector model followed by a self-attention encoder. The former encodes frame-level features into segmentlevel embeddings while the latter considers all those embeddings to generate a sequence of segment-level language labels. We evaluated the proposed methods on the dataset obtained from the shared task B in WSTCSMC 2020 and our handcrafted simulated data from the SEAME dataset. Experimental results show that our proposed XSA-E2E architecture achieved a relative improvement of 12.1% in equal error rate and a 7.4% relative improvement on accuracy compared with the baseline algorithm in the WSTCSMC 2020 dataset. Our proposed XSA-E2E architecture achieved an accuracy of 89.84% with a baseline of 85.60% on the simulated data derived from the SEAME dataset. </p>

Keywords:

Code-switching Computer science End-to-end principle Speech recognition Speaker diarisation Natural language processing Linguistics Artificial intelligence Speaker recognition

Metrics

Cited By

2.68

FWCI (Field Weighted Citation Impact)

Refs

0.91

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Phonetics and Phonology Research

Social Sciences → Psychology → Experimental and Cognitive Psychology

End-to-End Language Diarization for Bilingual Code-Switching Speech

Abstract

Metrics

Citation History

Topics

Related Documents

Code-Switching without Switching: Language Agnostic End-to-End Speech Translation

Improving Transformer Based End-to-End Code-Switching Speech Recognition Using Language Identification

Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition

Text-Derived Language Identity Incorporation for End-to-End Code-Switching Speech Recognition

Language Diarization Model for bilingual Code-Switched Speech Analysis