Context-dependent Label Smoothing Regularization for Attention-based End-to-End Code-Switching Speech Recognition

Zheying Huang; Peng Li; Xu Ji; Pengyuan Zhang; Yonghong Yan

doi:10.1109/iscslp49672.2021.9362080

ScienceGate Book Chapters

JOURNAL ARTICLE

Context-dependent Label Smoothing Regularization for Attention-based End-to-End Code-Switching Speech Recognition

Zheying Huang Peng Li Xu Ji Pengyuan Zhang Yonghong Yan

Year: 2021

DOI: 10.1109/iscslp49672.2021.9362080

Get Full-Text PDF Get Analytical Report

Abstract

Previous works utilize the context-independent (CI) label smoothing regularization (LSR) method to prevent attention-based End-to-End (E2E) automatic speech recognition (ASR) model, which is trained with a cross entropy loss function and hard labels, from making over-confident predictions. But the CI LSR method does not make use of linguistic knowledge within and between languages in the case of code-switching speech recognition (CSSR). In this paper, we propose the context-dependent (CD) LSR method. According to code-switching linguistic knowledge, the output units are classified into several categories and several context dependency rules are made. Under the guidance of the context dependency rules, prior label distribution is generated dynamically according to the category of historical context, rather than being fixed. Thus, the CD LSR method can utilize the linguistic knowledge in the case of CSSR to further improve the performance of the model. Experiments on the SEAME corpus demonstrate the effects of the proposed method. The final system with the CD LSR method achieves the best performance with 37.21% mixed error rate (MER), obtaining up to 3.7% relative MER reduction compared to the baseline system with no LSR method.

Keywords:

Smoothing Computer science Regularization (linguistics) Entropy (arrow of time) Speech recognition Cross entropy Dependency (UML) Artificial intelligence Language model Context (archaeology) Natural language processing Principle of maximum entropy Machine learning Computer vision

Metrics

Cited By

0.71

FWCI (Field Weighted Citation Impact)

Refs

0.75

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Context-dependent Label Smoothing Regularization for Attention-based End-to-End Code-Switching Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Towards Context-Aware End-to-End Code-Switching Speech Recognition

Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition

reducing multilingual context confusion for end-to-end code-switching automatic speech recognition

Data Augmentation for end-to-end Code-Switching Speech Recognition

Investigating End-to-end Speech Recognition for Mandarin-english Code-switching