Previous works utilize the context-independent (CI) label smoothing regularization (LSR) method to prevent attention-based End-to-End (E2E) automatic speech recognition (ASR) model, which is trained with a cross entropy loss function and hard labels, from making over-confident predictions. But the CI LSR method does not make use of linguistic knowledge within and between languages in the case of code-switching speech recognition (CSSR). In this paper, we propose the context-dependent (CD) LSR method. According to code-switching linguistic knowledge, the output units are classified into several categories and several context dependency rules are made. Under the guidance of the context dependency rules, prior label distribution is generated dynamically according to the category of historical context, rather than being fixed. Thus, the CD LSR method can utilize the linguistic knowledge in the case of CSSR to further improve the performance of the model. Experiments on the SEAME corpus demonstrate the effects of the proposed method. The final system with the CD LSR method achieves the best performance with 37.21% mixed error rate (MER), obtaining up to 3.7% relative MER reduction compared to the baseline system with no LSR method.
Zimeng QiuYiyuan LiXinjian LiFlorian MetzeWilliam M. Campbell
Sining SunPengcheng GuoLei XieMei-Yuh Hwang
Shuai ZhangJiangyan YiZhengkun TianJianhua TaoYu Ting YeungLiqun Deng
Chenpeng DuHao LiYizhou LuLan WangYanmin Qian
Changhao ShanChao WengGuangsen WangDan SuMin LuoDong YuLei Xie