Zhili TanXinghua FanHui ZhuEd Lin
Automatic speech recognition systems suffer from accuracy degradation when code-switching (multiple languages are spoken in a single utterance) is encountered. This is especially common for non-native speakers where there is a mismatch between speech and acoustic model. In this paper, we experiment on Mandarin-English code-switching audio spoken by native Chinese speakers and evaluate three techniques to improve accuracy-data adaptation, individual senone modeling and lexicon enrichment. Our results show the recognition of accented speech improves up to 12% on various code-switching datasets. We also propose several metrics to measure code-switching recognition quality, not captured in typical word error rate (WER) measurement.
Shun-Po ChuangHeng-Jui ChangSung-Feng HuangHung-yi Lee
Yanhua LongShuang WeiJie LianYijie Li
Yanhua LongYijie LiQiaozheng ZhangShuang WeiHong YeJichen Yang
Cao Hong NgaDuc-Quang VuHuong Hoang LuongChien-Lin HuangJia‐Ching Wang
Changhao ShanChao WengGuangsen WangDan SuMin LuoDong YuLei Xie