Cross-Language Transfer Learning and Domain Adaptation for End-to-End Automatic Speech Recognition

Jian Luo; Jianzong Wang; Ning Cheng; Edward Xiao; Jing Xiao; Georg Kucsko; Patrick O’Neill; Jagadeesh Balam; Slyne Deng; Adriana Flores; Boris Ginsburg; Jocelyn Huang; Oleksii Kuchaiev; Vitaly Lavrukhin; Jason Li

doi:10.1109/icme51207.2021.9428334

JOURNAL ARTICLE

Cross-Language Transfer Learning and Domain Adaptation for End-to-End Automatic Speech Recognition

Jian Luo Jianzong Wang Ning Cheng Edward Xiao Jing Xiao Georg Kucsko Patrick O’Neill Jagadeesh Balam Slyne Deng Adriana Flores Boris Ginsburg Jocelyn Huang Oleksii Kuchaiev Vitaly Lavrukhin Jason Li

Year: 2021 Pages: 1-6

DOI: 10.1109/icme51207.2021.9428334

Get Full-Text PDF Get Analytical Report

Abstract

In this paper, we demonstrate the efficacy of transfer learning and continuous learning for various automatic speech recognition (ASR) tasks using end-to-end models trained with CTC loss. We start with a large pre-trained English ASR model and show that transfer learning can be effectively and easily performed on: (1) different English accents, (2) different languages (from English to German, Spanish, Russian, or from Mandarin to Cantonese) and (3) application-specific domains. Our extensive set of experiments demonstrate that in all three cases, transfer learning from a good base model has higher accuracy than a model trained from scratch. Our results indicate that, for fine-tuning, larger pre-trained models are better than small pre-trained models, even if the dataset for fine-tuning is small. We also show that transfer learning significantly speeds up convergence, which could result in significant cost savings when training with large datasets.

Keywords:

Computer science Transfer of learning Speech recognition Artificial intelligence Language model Set (abstract data type) Adaptation (eye) German End-to-end principle Mandarin Chinese Domain adaptation Transfer (computing) Training set Natural language processing

Metrics

Cited By

1.69

FWCI (Field Weighted Citation Impact)

Refs

0.86

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Cross-Language Transfer Learning and Domain Adaptation for End-to-End Automatic Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Real-Time End-to-End Speech Emotion Recognition with Cross-Domain Adaptation

Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition

Efficient Adaptation of Spoken Language Understanding based on End-to-End Automatic Speech Recognition

Incremental Learning for End-to-End Automatic Speech Recognition

Joint Discriminator and Transfer Based Fast Domain Adaptation For End-To-End Speech Recognition