Recent work on multichannel end-to-end automatic speech recognition (ASR) has shown that multichannel speech enhancement and speech recognition functions can be integrated into a deep neural network (DNN)-based system, and promising experimental results have been shown using the CHiME-4 and AMI corpora. In other recent DNN-based hidden Markov model (DNN-HMM) hybrid architectures, the effectiveness of speaker adaptation has been well established. Motivated by these results, we propose a multi-path adaptation scheme for end-to-end multichannel ASR, which combines the unprocessed noisy speech features with a speech-enhanced pathway to improve upon previous end-to-end ASR approaches. Experimental results using CHiME-4 show that (1) our proposed multi-path adaptation scheme improves ASR performance and (2) adapting the encoder network is more effective than adapting the neural beamformer, attention mechanism, or decoder network.
Zhong MengYashesh GaurJinyu LiYifan Gong
Tsubasa OchiaiShinji WatanabeTakaaki HoriJohn R. Hershey
Shane SettleJonathan Le RouxTakaaki HoriShinji WatanabeJohn R. Hershey
Yue GuZhihao DuShiliang ZhangQian ChenJiqing Han