Speaker Adaptation for Multichannel End-to-End Speech Recognition

Tsubasa Ochiai; Shinji Watanabe; Shigeru Katagiri; Takaaki Hori; John R. Hershey

doi:10.1109/icassp.2018.8462161

ScienceGate Book Chapters

JOURNAL ARTICLE

Speaker Adaptation for Multichannel End-to-End Speech Recognition

Tsubasa Ochiai Shinji Watanabe Shigeru Katagiri Takaaki Hori John R. Hershey

Year: 2018 Pages: 6707-6711

DOI: 10.1109/icassp.2018.8462161

Get Full-Text PDF Get Analytical Report

Abstract

Recent work on multichannel end-to-end automatic speech recognition (ASR) has shown that multichannel speech enhancement and speech recognition functions can be integrated into a deep neural network (DNN)-based system, and promising experimental results have been shown using the CHiME-4 and AMI corpora. In other recent DNN-based hidden Markov model (DNN-HMM) hybrid architectures, the effectiveness of speaker adaptation has been well established. Motivated by these results, we propose a multi-path adaptation scheme for end-to-end multichannel ASR, which combines the unprocessed noisy speech features with a speech-enhanced pathway to improve upon previous end-to-end ASR approaches. Experimental results using CHiME-4 show that (1) our proposed multi-path adaptation scheme improves ASR performance and (2) adapting the encoder network is more effective than adapting the neural beamformer, attention mechanism, or decoder network.

Keywords:

End-to-end principle Speech recognition Computer science Hidden Markov model Adaptation (eye) Artificial neural network Speaker recognition Encoder Path (computing) Artificial intelligence

Metrics

Cited By

6.14

FWCI (Field Weighted Citation Impact)

Refs

0.97

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speaker Adaptation for Multichannel End-to-End Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Speaker Adaptation for Attention-Based End-to-End Speech Recognition

Multichannel End-to-end Speech Recognition

End-to-End Multi-Speaker Speech Recognition

Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition

Speaker Adaptation for End-to-End Speech Recognition Systems in Noisy Environments