Multichannel End-to-end Speech Recognition

Tsubasa Ochiai; Shinji Watanabe; Takaaki Hori; John R. Hershey

doi:10.48550/arxiv.1703.04783

ScienceGate Book Chapters

JOURNAL ARTICLE

Multichannel End-to-end Speech Recognition

Tsubasa Ochiai Shinji Watanabe Takaaki Hori John R. Hershey

Year: 2017 Journal: arXiv (Cornell University) Pages: 2632-2641 Publisher: Cornell University

DOI: 10.48550/arxiv.1703.04783

Get Full-Text PDF Get Analytical Report

Abstract

The field of speech recognition is in the midst of a paradigm shift: end-to-end neural networks are challenging the dominance of hidden Markov models as a core technology. Using an attention mechanism in a recurrent encoder-decoder architecture solves the dynamic time alignment problem, allowing joint end-to-end training of the acoustic and language modeling components. In this paper we extend the end-to-end framework to encompass microphone array signal processing for noise suppression and speech enhancement within the acoustic encoding network. This allows the beamforming components to be optimized jointly within the recognition architecture to improve the end-to-end speech recognition objective. Experiments on the noisy speech benchmarks (CHiME-4 and AMI) show that our multichannel end-to-end system outperformed the attention-based baseline with input from a conventional adaptive beamformer.

Keywords:

End-to-end principle Computer science Speech recognition Hidden Markov model Speech enhancement Speech processing Beamforming Microphone Adaptive beamformer Acoustic model Artificial intelligence Noise reduction Telecommunications

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Multichannel End-to-end Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Speaker Adaptation for Multichannel End-to-End Speech Recognition

CNN-BASED MULTICHANNEL END-TO-END SPEECH RECOGNITION FOR EVERYDAY HOME ENVIRONMENTS

Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming

CNN-based Multichannel End-to-End Speech Recognition for Everyday Home Environments

End-to-End Speech Recognition