JOURNAL ARTICLE

Improving Attention-Based End-to-End Speech Recognition by Monotonic Alignment Attention Matrix Reconstruction

Abstract

In automatic speech recognition (ASR) task, the output sequence should correspond to a linear transcription of the input sequence. Lots of works have been done to learn the monotonic alignment in end-to-end (E2E) ASR model, but their methods mainly focus on streaming propose and usually result in a decline in ASR performance. On the contrary, some studies have shown that for non-streaming attention-based models, monotonic alignment is beneficial to model performance. Based on this motivation, we propose the enhanced Gaussian Monotonic Alignment (e-GMA), which reduces the difficulty of learning monotonic alignment, and the reconstructed attention matrix leads to an improved accuracy in ASR tasks. Experiments on the LibriSpeech dataset demonstrate the effectiveness of the proposed approach. Comparing with a strong baseline obtained from WeNet, the proposed model yields 12.2% relative WER reduction on test-clean benchmark and 9.9% on test-other.

Keywords:
Monotonic function Benchmark (surveying) Computer science Gaussian Speech recognition Task (project management) Sequence (biology) Focus (optics) End-to-end principle Artificial intelligence Algorithm Pattern recognition (psychology) Mathematics Engineering

Metrics

1
Cited By
0.64
FWCI (Field Weighted Citation Impact)
23
Refs
0.63
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Explicit Alignment of Text and Speech Encodings for Attention-Based End-to-End Speech Recognition

Jennifer DrexlerJames Glass

Journal:   2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Year: 2019 Pages: 913-919
JOURNAL ARTICLE

Character-Aware Attention-Based End-to-End Speech Recognition

Zhong MengYashesh GaurJinyu LiYifan Gong

Journal:   2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Year: 2019 Vol: abs 1612 2695 Pages: 949-955
© 2026 ScienceGate Book Chapters — All rights reserved.