JOURNAL ARTICLE

Explicit Alignment of Text and Speech Encodings for Attention-Based End-to-End Speech Recognition

Jennifer DrexlerJames Glass

Year: 2019 Journal:   2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pages: 913-919

Abstract

In this work, we present a novel training procedure for attention-based end-to-end automatic speech recognition. Our goal is to push the encoder network to output only linguistic information, improving generalization performance particularly in low-resource scenarios. We accomplish this with the addition of a text encoder network, which the speech encoder is encouraged to mimic. Our main innovation is the comparison of the attention-weighted speech encoder outputs to the outputs of the text encoder - this guarantees two sequences of the same length that can be directly aligned. We show that our training procedure significantly decreases word error rates in all experiments and has the biggest absolute impact in the lowest resource scenarios.

Keywords:
Encoder Computer science Generalization Speech recognition End-to-end principle Word (group theory) Resource (disambiguation) Word error rate Artificial intelligence Natural language processing Mathematics

Metrics

3
Cited By
0.15
FWCI (Field Weighted Citation Impact)
22
Refs
0.60
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Character-Aware Attention-Based End-to-End Speech Recognition

Zhong MengYashesh GaurJinyu LiYifan Gong

Journal:   2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Year: 2019 Vol: abs 1612 2695 Pages: 949-955
JOURNAL ARTICLE

End-to-end Speech-to-Punctuated-Text Recognition

Jumon NozakiTatsuya KawaharaKenkichi IshizukaTaiichi Hashimoto

Journal:   Interspeech 2022 Year: 2022 Pages: 1811-1815
JOURNAL ARTICLE

Visual analysis of attention-based end-to-end speech recognition

Seongmin LimJahyun GooHoirin Kim

Journal:   Phonetics and Speech Sciences Year: 2019 Vol: 11 (1)Pages: 41-49
© 2026 ScienceGate Book Chapters — All rights reserved.