JOURNAL ARTICLE

Supervised Attention in Sequence-to-Sequence Models for Speech Recognition

Gene-Ping YangHao Tang

Year: 2022 Journal:   ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pages: 7222-7226

Abstract

Attention mechanism in sequence-to-sequence models is designed to model the alignments between acoustic features and output tokens in speech recognition. However, attention weights produced by models trained end to end do not always correspond well with actual alignments, and several studies have further argued that attention weights might not even correspond well with the relevance attribution of frames. Regardless, visual similarity between attention weights and alignments is widely used during training as an indicator of the models quality. In this paper, we treat the correspondence between attention weights and alignments as a learning problem by imposing a supervised attention loss. Experiments have shown significant improved performance, suggesting that learning the alignments well during training critically determines the performance of sequence-to-sequence models.

Keywords:
Sequence (biology) Computer science Similarity (geometry) Relevance (law) Speech recognition Artificial intelligence Sequence learning Pattern recognition (psychology) Natural language processing Machine learning Image (mathematics)

Metrics

5
Cited By
0.59
FWCI (Field Weighted Citation Impact)
41
Refs
0.62
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.