Supervised Attention in Sequence-to-Sequence Models for Speech Recognition

Gene-Ping Yang; Hao Tang

doi:10.1109/icassp43922.2022.9746310

ScienceGate Book Chapters

JOURNAL ARTICLE

Supervised Attention in Sequence-to-Sequence Models for Speech Recognition

Gene-Ping Yang Hao Tang

Year: 2022 Journal: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pages: 7222-7226

DOI: 10.1109/icassp43922.2022.9746310

Get Full-Text PDF Get Analytical Report

Abstract

Attention mechanism in sequence-to-sequence models is designed to model the alignments between acoustic features and output tokens in speech recognition. However, attention weights produced by models trained end to end do not always correspond well with actual alignments, and several studies have further argued that attention weights might not even correspond well with the relevance attribution of frames. Regardless, visual similarity between attention weights and alignments is widely used during training as an indicator of the models quality. In this paper, we treat the correspondence between attention weights and alignments as a learning problem by imposing a supervised attention loss. Experiments have shown significant improved performance, suggesting that learning the alignments well during training critically determines the performance of sequence-to-sequence models.

Keywords:

Sequence (biology) Computer science Similarity (geometry) Relevance (law) Speech recognition Artificial intelligence Sequence learning Pattern recognition (psychology) Natural language processing Machine learning Image (mathematics)

Metrics

Cited By

0.59

FWCI (Field Weighted Citation Impact)

Refs

0.62

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Supervised Attention in Sequence-to-Sequence Models for Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition

Integrating Source-Channel and Attention-Based Sequence-to-Sequence Models for Speech Recognition

Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition

Sequence-to-Sequence Models in Italian Atypical Speech Recognition

Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition