Weak-Attention Suppression for Transformer Based Speech Recognition

Yangyang Shi; Yongqiang Wang; Chunyang Wu; Christian Fuegen; Frank Zhang; Duc Le; Ching-Feng Yeh; Michael L. Seltzer

doi:10.21437/interspeech.2020-1363

ScienceGate Book Chapters

JOURNAL ARTICLE

Weak-Attention Suppression for Transformer Based Speech Recognition

Yangyang Shi Yongqiang Wang Chunyang Wu Christian Fuegen Frank Zhang Duc Le Ching-Feng Yeh Michael L. Seltzer

Year: 2020 Pages: 4996-5000

DOI: 10.21437/interspeech.2020-1363

Get Full-Text PDF Get Analytical Report

Abstract

Transformers, originally proposed for natural language processing (NLP) tasks, have recently achieved great success in automatic speech recognition (ASR). However, adjacent acoustic units (i.e., frames) are highly correlated, and long-distance dependencies between them are weak, unlike text units. It suggests that ASR will likely benefit from sparse and localized attention. In this paper, we propose Weak-Attention Suppression (WAS), a method that dynamically induces sparsity in attention probabilities. We demonstrate that WAS leads to consistent Word Error Rate (WER) improvement over strong transformer baselines. On the widely used LibriSpeech benchmark, our proposed method reduced WER by 10%$ on test-clean and 5% on test-other for streamable transformers, resulting in a new state-of-the-art among streaming models. Further analysis shows that WAS learns to suppress attention of non-critical and redundant continuous acoustic frames, and is more likely to suppress past frames rather than future ones. It indicates the importance of lookahead in attention-based ASR models.

Keywords:

Transformer Computer science Speech recognition Word error rate Language model Benchmark (surveying) Artificial intelligence Natural language processing Engineering

Metrics

Cited By

2.20

FWCI (Field Weighted Citation Impact)

Refs

0.89

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Weak-Attention Suppression for Transformer Based Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Transformer-Based Multi-Head Attention for Noisy Speech Recognition

A window attention based Transformer for Automatic Speech Recognition

Transformer Based End-to-End Speech Recognition with Linear Attention

Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition

Speech emotion recognition based on crossmodal transformer and attention weight correction