Untied Positional Encodings for Efficient Transformer-Based Speech Recognition

Lahiru Samarakoon; Ivan W. H. Fung

doi:10.1109/slt54892.2023.10023097

ScienceGate Book Chapters

JOURNAL ARTICLE

Untied Positional Encodings for Efficient Transformer-Based Speech Recognition

Lahiru Samarakoon Ivan W. H. Fung

Year: 2023 Journal: 2022 IEEE Spoken Language Technology Workshop (SLT) Pages: 108-114

DOI: 10.1109/slt54892.2023.10023097

Get Full-Text PDF Get Analytical Report

Abstract

Self-attention has become a vital component for end-to-end (E2E) automatic speech recognition (ASR). Convolution-augmented Transformer (Conformer) with relative positional encoding (RPE) achieved state-of-the-art performance. This paper proposes a positional encoding (PE) mechanism called Scaled Untied RPE that unties the feature-position correlations in the self-attention computation, and computes feature correlations and positional correlations separately using different projection matrices. In addition, we propose to scale feature correlations with the positional correlations and the aggressiveness of this multiplicative interaction can be configured using a parameter called amplitude. Moreover, we show that the PE matrix can be sliced to reduce model parameters. Our results on National Speech Corpus (NSC) show that Transformer encoders with Scaled Untied RPE achieves relative improvements of 1.9% in accuracy and up to 50.9% in latency over a Conformer baseline respectively.

Keywords:

Computer science Computation Transformer Encoder Speech recognition Latency (audio) Convolution (computer science) Artificial intelligence Pattern recognition (psychology) Feature (linguistics) Correlation Algorithm Mathematics Engineering Artificial neural network

Metrics

Cited By

0.29

FWCI (Field Weighted Citation Impact)

Refs

0.36

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Untied Positional Encodings for Efficient Transformer-Based Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Efficient Transformer-Based Speech Recognition

Complex-Valued Relative Positional Encodings for Transformer

Transformer Language Models without Positional Encodings Still Learn Positional Information

Automatic speech recognition with efficient transformer

An Investigation of Positional Encoding in Transformer-based End-to-end Speech Recognition