JOURNAL ARTICLE

Untied Positional Encodings for Efficient Transformer-Based Speech Recognition

Lahiru SamarakoonIvan W. H. Fung

Year: 2023 Journal:   2022 IEEE Spoken Language Technology Workshop (SLT) Pages: 108-114

Abstract

Self-attention has become a vital component for end-to-end (E2E) automatic speech recognition (ASR). Convolution-augmented Transformer (Conformer) with relative positional encoding (RPE) achieved state-of-the-art performance. This paper proposes a positional encoding (PE) mechanism called Scaled Untied RPE that unties the feature-position correlations in the self-attention computation, and computes feature correlations and positional correlations separately using different projection matrices. In addition, we propose to scale feature correlations with the positional correlations and the aggressiveness of this multiplicative interaction can be configured using a parameter called amplitude. Moreover, we show that the PE matrix can be sliced to reduce model parameters. Our results on National Speech Corpus (NSC) show that Transformer encoders with Scaled Untied RPE achieves relative improvements of 1.9% in accuracy and up to 50.9% in latency over a Conformer baseline respectively.

Keywords:
Computer science Computation Transformer Encoder Speech recognition Latency (audio) Convolution (computer science) Artificial intelligence Pattern recognition (psychology) Feature (linguistics) Correlation Algorithm Mathematics Engineering Artificial neural network

Metrics

2
Cited By
0.29
FWCI (Field Weighted Citation Impact)
43
Refs
0.36
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.