JOURNAL ARTICLE

Transformer Model Compression for End-to-End Speech Recognition on Mobile Devices

Leila Ben LetaifaJean-Luc Rouas

Year: 2022 Journal:   2022 30th European Signal Processing Conference (EUSIPCO) Pages: 439-443

Abstract

Transformer-based models have achieved state-of-the-art performance in various areas of machine learning, including automatic speech recognition. However, their cost in terms of computational power, memory or energy consumption can be exorbitant, hence the interest in compression techniques. Trans-former models are mostly composed of attention and feedforward components. In this paper, we propose to reduce the size of a transformer model in an end-to-end speech recognition system by decreasing the number and precision of linear layer parameters. Specifically, we investigate the impact of weight pruning on system performance. We then consider model quantization. To further reduce the model size, we address the combination of pruning and quantization methods. Experiments carried out on several speech datasets from different languages show that the memory footprint can be reduced by up to 84% with an insignificant loss of accuracy.

Keywords:
Computer science Transformer Quantization (signal processing) End-to-end principle Speech recognition Memory footprint Speech coding Artificial intelligence Algorithm Engineering

Metrics

5
Cited By
0.59
FWCI (Field Weighted Citation Impact)
31
Refs
0.65
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.