JOURNAL ARTICLE

TAFP-ViT: A Transformer Accelerator via QKV Computational Fusion and Adaptive Pruning for Vision Transformer

Liang XuHongRui SongR.C. WangTian LanZhongfeng Wang

Year: 2025 Journal:   ACM Transactions on Embedded Computing Systems Vol: 24 (5)Pages: 1-21   Publisher: Association for Computing Machinery

Abstract

The remarkable progress of Vision Transformer (ViT) models has significantly advanced performance in computer vision tasks. However, the deployment of ViTs in resource-constrained environments remains a challenge, as the attention computation mechanisms within these models form a significant bottleneck, requiring substantial memory and computational resources. To address this challenge, we introduce TAFP-ViT, a tailored hardware-software co-design framework for Vision Transformers. On the software level, TAFP-ViT leverages a learnable compressor to perform multi-head shared compression on feature maps, and fuses decompression reconstruction, QKV generation and QKV processing together for calculation, thereby greatly reducing memory and computation requirements. Furthermore, TAFP-ViT combines dynamic inter-layer token pruning to eliminate unimportant tokens and hardware-friendly intra-block row pruning to diminish redundant computations. The proposed software design converts the calculations before and after SoftMax into dense and sparse triple matrix multiplication (TMM) forms respectively. On the hardware level, TAFP-ViT proposes a configurable systolic array (SA) to efficiently adapt to the QKV fusion computation pattern. The SA has flexible PE units that can effectively support general matrix multiplication (GEMM), dense and sparse TMM. The TMM and flexible dataflows allow TAFP-ViT to avoid handling transpositions and storing intermediate computation results, greatly enhancing computational efficiency. Besides, TAFP-ViT innovatively designs a Top-k engine to support dynamic pruning on the fly with high throughput and low resource consumption. Experiments show that the proposed TAFP-ViT achieves remarkable speedups of 123.91×, 29.5×, and 3.01∼ 20.65× compared to conventional CPUs, GPUs, and previous state-of-the-art works, respectively. Additionally, TAFP-ViT reaches a throughput of up to 731.5 GOP/s and an impressive energy efficiency of 77.9 GOPS/W.

Keywords:
Computer science Computation Software Parallel computing Matrix multiplication Bottleneck Computer engineering Computer hardware Embedded system Algorithm

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
28
Refs
0.19
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

CCD and CMOS Imaging Sensors
Physical Sciences →  Engineering →  Electrical and Electronic Engineering
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.