TAFP-ViT: A Transformer Accelerator via QKV Computational Fusion and Adaptive Pruning for Vision Transformer

Liang Xu; HongRui Song; R.C. Wang; Tian Lan; Zhongfeng Wang

doi:10.1145/3745028

ScienceGate Book Chapters

JOURNAL ARTICLE

TAFP-ViT: A Transformer Accelerator via QKV Computational Fusion and Adaptive Pruning for Vision Transformer

Liang Xu HongRui Song R.C. Wang Tian Lan Zhongfeng Wang

Year: 2025 Journal: ACM Transactions on Embedded Computing Systems Vol: 24 (5)Pages: 1-21 Publisher: Association for Computing Machinery

DOI: 10.1145/3745028

Get Full-Text PDF Get Analytical Report

Abstract

The remarkable progress of Vision Transformer (ViT) models has significantly advanced performance in computer vision tasks. However, the deployment of ViTs in resource-constrained environments remains a challenge, as the attention computation mechanisms within these models form a significant bottleneck, requiring substantial memory and computational resources. To address this challenge, we introduce TAFP-ViT, a tailored hardware-software co-design framework for Vision Transformers. On the software level, TAFP-ViT leverages a learnable compressor to perform multi-head shared compression on feature maps, and fuses decompression reconstruction, QKV generation and QKV processing together for calculation, thereby greatly reducing memory and computation requirements. Furthermore, TAFP-ViT combines dynamic inter-layer token pruning to eliminate unimportant tokens and hardware-friendly intra-block row pruning to diminish redundant computations. The proposed software design converts the calculations before and after SoftMax into dense and sparse triple matrix multiplication (TMM) forms respectively. On the hardware level, TAFP-ViT proposes a configurable systolic array (SA) to efficiently adapt to the QKV fusion computation pattern. The SA has flexible PE units that can effectively support general matrix multiplication (GEMM), dense and sparse TMM. The TMM and flexible dataflows allow TAFP-ViT to avoid handling transpositions and storing intermediate computation results, greatly enhancing computational efficiency. Besides, TAFP-ViT innovatively designs a Top-k engine to support dynamic pruning on the fly with high throughput and low resource consumption. Experiments show that the proposed TAFP-ViT achieves remarkable speedups of 123.91×, 29.5×, and 3.01∼ 20.65× compared to conventional CPUs, GPUs, and previous state-of-the-art works, respectively. Additionally, TAFP-ViT reaches a throughput of up to 731.5 GOP/s and an impressive energy efficiency of 77.9 GOPS/W.

Keywords:

Computer science Computation Software Parallel computing Matrix multiplication Bottleneck Computer engineering Computer hardware Embedded system Algorithm

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.19

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

CCD and CMOS Imaging Sensors

Physical Sciences → Engineering → Electrical and Electronic Engineering

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

TAFP-ViT: A Transformer Accelerator via QKV Computational Fusion and Adaptive Pruning for Vision Transformer

Abstract

Metrics

Topics

Related Documents

ViT- ToGo: Vision Transformer Accelerator with Grouped Token Pruning

DPP-ViT: Dynamic Patch Pruning for Low Complexity Vision Transformer Accelerator

Adaptive Token Pruning for Vision Transformer

RMSF-ViT: Randomized Multi-scale Fusion Vision Transformer

A-ViT: Adaptive Tokens for Efficient Vision Transformer