Liang XuHongRui SongR.C. WangTian LanZhongfeng Wang
The remarkable progress of Vision Transformer (ViT) models has significantly advanced performance in computer vision tasks. However, the deployment of ViTs in resource-constrained environments remains a challenge, as the attention computation mechanisms within these models form a significant bottleneck, requiring substantial memory and computational resources. To address this challenge, we introduce TAFP-ViT, a tailored hardware-software co-design framework for Vision Transformers. On the software level, TAFP-ViT leverages a learnable compressor to perform multi-head shared compression on feature maps, and fuses decompression reconstruction, QKV generation and QKV processing together for calculation, thereby greatly reducing memory and computation requirements. Furthermore, TAFP-ViT combines dynamic inter-layer token pruning to eliminate unimportant tokens and hardware-friendly intra-block row pruning to diminish redundant computations. The proposed software design converts the calculations before and after SoftMax into dense and sparse triple matrix multiplication (TMM) forms respectively. On the hardware level, TAFP-ViT proposes a configurable systolic array (SA) to efficiently adapt to the QKV fusion computation pattern. The SA has flexible PE units that can effectively support general matrix multiplication (GEMM), dense and sparse TMM. The TMM and flexible dataflows allow TAFP-ViT to avoid handling transpositions and storing intermediate computation results, greatly enhancing computational efficiency. Besides, TAFP-ViT innovatively designs a Top-k engine to support dynamic pruning on the fly with high throughput and low resource consumption. Experiments show that the proposed TAFP-ViT achieves remarkable speedups of 123.91×, 29.5×, and 3.01∼ 20.65× compared to conventional CPUs, GPUs, and previous state-of-the-art works, respectively. Additionally, TAFP-ViT reaches a throughput of up to 731.5 GOP/s and an impressive energy efficiency of 77.9 GOPS/W.
Seungju LeeKyu-Min ChoEunji KwonSe Jin ParkSeojeong KimSeokhyeong Kang
Han ChoJoongho JoSeung-Eon HwangJongsun Park
Yuchao YangXiaofang HuYue Zhou
Yujin ChoA. R. LeeByung‐Gyu KimJan Plato
Hongxu YinArash VahdatJosé M. AlvarezArun MallyaJan KautzPavlo Molchanov