The Devil is in Details: Delving Into Lite FFN Design for Vision Transformers

Zhiyang Chen; Yousong Zhu; Zhaowen Li; Fan Yang; Chaoyang Zhao; Jinqiao Wang; Ming Tang

doi:10.1109/icassp48485.2024.10447756

ScienceGate Book Chapters

JOURNAL ARTICLE

The Devil is in Details: Delving Into Lite FFN Design for Vision Transformers

Zhiyang Chen Yousong Zhu Zhaowen Li Fan Yang Chaoyang Zhao Jinqiao Wang Ming Tang

Year: 2024 Pages: 4130-4134

DOI: 10.1109/icassp48485.2024.10447756

Get Full-Text PDF Get Analytical Report

Abstract

Transformer has demonstrated exceptional performance on a variety of vision tasks. However, its high computational complexity can become problematic. In this paper, we conduct a systematic analysis of the complexity of each component in vision transformers, and identify an easily overlooked detail: that the Feed-Forward Network (FFN) is the primary computational bottleneck, even more so than the Multi-Head Self-Attention (MHSA) mechanism. Inspired by this, we further propose a lightweight FFN module, named SparseFFN, that can reduce dense computations in both channel and spatial dimension. Specifically, SparseFFN consists of two components: Channel-Sparse FFN (CS-FFN) and Spatial-Sparse FFN (SS-FFN), which can be seamlessly incorporated into various vision transformers and even pure MLP models with significantly fewer FLOPs. Extensive experiments demonstrate the effectiveness and efficiency of the proposed method. For example, our approach can reduce model complexity by 23%-39% for most of vision transformers and MLP models while keeping comparable accuracy.

Keywords:

Bottleneck Transformer Computer science Computation Computational complexity theory Artificial intelligence FLOPS Machine learning Computer engineering Pattern recognition (psychology) Algorithm Engineering Parallel computing Embedded system Voltage Electrical engineering

Metrics

Cited By

1.11

FWCI (Field Weighted Citation Impact)

Refs

0.69

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

CCD and CMOS Imaging Sensors

Physical Sciences → Engineering → Electrical and Electronic Engineering

Visual Attention and Saliency Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Memory and Neural Computing

Physical Sciences → Engineering → Electrical and Electronic Engineering

The Devil is in Details: Delving Into Lite FFN Design for Vision Transformers

Abstract

Metrics

Citation History

Topics

Related Documents

The Devil is in Details: Delving into Lite Ffn Design for Vision Transformers

Devil in the details: Delving into accurate quality scoring for DensePose

Delving into the Details

Delving Deeper Into Astromorphic Transformers

Delving Deep into the Generalization of Vision Transformers under Distribution Shifts