JOURNAL ARTICLE

The Devil is in Details: Delving Into Lite FFN Design for Vision Transformers

Abstract

Transformer has demonstrated exceptional performance on a variety of vision tasks. However, its high computational complexity can become problematic. In this paper, we conduct a systematic analysis of the complexity of each component in vision transformers, and identify an easily overlooked detail: that the Feed-Forward Network (FFN) is the primary computational bottleneck, even more so than the Multi-Head Self-Attention (MHSA) mechanism. Inspired by this, we further propose a lightweight FFN module, named SparseFFN, that can reduce dense computations in both channel and spatial dimension. Specifically, SparseFFN consists of two components: Channel-Sparse FFN (CS-FFN) and Spatial-Sparse FFN (SS-FFN), which can be seamlessly incorporated into various vision transformers and even pure MLP models with significantly fewer FLOPs. Extensive experiments demonstrate the effectiveness and efficiency of the proposed method. For example, our approach can reduce model complexity by 23%-39% for most of vision transformers and MLP models while keeping comparable accuracy.

Keywords:
Bottleneck Transformer Computer science Computation Computational complexity theory Artificial intelligence FLOPS Machine learning Computer engineering Pattern recognition (psychology) Algorithm Engineering Parallel computing Embedded system Voltage Electrical engineering

Metrics

4
Cited By
1.11
FWCI (Field Weighted Citation Impact)
18
Refs
0.69
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

CCD and CMOS Imaging Sensors
Physical Sciences →  Engineering →  Electrical and Electronic Engineering
Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Memory and Neural Computing
Physical Sciences →  Engineering →  Electrical and Electronic Engineering

Related Documents

JOURNAL ARTICLE

Devil in the details: Delving into accurate quality scoring for DensePose

Junyao SunQiong Liu

Journal:   Pattern Recognition Year: 2023 Vol: 148 Pages: 110197-110197
BOOK-CHAPTER

Delving into the Details

Jason S. McIntosh

Year: 2024 Pages: 111-114
JOURNAL ARTICLE

Delving Deeper Into Astromorphic Transformers

Md Zesun Ahmed MiaMalyaban BalAbhronil Sengupta

Journal:   IEEE Transactions on Cognitive and Developmental Systems Year: 2025 Vol: 17 (6)Pages: 1436-1446
JOURNAL ARTICLE

Delving Deep into the Generalization of Vision Transformers under Distribution Shifts

Chongzhi ZhangMingyuan ZhangShanghang ZhangDaisheng JinQiang ZhouZhongang CaiHaiyu ZhaoXianglong LiuZiwei Liu

Journal:   2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Year: 2022 Pages: 7267-7276
© 2026 ScienceGate Book Chapters — All rights reserved.