JOURNAL ARTICLE

FSwin Transformer: Feature-Space Window Attention Vision Transformer for Image Classification

Dayeon YooJeesu KimJinwoo Yoo

Year: 2024 Journal:   IEEE Access Vol: 12 Pages: 72598-72606   Publisher: Institute of Electrical and Electronics Engineers

Abstract

The vision transformer (ViT) with global self-attention exhibits quadratic computational complexity that depends on the image size. To address this issue, window-based self-attention ViT limits attention area to a specific window, thereby mitigating the computational complexity. However, it cannot effectively capture the relationships between windows. The Swin Transformer, a representative window-based self-attention ViT, introduces shifted-window multi-head self-attention (SW-MSA) to capture the cross-window information. However, SW-MSA groups tokens that are close to each other in the image into one window and thus cannot capture relationships between distant tokens. Therefore, this paper introduces a feature-space window attention transformer (FSwin Transformer) that includes distant but similar tokens in one window. The proposed FSwin Transformer clusters similar tokens based on the feature space and conducts self-attention within the cluster. Thus, this approach helps understand the global context of the image by compensating for interactions between long-distance tokens, which cannot be captured when windows are set based on the image space. In addition, we incorporate a feature-space refinement method with channel and spatial attention to emphasize key parts and suppress non-essential parts. The refined feature map improves the representation power of the model, resulting in improved classification performance. Consequently, in classification tasks for ImageNet-1K, FSwin Transformer outperforms existing Transformer-based backbones, including the Swin Transformer.

Keywords:
Computer science Transformer Computer vision Artificial intelligence Window (computing) Pattern recognition (psychology) Electrical engineering Voltage Engineering

Metrics

10
Cited By
5.30
FWCI (Field Weighted Citation Impact)
49
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
CCD and CMOS Imaging Sensors
Physical Sciences →  Engineering →  Electrical and Electronic Engineering
Image Enhancement Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Refined Feature-Space Window Attention Vision Transformer for Image Classification

Dayeon YooJinwoo Yoo

Journal:   The Transactions of The Korean Institute of Electrical Engineers Year: 2024 Vol: 73 (6)Pages: 1004-1011
JOURNAL ARTICLE

Local Window Attention Vision Transformer for Mammogram Classification

K. SreekalaJayakrushna Sahoo

Journal:   IETE Journal of Research Year: 2025 Vol: 71 (6)Pages: 1920-1928
JOURNAL ARTICLE

Local Window Attention Transformer for Polarimetric SAR Image Classification

Ali JamaliSwalpa Kumar RoyAvik BhattacharyaPedram Ghamisi

Journal:   IEEE Geoscience and Remote Sensing Letters Year: 2023 Vol: 20 Pages: 1-5
JOURNAL ARTICLE

Spectral Spatial Window Attention Transformer for Hyperspectral Image Classification

Xi WuTahir ArshadBo Peng

Journal:   IEEE Transactions on Geoscience and Remote Sensing Year: 2025 Vol: 63 Pages: 1-13
© 2026 ScienceGate Book Chapters — All rights reserved.