JOURNAL ARTICLE

Vision Transformers Are Robust Learners

Sayak PaulPin‐Yu Chen

Year: 2022 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 36 (2)Pages: 2071-2081   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

Transformers, composed of multiple self-attention layers, hold strong promises toward a generic learning primitive applicable to different data modalities, including the recent breakthroughs in computer vision achieving state-of-the-art (SOTA) standard accuracy. What remains largely unexplored is their robustness evaluation and attribution. In this work, we study the robustness of the Vision Transformer (ViT) (Dosovitskiy et al. 2021) against common corruptions and perturbations, distribution shifts, and natural adversarial examples. We use six different diverse ImageNet datasets concerning robust classification to conduct a comprehensive performance comparison of ViT(Dosovitskiy et al. 2021) models and SOTA convolutional neural networks (CNNs), Big-Transfer (Kolesnikov et al. 2020). Through a series of six systematically designed experiments, we then present analyses that provide both quantitative andqualitative indications to explain why ViTs are indeed more robust learners. For example, with fewer parameters and similar dataset and pre-training combinations, ViT gives a top-1accuracy of 28.10% on ImageNet-A which is 4.3x higher than a comparable variant of BiT. Our analyses on image masking, Fourier spectrum sensitivity, and spread on discrete cosine energy spectrum reveal intriguing properties of ViT attributing to improved robustness. Code for reproducing our experiments is available at https://git.io/J3VO0.

Keywords:
Robustness (evolution) Computer science Artificial intelligence Machine learning Transformer Pattern recognition (psychology) Engineering

Metrics

196
Cited By
20.45
FWCI (Field Weighted Citation Impact)
99
Refs
1.00
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Adversarial Robustness in Machine Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Granular learning for robust vision transformers

Haoyang TangZongkai ShaoXiao PanKai Zeng

Journal:   Applied Soft Computing Year: 2025 Vol: 187 Pages: 114296-114296
BOOK-CHAPTER

Siamese Vision Transformers are Scalable Audio-Visual Learners

Yan-Bo LinGedas Bertasius

Lecture notes in computer science Year: 2024 Pages: 303-321
JOURNAL ARTICLE

Are Vision Transformers Robust to Spurious Correlations?

Soumya Suvra GhosalYifei Ming

Journal:   International Journal of Computer Vision Year: 2023 Vol: 132 (3)Pages: 689-709
© 2026 ScienceGate Book Chapters — All rights reserved.