Vision Transformers Are Robust Learners

Sayak Paul; Pin‐Yu Chen

doi:10.1609/aaai.v36i2.20103

ScienceGate Book Chapters

JOURNAL ARTICLE

Vision Transformers Are Robust Learners

Sayak Paul Pin‐Yu Chen

Year: 2022 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 36 (2)Pages: 2071-2081 Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v36i2.20103

Get Full-Text PDF Get Analytical Report

Abstract

Transformers, composed of multiple self-attention layers, hold strong promises toward a generic learning primitive applicable to different data modalities, including the recent breakthroughs in computer vision achieving state-of-the-art (SOTA) standard accuracy. What remains largely unexplored is their robustness evaluation and attribution. In this work, we study the robustness of the Vision Transformer (ViT) (Dosovitskiy et al. 2021) against common corruptions and perturbations, distribution shifts, and natural adversarial examples. We use six different diverse ImageNet datasets concerning robust classification to conduct a comprehensive performance comparison of ViT(Dosovitskiy et al. 2021) models and SOTA convolutional neural networks (CNNs), Big-Transfer (Kolesnikov et al. 2020). Through a series of six systematically designed experiments, we then present analyses that provide both quantitative andqualitative indications to explain why ViTs are indeed more robust learners. For example, with fewer parameters and similar dataset and pre-training combinations, ViT gives a top-1accuracy of 28.10% on ImageNet-A which is 4.3x higher than a comparable variant of BiT. Our analyses on image masking, Fourier spectrum sensitivity, and spread on discrete cosine energy spectrum reveal intriguing properties of ViT attributing to improved robustness. Code for reproducing our experiments is available at https://git.io/J3VO0.

Keywords:

Robustness (evolution) Computer science Artificial intelligence Machine learning Transformer Pattern recognition (psychology) Engineering

Metrics

196

Cited By

20.45

FWCI (Field Weighted Citation Impact)

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Adversarial Robustness in Machine Learning

Physical Sciences → Computer Science → Artificial Intelligence

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Vision Transformers Are Robust Learners

Abstract

Metrics

Citation History

Topics

Related Documents

Multi-Attribute Vision Transformers are Efficient and Robust Learners

Granular learning for robust vision transformers

Vision Transformers are Parameter-Efficient Audio-Visual Learners

Siamese Vision Transformers are Scalable Audio-Visual Learners

Are Vision Transformers Robust to Spurious Correlations?