JOURNAL ARTICLE

Multi-level Contrastive Learning for Self-Supervised Vision Transformers

Shentong MoZhun SunChao Li

Year: 2023 Journal:   2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pages: 2777-2786

Abstract

Recent studies aim to establish contrastive self-supervised learning (CSL) algorithms specialized for the family of Vision Transformers (ViTs) to make them function normally as ordinary convolutional-based backbones in the training progress. Despite obtaining promising performance on related downstream tasks, one compelling property of the ViTs is ignored in those approaches. As previous studies have demonstrated, vision transformers benefit from the early stage global attention mechanics, obtaining feature representations that contain information from distant patches, even in their shallow layers. Motivated by this, we present a simple yet effective framework to facilitate the self-supervised feature learning of transformer based vision architectures, namely, Multi-level Contrastive learning for Vision Transformers (MCVT). Specifically, we equip the vision transformers with individual-based (InfoNCE) and prototypical-based (ProtoNCE) contrastive loss in different stages of the architecture to capture low-level invariance and high-level invariance between views of samples, respectively. We conduct extensive experiments to demonstrate the effectiveness of the proposed method, using two well-known vision transformer backbones, on several vision downstream tasks, including linear classification, detection, and semantic segmentation.

Keywords:
Transformer Computer science Artificial intelligence Segmentation Feature learning Convolutional neural network Architecture Machine vision Machine learning Computer vision Engineering Voltage

Metrics

14
Cited By
2.02
FWCI (Field Weighted Citation Impact)
45
Refs
0.85
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Remote-Sensing Image Classification
Physical Sciences →  Engineering →  Media Technology
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.