Local masking meets progressive freezing: crafting efficient vision transformers for self-supervised learning

Utku Mert Topcuoglu; Erdem Akagündüz

doi:10.1117/12.3055190

ScienceGate Book Chapters

JOURNAL ARTICLE

Local masking meets progressive freezing: crafting efficient vision transformers for self-supervised learning

Utku Mert Topcuoglu Erdem Akagündüz

Year: 2025 Pages: 30-30

DOI: 10.1117/12.3055190

Get Full-Text PDF Get Analytical Report

Abstract

This paper presents an innovative approach to self-supervised learning for Vision Transformers (ViTs), integrating local masked image modeling with progressive layer freezing. This method enhances the efficiency and speed of initial layer training in ViTs. By systematically freezing specific layers at strategic points during training, we reduce computational demands while maintaining learning capabilities. Our approach employs a novel multi-scale reconstruction process that fosters efficient learning in initial layers and enhances semantic comprehension across scales. The results demonstrate a substantial reduction in training time (12.5%) with a minimal impact on model accuracy (decrease in top-1 accuracy by 0.6%). Our method achieves top-1 and top-5 accuracies of 82.6% and 96.2%, respectively, underscoring its potential in scenarios where computational resources and time are critical. The implementation of our approach is available at our project's GitHub repository: https://github.com/utkutpcgl/ViTFreeze.

Keywords:

Computer science Masking (illustration) Transformer Artificial intelligence Computer vision Engineering Electrical engineering Visual arts Art

Metrics

Cited By

3.72

FWCI (Field Weighted Citation Impact)

Refs

0.79

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Sensor and Control Systems

Physical Sciences → Engineering → Control and Systems Engineering

Building Energy and Comfort Optimization

Physical Sciences → Engineering → Building and Construction

Infrared Target Detection Methodologies

Physical Sciences → Engineering → Aerospace Engineering

Local masking meets progressive freezing: crafting efficient vision transformers for self-supervised learning

Abstract

Metrics

Citation History

Topics

Related Documents

Efficient Self-Supervised Continual Learning with Progressive Task-Correlated Layer Freezing

The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers

Efficient Self-Supervised Vision Transformers for Histopathology Image Retrieval

Automated Progressive Learning for Efficient Training of Vision Transformers

SELF-SUPERVISED VISION TRANSFORMERS FOR CROSS-MODAL LEARNING (REVIEW)