JOURNAL ARTICLE

A Survey on Masked Autoencoder for Visual Self-supervised Learning

Abstract

With the increasing popularity of masked autoencoders, self-supervised learning (SSL) in vision undertakes a similar trajectory as in NLP. Specifically, generative pretext tasks with the masked prediction have become a de facto standard SSL practice in NLP (e.g., BERT). By contrast, early attempts at generative methods in vision have been outperformed by their discriminative counterparts (like contrastive learning). However, the success of masked image modeling has revived the autoencoder-based visual pretraining method. As a milestone to bridge the gap with BERT in NLP, masked autoencoder in vision has attracted unprecedented attention. This work conducts a survey on masked autoencoders for visual SSL.

Keywords:
Autoencoder Artificial intelligence Computer science Discriminative model Machine learning Deep learning Natural language processing Pattern recognition (psychology)

Metrics

24
Cited By
6.13
FWCI (Field Weighted Citation Impact)
51
Refs
0.96
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Cancer-related molecular mechanisms research
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Cancer Research
Generative Adversarial Networks and Image Synthesis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.