JOURNAL ARTICLE

Evolved Hierarchical Masking for Self-Supervised Learning

Zhanzhou FengShiliang Zhang

Year: 2024 Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Vol: 47 (2)Pages: 1013-1027   Publisher: IEEE Computer Society

Abstract

Existing Masked Image Modeling methods apply fixed mask patterns to guide the self-supervised training. As those mask patterns resort to different criteria to depict image contents, sticking to a fixed pattern leads to a limited vision cues modeling capability. This paper introduces an evolved hierarchical masking method to pursue general visual cues modeling in self-supervised learning. The proposed method leverages the vision model being trained to parse the input visual cues into a hierarchy structure, which is hence adopted to generate masks accordingly. The accuracy of hierarchy is on par with the capability of the model being trained, leading to evolved mask patterns at different training stages. Initially, generated masks focus on low-level visual cues to grasp basic textures, then gradually evolve to depict higher-level cues to reinforce the learning of more complicated object semantics and contexts. Our method does not require extra pre-trained models or annotations and ensures training efficiency by evolving the training difficulty. We conduct extensive experiments on seven downstream tasks including partial-duplicate image retrieval relying on low-level details, as well as image classification and semantic segmentation that require semantic parsing capability. Experimental results demonstrate that it substantially boosts performance across these tasks. For instance, it surpasses the recent MAE by 1.1% in imageNet-1K classification and 1.4% in ADE20K segmentation with the same training epochs. We also align the proposed method with the current research focus on LLMs. The proposed approach bridges the gap with large-scale pre-training on semantic demanding tasks and enhances intricate detail perception in tasks requiring low-level feature recognition.

Keywords:
Artificial intelligence Computer science Masking (illustration) Pattern recognition (psychology) Machine learning

Metrics

1
Cited By
0.53
FWCI (Field Weighted Citation Impact)
65
Refs
0.59
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Face recognition and analysis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Masking Hierarchical Tokens for Underwater Acoustic Target Recognition With Self-Supervised Learning

Sheng FengXiaoqian ZhuShuqing Ma

Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Year: 2024 Vol: 32 Pages: 1365-1379
JOURNAL ARTICLE

Self-Supervised Learning With Segmental Masking for Speech Representation

Xianghu YueJingru LinFabian Ritter GutierrezHaizhou Li

Journal:   IEEE Journal of Selected Topics in Signal Processing Year: 2022 Vol: 16 (6)Pages: 1367-1379
JOURNAL ARTICLE

SHERLock: Self-Supervised Hierarchical Event Representation Learning

Sumegh RoychowdhurySumedh SontakkeLaurent IttiMausoom SarkarMilan AggarwalPinkesh BadjatiyaNikaash PuriBalaji Krishnamurthy

Journal:   2022 26th International Conference on Pattern Recognition (ICPR) Year: 2022 Vol: 9 Pages: 2672-2678
© 2026 ScienceGate Book Chapters — All rights reserved.