JOURNAL ARTICLE

Object-Centric Masked Image Modeling-Based Self-Supervised Pretraining for Remote Sensing Object Detection

Tong ZhangYin ZhuangHe ChenLiang ChenGuanqun WangPeng GaoHao Dong

Year: 2023 Journal:   IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Vol: 16 Pages: 5013-5025   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Masked image modeling (MIM) has been proved to be an optimal pretext task for self-supervised pretraining (SSP), which can facilitate the model to capture an effective task-agnostic representation at the pretraining step and then advance the fine-tuning performance of various downstream tasks. However, under the high randomly masked ratio of MIM, the scene-level MIM-based SSP is hard to capture the small-scale objects or local details from complex remote sensing scenes. Then, when the pretrained models capturing more scene-level information are directly applied for object-level fine-tuning step, there is an obvious representation learning misalignment between model pretraining and fine-tuning steps. Therefore, in this article, a novel object-centric masked image modeling (OCMIM) strategy is proposed to make the model better capture the object-level information at the pretraining step and then further advance the object detection fine-tuning step. First, to better learn the object-level representation involving full scales and multicategories at MIM-based SSP, a novel object-centric data generator is proposed to automatically setup targeted pretraining data according to objects themselves, which can provide the specific data condition for object detection model pretraining. Second, an attention-guided mask generator is designed to generate a guided mask for MIM pretext task, which can lead the model to learn more discriminative representation of highly attended object regions than by using the randomly masking strategy. Finally, several experiments are conducted on six remote sensing object detection benchmarks, and results proved that the proposed OCMIM-based SSP strategy is a better pretraining way for remote sensing object detection than normally used methods.

Keywords:
Computer science Artificial intelligence Object detection Discriminative model Object (grammar) Computer vision Representation (politics) Masking (illustration) Task (project management) Generator (circuit theory) Pretext Viola–Jones object detection framework Pattern recognition (psychology) Facial recognition system Face detection Power (physics)

Metrics

16
Cited By
3.47
FWCI (Field Weighted Citation Impact)
77
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Remote-Sensing Image Classification
Physical Sciences →  Engineering →  Media Technology
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Self-Supervised Pretraining for RGB-D Salient Object Detection

Xiaoqi ZhaoYouwei PangLihe ZhangHuchuan LuXiang Ruan

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2022 Vol: 36 (3)Pages: 3463-3471
JOURNAL ARTICLE

Saliency supervised masked autoencoder pretrained salient location mining network for remote sensing image salient object detection

Yuxiang FuWei FangVictor S. Sheng

Journal:   ISPRS Journal of Photogrammetry and Remote Sensing Year: 2025 Vol: 224 Pages: 222-234
© 2026 ScienceGate Book Chapters — All rights reserved.