JOURNAL ARTICLE

Contrastive Tokens and Label Activation for Remote Sensing Weakly Supervised Semantic Segmentation

Zaiyi HuJunyu GaoYuan YuanXuelong Li

Year: 2024 Journal:   IEEE Transactions on Geoscience and Remote Sensing Vol: 62 Pages: 1-11   Publisher: Institute of Electrical and Electronics Engineers

Abstract

In recent years, there has been remarkable progress in Weakly Supervised Semantic Segmentation (WSSS), with Vision Transformer (ViT) architectures emerging as a natural fit for such tasks due to their inherent ability to leverage global attention for comprehensive object information perception. However, directly applying ViT to WSSS tasks can introduce challenges. The characteristics of ViT can lead to an over-smoothing problem, particularly in dense scenes of remote sensing images, significantly compromising the effectiveness of Class Activation Maps (CAM) and posing challenges for segmentation. Moreover, existing methods often adopt multi-stage strategies, adding complexity and reducing training efficiency. To overcome these challenges, a comprehensive framework CTFA ( Contrastive Token and Foreground Activation ) based on the ViT architecture for WSSS of remote sensing images is presented. Our proposed method includes a Contrastive Token Learning Module (CTLM), incorporating both patch-wise and class-wise token learning to enhance model performance. In patch-wise learning, we leverage the semantic diversity preserved in intermediate layers of ViT and derive a relation matrix from these layers and employ it to supervise the final output tokens, thereby improving the quality of CAM. In class-wise learning, we ensure the consistency of representation between global and local tokens, revealing more entire object regions. Additionally, by activating foreground features in the generated pseudo label using a dual-branch decoder, we further promote the improvement of CAM generation. Our approach demonstrates outstanding results across three well-established datasets, providing a more efficient and streamlined solution for WSSS. Code will be available at: https://github.com/ZaiyiHu/CTFA.

Keywords:
Computer science Segmentation Leverage (statistics) Artificial intelligence Feature learning Security token Machine learning Pattern recognition (psychology)

Metrics

15
Cited By
9.22
FWCI (Field Weighted Citation Impact)
66
Refs
0.96
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Remote-Sensing Image Classification
Physical Sciences →  Engineering →  Media Technology
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.