JOURNAL ARTICLE

Token Contrast for Weakly-Supervised Semantic Segmentation

Abstract

Weakly-Supervised Semantic Segmentation (WSSS) using image-level labels typically utilizes Class Activation Map (CAM) to generate the pseudo labels. Limited by the local structure perception of CNN, CAM usually cannot identify the integral object regions. Though the recent Vision Transformer (ViT) can remedy this flaw, we observe it also brings the over-smoothing issue, i.e., the final patch tokens incline to be uniform. In this work, we propose Token Contrast (ToCo) to address this issue and further explore the virtue of ViT for WSSS. Firstly, motivated by the observation that intermediate layers in ViT can still retain semantic diversity, we designed a Patch Token Contrast module (PTC). PTC supervises the final patch tokens with the pseudo token relations derived from intermediate layers, allowing them to align the semantic regions and thus yield more accurate CAM. Secondly, to further differentiate the low-confidence regions in CAM, we devised a Class Token Contrast module (CTC) inspired by the fact that class tokens in ViT can capture high-level semantics. CTC facilitates the representation consistency between uncertain local regions and global objects by contrasting their class tokens. Experiments on the PASCAL VOC and MS COCO datasets show the proposed ToCo can remarkably surpass other single-stage competitors and achieve comparable performance with state-of-the-art multi-stage methods. Code is available at https://github.com/rulixiang/ToCo.

Keywords:
Computer science Security token Segmentation Contrast (vision) Artificial intelligence Semantics (computer science) Class (philosophy) Pattern recognition (psychology) Programming language

Metrics

155
Cited By
28.21
FWCI (Field Weighted Citation Impact)
58
Refs
1.00
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Cross-Block Sparse Class Token Contrast for Weakly Supervised Semantic Segmentation

Keyang ChengJingfeng TangHongjian GuHao WanMaozhen Li

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2024 Vol: 34 (12)Pages: 13004-13015
JOURNAL ARTICLE

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

Lian XuWanli OuyangMohammed BennamounFarid BoussaïdDan Xu

Journal:   2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Year: 2022 Pages: 4300-4309
JOURNAL ARTICLE

Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation

Wangyu WuTianhong DaiZhenhong ChenXiaowei HuangJimin XiaoFei MaOuyang Ren-rong

Journal:   Engineering Applications of Artificial Intelligence Year: 2024 Vol: 139 Pages: 109626-109626
© 2026 ScienceGate Book Chapters — All rights reserved.