JOURNAL ARTICLE

MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation

Lian XuMohammed BennamounFarid BoussaïdHamid LagaWanli OuyangDan Xu

Year: 2024 Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Vol: 46 (12)Pages: 8380-8395   Publisher: IEEE Computer Society

Abstract

This paper proposes a novel transformer-based framework to generate accurate class-specific object localization maps for weakly supervised semantic segmentation (WSSS). Leveraging the insight that the attended regions of the one-class token in the standard vision transformer can generate class-agnostic localization maps, we investigate the transformer's capacity to capture class-specific attention for class-discriminative object localization by learning multiple class tokens. We present the Multi-Class Token transformer, which incorporates multiple class tokens to enable class-aware interactions with patch tokens. This is facilitated by a class-aware training strategy that establishes a one-to-one correspondence between output class tokens and ground-truth class labels. We also introduce a Contrastive-Class-Token (CCT) module to enhance the learning of discriminative class tokens, enabling the model to better capture the unique characteristics of each class. Consequently, the proposed framework effectively generates class-discriminative object localization maps from the class-to-patch attentions associated with different class tokens. To refine these localization maps, we propose the utilization of patch-level pairwise affinity derived from the patch-to-patch transformer attention. Furthermore, the proposed framework seamlessly complements the Class Activation Mapping (CAM) method, yielding significant improvements in WSSS performance on PASCAL VOC 2012 and MS COCO 2014. These results underline the importance of the class token for WSSS.

Keywords:
Computer science Discriminative model Security token Artificial intelligence Transformer Class (philosophy) Pairwise comparison Class hierarchy Segmentation Pattern recognition (psychology) Machine learning Object-oriented programming Computer network Programming language

Metrics

44
Cited By
27.47
FWCI (Field Weighted Citation Impact)
90
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

Lian XuWanli OuyangMohammed BennamounFarid BoussaïdDan Xu

Journal:   2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Year: 2022 Pages: 4300-4309
JOURNAL ARTICLE

Structural Relation Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation

Dingjie PengWataru Kameyama

Journal:   IEICE Transactions on Information and Systems Year: 2024 Vol: E108.D (7)Pages: 752-759
JOURNAL ARTICLE

Enhancing weakly supervised semantic segmentation through multi-class token attention learning

Huilan LuoZhen Zeng

Journal:   The Journal of Supercomputing Year: 2024 Vol: 81 (1)
© 2026 ScienceGate Book Chapters — All rights reserved.