JOURNAL ARTICLE

Decoupling Zero-Shot Semantic Segmentation

Jian DingNan XueGui-Song XiaDengxin Dai

Year: 2022 Journal:   2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pages: 11573-11582

Abstract

Zero-shot semantic segmentation (ZS3) aims to segment the novel categories that have not been seen in the training. Existing works formulate ZS3 as a pixel-level zeroshot classification problem, and transfer semantic knowledge from seen classes to unseen ones with the help of language models pre-trained only with texts. While simple, the pixel-level ZS3 formulation shows the limited capability to integrate vision-language models that are often pre-trained with image-text pairs and currently demonstrate great potential for vision tasks. Inspired by the observation that humans often perform segment-level semantic labeling, we propose to decouple the ZS3 into two sub-tasks: 1) a classagnostic grouping task to group the pixels into segments. 2) a zero-shot classification task on segments. The former task does not involve category information and can be directly transferred to group pixels for unseen classes. The latter task performs at segment-level and provides a natural way to leverage large-scale vision-language models pre-trained with image-text pairs (e.g. CLIP) for ZS3. Based on the decoupling formulation, we propose a simple and effective zero-shot semantic segmentation model, called ZegFormer, which outperforms the previous methods on ZS3 standard benchmarks by large margins, e.g., 22 points on the PAS-CAL VOC and 3 points on the COCO-Stuff in terms of mIoU for unseen classes. Code will be released at https://github.com/dingjiansw101/ZegFormer.

Keywords:
Computer science Segmentation Leverage (statistics) Artificial intelligence Pixel Decoupling (probability) Task (project management) Pattern recognition (psychology) Contextual image classification Image segmentation Natural language processing Zero (linguistics) Simple (philosophy) Shot (pellet) Image (mathematics) Computer vision Machine learning

Metrics

221
Cited By
25.74
FWCI (Field Weighted Citation Impact)
79
Refs
1.00
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
COVID-19 diagnosis using AI
Health Sciences →  Medicine →  Radiology, Nuclear Medicine and Imaging
© 2026 ScienceGate Book Chapters — All rights reserved.