Zero-shot Referring Image Segmentation with Global-Local Context Features

Seonghoon Yu; Paul Hongsuck Seo; Jeany Son

doi:10.1109/cvpr52729.2023.01864

ScienceGate Book Chapters

JOURNAL ARTICLE

Zero-shot Referring Image Segmentation with Global-Local Context Features

Seonghoon Yu Paul Hongsuck Seo Jeany Son

Year: 2023 Pages: 19456-19465

DOI: 10.1109/cvpr52729.2023.01864

Get Full-Text PDF Get Analytical Report

Abstract

Referring image segmentation (RIS) aims to find a segmentation mask given a referring expression grounded to a region of the input image. Collecting labelled datasets for this task, however, is notoriously costly and labor-intensive. To overcome this issue, we propose a simple yet effective zero-shot referring image segmentation method by leveraging the pre-trained cross-modal knowledge from CLIP. In order to obtain segmentation masks grounded to the input text, we propose a mask-guided visual encoder that captures global and local contextual information of an input image. By utilizing instance masks obtained from off-the-shelf mask proposal techniques, our method is able to segment fine-detailed instance-level groundings. We also introduce a global-local text encoder where the global feature captures complex sentence-level semantics of the entire input expression while the local feature focuses on the target noun phrase extracted by a dependency parser. In our experiments, the proposed method outperforms several zero-shot baselines of the task and even the weakly supervised referring expression segmentation method with substantial margins. Our code is available at https://github.com/Seonghoon-Yu/Zero-shot-RIS.

Keywords:

Computer science Segmentation Artificial intelligence Encoder Feature (linguistics) Context (archaeology) Shot (pellet) Image segmentation Noun phrase Semantics (computer science) Computer vision Task (project management) Pattern recognition (psychology) Dependency (UML) Sentence Noun

Metrics

Cited By

9.10

FWCI (Field Weighted Citation Impact)

Refs

0.98

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Zero-shot Referring Image Segmentation with Global-Local Context Features

Abstract

Metrics

Citation History

Topics

Related Documents

Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation

Bidirectional Mask Selection for Zero-Shot Referring Image Segmentation

Text Augmented Spatial Aware Zero-shot Referring Image Segmentation

LGD: Leveraging generative descriptions for zero-shot referring image segmentation

: Localized text prompt refinement for zero-shot referring image segmentation