JOURNAL ARTICLE

Unsupervised Domain Adaptation for Referring Semantic Segmentation

Abstract

In this paper, we study the task of referring semantic segmentation in a highly practical setting, in which labeled visual data with corresponding text descriptions are available in the source, but only unlabeled visual data (without text descriptions) are available in the target. It is a challenging task that has many difficulties: (1) how to obtain proper queries for the target domain; (2) how to adapt visual-text joint distribution shifts; (3) how to maintain the original segmentation performance. Thus, we propose a cycle-consistent vision-language matching network to narrow down the domain gap and ease adaptation difficulty. Our model has significant practical applications since they are capable generalising to new data sources without requiring corresponding text annotations. First, a pseudo-text selector is devised to handle the missing modality, through the pre-trained clip model to measure the gap between query features of the source and visual features of the target. Next, a cross-domain segmentation predictor is adopted, which prompts the joint representations to be domain invariant and minimize the discrepancy between two domains. Then, we present a cycle-consistent query matcher to learn discriminative features via reconstructing visual features from masks. Instead of doing the textual comparison, we match the visual features to the pseudo queries. Extensive experiments show the effectiveness of our method.

Keywords:
Computer science Discriminative model Artificial intelligence Segmentation Task (project management) Domain adaptation Natural language processing Matching (statistics) Pattern recognition (psychology) Domain (mathematical analysis) Classifier (UML)

Metrics

4
Cited By
0.73
FWCI (Field Weighted Citation Impact)
52
Refs
0.67
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.