Empowering Unsupervised Domain Adaptation with Large-scale Pre-trained Vision-Language Models

Zhengfeng Lai; Haoping Bai; Haotian Zhang; Xianzhi Du; Jiulong Shan; Yinfei Yang; Chen‐Nee Chuah; Meng Cao

doi:10.1109/wacv57701.2024.00267

ScienceGate Book Chapters

JOURNAL ARTICLE

Empowering Unsupervised Domain Adaptation with Large-scale Pre-trained Vision-Language Models

Zhengfeng Lai Haoping Bai Haotian Zhang Xianzhi Du Jiulong Shan Yinfei Yang Chen‐Nee Chuah Meng Cao

Year: 2024 Pages: 2679-2689

DOI: 10.1109/wacv57701.2024.00267

Get Full-Text PDF Get Analytical Report

Abstract

Unsupervised Domain Adaptation (UDA) aims to leverage the labeled source domain to solve the tasks on the unlabeled target domain. Traditional UDA methods face the challenge of the tradeoff between domain alignment and semantic class discriminability, especially when a large domain gap exists between the source and target domains. The efforts of applying large-scale pre-training to bridge the domain gaps remain limited. In this work, we propose that Vision-Language Models (VLMs) can empower UDA tasks due to their training pattern with language alignment and their large-scale pre-trained datasets. For example, CLIP and GLIP have shown promising zero-shot generalization in classification and detection tasks. However, directly fine-tuning these VLMs into downstream tasks may be computationally expensive and not scalable if we have multiple domains that need to be adapted. Therefore, in this work, we first study an efficient adaption of VLMs to preserve the original knowledge while maximizing its flexibility for learning new knowledge. Then, we design a domain-aware pseudo-labeling scheme tailored to VLMs for domain disentanglement. We show the superiority of the proposed methods in four UDA-classification and two UDA-detection benchmarks, with a significant improvement (+9.9%) on DomainNet.

Keywords:

Domain adaptation Computer science Adaptation (eye) Scale (ratio) Artificial intelligence Domain (mathematical analysis) Natural language processing Psychology Geography Mathematics

Metrics

Cited By

6.89

FWCI (Field Weighted Citation Impact)

Refs

0.95

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Empowering Unsupervised Domain Adaptation with Large-scale Pre-trained Vision-Language Models

Abstract

Metrics

Citation History

Topics

Related Documents

Cross-domain distillation for unsupervised domain adaptation with large vision-language models

Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models

Pivot-based Unsupervised Domain Adaptation for Pre-trained Language Model

Leveraging Vision-Language Pre-training for Unsupervised Domain Adaptation

Construction safety inspection with large pre-trained vision-language models