CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation

Ziyang Gong; Zhixiang Wei; Di Wang; Xiaoxing Hu; Xianzheng Ma; Hongruixuan Chen; Yuru Jia; Yun Deng; Zhenming Ji; Xiangwei Zhu; X. Jessie Yang; Naoto Yokoya; Jing Zhang; Bo Du; Junchi Yan; Liangpei Zhang

doi:10.1109/tpami.2025.3649001

JOURNAL ARTICLE

CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation

Ziyang Gong Zhixiang Wei Di Wang Xiaoxing Hu Xianzheng Ma Hongruixuan Chen Yuru Jia Yun Deng Zhenming Ji Xiangwei Zhu X. Jessie Yang Naoto Yokoya Jing Zhang Bo Du Junchi Yan Liangpei Zhang

Year: 2025 Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence Vol: PP Pages: 1-18 Publisher: IEEE Computer Society

DOI: 10.1109/tpami.2025.3649001

Get Full-Text PDF Get Analytical Report

Abstract

Due to the substantial domain gaps in Remote Sensing (RS) images that are characterized by variabilities such as location, wavelength, and sensor type, Remote Sensing Domain Generalization (RSDG) has emerged as a critical and valuable research frontier, focusing on developing models that generalize effectively across diverse scenarios. However, research in this area remains underexplored: (1) Current cross-domain methods primarily focus on Domain Adaptation (DA), which adapts models to predefined domains rather than to unseen ones; (2) Few studies target the RSDG issue, especially for semantic segmentation tasks. Existing related models are developed for specific unknown domains, struggling with issues of underfitting on other unseen scenarios; (3) Existing RS foundation models tend to prioritize in-domain performance over cross-domain generalization. To this end, we introduce the first vision foundation model for RSDG semantic segmentation, CrossEarth. CrossEarth demonstrates strong cross-domain generalization through a specially designed data-level Earth-Style Injection pipeline and a model-level Multi-Task Training pipeline. In addition, for the semantic segmentation task, we have curated an RSDG benchmark comprising 32 semantic segmentation scenarios across various regions, spectral bands, platforms, and climates, providing comprehensive evaluations of the generalizability of future RSDG models. Extensive experiments on this collection demonstrate the superiority of CrossEarth over existing state-of-the-art methods.

Keywords:

Segmentation Generalizability theory Benchmark (surveying) Domain (mathematical analysis) Geospatial analysis Generalization Domain adaptation Focus (optics)

Metrics

Cited By

3.52

FWCI (Field Weighted Citation Impact)

Refs

0.89

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Remote-Sensing Image Classification

Physical Sciences → Engineering → Media Technology

Geographic Information Systems Studies

Social Sciences → Social Sciences → Geography, Planning and Development

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation

Abstract

Metrics

Citation History

Topics

Related Documents

Domain generalization for semantic segmentation of remote sensing images via vision foundation model fine-tuning

Vision Foundation Model Guided Multimodal Fusion Network for Remote Sensing Semantic Segmentation

Vision Foundation Model-Driven Multiscale Expert Tuning for Multimodal Remote Sensing Semantic Segmentation

Deep Relearning in the Geospatial Domain for Semantic Remote Sensing Image Segmentation

Boosting remote semantic segmentation using vision-and-language foundation model