JOURNAL ARTICLE

Multimodal Fusion Methods with Vision Transformers for Remote Sensing Semantic Segmentation

Abstract

This paper presents a comparative analysis of transformerbased fusion methods applied to a novel multimodal dataset for remote sensing semantic segmentation. This investigation evaluates the impact of several fusion methods on the accuracy of the results. In particular, for early fusion, we investigate the Early Concatenation. For middle fusion, we investigate four methods, namely the Token Patch Embedding, Channel Patch Embedding, Token Fusion at Attention Level, and Cross-Attention. Finally, as a representative of late fusion, we investigate the use of Late Concatenation. The methods presented here are specifically designed to operate effectively with all modalities under investigation. Experiments conducted on the Ticino dataset show that Late Concatenation outperforms the best single modality RGB method of 4.04%, 2.24% and 3.47% respectively on accuracy, precision and mIoU. This study provides an opportunity to further explore fusion methods utilizing transformers, thereby enhancing our understanding of the potential of data fusion.

Keywords:
Computer science Artificial intelligence Computer vision Segmentation Fusion Transformer Sensor fusion Image segmentation Remote sensing Geology Engineering

Metrics

2
Cited By
0.43
FWCI (Field Weighted Citation Impact)
18
Refs
0.66
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Remote-Sensing Image Classification
Physical Sciences →  Engineering →  Media Technology
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image Fusion Techniques
Physical Sciences →  Engineering →  Media Technology

Related Documents

JOURNAL ARTICLE

Vision Foundation Model Guided Multimodal Fusion Network for Remote Sensing Semantic Segmentation

Pan ChenXijian FanTardi TjahjadiHaiyan GuanLiyong FuQiaolin YeRuili Wang

Journal:   IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Year: 2025 Vol: 18 Pages: 9409-9431
JOURNAL ARTICLE

Learning Frequency-Domain Fusion for Multimodal Remote Sensing Semantic Segmentation

Guangsheng ChenFangyu SunWeipeng JingWeitao ZouDonglin DiYang SongLei Fan

Journal:   IEEE Transactions on Geoscience and Remote Sensing Year: 2025 Vol: 63 Pages: 1-16
JOURNAL ARTICLE

A Multilevel Multimodal Fusion Transformer for Remote Sensing Semantic Segmentation

Xianping MaXiaokang ZhangMan-On PunMing Liu

Journal:   IEEE Transactions on Geoscience and Remote Sensing Year: 2024 Vol: 62 Pages: 1-15
© 2026 ScienceGate Book Chapters — All rights reserved.