Multimodal Semantic Segmentation Based On Improved Vision Transformers

Weimin Qi; H. T. Chen; Zhiming Wang; Meng Wang

doi:10.1145/3650400.3650493

ScienceGate Book Chapters

JOURNAL ARTICLE

Multimodal Semantic Segmentation Based On Improved Vision Transformers

Weimin Qi H. T. Chen Zhiming Wang Meng Wang

Year: 2023 Pages: 565-569

DOI: 10.1145/3650400.3650493

Get Full-Text PDF Get Analytical Report

Abstract

Although semantic segmentation networks based on CNN or RNN can already perform the semantic segmentation task better, the introduction of multimodal input and Transformer can make the performance of semantic segmentation networks have further room for improvement. In this paper, we try to apply Transformer to the multimodal input scenario, but the ability of Transformer to handle multimodal inputs is not ideal, and how and where features from different modalities should interact with each other poses a great challenge to the design of the fusion scheme of the model architecture. In this regard, this paper improves Vision Transformer by using Token Fusion's model, and finally completes the image semantic segmentation task for RGB-Depth multimodal input efficiently.

Keywords:

Computer science Segmentation Transformer Artificial intelligence Computer vision Image segmentation Engineering

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.23

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Infrared Target Detection Methodologies

Physical Sciences → Engineering → Aerospace Engineering

Multimodal Semantic Segmentation Based On Improved Vision Transformers

Abstract

Metrics

Topics

Related Documents

Multimodal Fusion Methods with Vision Transformers for Remote Sensing Semantic Segmentation

Self-supervised vision transformers for semantic segmentation

Semantic segmentation using Vision Transformers: A survey

Training Vision Transformers for Semi-Supervised Semantic Segmentation

Vision Transformers: From Semantic Segmentation to Dense Prediction