MFSM-Net: Multimodal Feature Fusion for the Semantic Segmentation of Urban-Scale Textured 3D Meshes

Xinjie Hao; Jiahui Wang; Wei Leng; Rongting Zhang; Guangyun Zhang

doi:10.3390/rs17091573

ScienceGate Book Chapters

JOURNAL ARTICLE

MFSM-Net: Multimodal Feature Fusion for the Semantic Segmentation of Urban-Scale Textured 3D Meshes

Xinjie Hao Jiahui Wang Wei Leng Rongting Zhang Guangyun Zhang

Year: 2025 Journal: Remote Sensing Vol: 17 (9)Pages: 1573-1573 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/rs17091573

Get Full-Text PDF Get Analytical Report

Abstract

The semantic segmentation of textured 3D meshes is a critical step in constructing city-scale realistic 3D models. Compared to colored point clouds, textured 3D meshes have the advantage of high-resolution texture image patches embedded on each mesh face. However, existing studies predominantly focus on their geometric structures, with limited utilization of these high-resolution textures. Inspired by the binocular perception of humans, this paper proposes a multimodal feature fusion network based on 3D geometric structures and 2D high-resolution texture images for the semantic segmentation of textured 3D meshes. Methodologically, the 3D feature extraction branch computes the centroid coordinates and face normals of mesh faces as initial 3D features, followed by a multi-scale Transformer network to extract high-level 3D features. The 2D feature extraction branch employs orthographic views of city scenes captured from a top-down perspective and uses a U-Net to extract high-level 2D features. To align features across 2D and 3D modalities, a Bridge view-based alignment algorithm is proposed, which visualizes the 3D mesh indices to establish pixel-level associations with orthographic views, achieving the precise alignment of multimodal features. Experimental results demonstrate that the proposed method achieves competitive performance in city-scale textured 3D mesh semantic segmentation, validating the effectiveness and potential of the cross-modal fusion strategy.

Keywords:

Computer science Polygon mesh Segmentation Scale (ratio) Fusion Artificial intelligence Feature (linguistics) Computer vision Computer graphics (images) Cartography Geography

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.13

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

3D Shape Modeling and Analysis

Physical Sciences → Engineering → Computational Mechanics

3D Surveying and Cultural Heritage

Physical Sciences → Earth and Planetary Sciences → Geology

Image Processing and 3D Reconstruction

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

MFSM-Net: Multimodal Feature Fusion for the Semantic Segmentation of Urban-Scale Textured 3D Meshes

Abstract

Metrics

Topics

Related Documents

SEMANTIC SEGMENTATION OF URBAN TEXTURED MESHES THROUGH POINT SAMPLING

Semantic segmentation of 3D textured meshes for urban scene analysis

Improving Semantic Image Segmentation via Label Fusion in Semantically Textured Meshes

MeshSegNet: a local to global feature fusion-based semantic segmentation approach for urban meshes

DynaFusion: Dynamic Feature Fusion for Multimodal 3D Semantic Segmentation