JOURNAL ARTICLE

CMPFFNet: Cross-Modal and Progressive Feature Fusion Network for RGB-D Indoor Scene Semantic Segmentation

Wujie ZhouYuxiang XiaoWeiqing YanLu Yu

Year: 2023 Journal:   IEEE Transactions on Automation Science and Engineering Vol: 21 (4)Pages: 5523-5533   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Depth information can contribute to the semantic segmentation of scenes from red–green–blue (RGB) images. Therefore, the amount of information that can be obtained from RGB and RGB-depth (RGB-D) images is significantly greater for this task. However, RGB and RGB-D modalities are different in terms of object representation. Features that are extracted from these modalities and fused effectively are key to scene semantic segmentation. In addition, complete segmentation requires the fusion of multiscale features to unify global information. However, existing approaches primarily use multiscale features for sequential integration. This study introduces a cross-modal and progressive feature fusion network (CMPFFNet) for semantic segmentation of indoor scenes in RGB-D images. First, a multimodal adaptive alignment fusion (MAAF) module based on an attention mechanism is introduced. This module aligns the two modal channels by additive attention and then computes the spatial similarity between the two modalities based on the dot product to incorporate the complementary information of the depth modality into the RGB modality. In addition, a reverse attention augmentation (RAA) module is introduced to augment the more abstract high-level features for two adjacent multilevel features using the concrete semantic information of the lower-level features in them. After augmenting the extracted multilevel features, a multilevel feature progressive fusion (MFPF) module is deployed; this module sequentially fuses the neighboring two features progressively with emphasis on the spatial semantics. The network uses the Segformer network with high performance as a backbone in multiple computer vision tasks to enhance the segmentation capability. Experimental results obtained from two publicly available datasets of indoor scenes reveal that the proposed CMPFFNet outperforms existing models in semantic segmentation of indoor scenes of RGB-D images. Note to Practitioners —This study introduces a cross-modal and progressive feature fusion network (CMPFFNet) for indoor scene semantic segmentation in RGB-D images. The complementary information of the depth modality is incorporated into the RGB modality in both channel and spatial forms to form a discriminative representation for easy segmentation. A multilevel feature aggregation decoder is proposed to predict the results of semantic segmentation of scenes. The network uses the Segformer network with high performance as a backbone in multiple computer vision tasks to enhance the segmentation capability.

Keywords:
RGB color model Computer science Artificial intelligence Segmentation Feature (linguistics) Computer vision Modality (human–computer interaction) Semantics (computer science) Pattern recognition (psychology) Image segmentation Feature extraction

Metrics

20
Cited By
3.64
FWCI (Field Weighted Citation Impact)
51
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

CFANet: The Cross-Modal Fusion Attention Network for Indoor RGB-D Semantic Segmentation

Longtao WuDan WeiChang‐An Xu

Journal:   Journal of Imaging Year: 2025 Vol: 11 (6)Pages: 177-177
JOURNAL ARTICLE

Cross-modal attention fusion network for RGB-D semantic segmentation

Qiankun ZhaoYingcai WanJiqian XuLijin Fang

Journal:   Neurocomputing Year: 2023 Vol: 548 Pages: 126389-126389
JOURNAL ARTICLE

DMFNet: Deep Multi-Modal Fusion Network for RGB-D Indoor Scene Segmentation

Jianzhong YuanWujie ZhouTing Luo

Journal:   IEEE Access Year: 2019 Vol: 7 Pages: 169350-169358
JOURNAL ARTICLE

FGMNet: Feature grouping mechanism network for RGB-D indoor scene semantic segmentation

Yuming ZhangWujie ZhouLv YeLu YuTing Luo

Journal:   Digital Signal Processing Year: 2024 Vol: 149 Pages: 104480-104480
© 2026 ScienceGate Book Chapters — All rights reserved.