JOURNAL ARTICLE

Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion

Siqi LiChangqing ZouYipeng LiXibin ZhaoYue Gao

Year: 2020 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 34 (07)Pages: 11402-11409   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.

Keywords:
Computer science Artificial intelligence Segmentation RGB color model Convolutional neural network Pattern recognition (psychology) Modal Semantics (computer science) Computer vision Categorization Residual Dimension (graph theory) Fusion Task (project management)

Metrics

51
Cited By
2.61
FWCI (Field Weighted Citation Impact)
41
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Vision and Imaging
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Processing Techniques and Applications
Physical Sciences →  Engineering →  Media Technology
Advanced Image Processing Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Multi-modal fusion architecture search for camera-based semantic scene completion

Xuzhi WangWei FengLiang Wan

Journal:   Expert Systems with Applications Year: 2023 Vol: 243 Pages: 122885-122885
JOURNAL ARTICLE

TwinAMFNet : Twin Attention-based Multi-modal Fusion Network for 3D Semantic Segmentation

Jaegeun YoonJiyeon JeonKwangho Song

Journal:   Journal of KIISE Year: 2023 Vol: 50 (9)Pages: 784-794
JOURNAL ARTICLE

FFNet: Frequency Fusion Network for Semantic Scene Completion

Xuzhi WangDi LinLiang Wan

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2022 Vol: 36 (3)Pages: 2550-2557
JOURNAL ARTICLE

Semantic Scene Completion through Multi-Level Feature Fusion

Ruochong FuHang WuMengxiang HaoYubin Miao

Journal:   2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Year: 2022 Pages: 8399-8406
© 2026 ScienceGate Book Chapters — All rights reserved.