Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion

Siqi Li; Changqing Zou; Yipeng Li; Xibin Zhao; Yue Gao

doi:10.1609/aaai.v34i07.6803

ScienceGate Book Chapters

JOURNAL ARTICLE

Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion

Siqi Li Changqing Zou Yipeng Li Xibin Zhao Yue Gao

Year: 2020 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 34 (07)Pages: 11402-11409 Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v34i07.6803

Get Full-Text PDF Get Analytical Report

Abstract

This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.

Keywords:

Computer science Artificial intelligence Segmentation RGB color model Convolutional neural network Pattern recognition (psychology) Modal Semantics (computer science) Computer vision Categorization Residual Dimension (graph theory) Fusion Task (project management)

Metrics

Cited By

2.61

FWCI (Field Weighted Citation Impact)

Refs

0.92

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Vision and Imaging

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Processing Techniques and Applications

Physical Sciences → Engineering → Media Technology

Advanced Image Processing Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion

Abstract

Metrics

Citation History

Topics

Related Documents

Multi-modal fusion architecture search for camera-based semantic scene completion

Multi-Head Multi-Scale Feature Fusion Network for Semantic Scene Completion

TwinAMFNet : Twin Attention-based Multi-modal Fusion Network for 3D Semantic Segmentation

FFNet: Frequency Fusion Network for Semantic Scene Completion

Semantic Scene Completion through Multi-Level Feature Fusion