JOURNAL ARTICLE

Discriminative Multi-modal Feature Fusion for RGBD Indoor Scene Recognition

Abstract

RGBD scene recognition has attracted increasingly attention due to the rapid development of depth sensors and their wide application scenarios. While many research has been conducted, most work used hand-crafted features which are difficult to capture high-level semantic structures. Recently, the feature extracted from deep convolutional neural network has produced state-of-the-art results for various computer vision tasks, which inspire researchers to explore incorporating CNN learned features for RGBD scene understanding. On the other hand, most existing work combines rgb and depth features without adequately exploiting the consistency and complementary information between them. Inspired by some recent work on RGBD object recognition using multi-modal feature fusion, we introduce a novel discriminative multi-modal fusion framework for rgbd scene recognition for the first time which simultaneously considers the inter-and intra-modality correlation for all samples and meanwhile regularizing the learned features to be discriminative and compact. The results from the multimodal layer can be back-propagated to the lower CNN layers, hence the parameters of the CNN layers and multimodal layers are updated iteratively until convergence. Experiments on the recently proposed large scale SUN RGB-D datasets show that our method achieved the state-of-the-art without any image segmentation.

Keywords:
Discriminative model Computer science Artificial intelligence Convolutional neural network RGB color model Feature (linguistics) Pattern recognition (psychology) Modal Computer vision Segmentation Deep learning Feature extraction

Metrics

113
Cited By
10.87
FWCI (Field Weighted Citation Impact)
31
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

BOOK-CHAPTER

Scene Recognition Based on Multi-feature Fusion for Indoor Robot

Xiaocheng LiuWei HongHuiqiu Lu

Lecture notes in computer science Year: 2017 Pages: 160-169
JOURNAL ARTICLE

Multi-type and Multi-level Feature Fusion Network for RGBD Indoor Semantic Segmentation

Yuwen XiaChaochen GuKaijie Wu

Journal:   2022 34th Chinese Control and Decision Conference (CCDC) Year: 2022 Pages: 6142-6148
JOURNAL ARTICLE

Indoor scene classification model based on multi-modal fusion

Yaning WangWeifeng LiuJianning LiZhangming Peng

Journal:   2021 International Conference on Control, Automation and Information Sciences (ICCAIS) Year: 2021 Vol: pp Pages: 88-93
© 2026 ScienceGate Book Chapters — All rights reserved.