RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning

Zhitong Xiong; Yuan Yuan; Qi Wang

doi:10.1109/access.2019.2932080

ScienceGate Book Chapters

JOURNAL ARTICLE

RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning

Zhitong Xiong Yuan Yuan Qi Wang

Year: 2019 Journal: IEEE Access Vol: 7 Pages: 106739-106747 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/access.2019.2932080

Get Full-Text PDF Get Analytical Report

Abstract

RGB-D image-based scene recognition has achieved significant performance improvement with the development of deep learning methods. While convolutional neural networks can learn high-semantic level features for object recognition, these methods still have limitations for RGB-D scene classification. One limitation is that how to learn better multi-modal features for the RGB-D scene recognition is still an open problem. Another limitation is that the scene images are usually not object-centric and with great spatial variability. Thus, vanilla full-image CNN features maybe not optimal for scene recognition. Considering these problems, in this paper, we propose a compact and effective framework for RGB-D scene recognition. Specifically, we make the following contributions: 1) A novel RGB-D scene recognition framework is proposed to explicitly learn the global modal-specific and local modal-consistent features simultaneously. Different from existing approaches, local CNN features are considered for the learning of modal-consistent representations; 2) key Feature Selection (KFS) module is designed, which can adaptively select important local features from the high-semantic level CNN feature maps. It is more efficient and effective than object detection and dense patch-sampling based methods, and; 3) a triplet correlation loss and a spatial-attention similarity loss are proposed for the training of KFS module. Under the supervision of the proposed loss functions, the network can learn import local features of two modalities with no need for extra annotations. Finally, by concatenating the global and local features together, the proposed framework can achieve new state-of-the-art scene recognition performance on the SUN RGB-D dataset and NYU Depth version 2 (NYUD v2) dataset.

Keywords:

Computer science Artificial intelligence RGB color model Convolutional neural network Feature (linguistics) Pattern recognition (psychology) Computer vision Modal Cognitive neuroscience of visual object recognition Deep learning Feature learning Object (grammar) Similarity (geometry) Image (mathematics)

Metrics

Cited By

1.60

FWCI (Field Weighted Citation Impact)

Refs

0.86

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Robotics and Sensor-Based Localization

Physical Sciences → Engineering → Aerospace Engineering

RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning

Abstract

Metrics

Citation History

Topics

Related Documents

RGB-D Scene Classification via Multi-modal Feature Learning

Multi-modal Unsupervised Feature Learning for RGB-D Scene Labeling

MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition

Multi-modal feature fusion for action recognition in RGB-D sequences

Multi-modal deep feature learning for RGB-D object detection