RGB-D-Based Object Recognition Using Multimodal Convolutional Neural Networks: A Survey

Mingliang Gao; Jun Jiang; Guofeng Zou; Vijay John; Zheng Liu

doi:10.1109/access.2019.2907071

ScienceGate Book Chapters

JOURNAL ARTICLE

RGB-D-Based Object Recognition Using Multimodal Convolutional Neural Networks: A Survey

Mingliang Gao Jun Jiang Guofeng Zou Vijay John Zheng Liu

Year: 2019 Journal: IEEE Access Vol: 7 Pages: 43110-43136 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/access.2019.2907071

Get Full-Text PDF Get Analytical Report

Abstract

Object recognition in real-world environments is one of the fundamental and key tasks in computer vision and robotics communities. With the advanced sensing technologies and low-cost depth sensors, the high-quality RGB and depth images can be recorded synchronously, and the object recognition performance can be improved by jointly exploiting them. RGB-D-based object recognition has evolved from early methods that using hand-crafted representations to the current state-of-the-art deep learning-based methods. With the undeniable success of deep learning, especially convolutional neural networks (CNNs) in the visual domain, the natural progression of deep learning research points to problems involving larger and more complex multimodal data. In this paper, we provide a comprehensive survey of recent multimodal CNNs (MMCNNs)-based approaches that have demonstrated significant improvements over previous methods. We highlight two key issues, namely, training data deficiency and multimodal fusion. In addition, we summarize and discuss the publicly available RGB-D object recognition datasets and present a comparative performance evaluation of the proposed methods on these benchmark datasets. Finally, we identify promising avenues of research in this rapidly evolving field. This survey will not only enable researchers to get a good overview of the state-of-the-art methods for RGB-D-based object recognition but also provide a reference for other multimodal machine learning applications, e.g., multimodal medical image fusion, audio-visual speech recognition, and multimedia retrieval and generation.

Keywords:

Computer science Artificial intelligence Convolutional neural network Deep learning Benchmark (surveying) Machine learning RGB color model Cognitive neuroscience of visual object recognition 3D single-object recognition Field (mathematics) Key (lock) Sketch recognition Object (grammar) Computer vision Gesture recognition

Metrics

Cited By

3.63

FWCI (Field Weighted Citation Impact)

243

Refs

0.94

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

RGB-D-Based Object Recognition Using Multimodal Convolutional Neural Networks: A Survey

Abstract

Metrics

Citation History

Topics

Related Documents

RGB-D object recognition with multimodal deep convolutional neural networks

RGB-D Object Recognition Using Deep Convolutional Neural Networks

RGB-D Based Multimodal Convolutional Neural Networks for Spacecraft Recognition

Revisiting Deep Convolutional Neural Networks for RGB-D Based Object Recognition

Multimodal Convolutional Neural Network for Object Detection Using RGB-D Images