JOURNAL ARTICLE

Cross-Modal Matching and Adaptive Graph Attention Network for RGB-D Scene Recognition

Abstract

Despite the significant advances in RGB-D scene recognition, there are several major limitations that need further investigation. For example, simply extracting modal-specific features neglects the complex relationships among multiple modalities of features. Moreover, cross-modal features have not been considered in most existing methods. To address these concerns, we propose to integrate the tasks of cross-modal matching and modal-specific recognition, termed as Matching-to-Recognition Network (MRNet). Specifically, the cross-modal matching network enhances the descriptive power of the recognition network via a layer-wise semantic loss. The recognition network obtains multi-modal features from a two-stream CNN: global features are obtained by a higher-layer of a CNN to preserve the semantic content, and local layout features are learned by the graph attention network, thus better capturing the key object regions and modelling their relationships. Extensive experiments results demonstrate the MRNet achieves superior performance to state-of-the-art methods, especially for recognition solely based on single modality. © 2023 IEEE.

Keywords:
Computer science Modal Artificial intelligence Matching (statistics) Pattern recognition (psychology) Modality (human–computer interaction) Graph Key (lock) Cognitive neuroscience of visual object recognition RGB color model Feature extraction Theoretical computer science Mathematics

Metrics

2
Cited By
0.36
FWCI (Field Weighted Citation Impact)
22
Refs
0.51
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

ACM: Adaptive Cross-Modal Graph Convolutional Neural Networks for RGB-D Scene Recognition

Yuan YuanZhitong XiongQi Wang

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2019 Vol: 33 (01)Pages: 9176-9184
JOURNAL ARTICLE

Cross-Modal Pyramid Translation for RGB-D Scene Recognition

Dapeng DuLimin WangZhaoyang LiGangshan Wu

Journal:   International Journal of Computer Vision Year: 2021 Vol: 129 (8)Pages: 2309-2327
JOURNAL ARTICLE

Cross-modal attention fusion network for RGB-D semantic segmentation

Qiankun ZhaoYingcai WanJiqian XuLijin Fang

Journal:   Neurocomputing Year: 2023 Vol: 548 Pages: 126389-126389
JOURNAL ARTICLE

Cross-Modal Adaptive Interaction Network for RGB-D Saliency Detection

Qinsheng DuYingxu BianJianyu WuShiyan ZhangJian Zhao

Journal:   Applied Sciences Year: 2024 Vol: 14 (17)Pages: 7440-7440
JOURNAL ARTICLE

CACFNet: Cross-Modal Attention Cascaded Fusion Network for RGB-T Urban Scene Parsing

Wujie ZhouShaohua DongMeixin FangLu Yu

Journal:   IEEE Transactions on Intelligent Vehicles Year: 2023 Vol: 9 (1)Pages: 1919-1929
© 2026 ScienceGate Book Chapters — All rights reserved.