Measuring Similarity of Dual-Modal Academic Data Based on Multi-Fusion Representation Learning

Li Zhang; Qiang Gao; Ming Liu; Zepeng Gu; Bo Lang

doi:10.1109/access.2024.3427731

ScienceGate Book Chapters

JOURNAL ARTICLE

Measuring Similarity of Dual-Modal Academic Data Based on Multi-Fusion Representation Learning

Li Zhang Qiang Gao Ming Liu Zepeng Gu Bo Lang

Year: 2024 Journal: IEEE Access Vol: 12 Pages: 97701-97711 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/access.2024.3427731

Get Full-Text PDF Get Analytical Report

Abstract

Nowadays, academic materials such as articles, patents, lecture notes, and observation records often use both texts and images (i.e., dual-modal data) to illustrate scientific issues. Measuring the similarity of such dual-modal academic data largely depends on dual-modal features, which is far from satisfying in practice. To learn dual-modal feature representation, most current approaches mine interactions between texts and images on top of their fusion networks. This work proposes a multi-fusion deep learning framework that learns semantically richer dual-modal representations. The framework designs multiple fusion points in the feature space of various levels, and gradually integrates the fusion information from the low-level to the high-level. In addition, we develop a multi-channel decoding network with alternate fine-tuning strategies to mine modal-specific features and cross-modal correlations thoroughly. To our knowledge, this is the first work to bring forward deep learning functions for dual-modal academic data. It reduces the semantic and statistical attribute differences between two modalities, thereby learning robust representations. A large number of experiments conducted on real-world data sets show that our method has significant performance compared with state-of-the-art approaches.

Keywords:

Modal Computer science Dual (grammatical number) Similarity (geometry) Representation (politics) Artificial intelligence Modalities Feature (linguistics) Feature learning Sensor fusion Machine learning External Data Representation Data mining Pattern recognition (psychology) Image (mathematics)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.11

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Measuring Similarity of Dual-Modal Academic Data Based on Multi-Fusion Representation Learning

Abstract

Metrics

Topics

Related Documents

Academic Prediction in Multi-modal Learning Environments Using Data Fusion

QuatFuse: Quaternion-based orthogonal representation learning for multi-modal image fusion

Dual-Discriminator Based Multi-modal Medical Fusion

Learning Multi-modal Similarity

Learning Multi-modal Similarity