JOURNAL ARTICLE

Measuring Similarity of Dual-Modal Academic Data Based on Multi-Fusion Representation Learning

Li ZhangQiang GaoMing LiuZepeng GuBo Lang

Year: 2024 Journal:   IEEE Access Vol: 12 Pages: 97701-97711   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Nowadays, academic materials such as articles, patents, lecture notes, and observation records often use both texts and images (i.e., dual-modal data) to illustrate scientific issues. Measuring the similarity of such dual-modal academic data largely depends on dual-modal features, which is far from satisfying in practice. To learn dual-modal feature representation, most current approaches mine interactions between texts and images on top of their fusion networks. This work proposes a multi-fusion deep learning framework that learns semantically richer dual-modal representations. The framework designs multiple fusion points in the feature space of various levels, and gradually integrates the fusion information from the low-level to the high-level. In addition, we develop a multi-channel decoding network with alternate fine-tuning strategies to mine modal-specific features and cross-modal correlations thoroughly. To our knowledge, this is the first work to bring forward deep learning functions for dual-modal academic data. It reduces the semantic and statistical attribute differences between two modalities, thereby learning robust representations. A large number of experiments conducted on real-world data sets show that our method has significant performance compared with state-of-the-art approaches.

Keywords:
Modal Computer science Dual (grammatical number) Similarity (geometry) Representation (politics) Artificial intelligence Modalities Feature (linguistics) Feature learning Sensor fusion Machine learning External Data Representation Data mining Pattern recognition (psychology) Image (mathematics)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
57
Refs
0.11
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Dual-Discriminator Based Multi-modal Medical Fusion

Haoran WangZhen HuaJinjiang Li

Lecture notes in electrical engineering Year: 2022 Pages: 1164-1172
JOURNAL ARTICLE

Learning Multi-modal Similarity

Brian McFeeGert Lanckriet

Journal:   arXiv (Cornell University) Year: 2010 Vol: 12 (15)Pages: 491-523
JOURNAL ARTICLE

Learning Multi-modal Similarity

McFeeBrianLanckrietGert

Journal:   Journal of Machine Learning Research Year: 2011
© 2026 ScienceGate Book Chapters — All rights reserved.