JOURNAL ARTICLE

Scalable Multimodal Learning and Multimedia Recommendation

Abstract

Driven by the rapid growth of multimedia big data, multimodal learning (especially multimodal deep learning) has gained its significant importance and achieved biggest success in various multimedia computing related applications. The complexity and scale of modern multimedia recommender system often require much more sophisticated statistical model, learning architecture and data processing algorithms to facilitate effective and efficient content understanding and analysis than ever before. In this work, we discuss several major research challenges of the future multimedia recommender system supported by advanced multimodal learning. We also, 1) introduce why multimodal learning is important for large scale multimedia recommendation, 2) review various limitations of the current generation of learning model and architecture, and 3) review key challenges and technical issues in developing and evaluating modern multimedia recommender systems with multimodal learning under different contexts. We hope that our discussion and prediction provide an impetus for further research on this important direction.

Keywords:
Computer science Multimedia Scalability Human–computer interaction World Wide Web Database

Metrics

3
Cited By
0.77
FWCI (Field Weighted Citation Impact)
31
Refs
0.74
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and dialogue systems
Physical Sciences →  Computer Science →  Artificial Intelligence
Wikis in Education and Collaboration
Social Sciences →  Social Sciences →  Communication
© 2026 ScienceGate Book Chapters — All rights reserved.