Driven by the rapid growth of multimedia big data, multimodal learning (especially multimodal deep learning) has gained its significant importance and achieved biggest success in various multimedia computing related applications. The complexity and scale of modern multimedia recommender system often require much more sophisticated statistical model, learning architecture and data processing algorithms to facilitate effective and efficient content understanding and analysis than ever before. In this work, we discuss several major research challenges of the future multimedia recommender system supported by advanced multimodal learning. We also, 1) introduce why multimodal learning is important for large scale multimedia recommendation, 2) review various limitations of the current generation of learning model and architecture, and 3) review key challenges and technical issues in developing and evaluating modern multimedia recommender systems with multimodal learning under different contexts. We hope that our discussion and prediction provide an impetus for further research on this important direction.
Daniele MalitestaGiandomenico CornacchiaClaudio PomoFelice Antonio MerraTommaso Di NoiaEugenio Di Sciascio
Shuaiyang LiDan GuoKang LiuRichang HongFeng Xue
Kang LiuFeng XueDan GuoPeijie SunShengsheng QianRichang Hong
T. Y. FuJiang CaoZ. WangMin Yang