The importance of recommendation systems in our lives is increasingly growing, helping people quickly access valuable information from massive datasets. Previous recommendation systems have achieved significant success by incorporating techniques such as multimodal data and deep learning. However, current multimodal data recommendation algorithms primarily rely on official content, often overlooking the multimodal data provided by users. The inclusion of user-generated multimodal data from their perspective can further enhance the accuracy of recommendations for users.This paper introduces a novel recommender algorithm, the Multi-Modal and Multi-View Knowledge Graph Attention Network (3MVGAT), which utilizes multi-modal and multi-view data. In the item encoder, we unify and fuse images and text from different perspectives, to learn item representations. Furthermore, we apply an attention mechanism to this encoder to select essential modalities for item representation learning. In the user encoder, we learn user representations based on their historical interactions with items and apply an attention mechanism to capture user preferences. Extensive experiments on real-world datasets demonstrate that our approach effectively enhances the performance of recommender algorithms.
Lei ChenJie CaoYouquan WangWeichao LiangGuixiang Zhu
Fei WangXianzhang ZhuXin ChengYongjun ZhangYansheng Li