JOURNAL ARTICLE

Self-Supervised Learning for Multimedia Recommendation

Zhulin TaoXiaohao LiuYewei XiaXiang WangLifang YangXianglin HuangTat‐Seng Chua

Year: 2022 Journal:   IEEE Transactions on Multimedia Vol: 25 Pages: 5107-5116   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Learning representations for multimedia content is critical for multimedia recommendation. Current representation learning methods roughly fall into two groups: (1) using the historical interactions to create ID embeddings of users and items, and (2) treating multi-modal data as the side information of items to enrich their ID embeddings. Each user-item interaction offers the supervisory signal to optimize the representation learning by the traditional supervised learning paradigm. Due to the overlook of the multi-modal patterns ($e.g.$, co-occurrence of visual, acoustic, textual features in micro-videos a user saw before, and her behavioral features) hidden in the data, these methods are insufficient to create powerful representations and obtain satisfactory recommendation accuracy. To capture multi-modal patterns in the data itself, we go beyond the supervised learning paradigm, and incorporate the idea of self-supervised learning (SSL) into multimedia recommendation. Specifically, SSL consists of two components: (1) data augmentation upon multi-modal contents, where we design three operators — feature dropout (FD), feature masking (FM), feature fine and coarse spaces (FAC) — to generate multiple views of individual items; and (2) contrastive learning, which differentiates the views of an item from the others’ to distill additional supervisory signals. Clearly, SSL enables us to explore and exhibit the underlying relations among modalities, thereby resulting in powerful representations. We denote the generic framework by Self-supervised Learning-guided Multimedia Recommendation (SLMRec). Extensive experiments are performed on three real-world datasets, showing that SLMRec achieves significant improvements over several state-of-the-art baselines like LightGCN [1], MMGCN [2]. Further analysis shows how SSL affects recommendation performance.

Keywords:
Computer science Feature learning Feature (linguistics) Representation (politics) Modal Collaborative filtering Artificial intelligence Recommender system Masking (illustration) Modalities Multimedia Machine learning Information retrieval Natural language processing

Metrics

174
Cited By
65.74
FWCI (Field Weighted Citation Impact)
61
Refs
1.00
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Recommender Systems and Techniques
Physical Sciences →  Computer Science →  Information Systems
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Self-Supervised Learning for Recommendation

Chao HuangLianghao XiaXiang WangXiangnan HeDawei Yin

Journal:   Proceedings of the 31st ACM International Conference on Information & Knowledge Management Year: 2022 Pages: 5136-5139
JOURNAL ARTICLE

Self-Supervised learning for Conversational Recommendation

Shuokai LiRuobing XieYongchun ZhuFuzhen ZhuangZhenwei TangWayne Xin ZhaoQing He

Journal:   Information Processing & Management Year: 2022 Vol: 59 (6)Pages: 103067-103067
JOURNAL ARTICLE

Self-supervised contrastive learning for itinerary recommendation

Lei ChenGuixiang Zhu

Journal:   Expert Systems with Applications Year: 2024 Vol: 268 Pages: 126246-126246
© 2026 ScienceGate Book Chapters — All rights reserved.