Self-Supervised Learning for Multimedia Recommendation

Zhulin Tao; Xiaohao Liu; Yewei Xia; Xiang Wang; Lifang Yang; Xianglin Huang; Tat‐Seng Chua

doi:10.1109/tmm.2022.3187556

ScienceGate Book Chapters

JOURNAL ARTICLE

Self-Supervised Learning for Multimedia Recommendation

Zhulin Tao Xiaohao Liu Yewei Xia Xiang Wang Lifang Yang Xianglin Huang Tat‐Seng Chua

Year: 2022 Journal: IEEE Transactions on Multimedia Vol: 25 Pages: 5107-5116 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tmm.2022.3187556

Get Full-Text PDF Get Analytical Report

Abstract

Learning representations for multimedia content is critical for multimedia recommendation. Current representation learning methods roughly fall into two groups: (1) using the historical interactions to create ID embeddings of users and items, and (2) treating multi-modal data as the side information of items to enrich their ID embeddings. Each user-item interaction offers the supervisory signal to optimize the representation learning by the traditional supervised learning paradigm. Due to the overlook of the multi-modal patterns ($e.g.$, co-occurrence of visual, acoustic, textual features in micro-videos a user saw before, and her behavioral features) hidden in the data, these methods are insufficient to create powerful representations and obtain satisfactory recommendation accuracy. To capture multi-modal patterns in the data itself, we go beyond the supervised learning paradigm, and incorporate the idea of self-supervised learning (SSL) into multimedia recommendation. Specifically, SSL consists of two components: (1) data augmentation upon multi-modal contents, where we design three operators — feature dropout (FD), feature masking (FM), feature fine and coarse spaces (FAC) — to generate multiple views of individual items; and (2) contrastive learning, which differentiates the views of an item from the others’ to distill additional supervisory signals. Clearly, SSL enables us to explore and exhibit the underlying relations among modalities, thereby resulting in powerful representations. We denote the generic framework by Self-supervised Learning-guided Multimedia Recommendation (SLMRec). Extensive experiments are performed on three real-world datasets, showing that SLMRec achieves significant improvements over several state-of-the-art baselines like LightGCN [1], MMGCN [2]. Further analysis shows how SSL affects recommendation performance.

Keywords:

Computer science Feature learning Feature (linguistics) Representation (politics) Modal Collaborative filtering Artificial intelligence Recommender system Masking (illustration) Modalities Multimedia Machine learning Information retrieval Natural language processing

Metrics

174

Cited By

65.74

FWCI (Field Weighted Citation Impact)

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Recommender Systems and Techniques

Physical Sciences → Computer Science → Information Systems

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Retrieval and Classification Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Self-Supervised Learning for Multimedia Recommendation

Abstract

Metrics

Citation History

Topics

Related Documents

Self-Supervised Learning for Recommendation

Automated Self-Supervised Learning for Recommendation

Self-Supervised learning for Conversational Recommendation

Self-supervised Graph Learning for Recommendation

Self-supervised contrastive learning for itinerary recommendation