Rethinking Convolutional Neural Network in Multimodal Sequential Recommendation

Zhicheng Zhou; Bi Liang; Yujie Zhang

doi:10.1145/3777377

ScienceGate Book Chapters

JOURNAL ARTICLE

Rethinking Convolutional Neural Network in Multimodal Sequential Recommendation

Zhicheng Zhou Bi Liang Yujie Zhang

Year: 2025 Journal: ACM Transactions on Information Systems Vol: 44 (2)Pages: 1-35

DOI: 10.1145/3777377

Get Full-Text PDF Get Analytical Report

Abstract

Multimodal data can more comprehensively portray changes in user interests, and thus, multimodal sequential recommendation (MSRS) has gained widespread attention in recent years. However, the MSRS faces two key challenges: (1) how to effectively model long-range dependencies in user interaction sequence; and (2) how to efficiently fuse multimodal features. To address these challenges, this article proposes a novel multimodal sequential recommendation architecture based on pure convolutional neural network (CNN), named PCMSRec. PCMSRec contains two key innovations: first, by using the global receptive field of large kernel convolution, it models the long-range dependencies of multimodal user interaction sequence, breaking through the limitation that existing CNN-based methods can only capture local short-distance dependencies; second, by taking advantage of the high flexibility of the CNN architecture, it models the relationships among multimodal features of items through a carefully designed convolutional layer architecture and fusion strategy. Specifically, PCMSRec consists of two blocks: sequence-feature block and modal block. The sequence-feature block models long-range dependencies in user interaction sequence through large kernel convolutional layer and extracts item features by incorporating a bottleneck architecture. The modal block models the complex relationships between multimodal features using multiple convolutional layer. Experimental results on five public datasets show that PCMSRec outperforms existing methods.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Rethinking Convolutional Neural Network in Multimodal Sequential Recommendation

Abstract

Metrics

Topics

Related Documents

Recurrent Convolutional Neural Network for Sequential Recommendation

Graph Convolutional Neural Network for Multimodal Movie Recommendation

Self-attention Convolutional Neural Network for Sequential Recommendation

Double Attention Convolutional Neural Network for Sequential Recommendation

Multimodal Interactive Network for Sequential Recommendation