JOURNAL ARTICLE

Rethinking Convolutional Neural Network in Multimodal Sequential Recommendation

Zhicheng ZhouBi LiangYujie Zhang

Year: 2025 Journal:   ACM Transactions on Information Systems Vol: 44 (2)Pages: 1-35

Abstract

Multimodal data can more comprehensively portray changes in user interests, and thus, multimodal sequential recommendation (MSRS) has gained widespread attention in recent years. However, the MSRS faces two key challenges: (1) how to effectively model long-range dependencies in user interaction sequence; and (2) how to efficiently fuse multimodal features. To address these challenges, this article proposes a novel multimodal sequential recommendation architecture based on pure convolutional neural network (CNN), named PCMSRec. PCMSRec contains two key innovations: first, by using the global receptive field of large kernel convolution, it models the long-range dependencies of multimodal user interaction sequence, breaking through the limitation that existing CNN-based methods can only capture local short-distance dependencies; second, by taking advantage of the high flexibility of the CNN architecture, it models the relationships among multimodal features of items through a carefully designed convolutional layer architecture and fusion strategy. Specifically, PCMSRec consists of two blocks: sequence-feature block and modal block. The sequence-feature block models long-range dependencies in user interaction sequence through large kernel convolutional layer and extracts item features by incorporating a bottleneck architecture. The modal block models the complex relationships between multimodal features using multiple convolutional layer. Experimental results on five public datasets show that PCMSRec outperforms existing methods.

Keywords:

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
57
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

© 2026 ScienceGate Book Chapters — All rights reserved.