Zhicheng ZhouBi LiangYujie Zhang
Multimodal data can more comprehensively portray changes in user interests, and thus, multimodal sequential recommendation (MSRS) has gained widespread attention in recent years. However, the MSRS faces two key challenges: (1) how to effectively model long-range dependencies in user interaction sequence; and (2) how to efficiently fuse multimodal features. To address these challenges, this article proposes a novel multimodal sequential recommendation architecture based on pure convolutional neural network (CNN), named PCMSRec. PCMSRec contains two key innovations: first, by using the global receptive field of large kernel convolution, it models the long-range dependencies of multimodal user interaction sequence, breaking through the limitation that existing CNN-based methods can only capture local short-distance dependencies; second, by taking advantage of the high flexibility of the CNN architecture, it models the relationships among multimodal features of items through a carefully designed convolutional layer architecture and fusion strategy. Specifically, PCMSRec consists of two blocks: sequence-feature block and modal block. The sequence-feature block models long-range dependencies in user interaction sequence through large kernel convolutional layer and extracts item features by incorporating a bottleneck architecture. The modal block models the complex relationships between multimodal features using multiple convolutional layer. Experimental results on five public datasets show that PCMSRec outperforms existing methods.
Chengfeng XuPengpeng ZhaoYanchi LiuJiajie XuVictor S. ShengZhiming CuiXiaofang ZhouHui Xiong
Prabir MondalDaipayan ChakderSubham RajSriparna SahaNaoyuki Onoe
Vadin James SudarsanMd Masbaul Alam Polash
Qi ChenGuohui LiQuan ZhouSi ShiDeqing Zou
Tengyue HanPengfei WangShaozhang Niu