With the increasing popularity of micro-video sharing where people shoot short-videos effortlessly and share their daily stories on social media platforms, the micro-video recommendation has attracted extensive research efforts to provide users with micro-videos that interest them. In this paper, a hypothesis we explore is that, not only do users have multi-modal interest, but micro-videos have multi-modal targeted audience segments. As a result, we propose a novel framework User-Video Co-Attention Network (UVCAN), which can learn multi-modal information from both user and microvideo side using attention mechanism. In addition, UVCAN reasons about the attention in a stacked attention network fashion for both user and micro-video. Extensive experiments on two datasets collected from Toffee present superior results of our proposed UVCAN over the state-of-the-art recommendation methods, which demonstrate the effectiveness of the proposed framework.
Desheng CaiShengsheng QianQuan FangChangsheng Xu
Desheng CaiShengsheng QianQuan FangJun HuWenkui DingChangsheng Xu
Kangkang BianJingwei MaJiahui WenYang XuMingyang ZhongLei Zhu
Jingwei MaKangkang BianJiahui WenYang XuMingyang ZhongLei Zhu