JOURNAL ARTICLE

Hierarchical Multi-Modal Attention Network for Time-Sync Comment Video Recommendation

Weihao ZhaoHan WuWeidong HeHaoyang BiHao WangChen ZhuTong XuEnhong Chen

Year: 2023 Journal:   IEEE Transactions on Circuits and Systems for Video Technology Vol: 34 (4)Pages: 2694-2705   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Due to inherent interactivity, time-sync comment of videos have attracted increasing attention and were widely adopted in online video platforms. In addition to enhancing user engagement, time-sync comments provide abundant semantic information that can greatly enhance video understanding, which however is largely overlooked in mainstream video recommender systems. To address this issue, we propose a Hierarchical Multi-modal Attention Network (HMAN) to effectively utilize time-sync comment for recommendation. Specifically, we design a Multi-level Text Condense (MTC) Module to capture the accurate semantics of time-sync comments via text-level and vision-level condense operations. Then we propose a Range Convolution Block (RCB) to capture both visual and textual information from variable-length event segments leveraging the variable respective field. After that, we design a Hierarchical Multi-modal Branch Fusion (HMBF) Module to obtain a comprehensive multi-modal representation of the time-sync comments video. Finally, with the obtained video representation, recommendation scores are obtained through its inner product with user embedding. Extensive experiments demonstrate the effectiveness of the proposed HMAN, and ablation studies on different variants of HMAN further validate the utility of each component and the necessity of the hierarchical multi-modal branch fusion method.

Keywords:
Computer science sync Modal Semantics (computer science) Information retrieval Real-time computing Multimedia Channel (broadcasting) Computer network

Metrics

7
Cited By
1.27
FWCI (Field Weighted Citation Impact)
78
Refs
0.77
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.