The proliferation of multimedia fake news misleads social opinion, damages social harmony, and seriously challenges the authority of news media. Current multimodal fake news detection methods focus on extracting better unimodal features using pre-trained models and developing complex networks for visual and textual feature fusion, mostly ignoring social contextual information. In the text, we present a novel Multi-View Fusion Network (MVFN) for fake news detection, which attempts to achieve fusion learning of heterogeneous multimodal features. The model not only explores cross-modal graph interactions between visual regions and sentence words, but also investigates the effect of social context on feature learning. In addition, we consider the alignment relationship of features in both perspectives for better feature fusion. Experimental results on Chinese and English datasets show that the proposed model outperforms current multimodal detection approaches.
Zhu JiGuangjin WangFuyong XuPeiyu Liu
Lixia JiJingjie FengShijie XiaoHan Zhang
Zhi ZengMingmin WuGuodong LiXiang LiZhongqiang HuangYing Sha
Jing JingHongchen WuJie SunXiaochang FangHuaxiang Zhang