JOURNAL ARTICLE

Fusion and Discrimination: A Multimodal Graph Contrastive Learning Framework for Multimodal Sarcasm Detection

Bin LiangLin GuiYulan HeErik CambriaRuifeng Xu

Year: 2024 Journal:   IEEE Transactions on Affective Computing Vol: 15 (4)Pages: 1874-1888   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Identifying sarcastic clues from both textual and visual information has become an important research issue, called Multimodal Sarcasm Detection. In this paper, we investigate multimodal sarcasm detection from a novel perspective, where a multimodal graph contrastive learning strategy is proposed to fuse and distinguish the sarcastic clues for textual modality and visual modality. Specifically, we first utilize object detection to derive the crucial visual regions accompanied by their captions of the images, which allows better learning of the key visual regions of visual modality. In addition, to make full use of the semantic information of the visual modality, we employ optical character recognition to extract the textual content in the images. Then, based on image regions, the textual content of visual modality, and the context of the textual modality, we build a multimodal graph for each sample to model the intricate sarcastic relations between modalities. Furthermore, we devise a graph-oriented contrastive learning strategy to leverage the correlations in the same label and differences between different labels, so as to capture better multimodal representations for multimodal sarcasm detection. Extensive experiments show that our method outperforms the previous best baseline models (with a 2.47% improvement in Accuracy, a 1.99% improvement in F-score, and a 2.20% improvement in Macro F-score). The ablation study shows that both multimodal graph structure and graph-oriented contrastive learning are important to our framework. Further, the experiments of using different pre-trained methods show that the proposed multimodal graph contrastive learning framework can directly work with various pre-trained models and achieve outstanding performance in multimodal sarcasm detection.

Keywords:
Sarcasm Artificial intelligence Multimodal therapy Computer science Psychology Natural language processing Linguistics Irony Philosophy Psychotherapist

Metrics

34
Cited By
21.72
FWCI (Field Weighted Citation Impact)
85
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Hate Speech and Cyberbullying Detection
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Debiasing Multimodal Sarcasm Detection with Contrastive Learning

Mengzhao JiaXie CanLiqiang Jing

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2024 Vol: 38 (16)Pages: 18354-18362
JOURNAL ARTICLE

Multimodal Graph Meta Contrastive Learning

Feng ZhaoDonglin Wang

Year: 2021 Pages: 3657-3661
© 2026 ScienceGate Book Chapters — All rights reserved.