Fusion and Discrimination: A Multimodal Graph Contrastive Learning Framework for Multimodal Sarcasm Detection

Bin Liang; Lin Gui; Yulan He; Erik Cambria; Ruifeng Xu

doi:10.1109/taffc.2024.3380375

ScienceGate Book Chapters

JOURNAL ARTICLE

Fusion and Discrimination: A Multimodal Graph Contrastive Learning Framework for Multimodal Sarcasm Detection

Bin Liang Lin Gui Yulan He Erik Cambria Ruifeng Xu

Year: 2024 Journal: IEEE Transactions on Affective Computing Vol: 15 (4)Pages: 1874-1888 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/taffc.2024.3380375

Get Full-Text PDF Get Analytical Report

Abstract

Identifying sarcastic clues from both textual and visual information has become an important research issue, called Multimodal Sarcasm Detection. In this paper, we investigate multimodal sarcasm detection from a novel perspective, where a multimodal graph contrastive learning strategy is proposed to fuse and distinguish the sarcastic clues for textual modality and visual modality. Specifically, we first utilize object detection to derive the crucial visual regions accompanied by their captions of the images, which allows better learning of the key visual regions of visual modality. In addition, to make full use of the semantic information of the visual modality, we employ optical character recognition to extract the textual content in the images. Then, based on image regions, the textual content of visual modality, and the context of the textual modality, we build a multimodal graph for each sample to model the intricate sarcastic relations between modalities. Furthermore, we devise a graph-oriented contrastive learning strategy to leverage the correlations in the same label and differences between different labels, so as to capture better multimodal representations for multimodal sarcasm detection. Extensive experiments show that our method outperforms the previous best baseline models (with a 2.47% improvement in Accuracy, a 1.99% improvement in F-score, and a 2.20% improvement in Macro F-score). The ablation study shows that both multimodal graph structure and graph-oriented contrastive learning are important to our framework. Further, the experiments of using different pre-trained methods show that the proposed multimodal graph contrastive learning framework can directly work with various pre-trained models and achieve outstanding performance in multimodal sarcasm detection.

Keywords:

Sarcasm Artificial intelligence Multimodal therapy Computer science Psychology Natural language processing Linguistics Irony Philosophy Psychotherapist

Metrics

Cited By

21.72

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Hate Speech and Cyberbullying Detection

Physical Sciences → Computer Science → Artificial Intelligence

Fusion and Discrimination: A Multimodal Graph Contrastive Learning Framework for Multimodal Sarcasm Detection

Abstract

Metrics

Citation History

Topics

Related Documents

Debiasing Multimodal Sarcasm Detection with Contrastive Learning

Towards multimodal sarcasm detection via label-aware graph contrastive learning with back-translation augmentation

Multimodal Graph Meta Contrastive Learning

Sarcasm Detection of Dual Multimodal Contrastive Attention Networks

MDFCL: Multimodal data fusion-based graph contrastive learning framework for molecular property prediction