JOURNAL ARTICLE

Context-Aware Dual Attention Network for Multimodal Sarcasm Detection

Abstract

Multimodal sarcasm is often used to express strong emotions online through the discrepancy of the literal-figurative scene across multi-modalities. Current researches retrofit transform-based pretrained language models to integrate text and image to detect sarcasm. However, these methods struggle to distinguish subtle semantic and emotional differences between image and text within the same instance. To address this issue, this paper proposes a new context-aware dual attention network that collaboratively performs textual and visual attentions using a shared memory module. This approach enables us to reason about the interconnected portions involving sarcasm in both text and image. Additionally, we use implicit context derived from multimodal commonsense graph to establish a holistic perspective that encompasses semantics and emotions across modalities. Finally, multi-view cross-modal matching technique is employed to effectively identify contradictions. We evaluate our method on the widely used HFM dataset and achieve 1.01% improvements on the F1-score. Extensive experiments demonstrate the effectiveness of the proposed method.

Keywords:
Sarcasm Computer science Modalities Artificial intelligence Dual (grammatical number) Context (archaeology) Natural language processing Semantics (computer science) Machine learning Linguistics

Metrics

2
Cited By
1.28
FWCI (Field Weighted Citation Impact)
23
Refs
0.75
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
Language, Metaphor, and Cognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.