Embrace Smaller Attention: Efficient Cross-Modal Matching with Dual Gated Attention Fusion

Weikuo Guo; Xiangwei Kong

doi:10.1109/icassp49357.2023.10096438

ScienceGate Book Chapters

JOURNAL ARTICLE

Embrace Smaller Attention: Efficient Cross-Modal Matching with Dual Gated Attention Fusion

Weikuo Guo Xiangwei Kong

Year: 2023 Pages: 1-5

DOI: 10.1109/icassp49357.2023.10096438

Get Full-Text PDF Get Analytical Report

Abstract

Cross-modal matching is one of the most fundamental and widely studied tasks in the field of data science. To have a better understanding of the complicated cross-modal correspondences, the powerful attention mechanism has been widely used recently. In this paper, we propose a novel Dual Gated Attention Fusion (DGAF) unit to save cross-modal matching from heavy attention computation. Specifically, the attention unit in the main information flow is alternated to a single-head low-dimension light-weighted attention bypass which serves as a gate to selectively cast away noise in both modality. To strengthen the interaction between modalities, an auxiliary memory unit is appended. A gated memory fusion unit is designed to fuse the memorized inter-modality information into both modality streams. Extensive experiments on two benchmark datasets show that the proposed DGAF achieves good balance between the efficiency and the effectiveness.

Keywords:

Computer science Modality (human–computer interaction) Benchmark (surveying) Matching (statistics) Fuse (electrical) Artificial intelligence Sensor fusion Modal Dual (grammatical number) Dimension (graph theory) Computation Modalities Algorithm Engineering Mathematics

Metrics

Cited By

0.18

FWCI (Field Weighted Citation Impact)

Refs

0.38

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Visual Attention and Saliency Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Embrace Smaller Attention: Efficient Cross-Modal Matching with Dual Gated Attention Fusion

Abstract

Metrics

Citation History

Topics

Related Documents

Multi-corpus emotion recognition method based on cross-modal gated attention fusion

VSCGP: ViT-Swin Dual Feature Extraction Cross-Attention Gated Fusion Prototype Network

Attention fusion sentiment analysis with cross-modal sentiment augmentation

Bridging the gap: dual perception attention and local-global similarity fusion for cross-modal image-text matching

Cross‐modal retrieval with dual multi‐angle self‐attention