Structured Multimodal Fusion Network for Referring Image Segmentation

Mingcheng Xue; Yu Liu; Kaiping Xu; Haiyang Zhang; Chengyang Yu

doi:10.1145/3536221.3556570

ScienceGate Book Chapters

JOURNAL ARTICLE

Structured Multimodal Fusion Network for Referring Image Segmentation

Mingcheng Xue Yu Liu Kaiping Xu Haiyang Zhang Chengyang Yu

Year: 2022 Pages: 36-47

DOI: 10.1145/3536221.3556570

Get Full-Text PDF Get Analytical Report

Abstract

Referring image segmentation aims to segment one particular object referred by a natural language expression in the image. One major challenge of this task is how to understand and align vision and language to distinguish the referent. Another major challenge is how to refine the segmentation mask of the referent. In this paper, we focus on dissecting and enhancing the interaction between modalities to address these challenges. Specifically, we propose a Structured Multimodal Fusion Network (SMFN), which consists of a multimodal tree, a cross-modal transformer, and a mask refinement module. SMFN first exploits multimodal fusion structures to deeply integrate visual and linguistic features so that the referent can be accurately distinguished and then further utilizes a mask refinement module to aggregate multi-scale visual features to clarify boundaries. We conduct extensive experiments on the four benchmark datasets and achieve new state-of-the-art performances under different evaluation metrics.

Keywords:

Computer science Referent Artificial intelligence Segmentation Exploit Focus (optics) Image segmentation Computer vision Image fusion Pattern recognition (psychology) Natural language processing Image (mathematics)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.15

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Structured Multimodal Fusion Network for Referring Image Segmentation

Abstract

Metrics

Topics

Related Documents

Multimodal-Aware Fusion Network for Referring Remote Sensing Image Segmentation

Structured Attention Network for Referring Image Segmentation

Multimodal Prompt-Guided Bidirectional Fusion for Referring Remote Sensing Image Segmentation

Mixed-scale cross-modal fusion network for referring image segmentation

Multiscale deep feature selection fusion network for referring image segmentation