Multi-View Attention Network for Remote Sensing Image Captioning

Yun Meng; Yu Gu; Xiutiao Ye; Jingxian Tian; Shuang Wang; He Zhang; Biao Hou; Licheng Jiao

doi:10.1109/igarss47720.2021.9555083

ScienceGate Book Chapters

JOURNAL ARTICLE

Multi-View Attention Network for Remote Sensing Image Captioning

Yun Meng Yu Gu Xiutiao Ye Jingxian Tian Shuang Wang He Zhang Biao Hou Licheng Jiao

Year: 2021 Pages: 2349-2352

DOI: 10.1109/igarss47720.2021.9555083

Get Full-Text PDF Get Analytical Report

Abstract

In traditional remote sensing image captioning models, the attention mechanism plays a dominant role and has been used to integrate image features to infer the latent visual-semantic alignment. However, the scenes of remote sensing image are complex and diverse, using only one attention module to capture features often leads to insufficient semantic representation. In our work, we present a novel Multi-view Attention Network (MAN) model to realize feature integration from different views. With MAN, more semantically rich ensemble attended features can be obtained by different attention modules. Specifically, we enforce the weights of attention modules to be diverse through a cosine distance loss. This will provide the model with distinct views to make semantic predictions for each feature. Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed model for the task of remote sensing image captioning.

Keywords:

Closed captioning Computer science Feature (linguistics) Benchmark (surveying) Image (mathematics) Artificial intelligence Task (project management) Representation (politics) Semantics (computer science) Semantic feature Feature extraction Attention network Pattern recognition (psychology)

Metrics

Cited By

0.51

FWCI (Field Weighted Citation Impact)

Refs

0.66

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Multi-View Attention Network for Remote Sensing Image Captioning

Abstract

Metrics

Citation History

Topics

Related Documents

Multi-Scale Semantic Fusion Network with Adaptive Attention for Remote Sensing Image Captioning

Multi-Source Interactive Stair Attention for Remote Sensing Image Captioning

Multi-Scale Feature Fusion Network for Remote Sensing Image Captioning

Remote Sensing Image Captioning With Multi-Scale Feature and Small Target Attention

Multi-Level Feature And Dual-Keys Attention For Remote Sensing Image Captioning