Concrete Image Captioning by Integrating Content Sensitive and Global Discriminative Objective

Jie Wu; Tianshui Chen; Hefeng Wu; Zhi Yang; Qing Wang; Liang Lin

doi:10.1109/icme.2019.00227

ScienceGate Book Chapters

JOURNAL ARTICLE

Concrete Image Captioning by Integrating Content Sensitive and Global Discriminative Objective

Jie Wu Tianshui Chen Hefeng Wu Zhi Yang Qing Wang Liang Lin

Year: 2019 Vol: 8 Pages: 1306-1311

DOI: 10.1109/icme.2019.00227

Get Full-Text PDF Get Analytical Report

Abstract

Current methods for image captioning tend to generate sentences that are generally overly rigid and composed of some most frequent words/phrases, leading to inaccurate and indistinguishable descriptions. This is primarily due to the uneven word distribution of the ground truth captions that encourages to generate high frequent words/phrases while suppressing the less frequent but more concrete ones. In this work, we propose a new Content Sensitive and Global Discriminative objective, which is formulated as two constraints on top of a reference model to facilitate generating concrete and discriminative image captions. More specifically, the content sensitive constraint is designed to place greater focus on the less frequent and more concrete words/phrases, thus facilitating the generation of sentences that better describe visual details of the given images. To further improve the discriminability, the global discriminative constraint is designed to pull the generated sentence to better discern the corresponding image from others. We evaluate the proposed method on the widely used MS-COCO dataset, where it achieves superior performance over existing competing methods. We also conduct self-retrieval experiments to demonstrate the discriminability of the proposed method.

Keywords:

Discriminative model Closed captioning Computer science Constraint (computer-aided design) Focus (optics) Sentence Artificial intelligence Ground truth Word (group theory) Natural language processing Image (mathematics) Pattern recognition (psychology) Mathematics

Metrics

Cited By

0.43

FWCI (Field Weighted Citation Impact)

Refs

0.65

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Concrete Image Captioning by Integrating Content Sensitive and Global Discriminative Objective

Abstract

Metrics

Citation History

Topics

Related Documents

Fine-Grained Image Captioning With Global-Local Discriminative Objective

Cross-Domain Image Captioning with Discriminative Finetuning

Pragmatic Issue-Sensitive Image Captioning

Discriminative Style Learning for Cross-Domain Image Captioning

Integrating visual memory for image captioning