JOURNAL ARTICLE

Concrete Image Captioning by Integrating Content Sensitive and Global Discriminative Objective

Abstract

Current methods for image captioning tend to generate sentences that are generally overly rigid and composed of some most frequent words/phrases, leading to inaccurate and indistinguishable descriptions. This is primarily due to the uneven word distribution of the ground truth captions that encourages to generate high frequent words/phrases while suppressing the less frequent but more concrete ones. In this work, we propose a new Content Sensitive and Global Discriminative objective, which is formulated as two constraints on top of a reference model to facilitate generating concrete and discriminative image captions. More specifically, the content sensitive constraint is designed to place greater focus on the less frequent and more concrete words/phrases, thus facilitating the generation of sentences that better describe visual details of the given images. To further improve the discriminability, the global discriminative constraint is designed to pull the generated sentence to better discern the corresponding image from others. We evaluate the proposed method on the widely used MS-COCO dataset, where it achieves superior performance over existing competing methods. We also conduct self-retrieval experiments to demonstrate the discriminability of the proposed method.

Keywords:
Discriminative model Closed captioning Computer science Constraint (computer-aided design) Focus (optics) Sentence Artificial intelligence Ground truth Word (group theory) Natural language processing Image (mathematics) Pattern recognition (psychology) Mathematics

Metrics

4
Cited By
0.43
FWCI (Field Weighted Citation Impact)
41
Refs
0.65
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.