JOURNAL ARTICLE

Positional Feature Generator-Based Transformer for Image Captioning

Abstract

The Transformer-based architecture achieves state-of-the-art results in image captioning. Due to its non-recurrent nature, additional positional information needs to be provided. However, existing advanced methods attach positional information to the model by additional encoding or embedding, which is independently decoupled from the original input features. In addition, no matter absolute or relative methods, the encodings are fused with input features by add operation, which leads to information interference between the two types of features and affects the performance of the model. In this paper, we propose a novel architecture to remedy the above limitations, called positional feature generator (PFG). This module is effective in modeling image spatial positional frame by graph structure, which can learn absolute position explicitly and relative position implicitly. Meanwhile, we concatenate the captured positional features with the original features, making the positional information as a separate additional feature to avoid feature interference. Extensive experiments on MS COCO validate the effectiveness of PFG. Moreover, PFG outperforms some state-of-the-art positional representation methods, and positional feature generator-based Transformer (PFGT) is competitive with some state-of-the-art image captioning algorithms.

Keywords:
Closed captioning Computer science Transformer Generator (circuit theory) Artificial intelligence Feature (linguistics) Feature extraction Image (mathematics) Computer vision Speech recognition Electrical engineering Engineering Voltage

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
51
Refs
0.23
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.