JOURNAL ARTICLE

Image Captioning Method Based on Layer Feature Attention

Abstract

The high-level features of images are often used to represent the scene features in the image captioning task, because they contain rich semantic information, but the high-level features can only express a feature of global information, and the local information of small objects is easy to be ignored, which makes it difficult to generate the description of small objects, and thus cannot meet the description requirements of finer granularity. To describe the rich semantic information in the image and retain more description of small objects, an image captioning method based on layer feature attention is proposed. Combined with the existing structure of Transformer decoder, the layer feature attention module is designed. Using the multi-layer features of the image, each decoder stack layer can determine the attention to the features of each layer when decoding, and dynamically learn the similarity between the features of each layer and the sequence semantic features to improve the quality of the statement.

Keywords:
Closed captioning Computer science Feature (linguistics) Layer (electronics) Decoding methods Artificial intelligence Transformer Granularity Image (mathematics) Search engine indexing Information retrieval Pattern recognition (psychology)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
18
Refs
0.17
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.