JOURNAL ARTICLE

Cross-scale Feature Fusion Self-attention for Image Captioning

Abstract

In recent years,the encoder-decoder framework based on self-attention mechanism has become the mainstream model in image captioning.However,self-attention in the encoder only models the visual relations of low-scale features,ignoring some effective information in high-scale visual features,thus affecting the quality of the generated descriptions.To solve this problem,this paper proposes a cross-scale feature fusion self-attention(CFFSA) method for image captioning.Specifically,CFFSA integrates low-scale and high-scale visual features in self-attention to improve the range of attention from a visual perspective,which increases effective visual information and reduces noise,thereby learning more accurate visual and semantic relationships.Experiments on MS COCO dataset show that the proposed method can more accurately capture the relationship between cross-scale visual features and generate more accurate descriptions.In addition,CFFSA is a general method,which can further improve the performance of the model by combining with other self-attention based image captioning methods.

Keywords:
Closed captioning Feature (linguistics) Image (mathematics) Visualization Encoder Image fusion Feature extraction Pattern recognition (psychology)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.35
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Mycorrhizal Fungi and Plant Interactions
Life Sciences →  Agricultural and Biological Sciences →  Plant Science
Genomics and Phylogenetic Studies
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Plant Pathogens and Fungal Diseases
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Cell Biology

Related Documents

JOURNAL ARTICLE

Cross on Cross Attention: Deep Fusion Transformer for Image Captioning

Jing ZhangYingshuai XieWeichao DingZhe Wang

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2023 Vol: 33 (8)Pages: 4257-4268
© 2026 ScienceGate Book Chapters — All rights reserved.