ScienceGate Book Chapters

JOURNAL ARTICLE

Cross-scale Feature Fusion Self-attention for Image Captioning

WANG Ming-zhan, JI Jun-zhong, JIA Ao-zhe, ZHANG Xiao-dan

Year: 2022 Journal: DOAJ (DOAJ: Directory of Open Access Journals)

Get Full-Text PDF Get Analytical Report

Abstract

In recent years,the encoder-decoder framework based on self-attention mechanism has become the mainstream model in image captioning.However,self-attention in the encoder only models the visual relations of low-scale features,ignoring some effective information in high-scale visual features,thus affecting the quality of the generated descriptions.To solve this problem,this paper proposes a cross-scale feature fusion self-attention(CFFSA) method for image captioning.Specifically,CFFSA integrates low-scale and high-scale visual features in self-attention to improve the range of attention from a visual perspective,which increases effective visual information and reduces noise,thereby learning more accurate visual and semantic relationships.Experiments on MS COCO dataset show that the proposed method can more accurately capture the relationship between cross-scale visual features and generate more accurate descriptions.In addition,CFFSA is a general method,which can further improve the performance of the model by combining with other self-attention based image captioning methods.

Keywords:

Closed captioning Feature (linguistics) Image (mathematics) Visualization Encoder Image fusion Feature extraction Pattern recognition (psychology)

Metrics

0

Cited By

0.00

FWCI (Field Weighted Citation Impact)

0

Refs

0.35

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Mycorrhizal Fungi and Plant Interactions

Life Sciences → Agricultural and Biological Sciences → Plant Science

Genomics and Phylogenetic Studies

Life Sciences → Biochemistry, Genetics and Molecular Biology → Molecular Biology

Plant Pathogens and Fungal Diseases

Life Sciences → Biochemistry, Genetics and Molecular Biology → Cell Biology

Related Documents

JOURNAL ARTICLE

Cross on Cross Attention: Deep Fusion Transformer for Image Captioning

Jing Zhang Yingshuai Xie Weichao Ding Zhe Wang

Journal: IEEE Transactions on Circuits and Systems for Video Technology Year: 2023 Vol: 33 (8)Pages: 4257-4268

JOURNAL ARTICLE

Feature Fusion Based on Neural Image Captioning with Spatial Attention

Qingqing Lu Xiaomei Zhang Xin Kang Fuji Ren Karen Simonyan A Zisserman Szegedy Ioffe Vanhoucke Wojna Shlens X He S Zhang J Ren Su X Chen C Zitnick X Jia E Gavves B Fernando T Tuytelaars O Vinyals A Toshev S Bengio D Erhan T Guan Y Wang L Duan R Ji X Shi Y Shao A Karpathy L Fei-Fei P Jiang F Ren N Zheng X Wang M Peng L Pan M Hu C Jin F Ren Q You H Jin Z Wang C Fang J Luo K Xu J Ba R Kiros K Cho A Courville R Salakhutdinov R Zemel Y Bengio J Lu C Xiong D Parikh R Socher M Grubinger P Clough X He S Zhang J Ren Su

Journal: Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering Year: 2019

JOURNAL ARTICLE

Stacked cross-modal feature consolidation attention networks for image captioning

Mozhgan Pourkeshavarz Shahabedin Nabavi Mohsen Ebrahimi Moghaddam Mehrnoush Shamsfard

Journal: Multimedia Tools and Applications Year: 2023 Vol: 83 (4)Pages: 12209-12233

JOURNAL ARTICLE

Dynamic Cross-Attention and Multi-Level Feature Fusion for Fine-Grained Image Captioning in Advertising

Wenqing Zhang K G Shih Yihong Jin Zhenrui Chen Lipeng Liu Zhaoyang Zhang

Year: 2025 Pages: 282-286

JOURNAL ARTICLE

Multi-Scale Feature Fusion Network for Remote Sensing Image Captioning

Haiyan Huang Zhenfeng Shao Qimin Cheng Xiaoping Wu

Year: 2023 Vol: 57 Pages: 1-4