JOURNAL ARTICLE

A Multi-layer Feature Parallel Processing Method for Image Captioning

Abstract

Most image captioning methods based on neural networks use high-level features extracted by CNNs, but it is difficult for high-level features to retain the information of small objects, so the generated description cannot meet more fine-grained requirements. To solve the above problems, we propose a multi-layer feature parallel processing method for image captioning, which feeds each layer of features to each stacked layer of the decoder in a certain order, thereby using multi-feature expression to generate a more fine-grained description. We provide two design schemes for the proposed multi-layer feature parallel processing method: Sequential Parallel Connection(SPC) and Reverse Parallel Connection(RPC). This work focuses on exploring a more effective and robust model connection method that can generate finer-grained descriptions. Extensive experiments in the COCO dataset show that our connection method can generate better quality sentences.

Keywords:
Closed captioning Computer science Layer (electronics) Feature (linguistics) Image (mathematics) Connection (principal bundle) Artificial intelligence Artificial neural network Pattern recognition (psychology)

Metrics

1
Cited By
0.10
FWCI (Field Weighted Citation Impact)
25
Refs
0.40
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.