Chan HeQiujuan TongXiaobao YangJun WangTingge Zhu
Most image captioning methods based on neural networks use high-level features extracted by CNNs, but it is difficult for high-level features to retain the information of small objects, so the generated description cannot meet more fine-grained requirements. To solve the above problems, we propose a multi-layer feature parallel processing method for image captioning, which feeds each layer of features to each stacked layer of the decoder in a certain order, thereby using multi-feature expression to generate a more fine-grained description. We provide two design schemes for the proposed multi-layer feature parallel processing method: Sequential Parallel Connection(SPC) and Reverse Parallel Connection(RPC). This work focuses on exploring a more effective and robust model connection method that can generate finer-grained descriptions. Extensive experiments in the COCO dataset show that our connection method can generate better quality sentences.
Qiujuan TongChan HeJiaqi LiYifan Li
Ziqing HeXinman QiSugang MaZhiqiang HouXiaobao Yang
Jing ZhangZhongjun FangZhe Wang
Xi YangXingguo JiangJinfeng Liu