JOURNAL ARTICLE

Diverse Image Captioning via Panoptic Segmentation and Sequential Conditional Variational Transformer

Bing LiuJinfu LuMingming LiuHao LiuYong ZhouDongping Yang

Year: 2024 Journal:   ACM Transactions on Multimedia Computing Communications and Applications Vol: 20 (12)Pages: 1-17   Publisher: Association for Computing Machinery

Abstract

Recently, transformer-based image captioning models have achieved significant performance improvement. However, due to the limitations of region visual features and deterministic projections between image space and caption space, existing methods still suffer from disentangled visual features and rigid sentences. To address these issues, we first introduce panoptic segmentation to extract the segmentation region features, which can effectively alleviate the visual confusion caused by the widely-adopted region visual features. Then, we propose a panoptic segmentation based sequential conditional variational transformer (PS-SCVT) framework for diverse image captioning, which not only accurately extracts the image visual representations by fusing the segmentation region features and object detection features, but has the ability of learning one-to-many mappings from image space to caption space. The experimental results demonstrate that our approach achieves better interpretability and generalization performance compared with the state-of-the-art diverse image captioning models.

Keywords:
Computer science Closed captioning Transformer Computer vision Artificial intelligence Segmentation Panopticon Image segmentation Image (mathematics) Pixel Natural language processing Electrical engineering

Metrics

1
Cited By
0.53
FWCI (Field Weighted Citation Impact)
39
Refs
0.56
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.