JOURNAL ARTICLE

Knowledge-Driven Generative Adversarial Network for Text-to-Image Synthesis

Jun PengYiyi ZhouXiaoshuai SunLiujuan CaoYongjian WuFeiyue HuangRongrong Ji

Year: 2021 Journal:   IEEE Transactions on Multimedia Vol: 24 Pages: 4356-4366   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Text-to-Image (T2I) synthesis is a challenging task that aims to convert natural language descriptions to real images. It remains an open problem mainly due to the diversity of text descriptions, which poses a huge obstacle in generating vivid and relevant images. Moreover, the existing evaluation metrics in T2I synthesis are mainly used to evaluate the visual quality of the generated images, while the semantic consistency between the two modalities is often ignored. To address these issues, we present a novel Knowledge-Driven Generative Adversarial Network , termed KD-GAN, and a new evaluation system, named Pseudo Turing Test (PTT for short). Concretely, KD-GAN takes a further step in imitating the behavior of human painting, i.e. , drawing an image according to reference knowledge. The introduction of reference knowledge in KD-GAN not only improves the quality of the generated images but also enhances the semantic consistency between them and the input texts. In addition, KD-GAN can also greatly avoid some flaws against common sense during image generation, e.g. , skiing in the blue sky. The proposed PTT is an important supplement to the existing evaluation system of T2I synthesis. It includes a set of pseudo-experts of different multimedia tasks to evaluate the semantic consistency between the given texts and the generated images. To validate the proposed KD-GAN, we conducted extensive experiments on two benchmark datasets, i.e. , Caltech-UCSD Birds (CUB), and MS-COCO (COCO). The experimental results demonstrate that KD-GAN outperforms state-of-the-art methods on IS, FID, and the proposed PTT metrics. 1 The codes of KD-GAN are at [Online]. Available: https://github.com/pengjunn/KD-GAN and the codes and models of PTT are at [Online]. Available: https://github.com/pengjunn/PTT.

Keywords:
Computer science Consistency (knowledge bases) Artificial intelligence Information retrieval Generative adversarial network Image (mathematics) Image synthesis Natural language processing

Metrics

35
Cited By
2.76
FWCI (Field Weighted Citation Impact)
54
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Generative Adversarial Networks and Image Synthesis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.