Knowledge-Driven Generative Adversarial Network for Text-to-Image Synthesis

Jun Peng; Yiyi Zhou; Xiaoshuai Sun; Liujuan Cao; Yongjian Wu; Feiyue Huang; Rongrong Ji

doi:10.1109/tmm.2021.3116416

ScienceGate Book Chapters

JOURNAL ARTICLE

Knowledge-Driven Generative Adversarial Network for Text-to-Image Synthesis

Jun Peng Yiyi Zhou Xiaoshuai Sun Liujuan Cao Yongjian Wu Feiyue Huang Rongrong Ji

Year: 2021 Journal: IEEE Transactions on Multimedia Vol: 24 Pages: 4356-4366 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tmm.2021.3116416

Get Full-Text PDF Get Analytical Report

Abstract

Text-to-Image (T2I) synthesis is a challenging task that aims to convert natural language descriptions to real images. It remains an open problem mainly due to the diversity of text descriptions, which poses a huge obstacle in generating vivid and relevant images. Moreover, the existing evaluation metrics in T2I synthesis are mainly used to evaluate the visual quality of the generated images, while the semantic consistency between the two modalities is often ignored. To address these issues, we present a novel Knowledge-Driven Generative Adversarial Network , termed KD-GAN, and a new evaluation system, named Pseudo Turing Test (PTT for short). Concretely, KD-GAN takes a further step in imitating the behavior of human painting, i.e. , drawing an image according to reference knowledge. The introduction of reference knowledge in KD-GAN not only improves the quality of the generated images but also enhances the semantic consistency between them and the input texts. In addition, KD-GAN can also greatly avoid some flaws against common sense during image generation, e.g. , skiing in the blue sky. The proposed PTT is an important supplement to the existing evaluation system of T2I synthesis. It includes a set of pseudo-experts of different multimedia tasks to evaluate the semantic consistency between the given texts and the generated images. To validate the proposed KD-GAN, we conducted extensive experiments on two benchmark datasets, i.e. , Caltech-UCSD Birds (CUB), and MS-COCO (COCO). The experimental results demonstrate that KD-GAN outperforms state-of-the-art methods on IS, FID, and the proposed PTT metrics. ¹ ¹ The codes of KD-GAN are at [Online]. Available: https://github.com/pengjunn/KD-GAN and the codes and models of PTT are at [Online]. Available: https://github.com/pengjunn/PTT.

Keywords:

Computer science Consistency (knowledge bases) Artificial intelligence Information retrieval Generative adversarial network Image (mathematics) Image synthesis Natural language processing

Metrics

Cited By

2.76

FWCI (Field Weighted Citation Impact)

Refs

0.92

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Knowledge-Driven Generative Adversarial Network for Text-to-Image Synthesis

Abstract

Metrics

Citation History

Topics

Related Documents

KT-GAN: Knowledge-Transfer Generative Adversarial Network for Text-to-Image Synthesis

Text to Image Synthesis With Bidirectional Generative Adversarial Network

KnHiGAN: Knowledge-enhanced Hierarchical Generative Adversarial Network for Fine-grained Text-to-Image Synthesis

Hybrid Attention Driven Text-to-Image Synthesis via Generative Adversarial Networks

Usage of Generative Adversarial Network to Improve Text to Image Synthesis