From Gans to Diffusion Models: Text-To-Image Generation

Yian Xiao

doi:10.54097/79d59267

ScienceGate Book Chapters

JOURNAL ARTICLE

From Gans to Diffusion Models: Text-To-Image Generation

Yian Xiao

Year: 2025 Journal: Highlights in Science Engineering and Technology Vol: 160 Pages: 80-87

DOI: 10.54097/79d59267

Get Full-Text PDF Get Analytical Report

Abstract

This paper traces the evolution of text-to-image generation (TIG) techniques from Generative Adversarial Networks (GANs) to Diffusion Models (DMs). It first introduces GAN variants, including DCGAN, WGAN, MGGAN, and StyleGAN. While these popular GANs pioneered image synthesis through adversarial training of the generator and discriminator, they suffered from training instability, mode collapse, and the lack of diversity. Therefore, the systematic introduction of representative DMs—including DDPM, Guided Diffusion, GLIDE, Stable Diffusion, and Imagen—shows how they address these issues through iteratively denoising, achieving unprecedented image fidelity, semantic alignment, and generation stability. Quantitative comparisons on datasets such as COCO and CUB show that DMs consistently outperform GANs in metrics like FID, IS, and CLIP score, though GANs retain shorter inference time. Nevertheless, critical challenges such as generation efficiency, understanding of complex prompts, and safety controls remain. This paper analyses possible reasons for those problems while pointing out key directions for future work.

Keywords:

Generator (circuit theory) Adversarial system Generative grammar Image (mathematics) Inference Key (lock) Mode (computer interface) Diffusion

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.76

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Model Reduction and Neural Networks

Physical Sciences → Physics and Astronomy → Statistical and Nonlinear Physics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

From Gans to Diffusion Models: Text-To-Image Generation

Abstract

Metrics

Topics

Related Documents

Text-to-Image Generation Using Stack Generative Adversarial Networks (GANs) and Stable Diffusion Models

A Comparative Study of Image Synthesis Models: Stack GANs and Diffusion Based Text to Image Generation

Text to Video using GANs and Diffusion Models

Text-to-Image Generation Using Recurrent Convolutional GANs

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs