Evaluating Text-to-Image Generation Methods: Stable Diffusion vs Generative Adversarial Networks (GANs)

D. Indumathi

doi:10.22214/ijraset.2024.65677

ScienceGate Book Chapters

JOURNAL ARTICLE

Evaluating Text-to-Image Generation Methods: Stable Diffusion vs Generative Adversarial Networks (GANs)

D. Indumathi

Year: 2024 Journal: International Journal for Research in Applied Science and Engineering Technology Vol: 12 (11)Pages: 2523-2534 Publisher: International Journal for Research in Applied Science and Engineering Technology (IJRASET)

DOI: 10.22214/ijraset.2024.65677

Get Full-Text PDF Get Analytical Report

Abstract

Text-to-image generation is a fast developing field of artificial intelligence that allows verbal descriptions to be converted into realistic or creative visuals. This study investigates the differences between two cutting-edge approaches for generating images from text, examining their performance, efficiency, and practical applicability across multiple areas. The dominant techniques in this discipline are Generative Adversarial Networks (GANs) and Stable Diffusion models. While GANs have long been the preferred architecture for picture generation tasks, newer diffusion-based models such as Stable Diffusion have emerged as viable alternatives, providing distinct methods to noise reduction and image synthesis. Attention GAN(AttnGAN), a GAN-based approach, uses attention mechanisms to improve the semantic alignment of text descriptions and generated images, resulting in more contextually appropriate graphics. These methodologies are compared, with an emphasis on architectural differences, performance, and applicability to varied applications. GANs use adversarial training, in which two networks (the generator and the discriminator) compete to produce increasingly realistic images. This method is quite effective for producing high-quality photos, but it has drawbacks such as mode collapse and training instability. In contrast, Stable Diffusion models use a probabilistic diffusion process to iteratively reduce noisy images into coherent outputs, resulting in increased processing efficiency and the ability to handle high-resolution images. Experimental evaluation of benchmark datasets reveals each method's strengths and limits in real applications such as digital art, content development, and product design. Stable Diffusion produces more diverse and high-resolution images with fewer computer resources, but GANs generate extremely detailed and realistic visuals. The comparative insights gathered from this research can be used to choose the best technique for a given text-to-image production problem

Keywords:

Computer science Generative grammar Discriminator Benchmark (surveying) Artificial intelligence Generator (circuit theory) Image (mathematics) Stability (learning theory) Noise (video) Machine learning Power (physics)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.24

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Computer Graphics and Visualization Techniques

Physical Sciences → Computer Science → Computer Graphics and Computer-Aided Design

Evaluating Text-to-Image Generation Methods: Stable Diffusion vs Generative Adversarial Networks (GANs)

Abstract

Metrics

Topics

Related Documents

Text-to-Image Generation Using Stack Generative Adversarial Networks (GANs) and Stable Diffusion Models

SS-GANs: Text-to-Image via Stage by Stage Generative Adversarial Networks

Arabic Calligraphy Generation Through Image-to-Image Translation Using Generative Adversarial Networks (GANs)

APPLICATION OF GENERATIVE-ADVERSARIAL NETWORKS TO TEXT TO IMAGE GENERATION

On Evaluating Video-based Generative Adversarial Networks (GANs)