JOURNAL ARTICLE

Text to Video using GANs and Diffusion Models

Abstract

The challenging endeavour of text-to-video creation requires transforming text descriptions into realistic and cohesive videos. This field of study has made substantial progress in recent years, with the development of diffusion models and generative adversarial networks (GANs). This study examines the most modern text-to-video generation models, as well as the various steps involved in text-to-video generation,including temporal coherence, video generation, and text encoding. We additionally emphasise the challenges involved with text-to-video generation, as well as recent advances to overcome these issues. The most frequently used datasets and metrics in this field are also analysed and reviewed

Keywords:
Diffusion Computer science Physics Thermodynamics

Metrics

1
Cited By
3.76
FWCI (Field Weighted Citation Impact)
0
Refs
0.87
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Computational and Text Analysis Methods
Social Sciences →  Social Sciences →  General Social Sciences
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Generative Adversarial Networks and Image Synthesis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.