JOURNAL ARTICLE

Abstractive Summarization for Urdu Video Description Generation

Ali FaheemFaizad UllahMuhammad Sohaib AyubAsim Karim

Year: 2025 Journal:   ACM Transactions on Asian and Low-Resource Language Information Processing Vol: 24 (10)Pages: 1-21   Publisher: Association for Computing Machinery

Abstract

Automatic summarization condenses content while retaining key ideas and details. Urdu, with over 230 million speakers globally, is one of the most widely spoken languages. The rise of Urdu content on social media platforms has driven the need for tools that enhance accessibility and engagement. The growing popularity of social media has increased the number of Urdu instructional videos. Well-written video descriptions can boost viewer engagement and improve search engine optimization; however, many lack these. Therefore, an automatic description generation system for Urdu videos is needed, which can be achieved by abstractive summarization of video transcripts. However, such public datasets are not available in Urdu. To address this problem, we investigate the usability of high-resource language datasets for Urdu abstractive text summarization. We created the first Urdu video transcription dataset Urdu How2 and evaluated its quality using intrinsic evaluation. We leverage transfer learning, a technique where knowledge from pretrained models (like mT5) is adapted to new tasks, to develop the uT5 model for generating Urdu text summaries. We further trained the model to improve its Urdu text generation capability. The machine-generated summaries are evaluated using ROUGE scores, human evaluation scores, and adversarial evaluation, providing a reliable assessment of the quality of generated descriptions and the robustness of the model against noisy text data. The human evaluation shows the proposed method generates accurate and coherent summaries compared to the translated ground truth. To the best of our knowledge, this is the first attempt to utilize a cross-lingual dataset for Urdu abstractive text summarization and video description generation. This research enhances Urdu content accessibility and lays the groundwork for advancing multilingual content generation and multimodal analysis in other low-resource languages.

Keywords:
Automatic summarization Urdu Computer science Natural language processing Artificial intelligence Information retrieval Art Literature

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
51
Refs
0.14
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

RUATS: Abstractive Text Summarization for Roman Urdu

Laraib KaleemArif Ur RahmanMomina Moetesum

Lecture notes in computer science Year: 2024 Pages: 258-273
JOURNAL ARTICLE

ASoVS: Abstractive Summarization of Video Sequences

Aniqa DilawariMuhammad Usman Ghani Khan

Journal:   IEEE Access Year: 2019 Vol: 7 Pages: 29253-29263
JOURNAL ARTICLE

Abstractive Text Summarization for the Urdu Language: Data and Methods

Muhammad AwaisRao Muhammad Adeel Nawab

Journal:   IEEE Access Year: 2024 Vol: 12 Pages: 61198-61210
JOURNAL ARTICLE

Bidirectional Encoder Approach for Abstractive Text Summarization of Urdu Language

Muhammad AsifSyed Ali RazaJaved IqbalNousheen PerwaizTauqeer FaizShan Khan

Journal:   2022 International Conference on Business Analytics for Technology and Security (ICBATS) Year: 2022 Pages: 1-8
© 2026 ScienceGate Book Chapters — All rights reserved.