Ali FaheemFaizad UllahMuhammad Sohaib AyubAsim Karim
Automatic summarization condenses content while retaining key ideas and details. Urdu, with over 230 million speakers globally, is one of the most widely spoken languages. The rise of Urdu content on social media platforms has driven the need for tools that enhance accessibility and engagement. The growing popularity of social media has increased the number of Urdu instructional videos. Well-written video descriptions can boost viewer engagement and improve search engine optimization; however, many lack these. Therefore, an automatic description generation system for Urdu videos is needed, which can be achieved by abstractive summarization of video transcripts. However, such public datasets are not available in Urdu. To address this problem, we investigate the usability of high-resource language datasets for Urdu abstractive text summarization. We created the first Urdu video transcription dataset Urdu How2 and evaluated its quality using intrinsic evaluation. We leverage transfer learning, a technique where knowledge from pretrained models (like mT5) is adapted to new tasks, to develop the uT5 model for generating Urdu text summaries. We further trained the model to improve its Urdu text generation capability. The machine-generated summaries are evaluated using ROUGE scores, human evaluation scores, and adversarial evaluation, providing a reliable assessment of the quality of generated descriptions and the robustness of the model against noisy text data. The human evaluation shows the proposed method generates accurate and coherent summaries compared to the translated ground truth. To the best of our knowledge, this is the first attempt to utilize a cross-lingual dataset for Urdu abstractive text summarization and video description generation. This research enhances Urdu content accessibility and lays the groundwork for advancing multilingual content generation and multimodal analysis in other low-resource languages.
Laraib KaleemArif Ur RahmanMomina Moetesum
Aniqa DilawariMuhammad Usman Ghani Khan
Muhammad AwaisRao Muhammad Adeel Nawab
Muhammad AsifSyed Ali RazaJaved IqbalNousheen PerwaizTauqeer FaizShan Khan
Jiaxin DuanFengyu LuJunfei Liu