Mustafa Abdul SalamMohamed AldawsariMostafa GamalHesham F.A. HamedSara Sweidan
The exponential growth of online content has made the task of locating specific information increasingly challenging, thus highlighting the necessity for automated text summarization. Deep learning techniques, particularly neural abstractive models such as Seq2Seq, have emerged as prominent solutions to this issue. The utilization of pre-trained models, including GPT, BERT, BART, and T5, has notably enhanced the quality of text summarization by addressing key aspects such as saliency, fluency, and semantic coherence. However, while these advancements have greatly benefited the English language, there remains a significant gap in support for low-resource languages. To bridge this gap, monolingual BERT and multilingual Seq2Seq models have been developed, enabling the application of state-of-the-art summarization techniques to languages like Arabic. Our research capitalizes on pre-trained Seq2Seq models to achieve superior results in Arabic text summarization tasks by leveraging datasets such as XLSum and Hindawi Books. Notably, the titles within these datasets serve as robust benchmarks for evaluating the effectiveness of our summarization techniques, underscoring the importance of high-quality input data. Additionally, one of our key contributions lies in the implementation of fine-tuning for the model using reinforcement learning. This innovative approach enhances the adaptability and performance of the model. Our findings indicate that monolingual BERT models outperform their multilingual counterparts, yielding a notable 2.4% increase in Rouge scores, further improving the quality of text summarization in Arabic. Our study encompasses cross-dataset evaluations, exploration of various text generation methodologies, and in-depth preprocessing analysis tailored specifically for Arabic text. By presenting a comprehensive approach to address the challenges in Arabic text summarization, our study contributes to the advancement of the field and underscores the significance of supporting low-resource languages in natural language processing tasks.
Yasmin EiniehAmal AlmansourAmani Jamal
Tohida RehmanSuchandan DasDebarshi Kumar SanyalSamiran Chattopadhyay
Ashish K. KasarSuyash MatadeDurvankur RasalSwapnil Shinde
Alaa Ahmed Al-BannaAbeer K. Al-Mashhadany