Text summarization is essential in natural language processing because of the rapid growth of data. Therefore, the user needs to summarize this data into meaningful text quickly. There are two standard methods of text summarization: extractive and abstractive. There are many efforts to summarize Latin texts. However, summarizing Arabic texts is challenging for many reasons, including the language's complexity, structure, and morphology. Also, there is a need for benchmark data sources and a gold-standard Arabic evaluation metrics summary. Thus, the contribution of this research is multi-fold: First, it introduces a new Arabic benchmark dataset, called the HASD, which includes 43k articles with their extractive and abstractive summaries. Second, it presents a new Arabic benchmark dataset called the AASD, which includes 150k articles with their abstractive summaries. Third, this work modifies the well-known extractive EASC benchmarks by adding to each text its abstractive summarization. Fourth, this paper proposes a new measure called the Arabic-rouge measure for the abstractive summary depending on structure and similarity between words. Finally, an investigation of the impact of using abstractive Arabic text summarization on different transformer models with different data sets. The model is tested using the proposed HASD, AASD, modified EASC benchmarks and evaluated using Rouge, Bleu, and Arabic Rouge. The experimental results show satisfactory results compared to state-of-the-art methods.
Heidi Ahmed HolielNancy MohamedArwa AhmedWalaa Medhat
Mustapha BenbarkaMoulay Abdellah Kassimi
Yaser M. WazeryMarwa E. SalehAbdullah AlharbiAbdelmgeid A. Ali
Mohammad Bani-AlmarjehMohamad-Bassam Kurdy