IMPROVING ARABIC TEXT SUMMARIZATION USING ADVANCED PRE-TRAINED MODELS

Mustafa Abdul Salam; Mohamed Aldawsari; Mostafa Gamal; Hesham F.A. Hamed; Sara Sweidan

doi:10.35741/issn.0258-2724.59.3.5

ScienceGate Book Chapters

JOURNAL ARTICLE

IMPROVING ARABIC TEXT SUMMARIZATION USING ADVANCED PRE-TRAINED MODELS

Mustafa Abdul Salam Mohamed Aldawsari Mostafa Gamal Hesham F.A. Hamed Sara Sweidan

Year: 2024 Journal: Journal of Southwest Jiaotong University Vol: 59 (3) Publisher: Science Press

DOI: 10.35741/issn.0258-2724.59.3.5

Get Full-Text PDF Get Analytical Report

Abstract

The exponential growth of online content has made the task of locating specific information increasingly challenging, thus highlighting the necessity for automated text summarization. Deep learning techniques, particularly neural abstractive models such as Seq2Seq, have emerged as prominent solutions to this issue. The utilization of pre-trained models, including GPT, BERT, BART, and T5, has notably enhanced the quality of text summarization by addressing key aspects such as saliency, fluency, and semantic coherence. However, while these advancements have greatly benefited the English language, there remains a significant gap in support for low-resource languages. To bridge this gap, monolingual BERT and multilingual Seq2Seq models have been developed, enabling the application of state-of-the-art summarization techniques to languages like Arabic. Our research capitalizes on pre-trained Seq2Seq models to achieve superior results in Arabic text summarization tasks by leveraging datasets such as XLSum and Hindawi Books. Notably, the titles within these datasets serve as robust benchmarks for evaluating the effectiveness of our summarization techniques, underscoring the importance of high-quality input data. Additionally, one of our key contributions lies in the implementation of fine-tuning for the model using reinforcement learning. This innovative approach enhances the adaptability and performance of the model. Our findings indicate that monolingual BERT models outperform their multilingual counterparts, yielding a notable 2.4% increase in Rouge scores, further improving the quality of text summarization in Arabic. Our study encompasses cross-dataset evaluations, exploration of various text generation methodologies, and in-depth preprocessing analysis tailored specifically for Arabic text. By presenting a comprehensive approach to address the challenges in Arabic text summarization, our study contributes to the advancement of the field and underscores the significance of supporting low-resource languages in natural language processing tasks.

Keywords:

Automatic summarization Arabic Natural language processing Computer science Artificial intelligence Information retrieval Linguistics Philosophy

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.09

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

IMPROVING ARABIC TEXT SUMMARIZATION USING ADVANCED PRE-TRAINED MODELS

Abstract

Metrics

Topics

Related Documents

Arabic Extractive Summarization Using Pre-Trained Models

Exploring Arabic Pre-Trained Language Models for Arabic Abstractive Text Summarization

An Analysis of Abstractive Text Summarization Using Pre-trained Models

Enhancing Summarization of Legal Text Documents using Pre-trained Models

Automatic Text Summarization Based on Pre-trained Models