Extractive Text Summarization for Ge'ez Language

Dejen Wuletaw

doi:10.20372/nadre:10505

ScienceGate Book Chapters

JOURNAL ARTICLE

Extractive Text Summarization for Ge'ez Language

Dejen Wuletaw

Year: 2024 Journal: National Academic Digital Repository of Ethiopia

DOI: 10.20372/nadre:10505

Get Full-Text PDF Get Analytical Report

Abstract

The amount of text available online increasing rapidly, and making it difficult to find essential or relevant information from it. Due to the vast amount of information available, significant time and resources are spent trying to understand the relevant material. To address this issue, effective information reduction techniques are necessary to extract concise, accurate, and coherent summaries while preserving the original meaning. A helpful tool is automatic text summarization, which can provide a concise overview of a documents. To address the challenges of summarizing Ge’ez texts, this study proposed extractive summarization approach, the researchers proposed an improved method for automatic text summarization. They modified the existing research methodology by using term frequency-inverse document frequency (TF*IDF) and TextRank algorithms to create a more effective system that generates summaries by selecting important sentences from the original document or texts. A graph was created where sentences in the document were nodes and the connections between them (edges) represented similarity. The most important sentences were selected from the original document, and a summary was formed from its extracted sentence. To achieve this goal, the researcher prepared a comprised 120 Ge’ez text document of datasets, manually labeled by experts. Documents range from 101 to 1041 words, with an average length of 18 sentences and 256 words. The proposed method was tested by 25% and 30% extraction rate of prepared reference summaries (240 documents) and compared to standard methods like TextRank and TF*IDF algorithms, using precision, recall, and f1-score as evaluation measures. ROUGE-1, ROUGE-2, and ROUGE-L scores were calculated for all methods. The average f1-scores for experiment of TF*IDF algorithm were 72.5%, 59.2%, and 68.44% at evaluation metrics of ROUGE-1, ROUGE-2 and ROUGE-L respectively at 30% of extraction rate. For experiment of TextRank algorithm these scores were 79%, 65.67%, and 75% at evaluation metrics of ROUGE-1, ROUGE-2 and ROUGE-L respectively. The results of the experiment show that the proposed method can effectively summarize any type of documents irrespective of the category it belongs to.

Keywords:

Automatic summarization Keyword extraction Term (time) Information extraction Graph Range (aeronautics) Text graph

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.44

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Text and Document Classification Technologies

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Extractive Text Summarization for Ge'ez Language

Abstract

Metrics

Topics

Related Documents

Extractive Text Summarization for Ge'ez Language

Extractive Text Summarization for Azerbaijani Language

Automatic Extractive text Summarization for Ho Language

Extractive Text Summarization Using Formality of Language

Extractive Text Summarization Models for Urdu Language