JOURNAL ARTICLE

Automatic Extractive Text Summarization using Multiple Linguistic Features

Pooja GuptaSwati NigamRajiv Singh

Year: 2024 Journal:   ACM Transactions on Asian and Low-Resource Language Information Processing   Publisher: Association for Computing Machinery

Abstract

Automatic text summarization (ATS) provides a summary of distinct categories of information using natural language processing (NLP). Low-resource languages like Hindi have restricted applications of these techniques. This study proposes a method for automatically generating summaries of Hindi documents using extractive technique. The approach retrieves pertinent sentences from the source documents by employing multiple linguistic features and machine learning (ML) using maximum likelihood estimation (MLE) and maximum entropy (ME). We conducted pre-processing on the input documents, such as eliminating Hindi stop words and stemming. We have obtained 15 linguistic feature scores from each document to identify the phrases with high scores for summary generation. We have performed experiments over BBC News articles, CNN News, DUC 2004, Hindi Text Short Summarization Corpus, Indian Language News Text Summarization Corpus, and Wikipedia Articles for the proposed text summarizer. The Hindi Text Short Summarization Corpus and Indian Language News Text Summarization Corpus datasets are in Hindi, whereas BBC News articles, CNN News, and the DUC 2004 datasets have been translated into Hindi using Google, Microsoft Bing, and Systran translators for experiments. The summarization results have been calculated and shown for Hindi as well as for English to compare the performance of a low and rich-resource language. Multiple ROUGE metrics, along with precision, recall, and F-measure, have been used for the evaluation, which shows the better performance of the proposed method with multiple ROUGE scores. We compare the proposed method with the supervised and unsupervised machine learning methodologies, including support vector machine (SVM), Naive Bayes (NB), decision tree (DT), latent semantic analysis (LSA), latent Dirichlet allocation (LDA), and K-means clustering, and it was found that the proposed method outperforms these methods.

Keywords:
Automatic summarization Computer science Natural language processing Artificial intelligence Linguistics Information retrieval Philosophy

Metrics

7
Cited By
4.47
FWCI (Field Weighted Citation Impact)
30
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Statistical Features for Extractive Automatic Text Summarization

Yogesh Kumar MeenaDinesh Gopalani

Natural Language Processing Year: 2019 Pages: 619-637
BOOK-CHAPTER

Statistical Features for Extractive Automatic Text Summarization

Yogesh Kumar MeenaDinesh Gopalani

Advances in business information systems and analytics book series Year: 2016 Pages: 126-144
BOOK-CHAPTER

Extractive Text Summarization Using Topological Features

Ankit KumarApurba Sarkar

Lecture notes in computer science Year: 2023 Pages: 105-121
JOURNAL ARTICLE

Automatic Persian Text Summarization Using Linguistic Features from Text Structure Analysis

Ebrahim HeidaryHam飀 Parv飊Samad NejatianKaramollah BagherifardVahideh Rezaie

Journal:   Computers, materials & continua/Computers, materials & continua (Print) Year: 2021 Vol: 69 (3)Pages: 2845-2861
© 2026 ScienceGate Book Chapters — All rights reserved.