JOURNAL ARTICLE

Extractive summarization in Hindi using BERT-based ensemble model

Aravind DendukuriSagar GoyalJannat AroraAbhinav Pradeep

Year: 2022 Journal:   2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT) Pages: 1-7

Abstract

The past few years have seen a massive growth in the number of daily internet users whose primary language of communication is Hindi. Hindi is now one of the most spoken languages in the world and the official language of the Indian Government. Given this considerable rise in the amount of data in Hindi, managing, analyzing, and summarizing documents becomes a significant task with many applications. But language models and Natural Language Processing tasks catering to this demographic have been very limited in scope. Even state-of-the-art multilingual models cannot handle the nuances of the language. To bridge this gap, the MuRIL [37] language model was implemented and trained on large-scale Indian text corpora. The present work focuses on the summarization task for Hindi documents. We leverage the power of the MuRIL model and develop a novel extractive summarization-based solution using the language model's embeddings. Newspaper articles spanning several categories are extracted as our training data, and comprehensive testing shows that our model exceeds the performance of the previous baselines on the accuracy metric.

Keywords:
Automatic summarization Hindi Computer science Natural language processing Artificial intelligence

Metrics

4
Cited By
0.47
FWCI (Field Weighted Citation Impact)
49
Refs
0.60
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.