Extractive summarization in Hindi using BERT-based ensemble model

Aravind Dendukuri; Sagar Goyal; Jannat Arora; Abhinav Pradeep

doi:10.1109/icccnt54827.2022.9984413

ScienceGate Book Chapters

JOURNAL ARTICLE

Extractive summarization in Hindi using BERT-based ensemble model

Aravind Dendukuri Sagar Goyal Jannat Arora Abhinav Pradeep

Year: 2022 Journal: 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT) Pages: 1-7

DOI: 10.1109/icccnt54827.2022.9984413

Get Full-Text PDF Get Analytical Report

Abstract

The past few years have seen a massive growth in the number of daily internet users whose primary language of communication is Hindi. Hindi is now one of the most spoken languages in the world and the official language of the Indian Government. Given this considerable rise in the amount of data in Hindi, managing, analyzing, and summarizing documents becomes a significant task with many applications. But language models and Natural Language Processing tasks catering to this demographic have been very limited in scope. Even state-of-the-art multilingual models cannot handle the nuances of the language. To bridge this gap, the MuRIL [37] language model was implemented and trained on large-scale Indian text corpora. The present work focuses on the summarization task for Hindi documents. We leverage the power of the MuRIL model and develop a novel extractive summarization-based solution using the language model's embeddings. Newspaper articles spanning several categories are extracted as our training data, and comprehensive testing shows that our model exceeds the performance of the previous baselines on the accuracy metric.

Keywords:

Automatic summarization Hindi Computer science Natural language processing Artificial intelligence

Metrics

Cited By

0.47

FWCI (Field Weighted Citation Impact)

Refs

0.60

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Extractive summarization in Hindi using BERT-based ensemble model

Abstract

Metrics

Citation History

Topics

Related Documents

BERT-based ensemble model for Hindi summarization

Hindi Text Summarization: Using BERT

Extractive Text Summarization Using BERT

Extractive Summarization Utilizing Keyphrases by Finetuning BERT-Based Model

BERT: Extractive Text Summarization