JOURNAL ARTICLE

Performance Analysis of Supervised Machine Learning Approaches for Bengali Text Categorization

Abstract

In this digital era, enormous amount of data are being generated everyday, and most of them are unstructured textual data. An automated text classifier helps to categorize the texts automatically into pre-defined categories. With the help of machine learning we can learn about the features of precategorized documents and predict document's category. Bengali language is one of the most spoken languages in the world. It has become essential to implement automated text categorization for Bengali language. Text categorization mostly uses data mining algorithms along with NLP tools, feature extraction and selection methods with vector space modeling. In this paper, we have measured the performance of Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), Stochastic Gradient Descent (SGD) and Logistic Regression (LR) methods using an open source Bengali newspaper article corpus containing 84; 906 articles of 10 categories. The impact of the size of the training dataset on the accuracy of the classification was examined for different algorithms. We have documented the execution time to train the methods and discussed issues and challenges in Bengali text categorization. This paper can be used as a reference work for future researchers in Bengali text categorization.

Keywords:
Bengali Computer science Artificial intelligence Categorization Natural language processing Support vector machine Bigram Machine learning Naive Bayes classifier Classifier (UML)

Metrics

4
Cited By
0.40
FWCI (Field Weighted Citation Impact)
18
Refs
0.70
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Spam and Phishing Detection
Physical Sciences →  Computer Science →  Information Systems
Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Arabic Text Categorization using Machine Learning Approaches

Riyad Alshammari

Journal:   International Journal of Advanced Computer Science and Applications Year: 2018 Vol: 9 (3)
JOURNAL ARTICLE

Text Categorization in Indian Languages using Machine Learning Approaches.

K. RaghuveerK. N. Balasubramanya Murthy

Journal:   Indian International Conference on Artificial Intelligence Year: 2007 Vol: 108 (6)Pages: 1864-1883
JOURNAL ARTICLE

Text categorization Performance examination Using Machine Learning Algorithms

Bonthala Prabhanjan YadavSukhaveerji GhateA. HarshavardhanG. JhansiKomuravelly Sudheer KumarE. C. G. Sudarshan

Journal:   IOP Conference Series Materials Science and Engineering Year: 2020 Vol: 981 (2)Pages: 022044-022044
© 2026 ScienceGate Book Chapters — All rights reserved.