Performance Analysis of Supervised Machine Learning Approaches for Bengali Text Categorization

Ronald Tudu; Shaibal Saha; Prasun Nandy Pritam; Rajesh Palit

doi:10.1109/apwconcse.2018.00043

ScienceGate Book Chapters

JOURNAL ARTICLE

Performance Analysis of Supervised Machine Learning Approaches for Bengali Text Categorization

Ronald Tudu Shaibal Saha Prasun Nandy Pritam Rajesh Palit

Year: 2018 Vol: 3 Pages: 221-226

DOI: 10.1109/apwconcse.2018.00043

Get Full-Text PDF Get Analytical Report

Abstract

In this digital era, enormous amount of data are being generated everyday, and most of them are unstructured textual data. An automated text classifier helps to categorize the texts automatically into pre-defined categories. With the help of machine learning we can learn about the features of precategorized documents and predict document's category. Bengali language is one of the most spoken languages in the world. It has become essential to implement automated text categorization for Bengali language. Text categorization mostly uses data mining algorithms along with NLP tools, feature extraction and selection methods with vector space modeling. In this paper, we have measured the performance of Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), Stochastic Gradient Descent (SGD) and Logistic Regression (LR) methods using an open source Bengali newspaper article corpus containing 84; 906 articles of 10 categories. The impact of the size of the training dataset on the accuracy of the classification was examined for different algorithms. We have documented the execution time to train the methods and discussed issues and challenges in Bengali text categorization. This paper can be used as a reference work for future researchers in Bengali text categorization.

Keywords:

Bengali Computer science Artificial intelligence Categorization Natural language processing Support vector machine Bigram Machine learning Naive Bayes classifier Classifier (UML)

Metrics

Cited By

0.40

FWCI (Field Weighted Citation Impact)

Refs

0.70

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Text and Document Classification Technologies

Physical Sciences → Computer Science → Artificial Intelligence

Spam and Phishing Detection

Physical Sciences → Computer Science → Information Systems

Imbalanced Data Classification Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Performance Analysis of Supervised Machine Learning Approaches for Bengali Text Categorization

Abstract

Metrics

Citation History

Topics

Related Documents

Text Categorization using Supervised Machine Learning Techniques

Arabic Text Categorization using Machine Learning Approaches

Text Categorization in Indian Languages using Machine Learning Approaches.

Text categorization Performance examination Using Machine Learning Algorithms

A Study on Optimized Multi-Language Classification and Text Categorization Using Supervised Hybrid Machine Learning Approaches