JOURNAL ARTICLE

Increasing Accuracy of Support Vector Machine (SVM) By Applying N-Gram and Chi-Square Feature Selection for Text Classification

Abstract

Text mining is a technique that can be used for data processing. News classification is one of the text mining applications. Support Vector Machine is an algorithm that can be used for news classification. However, SVM performance is less than optimal when applied to large datasets. The number of attributes used is also a problem in classification. The number of these attributes will affect the performance of the classifier. This research aims to increase the Accuracy of SVM by applying N-gram and Chi-square feature selection. SVM accuracy without addition N-gram and feature selection have an accuracy of 96.40%. SVM accuracy by applying bigram and Chi-square feature selection with 70% feature reduction increased 0.95% has an accuracy of 97.35%. SVM accuracy by applying unigram and Chi-square feature selection with 90% reduction features increased by 1.58% with the highest accuracy value 97.98%. With this best pattern, the testing data is tested, and the results show improvement. SVM accuracy without applying N-gram and without feature selection has an accuracy of 76.80%. SVM accuracy by applying bigram and Chi-square with 70% feature reduction has an accuracy of 82%. SVM accuracy by applying unigram and Chi-square with a 90% reduction feature obtains the highest accuracy of 82.40%. From these studies, SVM performance is influenced by applying N-gram and Chi-square, which affect the number of features. The best text classification performance can be obtained maximally if the N-gram value and the feature amount are determined precisely.

Keywords:
Support vector machine n-gram Gram Feature selection Selection (genetic algorithm) Computer science Pattern recognition (psychology) Artificial intelligence Feature (linguistics) Structured support vector machine Chi-square test Machine learning Data mining Statistics Mathematics

Metrics

3
Cited By
0.42
FWCI (Field Weighted Citation Impact)
23
Refs
0.69
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Spam and Phishing Detection
Physical Sciences →  Computer Science →  Information Systems
Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Influence of Word Normalization and Chi-Squared Feature Selection on Support Vector Machine (SVM) Text Classification

Ardy Wibowo HaryantoEdy Kholid MawardiMuljono Muljono

Journal:   2018 International Seminar on Application for Technology of Information and Communication Year: 2018 Pages: 229-233
JOURNAL ARTICLE

Penerapan Algoritma Support Vector Machine (SVM) dengan TF-IDF N-Gram untuk Text Classification

Nur ArifinUltach EnriNina Sulistiyowati

Journal:   STRING (Satuan Tulisan Riset dan Inovasi Teknologi) Year: 2021 Vol: 6 (2)Pages: 129-129
JOURNAL ARTICLE

Toxic Comment Classification on Social Media Using Support Vector Machine and Chi Square Feature Selection

Nadhia Salsabila AzzahraDanang Triantoro MurdiansyahKemas Muslim Lhaksmana

Journal:   International Journal on Information and Communication Technology (IJoICT) Year: 2021 Vol: 7 (1)Pages: 64-76
JOURNAL ARTICLE

Lung Cancer Detection Using Chi-Square Feature Selection and Support Vector Machine Algorithm

Prabhpreet KaurN BanerjeeS DasS BharatiP PodderR MondalA MahmoodM Al-MasudS BhatiaY SinhaL Goel

Journal:   International Journal of Advanced Trends in Computer Science and Engineering Year: 2021 Vol: 10 (3)Pages: 2050-2060
© 2026 ScienceGate Book Chapters — All rights reserved.