JOURNAL ARTICLE

Sparsity adjusted information gain for feature selection in sentiment analysis

Abstract

The widespread use of social media and the internet are emerging trends that offer an additional interaction channel for companies to better understand customer sentiments about their brands and products. Sentiment analysis uses text data from social media such as customer comments and reviews, which has the nature of high dimensionality. Without selection, typically there are at least thousands of features (words or phrases) that can be extracted from a text corpus, among which there are many redundant or irrelevant features for sentiment classification task. Thus, it is critical to select a compact yet effective set of features to avoid the complex classifier design and slow running time of classification process. However, very few of existing metrics is able to improve efficacy of feature selection by addressing the issue of sparsity of feature matrix for text data, i.e., many features may appear only in a few documents. In this paper, an improved feature selection metric known as sparsity adjusted information gain (SAIG) is proposed, which modifies the conventional information gain metric and aims to adjust the feature ranking scores according to the sparsity of the feature vector. It is able to use less features to obtain a targeted performance level. The experiment results show that SAIG is able to improve the performance of sentiment classification.

Keywords:
Computer science Feature selection Sentiment analysis Classifier (UML) Dimensionality reduction Artificial intelligence Feature (linguistics) Social media Curse of dimensionality Machine learning Ranking (information retrieval) Data mining The Internet World Wide Web

Metrics

13
Cited By
1.57
FWCI (Field Weighted Citation Impact)
23
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Information Gain Based Feature Selection for Improved Textual Sentiment Analysis

R. MadhumathiA. Meena Kowshalya

Journal:   Wireless Personal Communications Year: 2022 Vol: 125 (2)Pages: 1203-1219
JOURNAL ARTICLE

Assessment of Sentiment Analysis Using Information Gain Based Feature Selection Approach

R. MadhumathiA. Meena KowshalyaR. Shruthi

Journal:   Computer Systems Science and Engineering Year: 2022 Vol: 43 (2)Pages: 849-860
JOURNAL ARTICLE

Sentiment Analysis using Naive Bayes Classifier and Information Gain Feature Selection over Twitter

Manjit SinghSwati Gupta

Journal:   International Journal of Computer Trends and Technology Year: 2020 Vol: 68 (5)Pages: 84-91
JOURNAL ARTICLE

On the Feature Selection and Classification Based on Information Gain for Document Sentiment Analysis

Asriyanti Indah PratiwiAdiwijaya Adiwijaya

Journal:   Applied Computational Intelligence and Soft Computing Year: 2018 Vol: 2018 Pages: 1-5
© 2026 ScienceGate Book Chapters — All rights reserved.