JOURNAL ARTICLE

Information Gain Based Feature Selection for Improved Textual Sentiment Analysis

R. MadhumathiA. Meena Kowshalya

Year: 2022 Journal:   Wireless Personal Communications Vol: 125 (2)Pages: 1203-1219   Publisher: Springer Science+Business Media

Abstract

Sentiment analysis or opinion mining is the process of mining the emotion from a given text. It is a text mining technique that effectively measures the inclination of public opinions and aids in analysing the subjective information from the given context. Sentiment analysis evaluates the opinion of a sentiment as either positive or negative or neutral. Sentiments are very specific and with respect to the underlying content, it plays a very crucial role in depicting the real-world scenario. Sentiment analysis can be performed at three levels namely document level, sentence level and feature level. This paper proposes a novel Information Gain based Feature Selection algorithm that selects highly correlated features by removing inappropriate content. Using this algorithm, extensive sentimental analysis is performed at the document level, sentence level and feature level. Datasets from Cornell and Kaggle are exploited for experimental purposes. Compared to other baseline classifiers experimental results show that the proposed Information Gain based classifier resulted in an accuracy of 95, 96.3 and 97.4% for document, sentence and feature levels respectively. The proposed method is also tested with higher dimensional datasets namely Movielens 1M, 10M and 25M datasets. Experimental results proved that the proposed method works better even for high dimensional datasets.

Keywords:
Sentiment analysis Computer science Sentence Feature selection Classifier (UML) MovieLens Feature (linguistics) Artificial intelligence Information gain Context (archaeology) Selection (genetic algorithm) Data mining Natural language processing Machine learning Recommender system Collaborative filtering

Metrics

16
Cited By
3.13
FWCI (Field Weighted Citation Impact)
30
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.