JOURNAL ARTICLE

Chi-Square Feature Selection Technique for Student’s performance prediction

Himanshi BhoriaAmita DhankharKamna Solanki

Year: 2023 Journal:   Indian Journal of Science and Technology Vol: 16 (38)Pages: 3250-3257   Publisher: Indian Society for Education and Environment

Abstract

<h2>Abstract</h2> <p><strong>Objectives:</strong> The main goals of this study are: 1) To assess students’ performance using several machine learning models. 2) To identify the attributes influencing the student’s performance using feature selection. 3) To assess and compare machine learning model performance using accuracy, precision, recall, F-1 score, and AUC score (Area Under Curve) as performance indicators. 4) Compare the effectiveness of feature selection-based versus non-feature-based machine learning models. <strong>Methods:</strong> The student performance dataset from UCI has been taken for this study. It consists of 650 records with 32 features. The pertinent features are selected by applying the Chi-square method to facilitate the effective construction of the model. Further, the implementation has been performed by using the classification models. Lastly, how well the machine learning model has performed has been compared in terms of performance metrics namely accuracy, precision, recall, F-1 score, and AUC score. <strong>Findings:</strong> The findings related to the first objective showed that the outcome of the student performance is passed and failed. The experimental evaluation of the Decision tree (DT), random forest (RF), SVM, K-Nearest Neighbors Algorithm (KNN), and XGBoost are evaluated in terms of accuracy, precision, recall, F-1 score, and AUC score. The F-1 score achieved by the DT, RF, SVM, KNN, and XGBoost is 92.16, 95.06, 95.19,93.8 and 94.59 respectively. The finding to the second objective identifies the attributes: Failures, Schoolsup, First Period Grade(G1), Second Period Grade(G2), and Final Grade(G3) influence on students’ performance. The finding of the third Objective shows that Support Vector Machine classification model outperforms the other models with F-1 score of 95.19%. The finding related to the fourth objective identifies that Models with use feature selection techniques give more performance than the model which does not use it.<strong> Novelty:</strong> Using machine learning to predict students’ performance can revolutionize the education sector by providing a data-driven approach to evaluating academic performance. This research work proposed a new “Chi-Square Based Feature Selection” (CBFS) technique for the prediction of students’ performance. Moreover, using chi-square for feature selection involves selecting only the most relevant features, which helps reduce the model’s complexity and improves its performance.</p> <p><strong>Keywords:</strong> Machine Learning, Prediction, Dataset Problem, Early Warning System, Educational Data Mining</p>

Keywords:
Support vector machine Artificial intelligence Computer science Random forest F1 score Feature selection Machine learning Precision and recall Recall Feature (linguistics) Decision tree Mean squared error Outcome (game theory) Pattern recognition (psychology) Statistics Mathematics

Metrics

2
Cited By
1.49
FWCI (Field Weighted Citation Impact)
14
Refs
0.77
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Online Learning and Analytics
Physical Sciences →  Computer Science →  Computer Science Applications
Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Artificial Intelligence in Healthcare
Health Sciences →  Health Professions →  Health Information Management
© 2026 ScienceGate Book Chapters — All rights reserved.