A method combining Spearman Correlation Coefficient (SCC), Maximal Information Coefficient (MIC), and Boruta algorithm is proposed to address the problem of low classification accuracy of traditional machine learning algorithms when processing features of enterprise credit data. The method is applied to Decision Trees, Extreme Gradient Boosting (XGBOOST), and Gradient Boosting Decision Tree (GBDT). Firstly, SCC is used to remove highly correlated features, and then MIC is used to find the strongest correlation between features and labels. Next, Boruta is embedded in the Random Forest model to find the optimal feature subset. Finally, the optimal feature subset is applied to the three classification models. Experimental results show that the feature subset selected by this method improves the classification accuracy of the three classification models by 1.18%, 1.18% and 3.53%, respectively.
Tri HandhikaMurni MurniRafi Mochamad Fahreza
Yutika AgarwalRita ChhikaraSanjeev Rana
Ajay KaushikSanchit AnandRiya SehgalNeeyati Anand
Neeyati AnandRiya SehgalSanchit AnandAjay Kaushik