A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for Regression

Amjad Ali; Muhammad Hamraz; Poom Kumam; Dost Muhammad Khan; Umair Khalil; Muhammad Sulaiman; Zardad Khan

doi:10.60692/w57kw-4p822

ScienceGate Book Chapters

JOURNAL ARTICLE

A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for Regression

Amjad Ali Muhammad Hamraz Poom Kumam Dost Muhammad Khan Umair Khalil Muhammad Sulaiman Zardad Khan

Year: 2020 Journal: Greater South Information System

DOI: 10.60692/w57kw-4p822

Get Full-Text PDF Get Analytical Report

Abstract

Ensemble methods based on $k$ -NN models minimise the effect of outliers in a training dataset by searching groups of the $k$ closest data points to estimate the response of an unseen observation. However, traditional $k$ -NN based ensemble methods use the arithmetic mean of the training points' responses for estimation which has several weaknesses. Traditional $k$ -NN based models are also adversely affected by the presence of non-informative features in the data. This paper suggests a novel ensemble procedure consisting of a class of base $k$ -NN models each constructed on a bootstrap sample drawn from the training dataset with a random subset of features. In the $k$ nearest neighbours determined by each $k$ -NN model, stepwise regression is fitted to predict the test point. The final estimate of the target observation is then obtained by averaging the estimates from all the models in the ensemble. The proposed method is compared with some other state-of-the-art procedures on 16 benchmark datasets in terms of coefficient of determination ( $R^{2}$ ), Pearson's product-moment correlation coefficient ( $r$ ), mean square predicted error ( $MSPE$ ), root mean squared error ( $RMSE$ ) and mean absolute error ( $MAE$ ) as performance metrics. Furthermore, boxplots of the results are also constructed. The suggested ensemble procedure has outperformed the other procedures on almost all the datasets. The efficacy of the method has also been verified by assessing the proposed method in comparison with the other methods by adding non-informative features to the datasets considered. The results reveal that the proposed method is more robust to the issue of non-informative features in the data as compared to the rest of the methods.

Keywords:

Outlier Mean squared error Benchmark (surveying) Regression Ensemble learning Random forest Correlation coefficient Model selection Feature selection Ensemble forecasting

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.28

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Statistical Methods and Models

Physical Sciences → Mathematics → Statistics and Probability

Face and Expression Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Anomaly Detection Techniques and Applications

Physical Sciences → Computer Science → Artificial Intelligence

A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for Regression

Abstract

Metrics

Topics

Related Documents

A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for Regression

A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for Regression

ROBUST ESTIMATION METHODS FOR k-NEAREST NEIGHBOURS ENSEMBLE REGRESSION MODEL

ROBUST ESTIMATION METHODS FOR k-NEAREST NEIGHBOURS ENSEMBLE REGRESSION MODEL

Optimal -k nearest neighbours based ensemble for classification and feature selection in chemometrics data