JOURNAL ARTICLE

An over-sampling method based on probability density estimation for imbalanced datasets classification

Abstract

Imbalanced data sets exist widely in real life. The identification of minority class tends to be the focus of classification as for imbalanced data sets. However, the results of classification of imbalanced data set by traditional support vector machines are poor. In order to improve the recognition accuracy of the minority class, an over-sampling method based on combination of probability density function estimation and Gibbs sampling is proposed. Firstly, probability density function of the minority class is estimated on the basis of Parzen window; then, Gibbs sampling technique is used to generate new samples which are in accordance with the minority sample distribution according to the acquired probability density function. Thus, a relative balanced training data set is generated. Finally, the support vector machine is learned on the new data set. Experimental results on a synthetic dataset and five benchmark UCI datasets are provided to show the effectiveness of the proposed method.

Keywords:
Support vector machine Benchmark (surveying) Computer science Probability density function Kernel density estimation Pattern recognition (psychology) Artificial intelligence Sampling (signal processing) Data set Density estimation Set (abstract data type) Data mining Machine learning Mathematics Statistics

Metrics

10
Cited By
0.56
FWCI (Field Weighted Citation Impact)
18
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Currency Recognition and Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Rough Sets and Fuzzy Logic
Physical Sciences →  Computer Science →  Computational Theory and Mathematics
© 2026 ScienceGate Book Chapters — All rights reserved.