JOURNAL ARTICLE

Classification of Class-Imbalanced Data: Effect of Over-sampling and Under-sampling of Training Data

;;

Year: 2004 Journal:   Korean Journal of Applied Statistics Vol: 17 (3)Pages: 445-457

Abstract

두 계급의 분류문제에서 두 계급의 관측 개체수가 심하게 불균형을 이룬 자료를 분석할 때, 흔히 인위적으로 두 계급의 크기를 비슷하게 해준 다음 분석한다. 본 연구에서는 이런 훈련표본 구성방법의 타당성에 대해 알아보았다. 또한 훈련표본의 구성방법이 부스팅에 미치는 효과에 대해서도 알아보았다. 12개의 실제 자료에 대한 실험 결과 나무모형으로 부스팅 기법을 적용할 때는 훈련표본을 그대로 둔 채 분석하는 것이 좋다는 결론을 얻었다. Given class-imbalanced data in two-class classification problem, we often do over-sampling and/or under-sampling of training data to make it balanced. We investigate the validity of such practice. Also we study the effect of such sampling practice on boosting of classification trees. Through experiments on twelve real datasets it is observed that keeping the natural distribution of training data is the best way if you plan to apply boosting methods to class-imbalanced data.

Keywords:
Boosting (machine learning) Computer science Data sampling Training set Machine learning Artificial intelligence Class (philosophy) Sampling (signal processing) Oversampling Data mining Decision tree

Metrics

3
Cited By
0.00
FWCI (Field Weighted Citation Impact)
2
Refs
0.51
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Face and Expression Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Machine Learning and Data Classification
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Over-sampling via under-sampling in strongly imbalanced data

Jamili Oskouei, Rozitasadeghi bigham, bahram

Journal:   OPAL (Open@LaTrobe) (La Trobe University) Year: 2025
JOURNAL ARTICLE

Over-sampling via under-sampling in strongly imbalanced data

Bahram Sadeghi BighamRozita Jamili Oskouei

Journal:   International Journal of Advanced Intelligence Paradigms Year: 2016 Vol: 9 (1)Pages: 58-58
JOURNAL ARTICLE

Borderline over-sampling for imbalanced data classification

Hien M. NguyenEric W. CooperKatsuari Kamei

Journal:   International Journal of Knowledge Engineering and Soft Data Paradigms Year: 2011 Vol: 3 (1)Pages: 4-4
JOURNAL ARTICLE

Over-sampling via under-sampling in strongly imbalanced data

Rozita Jamili OskoueiBahram Sadeghi Bigham

Journal:   International Journal of Advanced Intelligence Paradigms Year: 2016 Vol: 9 (1)Pages: 58-58
© 2026 ScienceGate Book Chapters — All rights reserved.