JOURNAL ARTICLE

Optimizing breast cancer classification using SMOTE, Boruta, and XGBoost

Cicin Hardiyanti P

Year: 2025 Journal:   Science in Information Technology Letters Vol: 6 (1)Pages: 16-33

Abstract

Breast cancer remains one of the leading causes of death among women worldwide. This study aims to develop a clinical data-based breast cancer classification framework by integrating the Synthetic Minority Oversampling Technique (SMOTE), the Boruta feature selection algorithm, and the XGBoost classifier. The proposed approach is tested using the Wisconsin Breast Cancer Diagnostic (WBCD) dataset, consisting of 569 samples and 30 numerical features. SMOTE addresses class imbalance, Boruta selects the most relevant diagnostic features, and XGBoost is the main classification algorithm due to its tabular and imbalanced data robustness. Model validation is conducted through Repeated Stratified K-Fold Cross Validation with 30 repetitions to ensure statistical stability. The resulting model achieves excellent classification performance, with an average accuracy of 0.9608 ± 0.0274, precision of 0.9465 ± 0.0481, Recall of 0.9512 ± 0.0524, and F1-score of 0.9475 ± 0.0374. The ROC-AUC value reaches 0.9926 ± 0.0094, the PR-AUC is 0.9906 ± 0.0113, and the Matthews Correlation Coefficient (MCC) is 0.9179 ± 0.0575, indicating a well-balanced model. Clinically, this model can aid early diagnosis by effectively reducing irrelevant diagnostic attributes, retaining only 10 key features without compromising accuracy, thereby offering a lightweight yet reliable diagnostic tool. However, limitations include the relatively small dataset and the absence of hyperparameter tuning. Future research should explore larger datasets, advanced ensemble methods, and interpretability techniques such as SHAP or LIME to improve clinical transparency and adoption.

Keywords:
Pattern recognition (psychology) Computer science Artificial intelligence

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.12
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

AI in cancer detection
Physical Sciences →  Computer Science →  Artificial Intelligence
Artificial Intelligence in Healthcare
Health Sciences →  Health Professions →  Health Information Management
Brain Tumor Detection and Classification
Life Sciences →  Neuroscience →  Neurology

Related Documents

JOURNAL ARTICLE

Breast Cancer Classification using XGBoost

Rahmanul HoqueSuman G. DasMahmudul HoqueMahmudul Hoque

Journal:   World Journal of Advanced Research and Reviews Year: 2024 Vol: 21 (2)Pages: 1985-1994
JOURNAL ARTICLE

Breast Cancer Classification using XGBoost

Rahmanul HoqueSuman DasMahmudul HoqueEhteshamul Haque

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2024
JOURNAL ARTICLE

Breast Cancer Classification using XGBoost

Rahmanul HoqueSuman DasMahmudul HoqueEhteshamul Haque

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2024
© 2026 ScienceGate Book Chapters — All rights reserved.