Parkinson's disease (PD) is chronic, permanent, and life-threatening. Neurologically protective treatments for PD rely on early detection. Recent studies have demonstrated that clinical data, cerebrospinal Fluid (CSF) based proteomes, and gene mutations are important biomarkers for accurate and early detection of PD. This study aims to investigate the heterogeneous data comprised of CSF-based clinical data, CSF-based proteomic analysis data as well as the mutation information of the genes, Glucose Beta Acid (GBA), leucine-rich kinase (LRRK2) to classify controls into PD-affected and Healthy Control (HC). The dataset contains 1103 controls (569 PD affected and 534 HC). Automated Machine Learning (AutoML) framework using PyCaret is utilized. The study has proposed an Extra Tree Classifier (ETC) as a feature selection mechanism to select features that significantly affect the PD classification. Selected features are further used to train Random Forest (RF), Logistic Regression (LR), and Decision Tree (DT) classifiers. Accuracy, sensitivity, specificity, area under the receiver operating characteristic curve (AUC-ROC), and the confusion matrix are used to evaluate the performance of classifiers. RF has depicted the best performance in terms of accuracy value of 96.12%, sensitivity of 95.59%, and specificity of 95.34% while LR has shown the highest AUC value of 98.33. RF has made the highest number of correct predictions 316 out of 331.
Diana BabySujitha JulietD. Jude HemanthM. M. Anishin Raj
Ahmad SanmorinoLuis MarnisahHastha Sunardi
Prabhleen Kaur ChawlaMeera Surendran NairDattakumar Gajanan MalkhedeHemprasad Yashwant PatilSumit Kumar JindalAvinash ChandraMahadev A. Gawas
Muhammad Akmal Al GhifariIrwan BudimanTriando Hamonangan SaragihMuhammad Itqan MazdadiRudy HertenoHasri Akbar Awal Rozaq