Wenqi ShiMio MurakosoXiaoyan GuoMay D. Wang
With the extensive time and financial requirements incumbent on drug discovery, computational approaches, such as protein-ligand docking predictions, are increasingly crucial for accelerating the process of drug repurposing. However, the proliferation of identified protein targets has exposed a critical knowledge gap in developing robust models that offer both generalizability and interpretability for docking score prediction. Addressing this, our study presents a machine learning-based surrogate model, employing interpretable artificial intelligence techniques for accurate docking score prediction for SARS-CoV-2 protein targets. We demonstrate the model generalization on its expansion to accommodate unseen protein targets by integrating protein target information through feature concatenation. Moreover, we leverage the SHapley Additive exPlanations (SHAP) method to identify the data-driven feature importance of molecular substructures for knowledge-based validation. Our experiments reveal that the combination of data-driven prediction and knowledge-driven validation could provide biomedical insights into the interactions between drugs and SARS-CoV-2 proteins, elucidating their consequent effects on docking scores.
Alexandra ZamitaloQingtong XieMayar AllamPhinu PhilipWenqi ShiFelipe GiusteBenoit MarteauMio MurakosoMay D. Wang
MauroS. Nogueira (1665586)Oliver Koch (382222)
Xu ZhangLin LiuMinxuan LanGuangwen SongLuzi XiaoJianguo Chen
Babagana ModuIbrahim Adamu Fika