JOURNAL ARTICLE

Improving Structure-Based Virtual Screening with Ensemble\nDocking and Machine Learning

Abstract

One of the main challenges of structure-based\nvirtual screening\n(SBVS) is the incorporation of the receptor’s flexibility,\nas its explicit representation in every docking run implies a high\ncomputational cost. Therefore, a common alternative to include the\nreceptor’s flexibility is the approach known as ensemble docking.\nEnsemble docking consists of using a set of receptor conformations\nand performing the docking assays over each of them. However, there\nis still no agreement on how to combine the ensemble docking results\nto obtain the final ligand ranking. A common choice is to use consensus\nstrategies to aggregate the ensemble docking scores, but these strategies\nexhibit slight improvement regarding the single-structure approach.\nHere, we claim that using machine learning (ML) methodologies over\nthe ensemble docking results could improve the predictive power of\nSBVS. To test this hypothesis, four proteins were selected as study\ncases: CDK2, FXa, EGFR, and HSP90. Protein conformational ensembles\nwere built from crystallographic structures, whereas the evaluated\ncompound library comprised up to three benchmarking data sets (DUD,\nDEKOIS 2.0, and CSAR-2012) and cocrystallized molecules. Ensemble\ndocking results were processed through 30 repetitions of 4-fold cross-validation\nto train and validate two ML classifiers: logistic regression and\ngradient boosting trees. Our results indicate that the ML classifiers\nsignificantly outperform traditional consensus strategies and even\nthe best performance case achieved with single-structure docking.\nWe provide statistical evidence that supports the effectiveness of\nML to improve the ensemble docking performance.

Keywords:
Docking (animal) Boosting (machine learning) Ensemble learning Virtual screening Protein–ligand docking Benchmarking Training set Support vector machine

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.40
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Genetic diversity and population structure
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Genetics
Advanced Statistical Modeling Techniques
Physical Sciences →  Computer Science →  Computer Networks and Communications
Phytoplasmas and Hemiptera pathogens
Life Sciences →  Agricultural and Biological Sciences →  Plant Science

Related Documents

JOURNAL ARTICLE

Improving Structure-Based Virtual Screening with Ensemble Docking and Machine Learning

Joel Ricci-LópezSergio A. ÁguilaMichael K. GilsonCarlos A. Brizuela

Journal:   Journal of Chemical Information and Modeling Year: 2021 Vol: 61 (11)Pages: 5362-5376
JOURNAL ARTICLE

Ensemble Machine Learning Approaches in Molecular Fingerprint based Virtual screening

Vinay KumarK AparnaR. AniO. S. Deepa

Journal:   2021 2nd Global Conference for Advancement in Technology (GCAT) Year: 2021 Pages: 1-6
JOURNAL ARTICLE

Improved method of structure-based virtual screening based on ensemble learning

Jin LiWeichao LiuYongping SongJiYi Xia

Journal:   RSC Advances Year: 2020 Vol: 10 (13)Pages: 7609-7618
JOURNAL ARTICLE

Machine‐learning scoring functions for structure‐based virtual screening

Hongjian LiKam‐Heung SzeGang LüPedro J. Ballester

Journal:   Wiley Interdisciplinary Reviews Computational Molecular Science Year: 2020 Vol: 11 (1)
© 2026 ScienceGate Book Chapters — All rights reserved.