JOURNAL ARTICLE

The Development of Target-Specific Machine Learning Models as Scoring Functions\nfor Docking-Based Target Prediction

MauroS. Nogueira (1665586)Oliver Koch (382222)

Year: 2019 Journal:   OPAL (Open@LaTrobe) (La Trobe University)   Publisher: La Trobe University

Abstract

The identification\nof possible targets for a known bioactive compound\nis of the utmost importance for drug design and development. Molecular\ndocking is one possible approach for <i>in-silico</i> protein\ntarget prediction, whereas a molecule is docked into several different\nprotein structures to identify potential targets. This reverse docking\napproach is hampered by the limitation of current scoring functions\nto correctly discriminate between targets and nontargets. In this\nwork, a development of target-specific scoring functions is described\nthat showed improved prediction performances for the correct target\nprediction of both actives and decoys on three validation data sets.\nIn contrast to pure ligand-based approaches, that are in general faster\nand include a greater target space, docking-based approaches can cover\nalso unknown chemical space that lies outside the known bioactivity\ndata. These target-specific scoring functions are based on known bioactivity\ndata retrieved from ChEMBL and supervised machine learning approaches.\nNeural Networks and Support Vector Machines (SVMs) models were trained\nfor 20 different protein targets. Our protein–ligand interaction\nfingerprint PADIF (Protein Atom Score Contributions Derived Interaction\nFingerprint) represents the input for training, whereas the PADIFs\nare calculated based on docking poses of active and inactive compounds.\nDifferent data sets of previously unseen molecules were used for the\nfinal evaluation and analysis of the prediction performance of the\ncreated models. For a single-target selectivity data set, the correct\ntarget model returns in most of the cases the highest probabilities\nscores for their active molecules and with statistically significant\ndifferences from the other targets. These probability scores were\nalso predicted and successfully used to rank the targets for molecules\nof a multitarget data set with activity data described simultaneously\nfor two, three, and four to seven protein targets.

Keywords:
chEMBL Support vector machine Training set Chemical space Set (abstract data type) Cross-validation Data set Rank (graph theory)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.45
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Genetic diversity and population structure
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Genetics
Nuts composition and effects
Health Sciences →  Nursing →  Nutrition and Dietetics
Advances in Cucurbitaceae Research
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Genetics
© 2026 ScienceGate Book Chapters — All rights reserved.