JOURNAL ARTICLE

Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS

Biqing LiKaiyan FengLei ChenTao HuangYu‐Dong Cai

Year: 2012 Journal:   PLoS ONE Vol: 7 (8)Pages: e43927-e43927   Publisher: Public Library of Science

Abstract

Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction.

Keywords:
Random forest Protein structure prediction Feature (linguistics) Feature selection Computer science Redundancy (engineering) Accessible surface area Artificial intelligence Protein structure Pattern recognition (psychology) Computational biology Machine learning Algorithm Bioinformatics Biology Biochemistry

Metrics

111
Cited By
3.92
FWCI (Field Weighted Citation Impact)
56
Refs
0.94
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Machine Learning in Bioinformatics
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Protein Structure and Dynamics
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Computational Drug Discovery Methods
Physical Sciences →  Computer Science →  Computational Theory and Mathematics
© 2026 ScienceGate Book Chapters — All rights reserved.