JOURNAL ARTICLE

Protein-Protein Interaction Sites Prediction Based on an Under-Sampling Strategy and Random Forest Algorithm

Minjie LiZiheng WuWenyan WangKun LuJun ZhangYuming ZhouZhaoquan ChenDan LiShicheng ZhengPeng ChenBing Wang

Year: 2021 Journal:   IEEE/ACM Transactions on Computational Biology and Bioinformatics Vol: 19 (6)Pages: 3646-3654   Publisher: Institute of Electrical and Electronics Engineers

Abstract

The computational methods of protein-protein interaction sites prediction can effectively avoid the shortcomings of high cost and time in traditional experimental approaches. However, the serious class imbalance between interface and non-interface residues on the protein sequences limits the prediction performance of these methods. This work therefore proposed a new strategy, NearMiss-based under-sampling for unbalancing datasets and Random Forest classification (NM-RF), to predict protein interaction sites. Herein, the residues on protein sequences were represented by the PSSM-derived features, hydropathy index (HI) and relative solvent accessibility (RSA). In order to resolve the class imbalance problem, an under-sampling method based on NearMiss algorithm is adopted to remove some non-interface residues, and then the random forest algorithm is used to perform binary classification on the balanced feature datasets. Experiments show that the accuracy of NM-RF model reaches 87.6% and 84.3% on Dtestset72 and PDBtestset164 respectively, which demonstrate the effectiveness of the proposed NM-RF method in differentiating the interface or non-interface residues.

Keywords:
Random forest Interface (matter) Computer science Sampling (signal processing) Algorithm Feature (linguistics) Data mining Binary number Protein structure prediction Pattern recognition (psychology) Artificial intelligence Protein structure Mathematics Chemistry

Metrics

18
Cited By
0.78
FWCI (Field Weighted Citation Impact)
50
Refs
0.67
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Protein Structure and Dynamics
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Machine Learning in Bioinformatics
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
RNA and protein synthesis mechanisms
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
© 2026 ScienceGate Book Chapters — All rights reserved.