Yang WangYanjiao ZengDongning LiuZhuowei Wang
Complexes from protein-protein interaction (PPI) are one of the fundamental molecular parts to perform a variety of biological functions, being of great importance in studying protein functions and action mechanisms. In this paper, we summarize the common protein sequence coding methods and propose a novel multiple-channel encoding, especially for convolutional neural networks (CNN). The proposed encoding consists of basic sequence information and additional sequence characteristics, such as amino acid contents and local sequence fragments. This new composite encoding provides specific and combined features from the original sequence data to enhance the feature abstraction capability of the CNN model. Results of encoding testing indicated performance improvement of 8.46% than the original SSC encoding method, and 4.13%-10.88% compared with literature methods. In 5-fold cross-validation experiments of 718306 PPIs involved 16470 proteins, the overall performance of the proposed method can achieve an accuracy of 94.35% and 0. 8871 of MCC. The prediction was validated by carrying out molecular docking and enrichment analysis, suggesting potential possibilities of real PPIs. The proposed method may provide new insights into the identification techniques of PPIs and can help improve the PPI prediction method. .
Zengyan XieXiaoya DengKunxian Shu
Elizaveta Alexandrovna BogdanovaValery NovoseletskyК. В. Шайтан