JOURNAL ARTICLE

Prediction of active sites of enzymes by maximum relevance minimum redundancy (mRMR) feature selection

Yufei GaoBiqing LiYu‐Dong CaiKaiyan FengZhandong LiYang Jiang

Year: 2012 Journal:   Molecular BioSystems Vol: 9 (1)Pages: 61-69   Publisher: Royal Society of Chemistry

Abstract

Identification of catalytic residues plays a key role in understanding how enzymes work. Although numerous computational methods have been developed to predict catalytic residues and active sites, the prediction accuracy remains relatively low with high false positives. In this work, we developed a novel predictor based on the Random Forest algorithm (RF) aided by the maximum relevance minimum redundancy (mRMR) method and incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility to predict active sites of enzymes and achieved an overall accuracy of 0.885687 and MCC of 0.689226 on an independent test dataset. Feature analysis showed that every category of the features except disorder contributed to the identification of active sites. It was also shown via the site-specific feature analysis that the features derived from the active site itself contributed most to the active site determination. Our prediction method may become a useful tool for identifying the active sites and the key features identified by the paper may provide valuable insights into the mechanism of catalysis.

Keywords:
Active site Feature selection Random forest Redundancy (engineering) False positive paradox Computer science Relevance (law) Minimum redundancy feature selection Artificial intelligence Feature (linguistics) Identification (biology) Data mining Machine learning Pattern recognition (psychology) Chemistry Enzyme Biology Biochemistry Ecology

Metrics

33
Cited By
4.35
FWCI (Field Weighted Citation Impact)
59
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Computational Drug Discovery Methods
Physical Sciences →  Computer Science →  Computational Theory and Mathematics
Machine Learning in Bioinformatics
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Protein Structure and Dynamics
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
© 2026 ScienceGate Book Chapters — All rights reserved.