JOURNAL ARTICLE

Optimized tree-classification algorithm for classification of protein sequences

Abstract

Computational intelligence is an ongoing area of research, which has been successfully utilized in the analysis and modeling of the tremendous amount of biological data accumulated under different high throughput genome sequencing projects. The data gathered is mainly comprised of DNA, RNA and protein sequences, which are imprecise, incomplete and increasing exponentially. Classification of protein sequences into different superfamilies could be helpful for knowing the structure/function or hidden characteristics of an unknown protein sequence. The problem of classifying protein sequences based on the primary sequence information is a very complex and challenging task in the analysis and understanding of sequenced data. The existing classification methods are performing well on a very limited data; however the rapid increase in the genomic data leads to the development of improved computational methods. In this work, we have proposed an optimized tree-classification technique which uses cluster k nearest neighbor classification algorithm to classify protein sequences into superfamilies. The proposed technique is alignment free and the experimental results reveal that it outperforms than the previous state-of-the-art approaches. The overall best classification accuracy achieved is 97–98% on the previously utilized dataset, which is taken from the well-known UniProtKB database.

Keywords:
UniProt Computer science Protein function prediction Data mining Protein sequencing Tree (set theory) Pattern recognition (psychology) Artificial intelligence Statistical classification Task (project management) Sequence (biology) Protein function Gene Mathematics Biology Peptide sequence

Metrics

2
Cited By
0.00
FWCI (Field Weighted Citation Impact)
17
Refs
0.16
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Machine Learning in Bioinformatics
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Genomics and Phylogenetic Studies
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Algorithms and Data Compression
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Pseudorandom Sequences Classification Algorithm

A. A. SpirinAlexander Kozachok

Journal:   Journal of Science and Technology on Information security Year: 2021 Vol: 2 (12)Pages: 3-10
JOURNAL ARTICLE

Document Classification of Protein Sequences

Betty Yee Man ChengJaime G. CarbonellJudith Klein‐Seetharaman

Journal:   OPAL (Open@LaTrobe) (La Trobe University) Year: 2003
JOURNAL ARTICLE

Protein Classification Prediction Based on Sparrow Search Algorithm Optimized Random Forest Algorithm

Qianqian Zhou

Journal:   Journal of Physics Conference Series Year: 2025 Vol: 3072 (1)Pages: 012018-012018
© 2026 ScienceGate Book Chapters — All rights reserved.