JOURNAL ARTICLE

Semi-Supervised Learning for Classification of Protein Sequence Data

Brian R. KingChittibabu Guda

Year: 2008 Journal:   Scientific Programming Vol: 16 (1)Pages: 5-29   Publisher: Hindawi Publishing Corporation

Abstract

Protein sequence data continue to become available at an exponential rate. Annotation of functional and structural attributes of these data lags far behind, with only a small fraction of the data understood and labeled by experimental methods. Classification methods that are based on semi-supervised learning can increase the overall accuracy of classifying partly labeled data in many domains, but very few methods exist that have shown their effect on protein sequence classification. We show how proven methods from text classification can be applied to protein sequence data, as we consider both existing and novel extensions to the basic methods, and demonstrate restrictions and differences that must be considered. We demonstrate comparative results against the transductive support vector machine, and show superior results on the most difficult classification problems. Our results show that large repositories of unlabeled protein sequence data can indeed be used to improve predictive performance, particularly in situations where there are fewer labeled protein sequences available, and/or the data are highly unbalanced in nature.

Keywords:
Computer science Sequence (biology) Artificial intelligence Support vector machine Annotation Labeled data Machine learning Semi-supervised learning Supervised learning Protein sequencing Pattern recognition (psychology) Peptide sequence Biology Artificial neural network

Metrics

8
Cited By
0.14
FWCI (Field Weighted Citation Impact)
70
Refs
0.51
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Machine Learning in Bioinformatics
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Genomics and Phylogenetic Studies
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
RNA and protein synthesis mechanisms
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology

Related Documents

JOURNAL ARTICLE

Semi-Supervised Learning for Classification of Protein Sequence Data

Brian R. KingChittibabu Guda

Journal:   DOAJ (DOAJ: Directory of Open Access Journals) Year: 2008
JOURNAL ARTICLE

Semi-supervised Sequence Learning

Andrew M. DaiQuoc V. Le

Journal:   arXiv (Cornell University) Year: 2015 Vol: 28 Pages: 3079-3087
JOURNAL ARTICLE

SEMI-SUPERVISED SEQUENCE CLASSIFICATION WITH HMMs

Shi Zhong

Journal:   International Journal of Pattern Recognition and Artificial Intelligence Year: 2005 Vol: 19 (02)Pages: 165-182
© 2026 ScienceGate Book Chapters — All rights reserved.