JOURNAL ARTICLE

scPretrain: multi-task self-supervised learning for cell-type classification

Ruiyi ZhangYunan LuoJianzhu MaMing ZhangSheng Wang

Year: 2022 Journal:   Bioinformatics Vol: 38 (6)Pages: 1607-1614   Publisher: Oxford University Press

Abstract

Abstract Motivation Rapidly generated scRNA-seq datasets enable us to understand cellular differences and the function of each individual cell at single-cell resolution. Cell-type classification, which aims at characterizing and labeling groups of cells according to their gene expression, is one of the most important steps for single-cell analysis. To facilitate the manual curation process, supervised learning methods have been used to automatically classify cells. Most of the existing supervised learning approaches only utilize annotated cells in the training step while ignoring the more abundant unannotated cells. In this article, we proposed scPretrain, a multi-task self-supervised learning approach that jointly considers annotated and unannotated cells for cell-type classification. scPretrain consists of a pre-training step and a fine-tuning step. In the pre-training step, scPretrain uses a multi-task learning framework to train a feature extraction encoder based on each dataset’s pseudo-labels, where only unannotated cells are used. In the fine-tuning step, scPretrain fine-tunes this feature extraction encoder using the limited annotated cells in a new dataset. Results We evaluated scPretrain on 60 diverse datasets from different technologies, species and organs, and obtained a significant improvement on both cell-type classification and cell clustering. Moreover, the representations obtained by scPretrain in the pre-training step also enhanced the performance of conventional classifiers, such as random forest, logistic regression and support-vector machines. scPretrain is able to effectively utilize the massive amount of unlabeled data and be applied to annotating increasingly generated scRNA-seq datasets. Availability and implementation The data and code underlying this article are available in scPretrain: Multi-task self-supervised learning for cell type classification, at https://github.com/ruiyi-zhang/scPretrain and https://zenodo.org/record/5802306. Supplementary information Supplementary data are available at Bioinformatics online.

Keywords:
Computer science Artificial intelligence Cluster analysis Support vector machine Random forest Task (project management) Machine learning Supervised learning Encoder Feature (linguistics) Pattern recognition (psychology) Artificial neural network

Metrics

19
Cited By
2.22
FWCI (Field Weighted Citation Impact)
61
Refs
0.82
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Single-cell and spatial transcriptomics
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Machine Learning and ELM
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Few-Shot Classification with Multi-task Self-supervised Learning

Fan ShiRui WangSanyi ZhangXiaochun Cao

Lecture notes in computer science Year: 2021 Pages: 224-236
JOURNAL ARTICLE

Hierarchical multi-task learning with self-supervised auxiliary task

Seung‐Han LeeTaeyoung Park

Journal:   Korean Journal of Applied Statistics Year: 2024 Vol: 37 (5)Pages: 631-641
JOURNAL ARTICLE

Multi-task Self-Supervised Adaptation for Reinforcement Learning

Keyu WuZhenghua ChenMin WuShili XiangRuibing JinLe ZhangXiaoli Li

Journal:   2022 IEEE 17th Conference on Industrial Electronics and Applications (ICIEA) Year: 2022 Vol: 518 Pages: 15-20
© 2026 ScienceGate Book Chapters — All rights reserved.