JOURNAL ARTICLE

Representation Learning for the Clustering of Multi-Omics Data

G ViaudP. MayilvahananPaul-Henry Cournède

Year: 2021 Journal:   IEEE/ACM Transactions on Computational Biology and Bioinformatics Vol: 19 (1)Pages: 135-145   Publisher: Institute of Electrical and Electronics Engineers

Abstract

The integration of several sources of data for the identification of subtypes of diseases has gained attention over the past few years. The heterogeneity and the high dimensions of the data sets calls for an adequate representation of the data. We summarize the field of representation learning for the multi-omics clustering problem and we investigate several techniques to learn relevant combined representations, using methods from group factor analysis (PCA, MFA and extensions) and from machine learning with autoencoders. We highlight the importance of appropriately designing and training the latter, notably with a novel combination of a disjointed deep autoencoder (DDAE) architecture and a layer-wise reconstruction loss. These different representations can then be clustered to identify biologically meaningful clusters of patients. We provide a unifying framework for model comparison between statistical and deep learning approaches with the introduction of a new weighted internal clustering index that evaluates how well the clustering information is retained from each source, favoring contributions from all data sets. We apply our methodology to two case studies for which previous works of integrative clustering exist, TCGA Breast Cancer and TARGET Neuroblastoma, and show how our method can yield good and well-balanced clusters across the different data sources.

Keywords:
Cluster analysis Computer science Representation (politics) Artificial intelligence Autoencoder Machine learning Feature learning Field (mathematics) Identification (biology) Data mining Clustering high-dimensional data External Data Representation Deep learning Mathematics Biology

Metrics

13
Cited By
1.07
FWCI (Field Weighted Citation Impact)
54
Refs
0.72
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Bioinformatics and Genomic Networks
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Gene expression and cancer classification
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Medical Imaging Techniques and Applications
Health Sciences →  Medicine →  Radiology, Nuclear Medicine and Imaging
© 2026 ScienceGate Book Chapters — All rights reserved.