Adjacency-constrained hierarchical clustering of a band similarity\n matrix with application to Genomics

Christophe Ambroise; Alia Dehman; Pierre Neuvial; Guillem Rigaill; Nathalie Vialaneix

ScienceGate Book Chapters

JOURNAL ARTICLE

Adjacency-constrained hierarchical clustering of a band similarity\n matrix with application to Genomics

Christophe Ambroise Alia Dehman Pierre Neuvial Guillem Rigaill Nathalie Vialaneix

Year: 2019 Journal: arXiv (Cornell University) Publisher: Cornell University

Get Full-Text PDF Get Analytical Report

Abstract

Motivation: Genomic data analyses such as Genome-Wide Association Studies\n(GWAS) or Hi-C studies are often faced with the problem of partitioning\nchromosomes into successive regions based on a similarity matrix of\nhigh-resolution, locus-level measurements. An intuitive way of doing this is to\nperform a modified Hierarchical Agglomerative Clustering (HAC), where only\nadjacent clusters (according to the ordering of positions within a chromosome)\nare allowed to be merged. A major practical drawback of this method is its\nquadratic time and space complexity in the number of loci, which is typically\nof the order of 10^4 to 10^5 for each chromosome. Results: By assuming that the\nsimilarity between physically distant objects is negligible, we propose an\nimplementation of this adjacency-constrained HAC with quasi-linear complexity.\nOur illustrations on GWAS and Hi-C datasets demonstrate the relevance of this\nassumption, and show that this method highlights biologically meaningful\nsignals. Thanks to its small time and memory footprint, the method can be run\non a standard laptop in minutes or even seconds. Availability and\nImplementation: Software and sample data are available as an R package,\nadjclust, that can be downloaded from the Comprehensive R Archive Network\n(CRAN).\n

Keywords:

Computer science Cluster analysis Similarity (geometry) Adjacency list Adjacency matrix Edit distance Algorithm Data mining Artificial intelligence Theoretical computer science Graph

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Gene expression and cancer classification

Life Sciences → Biochemistry, Genetics and Molecular Biology → Molecular Biology

Bioinformatics and Genomic Networks

Life Sciences → Biochemistry, Genetics and Molecular Biology → Molecular Biology

Genomics and Chromatin Dynamics

Life Sciences → Biochemistry, Genetics and Molecular Biology → Molecular Biology

Adjacency-constrained hierarchical clustering of a band similarity\n matrix with application to Genomics

Abstract

Metrics

Citation History

Topics

Related Documents

Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics

MOESM1 of Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics

MOESM1 of Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics

Multi-View Adjacency-Constrained Hierarchical Clustering

Enhanced Adjacency-Constrained Hierarchical Clustering Using Fine-Grained Pseudo Labels