JOURNAL ARTICLE

Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics

Christophe AmbroiseAlia DehmanPierre NeuvialMark HoebekeNathalie Vialaneix

Year: 2019 Journal:   Algorithms for Molecular Biology Vol: 14 (1)Pages: 22-22   Publisher: BioMed Central

Abstract

Abstract Background Genomic data analyses such as Genome-Wide Association Studies (GWAS) or Hi-C studies are often faced with the problem of partitioning chromosomes into successive regions based on a similarity matrix of high-resolution, locus-level measurements. An intuitive way of doing this is to perform a modified Hierarchical Agglomerative Clustering (HAC), where only adjacent clusters (according to the ordering of positions within a chromosome) are allowed to be merged. But a major practical drawback of this method is its quadratic time and space complexity in the number of loci, which is typically of the order of $$10^4$$ 104 to $$10^5$$ 105 for each chromosome. Results By assuming that the similarity between physically distant objects is negligible, we are able to propose an implementation of adjacency-constrained HAC with quasi-linear complexity. This is achieved by pre-calculating specific sums of similarities, and storing candidate fusions in a min-heap. Our illustrations on GWAS and Hi-C datasets demonstrate the relevance of this assumption, and show that this method highlights biologically meaningful signals. Thanks to its small time and memory footprint, the method can be run on a standard laptop in minutes or even seconds. Availability and implementation Software and sample data are available as an package, adjclust , that can be downloaded from the Comprehensive R Archive Network (CRAN).

Keywords:
Similarity (geometry) Computer science Adjacency matrix Cluster analysis Genomics Adjacency list Hierarchical clustering Matrix algebra Matrix (chemical analysis) Computational biology Artificial intelligence Data mining Theoretical computer science Genome Algorithm Biology Genetics Eigenvalues and eigenvectors Physics Materials science Graph

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
33
Refs
0.04
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Gene expression and cancer classification
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Bioinformatics and Genomic Networks
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Genomics and Chromatin Dynamics
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
© 2026 ScienceGate Book Chapters — All rights reserved.