JOURNAL ARTICLE

Parallel k-Means Clustering of Geospatial Data Sets Using Manycore CPU Architectures

Abstract

The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery and mining of weather, climate, ecological, and other geoscientific data sets fused from disparate sources. Many of the standard tools used on individual workstations are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of parallelism available in state-of-the-art high-performance computing platforms can enable such analysis. Here, we describe pKluster, an open-source tool we have developed for accelerated k-means clustering of geospatial and geospatiotemporal data, and discuss algorithmic modifications and code optimizations we have made to enable it to effectively use parallel machines based on novel CPU architectures—such as the Intel “Knights Landing” Xeon Phi and Skylake Xeon processors—with many cores and hardware threads, and employing significant single instruction, multiple data (SIMD) parallelism. We outline some applicationsof the code in ecology and climate science contexts and present a detailed discussion of the performance of the code for one such application, LiDAR-derived vertical vegetation structure classification.

Keywords:
Computer science Cluster analysis Parallel computing SIMD Geospatial analysis Workstation Supercomputer Xeon Phi Compiler Data parallelism Parallelism (grammar) Code (set theory) Computer architecture Operating system Programming language Artificial intelligence Remote sensing Set (abstract data type)

Metrics

4
Cited By
0.00
FWCI (Field Weighted Citation Impact)
22
Refs
0.18
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Clustering Algorithms Research
Physical Sciences →  Computer Science →  Artificial Intelligence
Remote Sensing in Agriculture
Physical Sciences →  Environmental Science →  Ecology
Species Distribution and Climate Change
Physical Sciences →  Environmental Science →  Ecological Modeling

Related Documents

BOOK-CHAPTER

Parallel k/h-Means Clustering for Large Data Sets

Kilian StoffelAbdelkader Belkoniene

Lecture notes in computer science Year: 1999 Pages: 1451-1454
BOOK-CHAPTER

Data Decomposition for Parallel K-means Clustering

Attila Gürsoy

Lecture notes in computer science Year: 2004 Pages: 241-248
BOOK-CHAPTER

Parallel Pruning for K-Means Clustering on Shared Memory Architectures

Attila Gürsoyİlker Cengiz

Lecture notes in computer science Year: 2001 Pages: 321-325
JOURNAL ARTICLE

Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering

Zahid AnsariAsif AfzalTanvir Habib Sardar

Journal:   Journal of The Institution of Engineers (India) Series B Year: 2019 Vol: 100 (2)Pages: 95-103
© 2026 ScienceGate Book Chapters — All rights reserved.