JOURNAL ARTICLE

Clustering over High-Dimensional Data Streams Based on Grid Density and Effective Dimension

Wuzhou Dong -Jingyan Cui -Haitao HeJiadong Ren

Year: 2011 Journal:   International Journal of Advancements in Computing Technology Vol: 3 (8)Pages: 154-162   Publisher: The International Association for Information, Culture, Human and Industry Technology (AICIT)

Abstract

Clustering algorithm based on grid and density has many excellent features. But for the highdimensional data stream, the number of grids will be increased sharply as the space dimensionality grows. To solve the defect, we propose GDH-Stream, a clustering method based on the effective dimension and grid density for high-dimensional data stream, which consists of an online component and an offline component. First, we define the effective value of dimension and give the partition method of the projection intervals for each dimension. Then the effective values of dimensions, which will be ranked in decreasing order, can be calculated. After that, the subset of dimensions will be chosen to generate subspace. On the online component, with the arrival of the data stream, GDHStream will map each data to the original grid structure. On the offline component, when a clustering request arrives, the subspace will be generated by the effective values of dimensions. Then the original grid structure is projected to the subspace and the new grid structure will be formed. Moreover, the clustering will be performed on the new grid structure according to the connection of the density grids. Experimental results show that the GDH-Stream has better clustering quality and efficiency. Meanwhile, GDH-Stream has strong scalability for clustering data stream.

Keywords:
Computer science Cluster analysis Dimension (graph theory) Grid Data stream mining Data mining STREAMS Clustering high-dimensional data Artificial intelligence Geology Mathematics

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
10
Refs
0.19
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Data Stream Mining Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Clustering Algorithms Research
Physical Sciences →  Computer Science →  Artificial Intelligence
Complex Network Analysis Techniques
Physical Sciences →  Physics and Astronomy →  Statistical and Nonlinear Physics

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.