JOURNAL ARTICLE

Probability Density Grid-based Online Clustering for Uncertain Data Streams

Haitao HeLijuan ChenJiadong RenWenyan Guo

Year: 2011 Journal:   INTERNATIONAL JOURNAL ON Advances in Information Sciences and Service Sciences Vol: 3 (8)Pages: 204-211

Abstract

Most existing stream clustering algorithms adopt the online component and offline component. The disadvantage of two-phase algorithms is that they can not generate the final clusters online and the accurate clustering results need to be got through the offline analysis. Furthermore, the clustering algorithms for uncertain data streams are incompetent to find clusters of arbitrary shapes according to the varieties of uncertain data streams. To address this issue, this paper proposes a novel algorithm PDG-OCUStream, Probability Density Grid-based Online Clustering for Uncertain Data Streams, in which the summary information of uncertain data streams is stored in the probability density grid with relative statistical values. By setting the probability density threshold, clustering quality can be effectively controlled, and probability density grid structure is easy to be maintained and updated, so it can improve the efficiency of online clustering. In this paper we also use the count-based sliding window, which reflects the current situation of the uncertain data stream. System resources can be effectively saved by adjusting the step of sliding window. In addition, this paper defines grid probability density similarity to achieve initializing and updating clusters according to merging connected probability density grids, so the algorithm can distinguish between dense regions and sparse regions, and quickly find the clusters in the data distribution in real time. The experimental results show that PDG-OCUStream algorithm has fast online clustering capability while ensuring a good clustering quality.

Keywords:
Cluster analysis Data stream clustering Computer science Data mining Data stream mining CURE data clustering algorithm Sliding window protocol Grid Correlation clustering Fuzzy clustering Probability distribution Constrained clustering Determining the number of clusters in a data set Canopy clustering algorithm Data stream Artificial intelligence Window (computing) Mathematics Statistics

Metrics

1
Cited By
0.00
FWCI (Field Weighted Citation Impact)
12
Refs
0.19
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Stream Mining Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Clustering Algorithms Research
Physical Sciences →  Computer Science →  Artificial Intelligence
Caching and Content Delivery
Physical Sciences →  Computer Science →  Computer Networks and Communications
© 2026 ScienceGate Book Chapters — All rights reserved.