Data stream clustering is an important task in data stream mining. In this paper, we propose SDStream, a new method for performing density-based data streams clustering over sliding windows. SDStream adopts CluStream clustering framework. In the online component, the potential core-micro-cluster and outlier micro-cluster structures are introduced to maintain the potential clusters and outliers. They are stored in the form of exponential histogram of cluster feature (EHCF) in main memory and are maintained by the maintenance of EHCFs. Outdated micro-clusters which need to be deleted are found by the value of t in temporal cluster feature (TCF). In the offline component, the final clusters of arbitrary shape are generated according to all the potential core-micro-clusters maintained online by DBSCAN algorithm. Experimental results show that SDStream which can generate clusters of arbitrary shape has a much higher clustering quality than CluStream which generates spherical clusters.
Jiadong RenShiyuan CaoChangzhen Hu
K. Shyam Sunder ReddyC. Shoba Bindu
Ta Minh ThuyHoai An Le ThiLydia Boudjeloud-Assala
Jonghem YounJihun ChoiJunho ShimSang‐Goo Lee