Kehua YangHeqing Gao -Lin ChenQiong Yuan
Due to the data stream is real-time, fast, unlimited, one-pass, clustering data stream requires algorithms which are capable to process the data stream in the limited time and memory. In this paper, we propose a clustering algorithm based on the improved similarity search tree (SSMC-Tree), and introduce buffer, hitchhike processing and local aggregation strategy, it can adapt to different speed data stream. We adopt an outlier processing mechanism by introducing potential core-micro-cluster buffer and outlier micro-cluster buffer to process noise in the data stream. Experimental results show that our algorithm can adapt to the high-speed data stream with noise.
Shifei DingJian ZhangHongjie JiaJun Qian
Yanni LiHui LiZhi WangBing LiuJiangtao CuiHang Fei
Liang GuWanli MaYi AnDacheng Huang