Wuzhou Dong -Jingyan Cui -Haitao HeJiadong Ren
Clustering algorithm based on grid and density has many excellent features. But for the highdimensional data stream, the number of grids will be increased sharply as the space dimensionality grows. To solve the defect, we propose GDH-Stream, a clustering method based on the effective dimension and grid density for high-dimensional data stream, which consists of an online component and an offline component. First, we define the effective value of dimension and give the partition method of the projection intervals for each dimension. Then the effective values of dimensions, which will be ranked in decreasing order, can be calculated. After that, the subset of dimensions will be chosen to generate subspace. On the online component, with the arrival of the data stream, GDHStream will map each data to the original grid structure. On the offline component, when a clustering request arrives, the subspace will be generated by the effective values of dimensions. Then the original grid structure is projected to the subspace and the new grid structure will be formed. Moreover, the clustering will be performed on the new grid structure according to the connection of the density grids. Experimental results show that the GDH-Stream has better clustering quality and efficiency. Meanwhile, GDH-Stream has strong scalability for clustering data stream.
Hou GuibinRui-Xia YaoJiadong RenChangzhen Hu
Irene NtoutsiArthur ZimekThemis PalpanasPeer KrögerHans‐Peter Kriegel
Jiadong RenBinlei CaiChangzhen Hu