Data clustering is a fundamental machine learning task found in many real-world applications. However, real data usually contain noise or outliers. Handling outliers in a clustering algorithm can improve the clustering accuracy. In this paper, we propose a variant of the k-means algorithm to provide data clustering and outlier detection simultaneously. In the proposed algorithm, outlier detection is integrated with the clustering process and is achieved via a term added to the objective function of the k-means algorithm. The proposed algorithm generates two partition matrices: one provides cluster groups and the other can be used to detect outliers. We use both synthetic data and real data to demonstrate the effectiveness and efficiency of the proposed algorithm and show that the clustering performance of the proposed approach is better than other, similar methods.
Dajiang LeiQingsheng ZhuJun ChenHai Xiang LinPeng Yang
Gunasekar ThangarasuKesava Rao AllaK Nattar Kannan
Wenfen LiuNan WangYuehua Huang