COMPARATIVE STUDY OF K-MEANS AND K-MEANS++ CLUSTERING ALGORITHMS ON CRIME DOMAIN

Aubaidan

doi:10.60692/ccyje-1d642

ScienceGate Book Chapters

JOURNAL ARTICLE

COMPARATIVE STUDY OF K-MEANS AND K-MEANS++ CLUSTERING ALGORITHMS ON CRIME DOMAIN

Aubaidan

Year: 2014 Journal: Greater South Information System

DOI: 10.60692/ccyje-1d642

Get Full-Text PDF Get Analytical Report

Abstract

This study presents the results of an experimental study of two document clustering techniques which are kmeans and k-means++.In particular, we compare the two main approaches in crime document clustering.The drawback of k-means is that the user needs to define the centroid point.This becomes more critical when dealing with document clustering because each center point represented by a word and the calculation of distance between words is not a trivial task.To overcome this problem, a k-means++ was introduced in order to find a good initial center point.Since k-means++ has not being applied before in crime document clustering, this study presented a comparative study between k-means and k-means++ to investigate whether the initialization process in k-means++ does help to get a better results than k-means.We proposes the k-means++ clustering algorithm, to identify best seed for initial cluster centers in clustering crime document.The aim of this study is to conduct a comparative study of two main clustering algorithms, namely k-means and k-means++.The method of this study includes a preprocessing phase, which in turn involves tokeniza-tion, stop-words removal and stemming.In addition, we evaluate the impact of the two similarity/distance measures (Cosine similarity and Jaccard coefficient) on the results of the two clustering algorithms.Exper-imental results on several settings of the crime data set showed that by identifying the best seed for initial cluster centers, k-mean++ can significantly (with the significance interval at 95%) work better than k-means.These results demonstrate the accuracy of k-mean++ clustering algorithm in clustering crime doc-uments.

Keywords:

Cluster analysis Fuzzy clustering CURE data clustering algorithm Correlation clustering Jaccard index Single-linkage clustering k-medians clustering Complete-linkage clustering Set (abstract data type) Similarity (geometry)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.36

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Clustering Algorithms Research

Physical Sciences → Computer Science → Artificial Intelligence

Digital and Cyber Forensics

Physical Sciences → Computer Science → Information Systems

Authorship Attribution and Profiling

Physical Sciences → Computer Science → Artificial Intelligence

COMPARATIVE STUDY OF K-MEANS AND K-MEANS++ CLUSTERING ALGORITHMS ON CRIME DOMAIN

Abstract

Metrics

Topics

Related Documents

COMPARATIVE STUDY OF K-MEANS AND K-MEANS++ CLUSTERING ALGORITHMS ON CRIME DOMAIN

COMPARATIVE STUDY OF K-MEANS AND K-MEANS++ CLUSTERING ALGORITHMS ON CRIME DOMAIN

A comparative study of K-Means, K-Means++ and Fuzzy C-Means clustering algorithms

Comparative Analysis of K-Means and Enhanced K-Means Algorithms for Clustering

Bengali Document Clustering: A Comparative Study of K-Means, K-Means++, Spectral K-Means