The paper proposed a soft computing approach to solve document clustering problem. Document clustering is a specialized clustering problem in which textual documents autonomously segregated to a number of identifiable, subject homogenous and smaller sub-collections (also called clusters). Identifying implicit textual patterns within the documents is a challenging aspect as there can be thousands of such textual features. Partition clustering algorithm like k-means is mainly used for this problem. There are several drawbacks in k-means algorithm such as (i) initial seeds dependency, and (ii) it traps into local optimal solution. Although every k-means solution may contain some good partial arrangements for clustering. Meta-heuristic algorithm like Black Hole (BH) uses certain trade-off of randomization and local search for finding the optimal and near optimal solution. Our motivation comes from the fact that meta-heuristic optimization can quickly produce a global optimal solution using random k-means initial solution. The contributions from this research are (i) an implementation of black hole algorithm using k-mean as embedding (ii) The phenomena of global search and local search optimization are used as parameters adjustments. A series of experiments are performed with our proposed method on standard text mining datasetslike: (i) NEWS20, (ii) Reuters and (iii) WebKB and results are evaluated on Purity and Silhouette Index. In comparison the proposed method outperforms the basic k-means, GA with k-means embedding and quickly converges to global or near global optimal solution.
Soldatenko, A.A.Semenova, D.V.Ibragimova, E.I.
Mohit Kumar KakkarGourav GuptaNeha GargJajji Singla
Antono AdhiBudi SantosaNurhadi Siswanto
Alfian FaizSubiyanto SubiyantoUlfah Mediaty Arief
Majid YousefikhoshbakhtEsmaile Khorram