In this research, we propose the version of K Nearest Neighbor which considers similarity among attributes for computing the similarity between feature vectors. The text segmentation task is viewed into the binary classification where each pair of sentences or paragraphs is classified into whether we put the boundary or not, and the proposed version resulted in the successful results in previous works concerned with the text categorization and clustering. In this research, we define the similarity measure based on both attributes and values, modify the KNN using it, and apply the modified version into the text segmentation task. We may expect more compact representation of data items and improved performance in the text segmentation task as well as other tasks of text mining. Therefore, the goal of this research is to implement the text segmentation system which provides the benefits.
Arild Brandrud NæssKaren LivescuRohit Prabhavalkar
Lin SunJiuxiao ZhangWeiping DingJiucheng Xu
Tzeviya Sylvia FuchsYedid HoshenYossi Keshet