JOURNAL ARTICLE

Text Categorization via Attribute Distance Weighted k-Nearest Neighbor Classification

Abstract

Text categorization entails making a decision on whether a document belongs to a set of pre-specified classes of other documents. This can be in a supervised way in classification tasks or unsupervised reminiscent of clustering related tasks. Categorization can be a challenging task especially when the discriminating words are large. K-Nearest Neighbor is an instance based learning algorithm that has proven to be effective in such classification tasks including documents. The key element of this algorithm lies in the similarity measurement principle that is capable of identifying neighbors of a particular document to high accuracies. The only drawback of this approach is in the weighting of all features to determine the distance among the documents in question. This is not only time consuming but also overuses computer resources without adding anything substantial to the overall results. In our approach (Attribute Distance Weighted - KNN), we do not make use of all features in the corpus but first extract the most relevant ones by weighting them in relation to the corpus. We then calculated the distance between the highly ranked features in the corpus alone as a representative of the entire document set. So far no known literature has inclined towards this approach thus our comparison will be in relation to the classical KNN measure. Our approach showed marginal performance in distance measure compared to classical KNN.

Keywords:
Categorization k-nearest neighbors algorithm Text categorization Computer science Pattern recognition (psychology) Artificial intelligence Nearest neighbor search Data mining

Metrics

6
Cited By
1.13
FWCI (Field Weighted Citation Impact)
9
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Spam and Phishing Detection
Physical Sciences →  Computer Science →  Information Systems
Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Text Categorization with K-Nearest Neighbor Approach

Suneetha ManneSita Kumari KothaS. Sameen Fatima

Advances in intelligent and soft computing Year: 2011 Pages: 413-420
JOURNAL ARTICLE

Binary k‐nearest neighbor for text categorization

Songbo Tan

Journal:   Online Information Review Year: 2005 Vol: 29 (4)Pages: 391-399
BOOK-CHAPTER

C × K-Nearest Neighbor Classification with Ordered Weighted Averaging Distance

Gözde UlutagayEfendi Nasıbov

Studies in computational intelligence Year: 2016 Pages: 105-122
© 2026 ScienceGate Book Chapters — All rights reserved.