JOURNAL ARTICLE

Features for unsupervised document classification

Abstract

Unsupervised document classification is an important problem in practical text mining since training data is seldom available. In this paper we study the problem of term selection and the performance of various features for unsupervised text classification. The features studied are: principal components, independent components, and non-negative components. The clustering algorithm used is based on bipartite graph partitioning (Zha et al., 2001). The evaluation is performed using the newsgroups corpus.

Keywords:
Computer science Artificial intelligence Document classification Bipartite graph Cluster analysis Document clustering Unsupervised learning Pattern recognition (psychology) Principal component analysis Graph Selection (genetic algorithm) Machine learning Data mining Theoretical computer science

Metrics

7
Cited By
0.37
FWCI (Field Weighted Citation Impact)
14
Refs
0.65
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Algorithms and Data Compression
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Novel Unsupervised Features for Czech Multi-label Document Classification

Tomáš BrychcínPavel Král

Lecture notes in computer science Year: 2014 Pages: 70-79
BOOK-CHAPTER

Unsupervised Document Classification and Topic Detection

Jaromír NovotnýPavel Ircing

Lecture notes in computer science Year: 2017 Pages: 748-756
BOOK-CHAPTER

The Benefit of Document Embedding in Unsupervised Document Classification

Jaromír NovotnýPavel Ircing

Lecture notes in computer science Year: 2018 Pages: 470-478
JOURNAL ARTICLE

Unsupervised document classification using sequential information maximization

Noam SlonimNir FriedmanNaftali Tishby

Journal:   Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02 Year: 2002
© 2026 ScienceGate Book Chapters — All rights reserved.