JOURNAL ARTICLE

Unsupervised document classification using sequential information maximization

Noam SlonimNir FriedmanNaftali Tishby

Year: 2002 Journal:   Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02

Abstract

We present a novel sequential clustering algorithm which is motivated by the Information Bottleneck (IB) method. In contrast to the agglomerative IB algorithm, the new sequential (sIB) approach is guaranteed to converge to a local maximum of the information with time and space complexity typically linear in the data size. information, as required by the original IB principle. Moreover, the time and space complexity are significantly improved. We apply this algorithm to unsupervised document classification. In our evaluation, on small and medium size corpora, the sIB is found to be consistently superior to all the other clustering methods we examine, typically by a significant margin. Moreover, the sIB results are comparable to those obtained by a supervised Naive Bayes classifier. Finally, we propose a simple procedure for trading cluster's recall to gain higher precision, and show how this approach can extract clusters which match the existing topics of the corpus almost perfectly.

Keywords:
Computer science Cluster analysis Information bottleneck method Artificial intelligence Naive Bayes classifier Margin (machine learning) Bottleneck Pattern recognition (psychology) Classifier (UML) Minimum description length Precision and recall Machine learning Data mining Support vector machine

Metrics

21
Cited By
0.62
FWCI (Field Weighted Citation Impact)
0
Refs
0.72
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Data Mining Algorithms and Applications
Physical Sciences →  Computer Science →  Information Systems

Related Documents

JOURNAL ARTICLE

Unsupervised Sequence Classification using Sequential Output Statistics

Yu LiuJianshu ChenLi Deng

Journal:   arXiv (Cornell University) Year: 2017 Vol: 30 Pages: 3550-3559
JOURNAL ARTICLE

Features for unsupervised document classification

Shyam Srinivasan

Year: 2002 Vol: 20 Pages: 1-7
JOURNAL ARTICLE

Classification Certainty Maximization for Unsupervised Domain Adaptation

Zhiqi YuJingjing LiLei ZhuKe LüHeng Tao Shen

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2023 Vol: 33 (8)Pages: 4232-4243
JOURNAL ARTICLE

Movie Subtitle Document Classification Using Unsupervised Machine Learning Approach

Md. Mehedi HasanSadia Tamim DipToshiba KamruzzamanSonia AkterImrus Salehin

Journal:   2021 IEEE 6th International Conference on Computing, Communication and Automation (ICCCA) Year: 2021 Vol: 218 Pages: 219-224
© 2026 ScienceGate Book Chapters — All rights reserved.