JOURNAL ARTICLE

Web Text Categorization for Large-scale Corpus

Abstract

Corpus is the set of language materials which are stored in computers and can use computers to search, query and analyze for enterprise decision-makers. Automated text categorization has been extensively studied and various techniques for document categorization. But based on the current scarcity of Chinese corpus, especially in the field of text categorization, the Chinese categorization corpus is especially rare; Besides, most of these experimental prototypes, for the purpose of evaluating different techniques, have been restricted to the heterogeneous, autonomic, dynamic and distributed internet environment. This paper proposes and realizes a kind of incremental learning algorithm on large-scale corpus for Chinese text categorization. In this study, an approach based on Support Vector Machines (SVMs) for web text mining of large-scale systems on GBODSS is developed to support enterprise decision making. Experimental results show that our approach has good classification accuracy by incremental learning and it shows speed up of computation time is almost super linear.

Keywords:
Computer science Categorization Text categorization Artificial intelligence The Internet Support vector machine Field (mathematics) Natural language processing Set (abstract data type) Text corpus Scale (ratio) Machine learning Computation Information retrieval Data mining World Wide Web

Metrics

9
Cited By
2.00
FWCI (Field Weighted Citation Impact)
6
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Semi-supervised learning in large scale text categorization

Zewen XuJianqiang LiBo LiuJing BiRong LiRui Mao

Journal:   Journal of Shanghai Jiaotong University (Science) Year: 2017 Vol: 22 (3)Pages: 291-302
JOURNAL ARTICLE

Exploiting semantic resources for large scale text categorization

Jian Qiang LiYu ZhaoBo Liu

Journal:   Journal of Intelligent Information Systems Year: 2012 Vol: 39 (3)Pages: 763-788
JOURNAL ARTICLE

Large-Scale Bayesian Logistic Regression for Text Categorization

Alexander GenkinDavid LewisDavid Madigan

Journal:   Technometrics Year: 2007 Vol: 49 (3)Pages: 291-304
© 2026 ScienceGate Book Chapters — All rights reserved.