Web text classification is the process of determine the text types automatically under a given classification, according to the text content. Web text categorization system is the use of machine learning, knowledge engineering and other related fields of knowledge, access to the web on the text, after text preprocessing, Chinese word segmentation and training classifier, using classification algorithm to implement automatic classification. This paper designed a web of Chinese text categorization system model and system tested, experimental results show that the classification system of the web text categorization with two main characteristics which are efficiency and accuracy.
Christopher D. ManningPrabhakar RaghavanHinrich Schütze
Qiaowei JiangWen WangHan XuShasha ZhangXinyan WangCong Wang