The main problems in Web Pages classification are lack of labeled data, as well as the cost of labeling the unlabeled data. In this paper we discuss the application of semi-supervised machine learning method co-training on classification of Deep Web query interfaces to boost the performance of a classifier. Then, Bayes and Maxim Entropy algorithm are co-operated to incorporate labeled data with unlabeled data in training process incrementally. Our experiment results show the novel approach has a promising performance.
Feng ZhaoLin LiuLu ZhangHanqiang LiuYan-Yang Cheng
Jizong PengGuillermo EstradaMarco PedersoliChristian Desrosiers
Siyuan QiaoWei ShenZhishuai ZhangBo WangAlan Yuille
Bhaskarjyoti DasHarshith Mohan KumarJ. YooMohammed Zayd Jamadar