Classification aims to define an abstract model of a set of classes, called classifier, which is built from a set of labeled data, the training set. However, in large or correlated data sets, association rule mining may yield huge rule sets. Hence several pruning techniques have been proposed to select a small subset of high-quality rules. Since the availability of a ldquorichrdquo rule set may improve the accuracy of the classifier, we argue that rule pruning should be reduced to a minimum. A small subset of high-quality rules is first considered. When this set is not able to classify the data, a larger rule set is exploited. This second set includes rules usually discarded by previous approaches. To cope with the need of mining large rule sets and to efficiently use them for classification, a compact form is proposed to represent a complete rule set in a space-efficient way and without information loss. An extensive experimental evaluation on real and synthetic data sets shows that improves the classification accuracy with respect to previous approaches.
Amanjeet ThakurShefali Singh ThakurDeepali Thakur Deepali ThakurKrishna BalanSakti KarthigaPriyaWang FengR AgrawalT ImielinskiA Swami