Naive Bayes (NB) classification is one of the most extensively used algorithms in data mining and machine learning due to its high efficiency and structural simplicity based on conditional independence of attributes. In this paper, we present a dependence metric to quantify the dependence among attributes and class attributes and propose feature-feature significance (FFS) and feature-class significance(FCS)to discover highly predictive attributes over less predictive ones in NB classification. We show how to get feature weights from FFS and FCS and propose a novel dependent feature weighted (DFW) NB classification. To increase performance further, we recommend clustering the random sample of interest due to the non-homogeneous dependence nature of features, and then using feature weighting to alleviate the conditional independence. As a consequence, we propose a cluster-based DFW (CDFW) NB as a result of weighting the DFW filters of random sub-samples by their accuracy and then merging them for performance augmentation. The experimental results show that the NB with DFW filter provides good results when compared to the conventional NB and all other feature weighting techniques.
Liangxiao JiangLungan ZhangChaoqun LiJia Wu
Gurudatta VermaTirath Prasad Sahu
Citra Nurina PrabiantissaMaftahatul HakimahNanang Fakhrur RoziIra PuspitasariLaura Navika YamaniV Mahendra