ZHAO Bowen, WANG Lingjiao, GUO Hua
Naive Bayes(NB) algorithm is simple and efficient when applied to text classification,but it has a bottleneck in accuracy due to the intrinsic assumption that attribute independence and attribute importance are consistent.To solve this problem,this paper proposes a feature-weighted NB text classification algorithm based on Poisson distribution.The algorithm combines the Poisson distribution model with the NB algorithm,and the Poisson random variable is introduced into the weight of feature words.On this basis,the Information Gain Ratio(IGR) is defined to weigh the feature words of texts,and thus the effects of the attribute independence assumption of traditional algorithms can be reduced.Experimental results on the 20-newsgroups data set show that,compared with NB algorithm and its improved algorithms RW,C-MNB and CFSNB,this algorithm can improve the accuracy rate,recall rate and F1 value of text classification.Meanwhile,its execution efficiency is higher than K-Nearest Neighbor(KNN) algorithm and Support Vector Machine(SVM) algorithm.
Sang‐Bum KimHee-Cheol SeoHae‐Chang Rim