Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, including both content spam [16, 12] and link spam [22, 13]. However, any time an anti-spam technique is developed, spammers will design new spamming techniques to confuse search engine ranking methods and spam detection mechanisms. Machine learning-based classification methods can quickly adapt to newly developed spam techniques. We describe a two-stage approach to improve the performance of common classifiers. We first implement a classifier to catch a large portion of spam in our data. Then we design several heuristics to decide if a node should be relabeled based on the preclassified result and knowledge about the neighborhood. Our experimental results show visible improvements with respect to precision and recall.
Heider A. WahshehMohammed N. Al‐KabiIzzat Alsmadi
Ayushi TiwariSaumya KushwahRitika KhanduriNahid FatimaGurpreet Kaur
Omar AlmomaniAdeeb AlsaaidahMosleh M. AbualhajMohammed Amin AlmaiahAmmar AlmomaniShahzad Memon
Potharlanka Jhansi LakshmiBharath Kumar NarukullapatiDokku Siva Naga Malleswara RaoBharath Reddy SaddaSunil PrakashKirill Epifantsev