JOURNAL ARTICLE

Improving web spam classifiers using link structure

Abstract

Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, including both content spam [16, 12] and link spam [22, 13]. However, any time an anti-spam technique is developed, spammers will design new spamming techniques to confuse search engine ranking methods and spam detection mechanisms. Machine learning-based classification methods can quickly adapt to newly developed spam techniques. We describe a two-stage approach to improve the performance of common classifiers. We first implement a classifier to catch a large portion of spam in our data. Then we design several heuristics to decide if a node should be relabeled based on the preclassified result and knowledge about the neighborhood. Our experimental results show visible improvements with respect to precision and recall.

Keywords:
Spamming Spamdexing Computer science Machine learning Classifier (UML) Spambot Heuristics Search engine Precision and recall Information retrieval Data mining Ranking (information retrieval) Artificial intelligence The Internet World Wide Web Metasearch engine Web search query

Metrics

52
Cited By
15.79
FWCI (Field Weighted Citation Impact)
31
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Spam and Phishing Detection
Physical Sciences →  Computer Science →  Information Systems
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems
© 2026 ScienceGate Book Chapters — All rights reserved.