Improving web spam classifiers using link structure

Qingqing Gan; Torsten Suel

doi:10.1145/1244408.1244412

ScienceGate Book Chapters

JOURNAL ARTICLE

Improving web spam classifiers using link structure

Qingqing Gan Torsten Suel

Year: 2007 Pages: 17-20

DOI: 10.1145/1244408.1244412

Get Full-Text PDF Get Analytical Report

Abstract

Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, including both content spam [16, 12] and link spam [22, 13]. However, any time an anti-spam technique is developed, spammers will design new spamming techniques to confuse search engine ranking methods and spam detection mechanisms. Machine learning-based classification methods can quickly adapt to newly developed spam techniques. We describe a two-stage approach to improve the performance of common classifiers. We first implement a classifier to catch a large portion of spam in our data. Then we design several heuristics to decide if a node should be relabeled based on the preclassified result and knowledge about the neighborhood. Our experimental results show visible improvements with respect to precision and recall.

Keywords:

Spamming Spamdexing Computer science Machine learning Classifier (UML) Spambot Heuristics Search engine Precision and recall Information retrieval Data mining Ranking (information retrieval) Artificial intelligence The Internet World Wide Web Metasearch engine Web search query

Metrics

Cited By

15.79

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Spam and Phishing Detection

Physical Sciences → Computer Science → Information Systems

Text and Document Classification Technologies

Physical Sciences → Computer Science → Artificial Intelligence

Web Data Mining and Analysis

Physical Sciences → Computer Science → Information Systems

Improving web spam classifiers using link structure

Abstract

Metrics

Citation History

Topics

Related Documents

Evaluating Arabic spam classifiers using link analysis

SPAM EMAIL DETECTION USING DIFFERENT CLASSIFIERS

URL Spam Detection Using Machine Learning Classifiers

Spam message detection using machine learning classifiers

Spam Detection on Twitter Using Traditional Classifiers