JOURNAL ARTICLE

Ensemble Learning Approach for Clickbait Detection Using Article Headline Features

Dilip Singh Sisodia

Year: 2019 Journal:   Informing Science The International Journal of an Emerging Transdiscipline Vol: 22 Pages: 031-044   Publisher: Informing Science Institute

Abstract

Aim/Purpose: The aim of this paper is to propose an ensemble learners based classification model for classification clickbaits from genuine article headlines. Background: Clickbaits are online articles with deliberately designed misleading titles for luring more and more readers to open the intended web page. Clickbaits are used to tempted visitors to click on a particular link either to monetize the landing page or to spread the false news for sensationalization. The presence of clickbaits on any news aggregator portal may lead to an unpleasant experience for readers. Therefore, it is essential to distinguish clickbaits from authentic headlines to mitigate their impact on readers’ perception. Methodology: A total of one hundred thousand article headlines are collected from news aggregator sites consists of clickbaits and authentic news headlines. The collected data samples are divided into five training sets of balanced and unbalanced data. The natural language processing techniques are used to extract 19 manually selected features from article headlines. Contribution: Three ensemble learning techniques including bagging, boosting, and random forests are used to design a classifier model for classifying a given headline into the clickbait or non-clickbait. The performances of learners are evaluated using accuracy, precision, recall, and F-measures. Findings: It is observed that the random forest classifier detects clickbaits better than the other classifiers with an accuracy of 91.16 %, a total precision, recall, and f-measure of 91 %.

Keywords:
Headline News aggregator Computer science Random forest Ensemble learning Boosting (machine learning) Classifier (UML) Artificial intelligence Recall Machine learning Precision and recall Natural language processing Information retrieval World Wide Web Advertising Linguistics

Metrics

20
Cited By
6.06
FWCI (Field Weighted Citation Impact)
3
Refs
0.96
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Misinformation and Its Impacts
Social Sciences →  Social Sciences →  Sociology and Political Science
Online Learning and Analytics
Physical Sciences →  Computer Science →  Computer Science Applications
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.