JOURNAL ARTICLE

Clickbait Headline Detection in Indonesian News Sites using Robustly Optimized BERT Pre-training Approach (RoBERTa)

Abstract

The abuse of clickbait headlines by online news media has kept increasing, causing bad experiences for the reader and reducing online news reading engagement. Since the advances in self-attention model, model such as BERT and other variant has been considered to be the state-of-the-art method in many NLP (natural language processing) related tasks. In this experiment, we'll be focusing on one of BERT variants called RoBERTa which is an improved model over BERT. The goal of this experiment is to compare how well the currently available RoBERTa model perform in detecting clickbait In-donesia news headline. Using a total of 6632 annotated news headlines, sampled from the CLICK-ID dataset, we experimented with some of huggingface community's top voted RoBERTa and BERT Indonesia language models, and compare each of their performances in classifying Indonesia clickbait headlines. We evaluate the accuracy, precission, recall, and F1 score of each models and found that cahya/XLM-RoBERTa-large and indobenchmarkIndoBERT-p1 as our top-performing model in this experiment with a 92% accuracy. Resource and performance wise we recommend indobenchmark/IndoBERT-p1 as a more suitable model, however, XLM-RoBERTa-large also comes with its merit in terms of having a more consistent output across validation and unseen set. In this paper, we propose each of the model configurations along with the exploratory data analysis, preprocessing method on the data, and the training performance of the configuration of each model.

Keywords:
Headline Computer science Language model Artificial intelligence Preprocessor Set (abstract data type) Natural language processing Training set Machine learning Linguistics

Metrics

10
Cited By
3.80
FWCI (Field Weighted Citation Impact)
31
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Educational Methods and Media Use
Physical Sciences →  Computer Science →  Information Systems
Data Mining and Machine Learning Applications
Physical Sciences →  Computer Science →  Information Systems
Edcuational Technology Systems
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.