JOURNAL ARTICLE

Leveraging Large Language Model on Spam Email Detection and Classification

Charles Okechukwu UgwunnaIbidun Christiana ObagbuwaMayowa Samuel ObadinaIfeanyi Godspower AkawukuKingsley Ukaoha

Year: 2025 Journal:   Applied Computational Intelligence and Soft Computing Vol: 2025 (1)   Publisher: Hindawi Publishing Corporation

Abstract

Spam emails are still an immense cybersecurity problem since they frequently result in financial theft, data breaches, and a general drop in user confidence within online interaction mediums. Although conventional spam detection algorithms have been effective in the past, they frequently fail to identify the more complex and context‐aware strategies employed by contemporary spammers. In this work, we assessed how well both sophisticated large language models (LLMs) and traditional machine learning methods performed in the classification of spam communications. The Enron Spam dataset was used to train four baseline models: multinomial Naïve Bayes (MNB), K‐nearest neighbors, support vector machine (SVM), and multilayer perceptron (MLP). With an accuracy of 98.45%, MLP was the most successful conventional model among all. We also investigated the potential of a suggested GPT‐4o and a refined Bidirectional Encoder Representations from Transformers (BERT) model for spam identification. With an accuracy and F1‐score of 99.45%, the BERT model performed better than any other model, while the GPT‐4o model also produced impressive results with an accuracy of 98.77%. We verified all models on other datasets, such as SMS Spam, SpamAssassin, and Ling‐Spam, to make sure our results were not dataset specific. This made it easier to see how consistent and flexible they were with various layouts and literary approaches. The impact of activation functions in the BERT model was also investigated. Its ongoing usage in transformer‐based systems was supported by the minor performance advantage that the GELU activation function provided over ReLU. Lastly, to increase the statistical validity of our findings, we applied 5‐fold cross‐validation. All things considered, this research demonstrates how well LLMs, especially refined BERT, handle intricate linguistic structures and enhance spam detection mechanisms. However, it also demonstrates that MLP may still be a sensible option when computing assets are few.

Keywords:

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
22
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

© 2026 ScienceGate Book Chapters — All rights reserved.