JOURNAL ARTICLE

A Distributed Training Approach on Email Spam Classification using DistilBERT

Abstract

The exponential rise of daily emails raises concerns about spam, which can be intrusive and harmful to user data. Effective email classification is crucial to address this issue. This study proposes a system using the DistilBERT model to identify spam and non-spam (ham) emails. We leverage distributed training with Hugging Face's Accelerate library to significantly reduce training time. Compared to a non-distributed approach, this method achieves a 46.39% reduction in training time while maintaining 96% accuracy. We recommend exploring multi-GPU training in future work for further efficiency gains.

Keywords:
Computer science Leverage (statistics) Training set Training (meteorology) Machine learning Artificial intelligence Data mining

Metrics

1
Cited By
1.53
FWCI (Field Weighted Citation Impact)
20
Refs
0.77
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Spam and Phishing Detection
Physical Sciences →  Computer Science →  Information Systems
Internet Traffic Analysis and Secure E-voting
Physical Sciences →  Computer Science →  Artificial Intelligence
Network Security and Intrusion Detection
Physical Sciences →  Computer Science →  Computer Networks and Communications

Related Documents

JOURNAL ARTICLE

Email Spam Classification in a Distributed Environment

K. RenukaP. Visalakshi

Journal:   Asian Journal of Research in Social Sciences and Humanities Year: 2017 Vol: 7 (1)Pages: 950-950
JOURNAL ARTICLE

Comparative study of DistilBERT and ELECTRA-Small Models in Spam Email Classification

Ferdy Agusman

Journal:   Jurnal Informatika Year: 2025 Vol: 12 (2)Pages: 113-121
© 2026 ScienceGate Book Chapters — All rights reserved.