A Distributed Training Approach on Email Spam Classification using DistilBERT

Dionis A. Padilla; Benjamin David P. Fernandez; Vance I. Del Rosario

doi:10.1109/icict62343.2024.00028

ScienceGate Book Chapters

JOURNAL ARTICLE

A Distributed Training Approach on Email Spam Classification using DistilBERT

Dionis A. Padilla Benjamin David P. Fernandez Vance I. Del Rosario

Year: 2024 Pages: 139-144

DOI: 10.1109/icict62343.2024.00028

Get Full-Text PDF Get Analytical Report

Abstract

The exponential rise of daily emails raises concerns about spam, which can be intrusive and harmful to user data. Effective email classification is crucial to address this issue. This study proposes a system using the DistilBERT model to identify spam and non-spam (ham) emails. We leverage distributed training with Hugging Face's Accelerate library to significantly reduce training time. Compared to a non-distributed approach, this method achieves a 46.39% reduction in training time while maintaining 96% accuracy. We recommend exploring multi-GPU training in future work for further efficiency gains.

Keywords:

Computer science Leverage (statistics) Training set Training (meteorology) Machine learning Artificial intelligence Data mining

Metrics

Cited By

1.53

FWCI (Field Weighted Citation Impact)

Refs

0.77

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Spam and Phishing Detection

Physical Sciences → Computer Science → Information Systems

Internet Traffic Analysis and Secure E-voting

Physical Sciences → Computer Science → Artificial Intelligence

Network Security and Intrusion Detection

Physical Sciences → Computer Science → Computer Networks and Communications

A Distributed Training Approach on Email Spam Classification using DistilBERT

Abstract

Metrics

Citation History

Topics

Related Documents

Email Spam Classification using DistilBERT

A Comparative Evaluation of a Multimodal Approach for Spam Email Classification Using DistilBERT and Structural Features

Email Spam Classification in a Distributed Environment

Comparative study of DistilBERT and ELECTRA-Small Models in Spam Email Classification

Email Spam Classification Using LBSVM