The exponential rise of daily emails raises concerns about spam, which can be intrusive and harmful to user data. Effective email classification is crucial to address this issue. This study proposes a system using the DistilBERT model to identify spam and non-spam (ham) emails. We leverage distributed training with Hugging Face's Accelerate library to significantly reduce training time. Compared to a non-distributed approach, this method achieves a 46.39% reduction in training time while maintaining 96% accuracy. We recommend exploring multi-GPU training in future work for further efficiency gains.
Vance I. Del RosarioBenjamin David P. FernandezDionis A. Padilla
Halim AsliyuksekÖzgür TonkalRamazan Kocaoğlu
Lehan ZhangXiaoyu JiangQinyuan ZhengKai GuoDongqi LiuShuai Xie