Phishing is a social engineering technique that is commonly used to deceive users in an attempt to obtain sensitive information such as username, passwords or credit card details. While there was extensive research on machine learning-based phishing detection, some prior works proposed a large number of features and not all of them are feasible to extract for real-time detection. This work combined two datasets with 30 and 48 features respectively, to identify 18 common features. Moreover, feature selection was conducted to identify 13 optimal features for a more robust model. A comparison with prior research works on the same datasets showed that the best models built on all features using the random forest algorithm scored lower on the 30 feature dataset, and achieved better performance on the 48 features dataset. The best model on the 13 features achieved an accuracy of 0.937.
Sumitra Das GupttaKhandaker Tayef ShahriarHamed AlqahtaniDheyaaldin AlsalmanIqbal H. Sarker
Pallavi M. BhagatSurendra WaghmareManisha WajeRupali B. PatilKavita JoshiMeeta Bakuli
Ebubekir BuberÖnder DemirÖzgür Koray Şahingöz
Routhu Srinivasa RaoAlwyn Roshan Pais
V. RamalingamParas YadavPrakhar Srivastava