Mourad JbeneSmail TiganiRachid SaadaneAbdellah Chehri
In recent years Natural language processing is one of the most active areas of research especially with the emergence of deep learning algorithms. More attention has been given to Latin descendent languages e.g English, French, and Spanish given the availability of high-quality datasets and compute resources. In this paper, we present a moroccan News Articles Corpus collected from four of the major moroccan news websites. The corpus contains more than 418k news articles corresponding to 19 different categories, thus considered to be one of the largest Arabic news articles corpora. A description of the collection and processing steps were presented and exploration analysis was performed. To prove the utility of the dataset. An evaluation step was conducted in the context of text classification using four different Machine Learning baselines: Random Forest (RF), Multinomial Naive Bayes (MNB), Support Vector Machine (SVC), and Gradient Boosting (GradBoost) Classifiers. The experimental results are presented in terms of accuracy, F1-score, and confusion matrix.
Omar EineaAshraf ElnagarRidhwan Al Debsi
Dhafar HamedAhmed T. SadiqAyad R. Abbas
Mehmet Fatih AmasyalıT. Yildrum
Muhammad Swaileh A. AlzaidiAlya AlshammariAbdulkhaleq Q. A. HassanShouki A. EbadHanan Al SultanMohammed AlliheediAli Abdulaziz AljubailanKhadija Abdullah Alzahrani