JOURNAL ARTICLE

A Moroccan News Articles Dataset (MNAD) For Arabic Text Categorization

Mourad JbeneSmail TiganiRachid SaadaneAbdellah Chehri

Year: 2021 Journal:   2021 International Conference on Decision Aid Sciences and Application (DASA) Pages: 350-353

Abstract

In recent years Natural language processing is one of the most active areas of research especially with the emergence of deep learning algorithms. More attention has been given to Latin descendent languages e.g English, French, and Spanish given the availability of high-quality datasets and compute resources. In this paper, we present a moroccan News Articles Corpus collected from four of the major moroccan news websites. The corpus contains more than 418k news articles corresponding to 19 different categories, thus considered to be one of the largest Arabic news articles corpora. A description of the collection and processing steps were presented and exploration analysis was performed. To prove the utility of the dataset. An evaluation step was conducted in the context of text classification using four different Machine Learning baselines: Random Forest (RF), Multinomial Naive Bayes (MNB), Support Vector Machine (SVC), and Gradient Boosting (GradBoost) Classifiers. The experimental results are presented in terms of accuracy, F1-score, and confusion matrix.

Keywords:
Computer science Artificial intelligence Natural language processing Support vector machine Naive Bayes classifier Confusion matrix Categorization Context (archaeology) Arabic Random forest Gradient boosting Confusion Machine learning Linguistics Geography

Metrics

7
Cited By
0.61
FWCI (Field Weighted Citation Impact)
16
Refs
0.72
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Spam and Phishing Detection
Physical Sciences →  Computer Science →  Information Systems
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.