JOURNAL ARTICLE

NADiA: News Articles Dataset in Arabic for Multi-Label Text Categorization

Abstract

NADiA Dataset is the largest, to the best of our knowledge, source for Arabic textual data that can be used in any NLP related task such as text classification. We chose the abbreviation NADiA as it is a common Arabic name. The data was collected by scraping ‘SkyNewsArabia’ and ‘Masrawy’ news websites using Python scripts that are fine-tuned for each website. SkyNewsArabia will be referred to as NADiA1, while the latter would be NADiA2. NADiA1 is a big dataset containing 37,445 files, while NADiA2 is a huge dataset that contains 678,563 files. However, after filtering and cleaning we reduced the numbers to 35,416 and 451,230 for NADiA 1 and 2, respectively. NADiA1 consists of the following categories (24, displayed in English for easy referencing): News, North Africa, Levant, Middle East, The Americas, Research, Finance & Economy, War & Terrorism, Gulf, Europe, Political Figures, Iran, Technology, Russia, Sports, Tennis, Football, English League, Arabian Sports, Spanish League, Health, East Asia, Environment, Other Countries NADiA2 consists of the following categories (28, displayed in English for easy referencing): Politics, Middle East, Asia, Africa, United States, Europe, Other Countries, Leaders, Sports, Arabian Sports, Football Clubs, Spanish League, Egyptian League, Finance, Arts, Cinema & TV, Fashion, Health, Pregnancy & Delivery, Cancer, Obesity, Social Media, Technology, Religion, Islamic, Fatawa, Worship, Prophet Biography

Keywords:
Categorization Arabic Natural language processing Text categorization Computer science Information retrieval Artificial intelligence Linguistics

Metrics

3
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

SANAD: Single-label Arabic News Articles Dataset for automatic text categorization

Omar EineaAshraf ElnagarRidhwan Al Debsi

Journal:   Data in Brief Year: 2019 Vol: 25 Pages: 104076-104076
JOURNAL ARTICLE

A Moroccan News Articles Dataset (MNAD) For Arabic Text Categorization

Mourad JbeneSmail TiganiRachid SaadaneAbdellah Chehri

Journal:   2021 International Conference on Decision Aid Sciences and Application (DASA) Year: 2021 Pages: 350-353
BOOK-CHAPTER

Arabic Multi-label Text Classification of News Articles

Hozayfa El RifaiLeen Al QadiAshraf Elnagar

Advances in intelligent systems and computing Year: 2021 Pages: 431-444
JOURNAL ARTICLE

PAAD: POLITICAL ARABIC ARTICLES DATASET FOR AUTOMATIC TEXT CATEGORIZATION

Dhafar HamedAhmed T. SadiqAyad R. Abbas

Journal:   Iraqi Journal for Computers and Informatics Year: 2020 Vol: 46 (1)Pages: 1-10
© 2026 ScienceGate Book Chapters — All rights reserved.