Sentence Classification Using N-Grams in Urdu Language Text

Malik Daler Ali Awan; Sikandar Ali; Ali Samad; Nadeem Iqbal; Malik Muhammad Saad Missen; Niamat Ullah

doi:10.1155/2021/1296076

ScienceGate Book Chapters

JOURNAL ARTICLE

Sentence Classification Using N-Grams in Urdu Language Text

Malik Daler Ali Awan Sikandar Ali Ali Samad Nadeem Iqbal Malik Muhammad Saad Missen Niamat Ullah

Year: 2021 Journal: Scientific Programming Vol: 2021 Pages: 1-11 Publisher: Hindawi Publishing Corporation

DOI: 10.1155/2021/1296076

Get Full-Text PDF Get Analytical Report

Abstract

The usage of local languages is being common in social media and news channels. The people share the worthy insights about various topics related to their lives in different languages. A bulk of text in various local languages exists on the Internet that contains invaluable information. The analysis of such type of stuff (local language’s text) will certainly help improve a number of Natural Language Processing (NLP) tasks. The information extracted from local languages can be used to develop various applications to add new milestone in the field of NLP. In this paper, we presented an applied research task, “multiclass sentence classification for Urdu language text at sentence level existing on the social networks, i.e., Twitter, Facebook, and news channels by using N-grams features.” Our dataset consists of more than 1,00000 instances of twelve (12) different types of topics. A famous machine learning classifier Random Forest is used to classify the sentences. It showed 80.15%, 76.88%, and 64.41% accuracy for unigram, bigram, and trigram features, respectively.

Keywords:

Computer science Natural language processing Bigram Artificial intelligence Urdu Sentence Trigram Classifier (UML) Linguistics

Metrics

Cited By

0.42

FWCI (Field Weighted Citation Impact)

Refs

0.70

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Sentiment Analysis and Opinion Mining

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Sentence Classification Using N-Grams in Urdu Language Text

Abstract

Metrics

Citation History

Topics

Related Documents

Copy detection in urdu language documents using n-grams model

Text Mining Using N-Grams

Weighted N-grams CNN for Text Classification

Feature Selection on Chinese Text Classification Using Character N-Grams

Using Word N-Grams as Features in Arabic Text Classification