Abstract

In this work, chunking is used to mark the noun phrases of Urdu sentences. The approach used in this work is hybrid that combines statistical method and hand crafted rules. The statistical model used in this work is HMM along with IOB chunk annotation. From a POS tagged corpus of 100,000 words, around 90,000 word tokens are used for training and 10,000 word tokens for testing. Several experiments are conducted to achieve high accuracy with different combinations of input, output and rule application patterns. Overall accuracy of 97.52% is achieved using TnT Tagger. It is observed that the input sequence which is successful in this regard is merging of POS annotation with IOB annotation.

Keywords:
Computer science Chunking (psychology) Noun phrase Annotation Natural language processing Artificial intelligence Hidden Markov model Phrase Urdu Word (group theory) Part of speech Noun Nominalization Speech recognition Linguistics

Metrics

4
Cited By
0.40
FWCI (Field Weighted Citation Impact)
14
Refs
0.70
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Handwritten Text Recognition Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Noun phrase chunking with APL2

Suresh ManandharEnrique Alfonseca

Year: 2000 Pages: 136-144
JOURNAL ARTICLE

Noun phrase chunking with APL2

Suresh ManandharEnrique Alfonseca

Journal:   ACM SIGAPL APL Quote Quad Year: 2000 Vol: 30 (4)Pages: 136-144
BOOK-CHAPTER

A Semi-supervised Approach for Chinese Noun Phrase Chunking

Zhao-Ming GaoYen-Hsi LinRuben G. Tsui

Text, speech and language technology Year: 2023 Pages: 497-525
© 2026 ScienceGate Book Chapters — All rights reserved.