Shahid SiddiqSarmad HussainAasim AliMuhammad Kamran MalikWajid Ali
In this work, chunking is used to mark the noun phrases of Urdu sentences. The approach used in this work is hybrid that combines statistical method and hand crafted rules. The statistical model used in this work is HMM along with IOB chunk annotation. From a POS tagged corpus of 100,000 words, around 90,000 word tokens are used for training and 10,000 word tokens for testing. Several experiments are conducted to achieve high accuracy with different combinations of input, output and rule application patterns. Overall accuracy of 97.52% is achieved using TnT Tagger. It is observed that the input sequence which is successful in this regard is merging of POS annotation with IOB annotation.
Suresh ManandharEnrique Alfonseca
Yoav GoldbergMeni AdlerMichael Elhadad
Suresh ManandharEnrique Alfonseca
Zhao-Ming GaoYen-Hsi LinRuben G. Tsui