Automatic text summarization is a growing area of natural language processing research. Using extractive text summarization approach a concise summary of the input information sources is developed by selecting phrases and sentences on a given selection criterion that can be based on features e.g. syntactic, semantic, temporal, positional, etc. Text summarization for under resourced languages e.g. Urdu is even more challenging, due to the limited availability of basic computational resources to effectively extract textual features. Surmounting these challenges, this paper presents an extractive text summarization methodology for Urdu language documents based on sentence weight algorithm using segmentation, tokenization and stopwords as prominent features. ROUGE metric is used for system evaluation by comparing system generated and human generated summaries. System accuracy at Unigram, bigram and trigram level is 67 percent.
V. Sherlin SolomiCh. Keertana SarvaniN. Supriya
Xiaoyue LiuJonathan J. WebsterChunyu Kit
Vaishali V. SarwadnyaSheetal Sonawane
Neelam Phadnis Gurveen Kaur Bans