V. Sherlin SolomiCh. Keertana SarvaniN. Supriya
Text summarization simply means creating a summary from a text document while retaining its main ideas and points of contention. Text summarization is completely aimed to generate a coherent and precise synopsis of the text verbal document. Generating a coherent summary for a large text is a time-consuming task, this paper proposes a novel generic text summarizer for the English language that can accept a maximum input of 160-170 words and generate a summary of 60-80 words, which retains the original context of the input text. This model utilizes a heap queue algorithm for text summarization. The heap queue helps in preserving the phrases from an input text, by skimming the top-scoring sentences, making it easier to be extracted in terms of importance. The input text is tokenized aptly, with all the stop words removed. Further word frequency is calculated, which is used to calculate sentence score, the words are joined together to form a coherent sentence, a summary that uses the summary’s highest-scoring sentences. The model is tested using various scoring methods available and has obtained an accuracy of 86 percent. It is also observed that the cosine similarity for the model-generated output and manual reference summary is 0.86. The proposed model is a generic text summarizer that can be used for any type of data summarization, irrespective of its domain.
Muhammad AslamNoman JazebA. M. Martínez-EnríquezSikander Ali
Xiaoyue LiuJonathan J. WebsterChunyu Kit
Vaishali V. SarwadnyaSheetal Sonawane
Neelam Phadnis Gurveen Kaur Bans