Text Summarization is a process of shortening the longer text document(s) into a summary of informative sentences in an orderly way. Extractive summarization retrieves the important sentences without making any change to actual sentences. Each sentence in the document is represented as a vector of real numbers, which is termed as embedding. While embedding a sentence, values of specific features are analyzed and are plotted in n-dimensional space. This results in effective prediction of following and preceding sentences of the current sentence. Thus, the sentences with similar semantics lie closer to each other. Unsupervised Summarization groups the similar sentences by estimating the distance between the vectors and decides if sentence must be included in summary or not. Hierarchical Summarization constructs a tree-structure with the input text data, where the truncation of tree is done based on the number of clusters. Each cluster holds sentences that are semantically similar. By determining the nearest neighbor in each cluster, a specified number of sentences were retrieved from each cluster to be included in summary which holds at-least half the size of the input document(s). The performance of hierarchical summarization is measured on CNN/Daily Mail dataset and is determined using performance metrics. The evaluation score concludes that BIRCH algorithm performs better than the other Hierarchical Clustering techniques.
Aditya AdityaAkanksha ShrivastavaSaurabh Bilgaiyan
R. AkilaJ Brindha MerinHaeedir MohameedS. MuthusundariS. S. BhaviyaM. Elango