A multi-document summarizer finds the key topics from multiple textual sources and organizes information around them.In this paper we propose a summarization method for Persian text using paragraph vectors that can represent textual units of arbitrary lengths.We use these vectors to calculate the semantic relatedness between documents, cluster them to a number of predetermined groups, weight them based on their distance to the centroids and the intra-cluster homogeneity and take out the key paragraphs.We compare the final summaries with the goldstandard summaries of 21 digital topics using the ROUGE evaluation metric.Experimental results show the advantages of using paragraph vectors over earlier attempts at developing similar methods for a low resource language like Persian.
Aniket SuryavanshiBhavika GujareAllan MascarenhasBhanu Tekwani
Richeeka BathijaPranav AgarwalRakshith SomannaG. Pallavi