In the Internet Era of Information due to the exponential increase of textual data, Multi Document Summarization (MDS) is becoming an inevitable NLP task that aims to produce a concise representation of the main idea of multiple related documents. MDS becomes difficult and challenging to produce a non-redundant summary because of the lexical diversity of multiple authors. This paper proposes a new multi-document summarization system based on phrase embedding and greedy Maximal Marginal Relevance (MMR) algorithm. This approach considers phrases as the basic meaningful semantic unit of the sentences to understand and summarize documents. Embedding techniques are employed to learn the vector representation of phrases to identify similar phrases semantically. Finally, an MMR based greedy algorithm is used to select sentences with important phrases while reducing the redundancy among similar phrases. Experimental results on the benchmark dataset DUC 2004 show better performance gains compared with the state-of-the-art baselines.
Yuning MaoYanru QuYiqing XieXiang RenJiawei Han
Salima LamsiyahAbdelkader El MahdaouyBernard EspinasseSaïd Ouatik El Alaoui
Zhaolin ZENGXin YanBingbing YuFeng ZhouGuangyi XU
Abdulrahman Mohsen Ahmed ZeyadArun Biradar