JOURNAL ARTICLE

Phrase Embedding Based Multi Document Summarization with Reduced Redundancy using Maximal Marginal Relevance

Abstract

In the Internet Era of Information due to the exponential increase of textual data, Multi Document Summarization (MDS) is becoming an inevitable NLP task that aims to produce a concise representation of the main idea of multiple related documents. MDS becomes difficult and challenging to produce a non-redundant summary because of the lexical diversity of multiple authors. This paper proposes a new multi-document summarization system based on phrase embedding and greedy Maximal Marginal Relevance (MMR) algorithm. This approach considers phrases as the basic meaningful semantic unit of the sentences to understand and summarize documents. Embedding techniques are employed to learn the vector representation of phrases to identify similar phrases semantically. Finally, an MMR based greedy algorithm is used to select sentences with important phrases while reducing the redundancy among similar phrases. Experimental results on the benchmark dataset DUC 2004 show better performance gains compared with the state-of-the-art baselines.

Keywords:
Computer science Automatic summarization Phrase Redundancy (engineering) Natural language processing Artificial intelligence Embedding Relevance (law) Information retrieval

Metrics

3
Cited By
0.15
FWCI (Field Weighted Citation Impact)
27
Refs
0.58
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.