The problem of the oft-decried information overload negatively impacts comprehension of useful information. How to solve this problem has given rise to increase of interest in research on multi-document summarization. With the aim of seeking a new method to help justify the importance and similarity of sentences in multi-document summarization, this study proposes a novel approach based on well-known hierarchical Bayesian topic models. By investigating hierarchical topics and their correlations with respect to the lexical co-occurrences of words, the proposed contextual topic model can determine the relevance of sentences more effectively, and recognize latent topics and arrange them hierarchically as well. The quantitative evaluation results show that this model has outperformed hLDA and LDA in document modeling. In addition, a practical application demonstrates that a summarization system implementing this model can significantly improve the performance of summarization and make it comparable to state-of-the-art summarization systems.
Guangbing YangDunwei WenKinshuk KinshukNian‐Shing ChenErkki Sutinen
Jianfeng GaoChenyan XiongPaul N. BennettNick Craswell
Surabhi GuptaAni NenkovaDan Jurafsky