Neural sequence-to-sequence models have provided a viable new approach for abstractive text summarization. However, they are facing the challenges of low efficiency and accuracy when dealing with long text: their capability are not enough to handle very long input, they can not reproduce factual details accurately, and they tend to repeat themselves. In this paper, we propose an extractive and abstractive hybrid model. In the extractive part, we construct a graph model and propose a hybrid sentence similarity measure by combining sentence vector and Levenshtein. Then use this measure to rank and extract key sentences and concatenate the key sentences into a shorter text as the input of the summary generator. In the abstractive part, we make two improvement to the standard sequence-to-sequence attentional model. First, we use pointer mechanism to copy words from the source text, which helps the seq2seq generator to handle out-of-vocabulary (OOV) problem. Second, we use coverage mechanism to avoid repetition. We collect a financial news dataset and apply our model to the financial news summarization task, outperforming state-of-the-art method by at least 4.7 ROUGE points.
Yangbin ChenYun MaXudong MaoQing Li
Madhulika YarlagaddaHanumantha Rao Nadendla
Wenbo WangYang GaoHeyan HuangYuxiang Zhou
P. Lakshmi PrabhaDr.M. Parvathy
Wenfeng LiuYaling GaoJinming LiYuzhen Yang