In this paper, we present a novel sequence-to-sequence architecture with multi-head attention for automatic summarization of long text. Summaries generated by previous abstractive methods have the problems of duplicate and missing original information commonly. To address these problems, we propose a multi-head attention summarization (MHAS) model, which uses multi-head attention mechanism to learn relevant information in different representation subspaces. The MHAS model can consider the previously predicted words when generating new words to avoid generating a summary of redundant repetition words. And it can learn the internal structure of the article by adding self-attention layer to the traditional encoder and decoder and make the model better preserve the original information. We also integrate the multi-head attention distribution into pointer network creatively to improve the performance of the model. Experiments are conducted on CNN/Daily Mail dataset, which is a long text English corpora. Experimental results show that our proposed model outperforms the previous extractive and abstractive models.
Doosan BaekJiho KimHong-Chul Lee
HUANG Yuxin, YU Zhengtao, XIANG Yan, GAO Shengxiang, GUO Junjun
Hyunsoo LeeYunSeok ChoiJee-Hyong Lee