Copy module has been widely equipped in the recent abstractive summarization models, which facilitates the decoder to extract words from the source into the summary. Generally, the encoder-decoder attention is served as the copy distribution, while how to guarantee that important words in the source are copied remains a challenge. In this work, we propose a Transformer-based model to enhance the copy mechanism. Specifically, we identify the importance of each source word based on the degree centrality with a directed graph built by the self-attention layer in the Transformer. We use the centrality of each source word to guide the copy process explicitly. Experimental results show that the self-attention graph provides useful guidance for the copy distribution. Our proposed models significantly outperform the baseline methods on the CNN/Daily Mail dataset and the Gigaword dataset.
Weijun YangZhi-Cheng TangXinhuai Tang
Xiangyu DuanHongfei YuMingming YinMin ZhangWeihua LuoYue Zhang
Rashed Z. AlMazroueiJenophia NelciSaid A. SalloumKhaled Shaalan
Junpeng LiuYanyan ZouYuxuan XiShengjie LiMian MaZhuoye DingBo Long