Element dispersion is a difficulty for document-level event extraction. While classic LSTM lacks the capability to interact between input and context while collecting long sequence features, previous document-level event extraction utilizes the entire document as input, leaving the sequence features of the document devoid of deeper contextual information. This paper suggests a document-level event extraction strategy based on Mogrifier LSTM to solve this issue. We divide the text into multiple paragraphs and then input each paragraph individually into the Mogrifier LSTM. To increase the context modeling capability of lengthy sequence text, the upgraded LSTM will allow the input of the present moment and the output of the preceding moment to be computed several times initially. Then an attention mechanism is introduced to capture the internal correlation of each paragraph and integrate the contextual semantics of each paragraph. Finally, sequence annotation is used to extract dispersed event elements and match event types. According to the experimental results on Chinese financial dataset, the method in this paper can effectively solve the problems of loss of depth information and scattered theoretical elements of long sequence features of documents, and improve the effectiveness of document-level event extraction.
FANG Yiqiu, LIU Fei, GE Junwei
Jiahua ZhangXiaochuan JingJunkang ZhengBoya Shi
Hongpeng LI, Bo MA, Yating YANG, Lei WANG, Zhen WANG, Xiao LI