KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness

Yichuan Li; Jialong Han; Kyumin Lee; Chengyuan Ma; Benjamin Yao; Xiaohu Liu

doi:10.18653/v1/2023.findings-emnlp.458

ScienceGate Book Chapters

JOURNAL ARTICLE

KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness

Yichuan Li Jialong Han Kyumin Lee Chengyuan Ma Benjamin Yao Xiaohu Liu

Year: 2023 Pages: 6864-6876

DOI: 10.18653/v1/2023.findings-emnlp.458

Get Full-Text PDF Get Analytical Report

Abstract

In recent years, Pre-trained Language Models (PLMs) have shown their superiority by pre-training on unstructured text corpus and then fine-tuning on downstream tasks.On entity-rich textual resources like Wikipedia, Knowledge-Enhanced PLMs (KEPLMs) incorporate the interactions between tokens and mentioned entities in pre-training, and are thus more effective on entity-centric tasks such as entity linking and relation classification.Although exploiting Wikipedia's rich structures to some extent, conventional KEPLMs still neglect a unique layout of the corpus where each Wikipedia page is around a topic entity (identified by the page URL and shown in the page title).In this paper, we demonstrate that KE-PLMs without incorporating the topic entities will lead to insufficient entity interaction and biased (relation) word semantics.We thus propose KÉPLET, a novel Knowledge-Énhanced Pre-trained LanguagE model with Topic entity awareness.In an end-to-end manner, KÉPLET identifies where to add the topic entity's information in a Wikipedia sentence, fuses such information into token and mentioned entities representations, and supervises the network learning, through which it takes topic entities back into consideration.Experiments demonstrated the generality and superiority of KÉPLET which was applied to two representative KEPLMs, achieving significant improvements on four entity-centric tasks.

Keywords:

Computer science Natural language processing Language model Artificial intelligence Knowledge management

Metrics

Cited By

0.26

FWCI (Field Weighted Citation Impact)

Refs

0.59

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Data Quality and Management

Social Sciences → Decision Sciences → Management Science and Operations Research

KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness

Abstract

Metrics

Citation History

Topics

Related Documents

Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

KLMo: Knowledge Graph Enhanced Pretrained Language Model with Fine-Grained Relationships

TabMedBERT: A Tabular Knowledge Enhanced Biomedical Pretrained Language Model

Incorporating entity-level knowledge in pretrained language model for biomedical dense retrieval

Span-Based Nested Named Entity Recognition with Pretrained Language Model