Improving Biomedical Pretrained Language Models with Knowledge

Zheng Yuan; Yijia Liu; Chuanqi Tan; Songfang Huang; Fei Huang

doi:10.18653/v1/2021.bionlp-1.20

ScienceGate Book Chapters

JOURNAL ARTICLE

Improving Biomedical Pretrained Language Models with Knowledge

Zheng Yuan Yijia Liu Chuanqi Tan Songfang Huang Fei Huang

Year: 2021 Pages: 180-190

DOI: 10.18653/v1/2021.bionlp-1.20

Get Full-Text PDF Get Analytical Report

Abstract

Pretrained language models have shown success in many natural language processing tasks. Many works explore to incorporate the knowledge into the language models. In the biomedical domain, experts have taken decades of effort on building large-scale knowledge bases. For example, UMLS contains millions of entities with their synonyms and defines hundreds of relations among entities. Leveraging this knowledge can benefit a variety of downstream tasks such as named entity recognition and relation extraction. To this end, we propose KeBioLM, a biomedical pretrained language model that explicitly leverages knowledge from the UMLS knowledge bases. Specifically, we extract entities from PubMed abstracts and link them to UMLS. We then train a knowledge-aware language model that firstly applies a text-only encoding layer to learn entity representation and then applies a text-entity fusion encoding to aggregate entity representation. In addition, we add two training objectives as entity detection and entity linking. Experiments on the named entity recognition and relation extraction tasks from the BLURB benchmark demonstrate the effectiveness of our approach. Further analysis on a collected probing dataset shows that our model has better ability to model medical knowledge.

Keywords:

Unified Medical Language System Computer science Natural language processing Relationship extraction Benchmark (surveying) Artificial intelligence Named-entity recognition Entity linking Language model Relation (database) Representation (politics) Knowledge base Domain (mathematical analysis) Natural language Encoding (memory) Variety (cybernetics) Question answering Information retrieval Information extraction Data mining Task (project management)

Metrics

Cited By

8.75

FWCI (Field Weighted Citation Impact)

Refs

0.98

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Biomedical Text Mining and Ontologies

Life Sciences → Biochemistry, Genetics and Molecular Biology → Molecular Biology

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Improving Biomedical Pretrained Language Models with Knowledge

Abstract

Metrics

Citation History

Topics

Related Documents

Ensemble pretrained language models to extract biomedical knowledge from literature

Improving Generalization of Pretrained Language Models

Developing Pretrained Language Models for Turkish Biomedical Domain

KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained Language Model

TabMedBERT: A Tabular Knowledge Enhanced Biomedical Pretrained Language Model