Pretrained Biomedical Language Models for Clinical NLP in Spanish

Casimiro Pio Carrino; Joan Llop; Marc Pàmies; Asier Gutiérrez-Fandiño; Jordi Armengol-Estapé; Joaquín Silveira-Ocampo; Alfonso Valencia; Aitor González-Agirre; Marta Villegas

doi:10.18653/v1/2022.bionlp-1.19

JOURNAL ARTICLE

Pretrained Biomedical Language Models for Clinical NLP in Spanish

Casimiro Pio Carrino Joan Llop Marc Pàmies Asier Gutiérrez-Fandiño Jordi Armengol-Estapé Joaquín Silveira-Ocampo Alfonso Valencia Aitor González-Agirre Marta Villegas

Year: 2022 Pages: 193-199

DOI: 10.18653/v1/2022.bionlp-1.19

Get Full-Text PDF Get Analytical Report

Abstract

This work presents the first large-scale biomedical Spanish language models trained from scratch, using large biomedical corpora consisting of a total of 1.1B tokens and an EHR corpus of 95M tokens. We compared them against general-domain and other domain-specific models for Spanish on three clinical NER tasks. As main results, our models are superior across the NER tasks, rendering them more convenient for clinical NLP applications. Furthermore, our findings indicate that when enough data is available, pre-training from scratch is better than continual pre-training when tested on clinical tasks, raising an exciting research question about which approach is optimal. Our models and fine-tuning scripts are publicly available at HuggingFace and GitHub.

Keywords:

Natural language processing Computer science Artificial intelligence Linguistics Library science Philosophy

Metrics

Cited By

8.03

FWCI (Field Weighted Citation Impact)

Refs

0.96

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Biomedical Text Mining and Ontologies

Life Sciences → Biochemistry, Genetics and Molecular Biology → Molecular Biology

Pretrained Biomedical Language Models for Clinical NLP in Spanish

Abstract

Metrics

Citation History

Topics

Related Documents

Improving Biomedical Pretrained Language Models with Knowledge

Developing Pretrained Language Models for Turkish Biomedical Domain

Pretrained language models

Ensemble pretrained language models to extract biomedical knowledge from literature

AMMU -A Survey of Transformer-based Biomedical Pretrained Language Models