JOURNAL ARTICLE

Comparison of BERT Implementations for Enhanced Cancer Symptoms Extraction from Electronic Health Records

Abstract

Effective management of cancer symptoms is pivotal for optimal clinical outcomes. This research aims to harness the potential of Electronic Health Records (EHRs), particularly unstructured clinical notes, as a rich data source for cancer symptom information. Given the complexity of extracting information from EHRs, we investigate the performance of various Large Language Models (LLMs), such as (Bidirectional Encoder Representations from Transformers) BERT and its variants, for cancer symptom identification. Using a carefully curated dataset of 1112 clinical notes annotated by experts for 13 prevalent cancer symptoms, we present a comparative analysis of the performance of models including BERT-based, Span BERT, Bio BERT, Clinical BERT, and PubMed BERT. Our findings unequivocally show that Clinical BERT outperforms other models, especially in metrics like precision, recall, and F1-score. This dominance of Clinical BERT underscores its potential to revolutionize cancer symptom management through EHRs, hinting at a brighter future for oncological research and improved treatment decision-making.

Keywords:
Implementation Computer science Health records Extraction (chemistry) Programming language Health care Political science Chemistry

Metrics

2
Cited By
1.28
FWCI (Field Weighted Citation Impact)
6
Refs
0.75
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Machine Learning in Healthcare
Physical Sciences →  Computer Science →  Artificial Intelligence
Data Quality and Management
Social Sciences →  Decision Sciences →  Management Science and Operations Research
© 2026 ScienceGate Book Chapters — All rights reserved.