JOURNAL ARTICLE

Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records

Abstract

Abstract Objectives To develop classification algorithms that accurately identify axial SpA (axSpA) patients in electronic health records, and compare the performance of algorithms incorporating free-text data against approaches using only International Classification of Diseases (ICD) codes. Methods An enriched cohort of 7853 eligible patients was created from electronic health records of two large hospitals using automated searches (⩾1 ICD codes combined with simple text searches). Key disease concepts from free-text data were extracted using NLP and combined with ICD codes to develop algorithms. We created both supervised regression-based algorithms—on a training set of 127 axSpA cases and 423 non-cases—and unsupervised algorithms to identify patients with high probability of having axSpA from the enriched cohort. Their performance was compared against classifications using ICD codes only. Results NLP extracted four disease concepts of high predictive value: ankylosing spondylitis, sacroiliitis, HLA-B27 and spondylitis. The unsupervised algorithm, incorporating both the NLP concept and ICD code for AS, identified the greatest number of patients. By setting the probability threshold to attain 80% positive predictive value, it identified 1509 axSpA patients (mean age 53 years, 71% male). Sensitivity was 0.78, specificity 0.94 and area under the curve 0.93. The two supervised algorithms performed similarly but identified fewer patients. All three outperformed traditional approaches using ICD codes alone (area under the curve 0.80–0.87). Conclusion Algorithms incorporating free-text data can accurately identify axSpA patients in electronic health records. Large cohorts identified using these novel methods offer exciting opportunities for future clinical research.

Keywords:
Medicine Artificial intelligence Machine learning Sacroiliitis Diagnosis code Cohort Axial spondyloarthritis Ankylosing spondylitis Receiver operating characteristic Health records Electronic health record Natural language processing Computer science Data mining Population Health care Internal medicine

Metrics

43
Cited By
3.98
FWCI (Field Weighted Citation Impact)
28
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Spondyloarthritis Studies and Treatments
Health Sciences →  Medicine →  Rheumatology
Rheumatoid Arthritis Research and Therapies
Health Sciences →  Medicine →  Rheumatology
Psoriasis: Treatment and Pathogenesis
Life Sciences →  Immunology and Microbiology →  Immunology

Related Documents

JOURNAL ARTICLE

Using Natural Language Processing to Predict Risk in Electronic Health Records

Duy Van LeJames MontgomeryKenneth C. KirkbyJoel Scanlan

Journal:   Studies in health technology and informatics Year: 2024 Vol: 310 Pages: 574-578
© 2026 ScienceGate Book Chapters — All rights reserved.