JOURNAL ARTICLE

Automated extraction of post-stroke functional outcomes from unstructured electronic health records

Abstract

Purpose: Population level tracking of post-stroke functional outcomes is critical to guide interventions that reduce the burden of stroke-related disability. However, functional outcomes are often missing or documented in unstructured notes. We developed a natural language processing (NLP) model that reads electronic health records (EHR) notes to automatically determine the modified Rankin Scale (mRS). Method: We included consecutive patients (⩾18 years) with acute stroke admitted to our center (2015–2024). mRS scores were obtained from the Get With the Guidelines registry and clinical notes (if documented), and used as the gold standard to compare against NLP-generated scores. We used text-based features from notes, along with age, sex, discharge status, and outpatient follow-up to train a logistic regression for prediction of good (0–2) versus poor (3–6) mRS, and a linear regression for the full range of mRS scores. The models were trained for prediction of mRS at hospital discharge and post-discharge. The models were externally validated in a dataset of patients with brain injuries from a different healthcare center. Findings: We included 5307 patients, 5006 in train and test and 301 in validation; average age was 69 (SD 15) and 65 (SD 17) years, respectively; 47% female. The logistic regression achieved an area under the receiver operating curve (AUROC) of 0.94 [CI 0.93–0.95] (test) and 0.94 [0.91–0.96] (validation), and the linear model a root mean squared error (RMSE) of 0.91 [0.87–0.94] (test) and 1.17 [1.06–1.28] (validation). Discussion and Conclusion: The NLP-based model is suitable for use in large-scale phenotyping of stroke functional outcomes and population health research.

Keywords:
Logistic regression Receiver operating characteristic Medicine Stroke (engine) Medical record Gold standard (test) Population Modified Rankin Scale Linear regression Test (biology) Psychological intervention Physical therapy Machine learning Computer science Surgery Internal medicine Ischemic stroke

Metrics

1
Cited By
4.60
FWCI (Field Weighted Citation Impact)
22
Refs
0.79
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Acute Ischemic Stroke Management
Health Sciences →  Medicine →  Epidemiology
Machine Learning in Healthcare
Physical Sciences →  Computer Science →  Artificial Intelligence
Stroke Rehabilitation and Recovery
Health Sciences →  Medicine →  Rehabilitation

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.