Automated extraction of post-stroke functional outcomes from unstructured electronic health records

Marta Fernandes; Kaileigh Gallagher; Niels Turley; Aditya Gupta; M. Brandon Westover; Aneesh B. Singhal; Sahar F. Zafar

doi:10.1177/23969873251314340

ScienceGate Book Chapters

JOURNAL ARTICLE

Automated extraction of post-stroke functional outcomes from unstructured electronic health records

Marta Fernandes Kaileigh Gallagher Niels Turley Aditya Gupta M. Brandon Westover Aneesh B. Singhal Sahar F. Zafar

Year: 2025 Journal: European Stroke Journal Vol: 10 (3)Pages: 829-836 Publisher: SAGE Publishing

DOI: 10.1177/23969873251314340

Get Full-Text PDF Get Analytical Report

Abstract

Purpose: Population level tracking of post-stroke functional outcomes is critical to guide interventions that reduce the burden of stroke-related disability. However, functional outcomes are often missing or documented in unstructured notes. We developed a natural language processing (NLP) model that reads electronic health records (EHR) notes to automatically determine the modified Rankin Scale (mRS). Method: We included consecutive patients (⩾18 years) with acute stroke admitted to our center (2015–2024). mRS scores were obtained from the Get With the Guidelines registry and clinical notes (if documented), and used as the gold standard to compare against NLP-generated scores. We used text-based features from notes, along with age, sex, discharge status, and outpatient follow-up to train a logistic regression for prediction of good (0–2) versus poor (3–6) mRS, and a linear regression for the full range of mRS scores. The models were trained for prediction of mRS at hospital discharge and post-discharge. The models were externally validated in a dataset of patients with brain injuries from a different healthcare center. Findings: We included 5307 patients, 5006 in train and test and 301 in validation; average age was 69 (SD 15) and 65 (SD 17) years, respectively; 47% female. The logistic regression achieved an area under the receiver operating curve (AUROC) of 0.94 [CI 0.93–0.95] (test) and 0.94 [0.91–0.96] (validation), and the linear model a root mean squared error (RMSE) of 0.91 [0.87–0.94] (test) and 1.17 [1.06–1.28] (validation). Discussion and Conclusion: The NLP-based model is suitable for use in large-scale phenotyping of stroke functional outcomes and population health research.

Keywords:

Logistic regression Receiver operating characteristic Medicine Stroke (engine) Medical record Gold standard (test) Population Modified Rankin Scale Linear regression Test (biology) Psychological intervention Physical therapy Machine learning Computer science Surgery Internal medicine Ischemic stroke

Metrics

Cited By

4.60

FWCI (Field Weighted Citation Impact)

Refs

0.79

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Acute Ischemic Stroke Management

Health Sciences → Medicine → Epidemiology

Machine Learning in Healthcare

Physical Sciences → Computer Science → Artificial Intelligence

Stroke Rehabilitation and Recovery

Health Sciences → Medicine → Rehabilitation

Automated extraction of post-stroke functional outcomes from unstructured electronic health records

Abstract

Metrics

Citation History

Topics

Related Documents

Automated Extraction of Stroke Severity From Unstructured Electronic Health Records Using Natural Language Processing

Data Extraction and Integration from Unstructured Electronic Health Records

Automated extraction of incidental adrenal nodules from electronic health records

Depressive Symptoms and Functional Impairments Extraction From Electronic Health Records

Extraction of Unstructured Electronic Health Records to Evaluate Glioblastoma Treatment Patterns