Natural Language Processing Accurately Differentiates Cancer Symptom Information in Electronic Health Record Narratives

Alaa Albashayreh; Anindita Bandyopadhyay; Nahid Zeinali; Min Zhang; Weiguo Fan; Stephanie Gilbertson‐White

doi:10.1200/cci.23.00235

ScienceGate Book Chapters

JOURNAL ARTICLE

Natural Language Processing Accurately Differentiates Cancer Symptom Information in Electronic Health Record Narratives

Alaa Albashayreh Anindita Bandyopadhyay Nahid Zeinali Min Zhang Weiguo Fan Stephanie Gilbertson‐White

Year: 2024 Journal: JCO Clinical Cancer Informatics Vol: 8 (8)Pages: e2300235-e2300235 Publisher: Lippincott Williams & Wilkins

DOI: 10.1200/cci.23.00235

Get Full-Text PDF Get Analytical Report

Abstract

PURPOSE Identifying cancer symptoms in electronic health record (EHR) narratives is feasible with natural language processing (NLP). However, more efficient NLP systems are needed to detect various symptoms and distinguish observed symptoms from negated symptoms and medication-related side effects. We evaluated the accuracy of NLP in (1) detecting 14 symptom groups (ie, pain, fatigue, swelling, depressed mood, anxiety, nausea/vomiting, pruritus, headache, shortness of breath, constipation, numbness/tingling, decreased appetite, impaired memory, disturbed sleep) and (2) distinguishing observed symptoms in EHR narratives among patients with cancer. METHODS We extracted 902,508 notes for 11,784 unique patients diagnosed with cancer and developed a gold standard corpus of 1,112 notes labeled for presence or absence of 14 symptom groups. We trained an embeddings-augmented NLP system integrating human and machine intelligence and conventional machine learning algorithms. NLP metrics were calculated on a gold standard corpus subset for testing. RESULTS The interannotator agreement for labeling the gold standard corpus was excellent at 92%. The embeddings-augmented NLP model achieved the best performance (F1 score = 0.877). The highest NLP accuracy was observed in pruritus (F1 score = 0.937) while the lowest accuracy was in swelling (F1 score = 0.787). After classifying the entire data set with embeddings-augmented NLP, we found that 41% of the notes included symptom documentation. Pain was the most documented symptom (29% of all notes) while impaired memory was the least documented (0.7% of all notes). CONCLUSION We illustrated the feasibility of detecting 14 symptom groups in EHR narratives and showed that an embeddings-augmented NLP system outperforms conventional machine learning algorithms in detecting symptom information and differentiating observed symptoms from negated symptoms and medication-related side effects.

Keywords:

Nausea Mood Artificial intelligence Medicine Narrative Natural language processing Computer science Internal medicine Clinical psychology

Metrics

Cited By

7.03

FWCI (Field Weighted Citation Impact)

Refs

0.95

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Machine Learning in Healthcare

Physical Sciences → Computer Science → Artificial Intelligence

Biomedical Text Mining and Ontologies

Life Sciences → Biochemistry, Genetics and Molecular Biology → Molecular Biology

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Accurately Differentiates Cancer Symptom Information in Electronic Health Record Narratives

Abstract

Metrics

Citation History

Topics

Related Documents

Natural Language Processing Techniques for Electronic Health Record Analysis

Developing Natural Language Processing to Extract Complementary and Integrative Health Information from Electronic Health Record Data

Natural Language Processing for Electronic Health Record Optimization in Android Applications

O3.17: Natural language processing of electronic health record data can accurately identify hospitalized elderly patients with venous thromboembolism (VTE)

Enhancing the National Cancer Database content using natural language processing and electronic health record data