JOURNAL ARTICLE

Natural Language Processing Improves Identification of Colorectal Cancer Testing in the Electronic Medical Record

Abstract

Background. Difficulty identifying patients in need of colorectal cancer (CRC) screening contributes to low screening rates. Objective. To use Electronic Health Record (EHR) data to identify patients with prior CRC testing. Design. A clinical natural language processing (NLP) system was modified to identify 4 CRC tests (colonoscopy, flexible sigmoidoscopy, fecal occult blood testing, and double contrast barium enema) within electronic clinical documentation. Text phrases in clinical notes referencing CRC tests were interpreted by the system to determine whether testing was planned or completed and to estimate the date of completed tests. Setting. Large academic medical center. Patients. 200 patients ≥50 years old who had completed ≥2 non-acute primary care visits within a 1-year period. Measures. Recall and precision of the NLP system, billing records, and human chart review were compared to a reference standard of human review of all available information sources. Results. For identification of all CRC tests, recall and precision were as follows: NLP system (recall 93%, precision 94%), chart review (74%, 98%), and billing records review (44%, 83%). Recall and precision for identification of patients in need of screening were: NLP system (recall 95%, precision 88%), chart review (99%, 82%), and billing records (99%, 67%). Limitations. Small sample size and requirement for a robust EHR. Conclusions. Applying NLP to EHR records detected more CRC tests than either manual chart review or billing records review alone. NLP had better precision but marginally lower recall to identify patients who were due for CRC screening than billing record review.

Keywords:
Medicine Medical record Colonoscopy Chart Recall Fecal occult blood Sigmoidoscopy Precision and recall Colorectal cancer Artificial intelligence Natural language processing Medical physics Cancer Internal medicine Computer science

Metrics

81
Cited By
2.39
FWCI (Field Weighted Citation Impact)
37
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Colorectal Cancer Screening and Detection
Health Sciences →  Medicine →  Oncology
Biomedical Text Mining and Ontologies
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Global Cancer Incidence and Screening
Health Sciences →  Medicine →  Oncology
© 2026 ScienceGate Book Chapters — All rights reserved.