JOURNAL ARTICLE

Interpretable Phenotyping for Electronic Health Records

Abstract

Datasets from Electronic Health Records (EHRs) are increasingly large and complex, creating challenges in their use for predictive modeling. The two major challenges are large-scale and high-dimensionality. One of the common way to address the large-scale challenge is through use of data phenotypes: clinically relevant characteristic groupings that can be expressed as logical queries (e.g., "senior patients with diabetes"). With the increasing use of machine learning across the continuum of care, phenotypes play an important role in modeling for population management, clinical trials, observational and interventional research, and quality measures. Yet, phenotype interpretation can often be difficult and require post-hoc clarifications from experienced clinicians. For example, detailed analysis may be needed to find that all patients in a a phenotype are diabetic seniors with complications from previous surgery. Moreover, the high-dimensionality problem is often addressed either separately or simultaneously with phenotyping by dimension reduction methods that may further hinder interpretability. In this paper, we introduce the notion of interpretable data phenotypes generated by an unsupervised learning technique. Methods are designed to disambiguate relative feature memberships, thus facilitating general clinical validation, and alleviating the problem of high-dimensionality. The empirical study applies the proposed unsupervised interpretable phenotyping method to a real world healthcare dataset (MIMIC), then uses hospital length of stay as a reference prediction task. The results demonstrate that the proposed method produces phenotypes with improved interpretability and without diminishing the quality of prediction results.

Keywords:
Health records Computer science Electronic health record Data science Artificial intelligence Information retrieval Health care Political science

Metrics

1
Cited By
0.14
FWCI (Field Weighted Citation Impact)
31
Refs
0.56
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Machine Learning in Healthcare
Physical Sciences →  Computer Science →  Artificial Intelligence
Biomedical Text Mining and Ontologies
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Artificial Intelligence in Healthcare
Health Sciences →  Health Professions →  Health Information Management

Related Documents

BOOK-CHAPTER

Leveraging Electronic Health Records for Phenotyping

Adam Wilcox

Health informatics Year: 2014 Pages: 61-74
JOURNAL ARTICLE

Next-generation phenotyping of electronic health records

George HripcsakDavid J. Albers

Journal:   Journal of the American Medical Informatics Association Year: 2012 Vol: 20 (1)Pages: 117-121
DISSERTATION

Asthma in electronic health records: validity and phenotyping

FW Nissen

University:   LSHTM Research Online (London School of Hygiene and Tropical Medicine) Year: 2019
© 2026 ScienceGate Book Chapters — All rights reserved.