Ya-Yun YehHsin-Yueh LinJingchuan GuoRamon C SunS. Brian JiangJIANG BIANHao Dai
Abstract Objectives Electronic health records (EHRs) rarely capture dietary detail, limiting diet–disease research. We aimed to develop machine learning (ML) computable phenotypes to identify high-fat diet (HFD) using variables typically available in EHRs. Materials and Methods We used National Health and Nutrition Examination Survey (NHANES) 1999-2020 data, where 24-h dietary recall served as ground truth. Dietary fat intake was summarized into a score (0-30) based on percent energy from fat, carbohydrate, and protein; lower scores indicated HFD. We defined HFD at cutoffs of 10, 15, and 20, and trained ML models (Extreme Gradient Boosting, logistic regression, random forest) using EHR-compatible variables (demographics, comorbidities, labs, anthropometrics). Model interpretability was assessed using Shapley Additive Explanations. To evaluate clinical relevance, we compared cancer associations using ML-predicted vs true diet labels. Results Machine learning models classified HFD with good performance, strongest at broader definitions. Random forest achieved an F1-score of 0.79 (recall 0.74, precision 0.84) at cutoff 20. Key predictors included race/ethnicity, triglycerides, obesity metrics (body mass index and derived indices), and metabolic panel results. Discussion These findings indicate that dietary patterns, though seldom recorded in EHRs, can be inferred from routinely available variables. The ability of ML-derived phenotypes to reproduce known diet–disease relationships underscore their epidemiologic validity. Top predictors also align with established biological pathways linking obesity, lipid metabolism, and cancer risk, supporting plausibility. Conclusion A high-fat dietary pattern can be inferred from EHR-compatible variables using ML-based phenotyping. This approach offers a scalable tool to integrate diet into EHR-based research and precision medicine.
Jenna WongMara E. Murray HorwitzLi ZhouSengwee Toh
Sara K. TedeschiTianrun CaiZhe HeYuri AhujaChuan HongKatherine A. YatesKumar DahalChang XuHouchen LyuKazuki YoshidaDaniel H. SolomonTianxi CaiKatherine P. Liao
Milena GianfrancescoSuzanne TamangJinoos YazdanyGabriela Schmajuk
Michael BurnsMichael R. MathisJohn VandervestXinyu TanBo LuDouglas A. ColquhounNirav ShahSachin KheterpalLeif Saager