Zijian GuoXiudi LiLarry HanTianxi Cai
Synthesizing information from multiple data sources is critical to ensure knowledge generalizability. Integrative analysis of multi-source data is challenging due to the heterogeneity across sources and data-sharing constraints. In this paper, we consider a general robust inference framework for federated meta-learning of data from multiple sites, enabling statistical inference for the prevailing model, defined as the one matching the majority of the sites. Statistical inference for the prevailing model is challenging since it requires a data-adaptive mechanism to select eligible sites and subsequently account for the selection uncertainty. We propose a novel sampling method to address the additional variation arising from the selection. Our devised confidence interval does not require sites to share individual-level data and is shown to be valid without requiring the selection of eligible sites to be error-free. The proposed robust inference for federated meta-learning (RIFL) methodology is broadly applicable and illustrated with three inference problems: aggregation of parametric models, high-dimensional prediction models, and inference for average treatment effects. We use RIFL to perform federated learning of mortality risk for patients hospitalized with COVID-19 using real-world EHR data from 15 healthcare centers representing 274 hospitals across four countries.
Huan ZhangYuxiang ChenKuanching LiYuhui LiSisi ZhouWei LiangAneta Poniszewska-Marańda
C. L. FuYuwen PuZhang QiaoJiayu PanJing QiuXuhong ZhangYiming WuShouling Ji
Farnaz TahmasebianJian LouLi Xiong
Bruce LiStella HoYouyang QuChenhao XuTom H. LuanLongxiang Gao