Electronic health records (EHRs) have an inherently high degree of irregularity, including many missing values and varying time intervals, due to variations in patient conditions and treatment needs. This makes successful health risk prediction challenging. EHRs contain longitudinal patient data that records meaningful information associated with a chronological set of clinical observations for each patient. Existing methods focus on modeling variable correlations in patient data with deep neural networks to impute missing values and feed complete data matrices into machine learning models to perform downstream healthcare prediction tasks. However, not enough attention was given to the reliability of the imputed values by these methods. Further, it is likely that the pattern of missing data in EHR contains important information affecting relationships among variables, including time intervals. We propose a novel deep imputation-prediction network to simultaneously perform imputation and prediction tasks with EHR. Our method has the advantages of being able to: 1) learn from the longitudinal patient data in both forward and backward directions, 2) generate both the predicted and imputed values and enhance the reliability of imputed values, and 3) incorporate three common decay functions to capture the variation pattern of input variables in time and adaptively enhances the temporal representation of each pattern with adjustable weights. As well, our method is able to examine the association between input variables to identify critical indicative variables regardless of how long ago the associated event happened. Experimental results on MIMIC-III and eICU datasets demonstrate the effectiveness and superiority of our method for both imputation and prediction, as well as transparency and interpretability, compared to existing state-of-the-art methods.
Yu-xi LiuShaowen QinZhenhao ZhangWei Shao
Yu-xi LiuZhenhao ZhangShaowen Qin
Benjamin A. GoldsteinAnn Marie NávarMichael Pencina