Abstract

Imputation of missing data in high-dimensional datasets with more variables P than samples N, P≫N, is hampered by the data dimensionality. For multivariate imputation, the covariance matrix is ill conditioned and cannot be properly estimated. For fully conditional imputation, the regression models for imputation cannot include all the variables. Thus, the high dimension requires special imputation approaches. In this paper, we provide an overview and realistic comparisons of imputation approaches for high-dimensional data when applied to a linear mixed modelling (LMM) framework. We examine approaches from three different classes using simulation studies: multiple imputation with penalized regression, multiple imputation with recursive partitioning and predictive mean matching and multiple imputation with Principal Component Analysis (PCA). We illustrate the methods on a real case study where a multivariate outcome, i.e., an extracted set of correlated biomarkers from human urine samples, was collected and monitored over time and we discuss the proposed methods with more standard imputation techniques that could be applied by ignoring either the multivariate or the longitudinal dimension. Our simulations demonstrate the superiority of the recursive partitioning and predictive mean matching algorithm over the other methods in terms of bias, mean squared error and coverage of the LMM parameter estimates when compared to those obtained from a data analysis without missingness, although it comes at the expense of high computational costs. It is worthwhile reconsidering much faster methodologies like the one relying on PCA.

Keywords:
Imputation (statistics) Missing data Multivariate statistics Principal component analysis Covariance Regression Data set Linear regression

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.44
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Statistical Methods and Bayesian Inference
Physical Sciences →  Mathematics →  Statistics and Probability
Bayesian Methods and Mixture Models
Physical Sciences →  Computer Science →  Artificial Intelligence
Statistical Methods and Inference
Physical Sciences →  Mathematics →  Statistics and Probability

Related Documents

JOURNAL ARTICLE

Missing Data Imputation with High-Dimensional Data

Alberto BriniEdwin R. van den Heuvel

Journal:   The American Statistician Year: 2023 Vol: 78 (2)Pages: 240-252
JOURNAL ARTICLE

Missing Data Imputation in High Dimensional Data Set using Local Similarity

C. NaliniJ. SudeepthaApplication development associate in Accenture Chennai India

Journal:   International Journal of Recent Technology and Engineering (IJRTE) Year: 2019 Vol: 8 (3)Pages: 8070-8074
JOURNAL ARTICLE

High-dimensional missing data imputation via undirected graphical model

Yoonah LeeSeongoh Park

Journal:   Statistics and Computing Year: 2024 Vol: 34 (5)
BOOK-CHAPTER

Missing Data Imputation

Chapman & Hall/CRC biostatistics series Year: 2011 Pages: 275-290
© 2026 ScienceGate Book Chapters — All rights reserved.