JOURNAL ARTICLE

Multiple imputation and analysis for high‐dimensional incomplete proteomics data

Abstract

Multivariable analysis of proteomics data using standard statistical models is hindered by the presence of incomplete data. We faced this issue in a nested case–control study of 135 incident cases of myocardial infarction and 135 pair‐matched controls from the Framingham Heart Study Offspring cohort. Plasma protein markers ( K = 861) were measured on the case–control pairs ( N = 135), and the majority of proteins had missing expression values for a subset of samples. In the setting of many more variables than observations ( K ≫ N ), we explored and documented the feasibility of multiple imputation approaches along with subsequent analysis of the imputed data sets. Initially, we selected proteins with complete expression data ( K = 261) and randomly masked some values as the basis of simulation to tune the imputation and analysis process. We randomly shuffled proteins into several bins, performed multiple imputation within each bin, and followed up with stepwise selection using conditional logistic regression within each bin. This process was repeated hundreds of times. We determined the optimal method of multiple imputation, number of proteins per bin, and number of random shuffles using several performance statistics. We then applied this method to 544 proteins with incomplete expression data (≤40% missing values), from which we identified a panel of seven proteins that were jointly associated with myocardial infarction. © 2015 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

Keywords:
Imputation (statistics) Missing data Statistics Framingham Heart Study Computer science Logistic regression Bin Data mining Mathematics Medicine Framingham Risk Score Internal medicine Algorithm

Metrics

24
Cited By
2.59
FWCI (Field Weighted Citation Impact)
36
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Statistical Methods in Clinical Trials
Physical Sciences →  Mathematics →  Statistics and Probability
Statistical Methods and Inference
Physical Sciences →  Mathematics →  Statistics and Probability
Advanced Statistical Methods and Models
Physical Sciences →  Mathematics →  Statistics and Probability

Related Documents

JOURNAL ARTICLE

Multiple imputation for high-dimensional mixed incomplete continuous and binary data

Ren HeThomas R. Belin

Journal:   Statistics in Medicine Year: 2014 Vol: 33 (13)Pages: 2251-2262
JOURNAL ARTICLE

Analysis of incomplete longitudinal binary data using multiple imputation

Xiaoming LiDevan V. MehrotraJohn Barnard

Journal:   Statistics in Medicine Year: 2005 Vol: 25 (12)Pages: 2107-2124
JOURNAL ARTICLE

Incomplete clustering analysis via multiple imputation

Jung Wun LeeOfer Harel

Journal:   Journal of Applied Statistics Year: 2022 Vol: 50 (9)Pages: 1962-1979
JOURNAL ARTICLE

Multiple imputation with compatibility for high-dimensional data

Faisal Maqbool ZahidShahla FaisalChristian Heumann

Journal:   PLoS ONE Year: 2021 Vol: 16 (7)Pages: e0254112-e0254112
© 2026 ScienceGate Book Chapters — All rights reserved.