JOURNAL ARTICLE

Testing for Associations with Missing High-Dimensional Categorical Covariates

Jennifer SchumiA. Gregory DiRienzoVictor DeGruttola

Year: 2008 Journal:   The International Journal of Biostatistics Vol: 4 (1)Pages: Article 18-Article 18   Publisher: De Gruyter

Abstract

Understanding how long-term clinical outcomes relate to short-term response to therapy is an important topic of research with a variety of applications. In HIV, early measures of viral RNA levels are known to be a strong prognostic indicator of future viral load response. However, mutations observed in the high-dimensional viral genotype at an early time point may change this prognosis. Unfortunately, some subjects may not have a viral genetic sequence measured at the early time point, and the sequence may be missing for reasons related to the outcome. Complete-case analyses of missing data are generally biased when the assumption that data are missing completely at random is not met, and methods incorporating multiple imputation may not be well-suited for the analysis of high-dimensional data. We propose a semiparametric multiple testing approach to the problem of identifying associations between potentially missing high-dimensional covariates and response. Following the recent exposition by Tsiatis, unbiased nonparametric summary statistics are constructed by inversely weighting the complete cases according to the conditional probability of being observed, given data that is observed for each subject. Resulting summary statistics will be unbiased under the assumption of missing at random. We illustrate our approach through an application to data from a recent AIDS clinical trial, and demonstrate finite sample properties with simulations.

Keywords:
Missing data Categorical variable Imputation (statistics) Covariate Statistics Inverse probability weighting Weighting Econometrics Nonparametric statistics Mathematics Computer science Medicine Estimator

Metrics

3
Cited By
0.27
FWCI (Field Weighted Citation Impact)
14
Refs
0.60
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Statistical Methods in Clinical Trials
Physical Sciences →  Mathematics →  Statistics and Probability
Bayesian Methods and Mixture Models
Physical Sciences →  Computer Science →  Artificial Intelligence
HIV/AIDS drug development and treatment
Health Sciences →  Medicine →  Infectious Diseases

Related Documents

JOURNAL ARTICLE

Logistic Models with Missing Categorical Covariates

Jeremiah Rounds

Journal:   Utah State Research and Scholarship (Utah State University) Year: 2021
JOURNAL ARTICLE

Testing endogeneity with high dimensional covariates

Zijian GuoHyunseung KangTommaso CaiDylan S. Small

Journal:   Journal of Econometrics Year: 2018 Vol: 207 (1)Pages: 175-187
JOURNAL ARTICLE

Feature screening for ultrahigh dimensional categorical data with covariates missing at random

Lyu NiFang FangJun Shao

Journal:   Computational Statistics & Data Analysis Year: 2019 Vol: 142 Pages: 106824-106824
JOURNAL ARTICLE

Bias correction in logistic regression with missing categorical covariates

Ujjwal DasTapabrata MaitiVivek Pradhan

Journal:   Journal of Statistical Planning and Inference Year: 2010 Vol: 140 (9)Pages: 2478-2485
© 2026 ScienceGate Book Chapters — All rights reserved.