JOURNAL ARTICLE

Nearest neighbour imputation and variance estimation methods.

Murthy Mittinty

Year: 2004 Journal:   University of Canterbury Research Repository (University of Canterbury)   Publisher: University of Canterbury

Abstract

In large-scale surveys, non-response is a common phenomenon. This non-response can be of two types; unit and item non-response. In this thesis we deal with item non-response as other responses from the survey unit can be used for adjustment. Usually non-response adjustment is carried out in one of three ways; weighting, imputation and no adjustments. Imputation is the most commonly used adjustment method, either as single imputation or multiple imputations. In this thesis we study single imputation, in particular nearest neighbour methods, and we have developed a new method. Our method is based on dissimilarity measures and is nonparametric and handles categorical and continuous covariates without requiring any transformations. One drawback with this method was that it is relatively computer intensive, so we investigated data reduction methods. For data reduction we developed a new method that uses propensity scores. Propensity score is used as it has properties that suggest that it would make a good method for matching the respondents and non-respondents. We also looked at subset selection of the covariates using graphical modelling and principal component analysis. We found that the data reduction methods gave as good a result as when using all variables and there was considerable reduction in computation time especially with the propensity score method. As the imputed values are not true values, estimating the variance of the parameter of interest using standard methods would underestimate the variance if no allowance is made for the extra uncertainty due to imputed data being used. We examined various existing methods of variance estimation, particularly the bootstrap method, because both nearest neighbour imputation and bootstrap are non parametric. Also bootstrap is a unified method for estimating smooth as well as non-smooth parameters. Shao and Sitter (1996) proposed a bootstrap method, but for some extreme situations this method has problems. We have modified the bootstrap method of Shao and Sitter to overcome this problem and simulations indicate that both methods give good results. The conclusions from the study are that our new method of multivariate nearest neighbour is at least as good as regression based nearest neighbour and is often better. For large data sets, data reduction may be desirable and we recommend our propensity score method as it was observed to be the fastest among the subset selection methods as well as have some other advantages over the others. Imputing using any of the subsets methods we looked at appear to have similar results to imputing using all covariates. To compute the variance of the imputed data, we recommend the method proposed by Shao and Sitter or our modification of Shao and Sitter's method.

Keywords:
Imputation (statistics) Statistics Computer science Variance (accounting) Artificial intelligence Mathematics Data mining Missing data Business

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
41
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Bayesian Methods and Mixture Models
Physical Sciences →  Computer Science →  Artificial Intelligence
Data Management and Algorithms
Physical Sciences →  Computer Science →  Signal Processing
Automated Road and Building Extraction
Physical Sciences →  Engineering →  Ocean Engineering

Related Documents

JOURNAL ARTICLE

Jackknife Variance Estimation for Nearest-Neighbor Imputation

Jiahua ChenJun Shao

Journal:   Journal of the American Statistical Association Year: 2001 Vol: 96 (453)Pages: 260-269
JOURNAL ARTICLE

Convergence of random -nearest-neighbour imputation

Fredrik A. Dahl

Journal:   Computational Statistics & Data Analysis Year: 2006 Vol: 51 (12)Pages: 5913-5917
JOURNAL ARTICLE

Balancedk-nearest neighbour imputation

Caren HaslerYves Tillé

Journal:   Statistics Year: 2016 Vol: 50 (6)Pages: 1310-1331
JOURNAL ARTICLE

Nearest neighbour imputation under single index models

Jun ShaoLei Wang

Journal:   Statistical Theory and Related Fields Year: 2019 Vol: 3 (2)Pages: 208-212
© 2026 ScienceGate Book Chapters — All rights reserved.