JOURNAL ARTICLE

Variable Selection in Nonparametric Regression with Categorical Covariates

Peter J. BickelPing ZhangPing Zhang

Year: 1992 Journal:   Journal of the American Statistical Association Vol: 87 (417)Pages: 90-97

Abstract

Abstract Abstract This article extends the problem of variable selection to a nonparametric regression model with categorical covariates. Two selection criteria are considered: the cross-validation (CV) criterion and the accumulated prediction error (APE) criterion. We find that, asymptotically, the CV criterion performs well only when the true model is infinite-dimensional, while the APE criterion is appropriate when the true model is finite-dimensional. This is very similar to the case of linear regression model. A simulation study reveals some interesting small-sample properties of these criteria. To be more specific, suppose that we have observations (X 1, Y 1), …, (Xn, Yn ) that are iid random vectors and X = (X(1), X(2), …), where the X(i)'s are categorical. We allow Y to be of any type. Now a new observation X has arrived and we want to predict the corresponding Y. Such a framework is more appropriate than regressions with fixed covariates in situations where the covariates are observational rather than being controlled. For instance, Y could be the time from HIV infection to developing clinical AIDS, and the covariates (mostly categorical or reducible to categorical) could be observations from blood tests, a physical examination, or further personal information, such as sexual practices obtained from an interview. Take another example: Y could be the premium of an insurance policy with the covariates being the customer's general demographical information. Our goal is to select a subset of covariates that best predict Y. We define the true model dimension as d 0 if the regression function E(Y|X(1), X(2), …) is a d 0-variate function. The main conclusions of the article are: (1) The popular CV criterion performs well only when d 0 = ∞. (2) There exist other criteria that are more appropriate than CV when d 0 < ∞. (3) There is no difference between conditional and unconditional prediction errors, as far as asymptotics are concerned. (4) The selection range has to depend on the sample size. In fact, we argue that, for a given sample size n, we should only select models with the number of covariates not exceeding the order of magnitude of o(log n). (5) Simulation study indicates that the CV criterion has nice small-sample properties. Key Words: Cross-validationModel selectionPrediction

Keywords:
Covariate Categorical variable Statistics Nonparametric statistics Feature selection Mathematics Selection (genetic algorithm) Nonparametric regression Econometrics Regression analysis Regression Cross-sectional regression Semiparametric regression Computer science Artificial intelligence Bayesian multivariate linear regression

Metrics

33
Cited By
1.99
FWCI (Field Weighted Citation Impact)
17
Refs
0.87
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Statistical Methods and Models
Physical Sciences →  Mathematics →  Statistics and Probability
Statistical Methods and Inference
Physical Sciences →  Mathematics →  Statistics and Probability
Advanced Statistical Process Monitoring
Social Sciences →  Decision Sciences →  Statistics, Probability and Uncertainty

Related Documents

JOURNAL ARTICLE

Variable Selection in Nonparametric Regression with Categorical Covariates

Peter J. BickelPing Zhang

Journal:   Journal of the American Statistical Association Year: 1992 Vol: 87 (417)Pages: 90-90
JOURNAL ARTICLE

Variable Selection in Nonparametric Regression with Continuous Covariates

Ping Zhang

Journal:   The Annals of Statistics Year: 1991 Vol: 19 (4)
JOURNAL ARTICLE

Uniform convergence rates and automatic variable selection in nonparametric regression with functional and categorical covariates

Leonie Selk

Journal:   Journal of nonparametric statistics Year: 2023 Vol: 36 (1)Pages: 264-286
JOURNAL ARTICLE

Nonparametric regression and classification with functional, categorical, and mixed covariates

Leonie SelkJan Gertheiss

Journal:   Advances in Data Analysis and Classification Year: 2022 Vol: 17 (2)Pages: 519-543
© 2026 ScienceGate Book Chapters — All rights reserved.