Abstract This paper focuses on the problem of variance estimation in high-dimensional truncated linear models. In ultra-high-dimensional regression models, variance estimation is crucial for statistical inference and model selection. Traditional variance estimation techniques are no longer applicable when the number of samples is fewer than the number of variables. This paper proposes an improved variance estimation method that combines Refitted Cross-validation (RCV) with three semiparametric truncated linear regression estimators: Symmetric Trimmed Least Squares (STLS), Quadratic Mode Estimation (QME), and Left Truncation (LT) estimators. Theoretical analysis and simulation studies show that these combined methods not only mitigate the impact of high spurious correlations from irrelevant variables but also improve the accuracy and robustness of variance estimation. We further conducted detailed numerical simulations, which serve as concrete evidence of their effectiveness.
Thomas KalscheuerLaust B. Pedersen