Logo
Munich Personal RePEc Archive

Concentration Based Inference in High Dimensional Generalized Regression Models (I: Statistical Guarantees)

Zhu, Ying (2018): Concentration Based Inference in High Dimensional Generalized Regression Models (I: Statistical Guarantees).

Warning
There is a more recent version of this item available.
[thumbnail of MPRA_paper_88502.pdf]
Preview
PDF
MPRA_paper_88502.pdf

Download (549kB) | Preview

Abstract

We develop simple and non-asymptotically justified methods for hypothesis testing about the coefficients ($\theta^{*}\in\mathbb{R}^{p}$) in the high dimensional generalized regression models where $p$ can exceed the sample size. Given a function $h:\,\mathbb{R}^{p}\mapsto\mathbb{R}^{m}$, we consider $H_{0}:\,h(\theta^{*})=\mathbf{0}_{m}$ against $H_{1}:\,h(\theta^{*})\neq\mathbf{0}_{m}$, where $m$ can be any integer in $\left[1,\,p\right]$ and $h$ can be nonlinear in $\theta^{*}$. Our test statistics is based on the sample ``quasi score'' vector evaluated at an estimate $\hat{\theta}_{\alpha}$ that satisfies $h(\hat{\theta}_{\alpha})=\mathbf{0}_{m}$, where $\alpha$ is the prespecified Type I error. By exploiting the concentration phenomenon in Lipschitz functions, the key component reflecting the dimension complexity in our non-asymptotic thresholds uses a Monte-Carlo approximation to mimic the expectation that is concentrated around and automatically captures the dependencies between the coordinates. We provide probabilistic guarantees in terms of the Type I and Type II errors for the quasi score test. Confidence regions are also constructed for the population quasi-score vector evaluated at $\theta^{*}$. The first set of our results are specific to the standard Gaussian linear regression models; the second set allow for reasonably flexible forms of non-Gaussian responses, heteroscedastic noise, and nonlinearity in the regression coefficients, while only requiring the correct specification of $\mathbb{E}\left(Y_{i}|X_{i}\right)$s. The novelty of our methods is that their validity does not rely on good behavior of $\left\Vert \hat{\theta}_{\alpha}-\theta^{*}\right\Vert _{2}$ (or even $n^{-1/2}\left\Vert X\left(\hat{\theta}_{\alpha}-\theta^{*}\right)\right\Vert _{2}$ in the linear regression case) nonasymptotically or asymptotically.

Available Versions of this Item

Atom RSS 1.0 RSS 2.0

Contact us: mpra@ub.uni-muenchen.de

This repository has been built using EPrints software.

MPRA is a RePEc service hosted by Logo of the University Library LMU Munich.