Zhu, Ying (2018): Concentration Based Inference in High Dimensional Generalized Regression Models (I: Statistical Guarantees).
Preview |
PDF
MPRA_paper_88502.pdf Download (549kB) | Preview |
Abstract
We develop simple and non-asymptotically justified methods for hypothesis testing about the coefficients ($\theta^{*}\in\mathbb{R}^{p}$) in the high dimensional generalized regression models where $p$ can exceed the sample size. Given a function $h:\,\mathbb{R}^{p}\mapsto\mathbb{R}^{m}$, we consider $H_{0}:\,h(\theta^{*})=\mathbf{0}_{m}$ against $H_{1}:\,h(\theta^{*})\neq\mathbf{0}_{m}$, where $m$ can be any integer in $\left[1,\,p\right]$ and $h$ can be nonlinear in $\theta^{*}$. Our test statistics is based on the sample ``quasi score'' vector evaluated at an estimate $\hat{\theta}_{\alpha}$ that satisfies $h(\hat{\theta}_{\alpha})=\mathbf{0}_{m}$, where $\alpha$ is the prespecified Type I error. By exploiting the concentration phenomenon in Lipschitz functions, the key component reflecting the dimension complexity in our non-asymptotic thresholds uses a Monte-Carlo approximation to mimic the expectation that is concentrated around and automatically captures the dependencies between the coordinates. We provide probabilistic guarantees in terms of the Type I and Type II errors for the quasi score test. Confidence regions are also constructed for the population quasi-score vector evaluated at $\theta^{*}$. The first set of our results are specific to the standard Gaussian linear regression models; the second set allow for reasonably flexible forms of non-Gaussian responses, heteroscedastic noise, and nonlinearity in the regression coefficients, while only requiring the correct specification of $\mathbb{E}\left(Y_{i}|X_{i}\right)$s. The novelty of our methods is that their validity does not rely on good behavior of $\left\Vert \hat{\theta}_{\alpha}-\theta^{*}\right\Vert _{2}$ (or even $n^{-1/2}\left\Vert X\left(\hat{\theta}_{\alpha}-\theta^{*}\right)\right\Vert _{2}$ in the linear regression case) nonasymptotically or asymptotically.
Item Type: | MPRA Paper |
---|---|
Original Title: | Concentration Based Inference in High Dimensional Generalized Regression Models (I: Statistical Guarantees) |
Language: | English |
Keywords: | Nonasymptotic inference, concentration inequalities, high dimensional inference, hypothesis testing, confidence sets |
Subjects: | C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General > C12 - Hypothesis Testing: General C - Mathematical and Quantitative Methods > C2 - Single Equation Models ; Single Variables C - Mathematical and Quantitative Methods > C2 - Single Equation Models ; Single Variables > C21 - Cross-Sectional Models ; Spatial Models ; Treatment Effect Models ; Quantile Regressions |
Item ID: | 88502 |
Depositing User: | Ms Ying Zhu |
Date Deposited: | 21 Aug 2018 01:27 |
Last Modified: | 26 Sep 2019 08:15 |
References: | Arlot, S., G. Blanchard, and E. Roquain (2010). “Some Nonasymptotic Results on Resampling in High Dimension, I: Confidence Regions.” Annals of Statistics, 38, 51-82. Bickel, P., J. Y. Ritov, and A. B. Tsybakov (2009). “Simultaneous Analysis of Lasso and Dantzig Selector.” Annals of Statistics, 37, 1705-1732. Bobkov, S. G. and M. Ledoux (2000). “From Brunn-Minkowski to Brascamp-Lieb and to Logarithmic Sobolev Inequalities.” Geometric and Functional Analysis. 10, 1028-1052. Boucheron, S, G. Lugosi, and P. Massart (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press. Oxford. Chernozhukov, V., D. Chetverikov, and K. Kato (2013). “Gaussian Approximations and Multiplier Bootstrap for Maxima of Sums of High-Dimensional Random Vectors.” Annals of Statistics, 41, 2786-2819. Dezeure, R., P. Bühlmann, and C.-H. Zhang (2017). “High-Dimensional Simultaneous Inference with the Bootstrap.” Test, 26, 685-719. Donoho, D. L., M. Elad, and V. N. Temlyakov (2006). “Stable Recovery of Sparse Overcomplete Representations in the Presence of Noise”. IEEE Transactions on Information Theory, 52, 6-18. Gautier, E. and A. B. Tsybakov (2011). “High-Dimensional Instrumental Variables Regression and Confidence Sets.” Manuscript. CREST (ENSAE). Gautier, E. and A. B. Tsybakov (2014). “High-Dimensional Instrumental Variables Regression and Confidence Sets.” Manuscript. CREST (ENSAE). Horowitz, J. L. (2017). “Non-Asymptotic Inference in Instrumental Variables Estimation.” Manuscript. Northwestern University. Javanmard, A. and A. Montanari (2014). “Confidence Intervals and Hypothesis Testing for High- Dimensional Regression.” Journal of Machine Learning Research, 15, 2869-2909. Ledoux, M., and M. Talagrand (1991). Probability in Banach Spaces: Isoperimetry and Processes. Springer-Verlag, New York, NY. Maurey, B. (1991). “Some Deviation Inequalities.” Geometric and Functional Analysis. 1, 188-197. Ning, Y and H. Liu (2017). “A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models.” Annals of Statistics, 45, 158-195. Rosenbaum, P. and D. Rubin (1983). “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika, 70, 41-55. Saumard, A. and J. A. Wellner (2014). “Log-Concavity and Strong Log-Concavity: A Review.” Statistics Surveys, 8, 45-114. van de Geer, S., P. Bühlmann, Y. Ritov, and R. Dezeure (2014). “On Asymptotically Optimal Confidence Regions and Tests for High-Dimensional Models.” Annals of Statistics, 42, 1166-1202. Wainwright, M. J. (2015). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. In preparation. University of California, Berkeley. Wooldridge, J. M. and Y. Zhu (2017). “Inference in Approximately Sparse Correlated Random Effects Probit Models.” Forthcoming in Journal of Business and Economic Statistics. Ye, F., and C.-H. Zhang (2010). “Rate Minimaxity of the Lasso and Dantzig Selector for the lq Loss in lr Balls”. Journal of Machine Learning Research, 11, 3519-3540. Zhang C.-H. and S. S. Zhang (2014). “Confidence Intervals for Low Dimensional Parameters in High Dimensional Linear Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76, 217-242. Zhu, Y. and J. Bradic (2017). “Linear Hypothesis Testing in Dense High-Dimensional Linear Models.” Forthcoming in Journal of the American Statistical Association. |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/88502 |
Available Versions of this Item
- Concentration Based Inference in High Dimensional Generalized Regression Models (I: Statistical Guarantees). (deposited 21 Aug 2018 01:27) [Currently Displayed]