Zhu, Ying (2018): Concentration Based Inference for High Dimensional (Generalized) Regression Models: New Phenomena in Hypothesis Testing.
Preview |
PDF
MPRA_paper_89281.pdf Download (618kB) | Preview |
Abstract
We develop simple and non-asymptotically justified methods for hypothesis testing about the coefficients ($\theta^{*}\in\mathbb{R}^{p}$) in the high dimensional (generalized) regression models where $p$ can exceed the sample size $n$. Given a function $h:\,\mathbb{R}^{p}\mapsto\mathbb{R}^{m}$, we consider $H_{0}:\,h(\theta^{*})=\mathbf{0}_{m}$ against the alternative hypothesis $H_{1}:\,h(\theta^{*})\neq\mathbf{0}_{m}$, where $m$ can be as large as $p$ and $h$ can be nonlinear in $\theta^{*}$. Our test statistics is based on the sample score vector evaluated at an estimate $\hat{\theta}_{\alpha}$ that satisfies $h(\hat{\theta}_{\alpha})=\mathbf{0}_{m}$, where $\alpha$ is the prespecified Type I error. We provide nonasymptotic control on the Type I and Type II errors for the score test. In addition, confidence regions are constructed in terms of the score vectors. By exploiting the concentration phenomenon in Lipschitz functions, the key component reflecting the ``dimension complexity'' in our non-asymptotic thresholds uses a Monte-Carlo approximation to ``mimic'' the expectation that is concentrated around and automatically captures the dependencies between the coordinates. The novelty of our methods is that their validity does not rely on good behavior of $\left\Vert \hat{\theta}_{\alpha}-\theta^{*}\right\Vert _{2}$ or even $n^{-1/2}\left\Vert X\left(\hat{\theta}_{\alpha}-\theta^{*}\right)\right\Vert _{2}$ nonasymptotically or asymptotically. Most interestingly, we discover phenomena that are opposite from the existing literature: (1) More restrictions (larger $m$) in $H_{0}$ make our procedures more powerful; (2) whether $\theta^{*}$ is sparse or not, it is possible for our procedures to detect alternatives with probability at least $1-\textrm{Type II error}$ when $p\geq n$ and $m>p-n$; (3) the coverage probability of our procedures is not affected by how sparse $\theta^{*}$ is. The proposed procedures are evaluated with simulation studies, where the empirical evidence supports our key insights.
Item Type: | MPRA Paper |
---|---|
Original Title: | Concentration Based Inference for High Dimensional (Generalized) Regression Models: New Phenomena in Hypothesis Testing |
Language: | English |
Keywords: | Nonasymptotic inference, concentration, high dimensional inference, hypothesis testing, confidence sets |
Subjects: | C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General > C12 - Hypothesis Testing: General C - Mathematical and Quantitative Methods > C2 - Single Equation Models ; Single Variables C - Mathematical and Quantitative Methods > C2 - Single Equation Models ; Single Variables > C21 - Cross-Sectional Models ; Spatial Models ; Treatment Effect Models ; Quantile Regressions |
Item ID: | 89281 |
Depositing User: | Ms Ying Zhu |
Date Deposited: | 02 Oct 2018 03:23 |
Last Modified: | 01 Oct 2019 01:49 |
References: | Arlot, S., G. Blanchard, and E. Roquain (2010). “Some Nonasymptotic Results on Resampling in High Dimension, I: Confidence Regions.” Annals of Statistics, 38, 51-82. Bickel, P., J. Y. Ritov, and A. B. Tsybakov (2009). “Simultaneous Analysis of Lasso and Dantzig Selector.” Annals of Statistics, 37, 1705-1732. Bobkov, S. G. and M. Ledoux (2000). “From Brunn-Minkowski to Brascamp-Lieb and to Logarithmic Sobolev Inequalities.” Geometric and Functional Analysis. 10, 1028-1052. Boucheron, S, G. Lugosi, and P. Massart (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press. Oxford. Chernozhukov, V., D. Chetverikov, and K. Kato (2013). “Gaussian Approximations and Multiplier Bootstrap for Maxima of Sums of High-Dimensional Random Vectors.” Annals of Statistics, 41, 2786-2819. Dezeure, R., P. Bühlmann, and C.-H. Zhang (2017). “High-Dimensional Simultaneous Inference with the Bootstrap.” Test, 26, 685-719. Donoho, D. L., M. Elad, and V. N. Temlyakov (2006). “Stable Recovery of Sparse Overcomplete Representations in the Presence of Noise”. IEEE Transactions on Information Theory, 52, 6-18. Gautier, E. and A. B. Tsybakov (2011). “High-Dimensional Instrumental Variables Regression and Confidence Sets.” Manuscript. CREST (ENSAE). Gautier, E. and A. B. Tsybakov (2014). “High-Dimensional Instrumental Variables Regression and Confidence Sets.” Manuscript. CREST (ENSAE). Horowitz, J. L. (2017). “Non-Asymptotic Inference in Instrumental Variables Estimation.” Manuscript. Northwestern University. Javanmard, A. and A. Montanari (2014). “Confidence Intervals and Hypothesis Testing for High- Dimensional Regression.” Journal of Machine Learning Research, 15, 2869-2909. Ledoux, M., and M. Talagrand (1991). Probability in Banach Spaces: Isoperimetry and Processes. Springer-Verlag, New York, NY. Maurey, B. (1991). “Some Deviation Inequalities.” Geometric and Functional Analysis. 1, 188-197. Ning, Y and H. Liu (2017). “A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models.” Annals of Statistics, 45, 158-195. Rosenbaum, P. and D. Rubin (1983). “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika, 70, 41-55. Saumard, A. and J. A. Wellner (2014). “Log-Concavity and Strong Log-Concavity: A Review.” Statistics Surveys, 8, 45-114. van de Geer, S., P. Bühlmann, Y. Ritov, and R. Dezeure (2014). “On Asymptotically Optimal Confidence Regions and Tests for High-Dimensional Models.” Annals of Statistics, 42, 1166-1202. Wainwright, M. J. (2015). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. In preparation. University of California, Berkeley. Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge. Wooldridge, J. M. and Y. Zhu (2017). “Inference in Approximately Sparse Correlated Random Effects Probit Models.” Forthcoming in Journal of Business and Economic Statistics. Ye, F., and C.-H. Zhang (2010). “Rate Minimaxity of the Lasso and Dantzig Selector for the lq Loss in lr Balls”. Journal of Machine Learning Research, 11, 3519-3540. Zhang C.-H. and S. S. Zhang (2014). “Confidence Intervals for Low Dimensional Parameters in High Dimensional Linear Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76, 217-242. Zhang, X. and Cheng, G. (2017). “Simultaneous Inference for High-Dimensional Linear Models.” Journal of the American Statistical Association - Theory & Methods, 112, 757-768. Zhu, Y. and J. Bradic (2017). “Linear Hypothesis Testing in Dense High-Dimensional Linear Models.” Forthcoming in Journal of the American Statistical Association. |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/89281 |
Available Versions of this Item
-
Concentration Based Inference in High Dimensional Generalized Regression Models (I: Statistical Guarantees). (deposited 21 Aug 2018 01:27)
- Concentration Based Inference for High Dimensional (Generalized) Regression Models: New Phenomena in Hypothesis Testing. (deposited 02 Oct 2018 03:23) [Currently Displayed]