The Factor-Lasso and K-Step Bootstrap Approach for Inference in High-Dimensional Economic Applications

Hansen, Christian and Liao, Yuan (2016): The Factor-Lasso and K-Step Bootstrap Approach for Inference in High-Dimensional Economic Applications.

Preview

PDF
MPRA_paper_75313.pdf
Download (1MB) | Preview

Abstract

We consider inference about coefficients on a small number of variables of interest in a linear panel data model with additive unobserved individual and time specific effects and a large number of additional time-varying confounding variables. We allow the number of these additional confounding variables to be larger than the sample size, and suppose that, in addition to unrestricted time and individual specific effects, these confounding variables are generated by a small number of common factors and high-dimensional weakly-dependent disturbances. We allow that both the factors and the disturbances are related to the outcome variable and other variables of interest. To make informative inference feasible, we impose that the contribution of the part of the confounding variables not captured by time specific effects, individual specific effects, or the common factors can be captured by a relatively small number of terms whose identities are unknown. Within this framework, we provide a convenient computational algorithm based on factor extraction followed by lasso regression for inference about parameters of interest and show that the resulting procedure has good asymptotic properties. We also provide a simple k-step bootstrap procedure that may be used to construct inferential statements about parameters of interest and prove its asymptotic validity. The proposed bootstrap may be of substantive independent interest outside of the present context as the proposed bootstrap may readily be adapted to other contexts involving inference after lasso variable selection and the proof of its validity requires some new technical arguments. We also provide simulation evidence about performance of our procedure and illustrate its use in two empirical applications.

Item Type:	MPRA Paper
Original Title:	The Factor-Lasso and K-Step Bootstrap Approach for Inference in High-Dimensional Economic Applications
English Title:	The Factor-Lasso and K-Step Bootstrap Approach for Inference in High-Dimensional Economic Applications
Language:	English
Keywords:	panel data, treatment effects
Subjects:	C - Mathematical and Quantitative Methods > C3 - Multiple or Simultaneous Equation Models ; Multiple Variables > C33 - Panel Data Models ; Spatio-temporal Models
Item ID:	75313
Depositing User:	Yuan Liao
Date Deposited:	08 Dec 2016 11:00
Last Modified:	26 Sep 2019 21:42
References:	Acemoglu, D., Johnson, S. and Robinson, J. A. (2001). The colonial origins of comparative develop- ment: An empirical investigation. American Economic Review 91 1369–1401. Agarwal, A., Negahban, S., Wainwright, M. J. et al. (2012). Fast global convergence of gradient methods for high-dimensional statistical recovery. The Annals of Statistics 40 2452–2482. Ahn, S. and Horenstein, A. (2013). Eigenvalue ratio test for the number of factors. Econometrica 81 1203–1227. Andrews, D. W. (2002). Higher-order improvements of a computationally attractive k-step bootstrap for extremum estimators. Econometrica 70 119–162. Arellano, M. (1987). Computing robust standard errors for within-groups estimators. Oxford Bulletin of Economics and Statistics 49 431–434. Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 71 135–171. Bai, J. (2009). Panel data models with interactive fixed effects. Econometrica 77 1229–1279. Bai, J. and Li, K. (2014). Theory and methods of panel data models with interactive effects. The Annals of Statistics 42 142–170. Bai, J. and Liao, Y. (2013). Statistical inferences using large estimated covariances for panel data and factor models. Tech. rep., University of Maryland. Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica 70 191–221. Bai, J. and Ng, S. (2006). Confidence intervals for diffusion index forecasts and inference for factor-augmented regressions. Econometrica 74 133–1150. Belloni, A., Chen, D., Chernozhukov, V. and Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 2369–2429. Belloni, A., Chernozhukov, V., Ferna ́ndez-Val, I. and Hansen, C. (2014a). Program evaluation with high-dimensional data. Econometrica Forthcoming, arXiv:1311.2645. Belloni, A., Chernozhukov, V. and Hansen, C. (2014b). Inference on treatment effects after selection among high-dimensional controls. The Review of Economic Studies 81 608–650. Belloni, A., Chernozhukov, V., Hansen, C. and Kozbur, D. (2015). Inference in high dimensional panel models with an application to gun control. Journal of Business and Economic Statistics Forthcoming, arXiv:1411.6507. Belloni, A., Chernozhukov, V. and Kato, K. (2013a). Uniform post selection inference for lad regression models and other z-estimation problems. arXiv preprint arXiv:1304.0282 ArXiv, 2013; Oberwolfach, 2012, Luminy, 2012. Belloni, A., Chernozhukov, V. and Wei, Y. (2013b). Honest confidence regions for logistic regression with a large number of controls. arXiv preprint arXiv:1304.3969 ArXiv, 2013. Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Annals of Statistics 41 802–837. Bernanke, B., Boivin, J. and Eliasz, P. (2005). Factor augmented vector autoregressions (fvars) and the analysis of monetary policy. Quarterly Journal of Economics 120 387–422. Bertrand, M., Duflo, E. and Mullainathan, S. (2004). How much should we trust differences-in-differences estimates? Quarterly Journal of Economics 119 249–275. Bonhomme, S. and Manresa, E. (2015). Grouped patterns of heterogeneity in panel data. Econometrica 83 1147–1184. Chatterjee, A. and Lahiri, S. N. (2011). Bootstrapping lasso estimators. Journal of the American Statistical Association 106 608–625. Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C. and Newey, W. (2016). Double machine learning for treatment and causal parameters. arXiv:1608.00060 . Chernozhukov, V., Chetverikov, D. and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Annals of Statistics 41 2786–2819. Chernozhukov, V. and Hansen, C. (2008). The reduced form: A simple approach to inference with weak instruments. Economics Letters 100 68–71. Choi, I. (2012). Efficient estimation of factor models. Econometric Theory 28 274–308. Cook, P. J. and Ludwig, J. (2006). The social costs of gun ownership. Journal of Public Economics 90 379–391. Dezeure, R., Bu ̈hlmann, P. and Zhang, C.-H. (2016). High-dimensional simultaneous inference with the bootstrap. arXiv preprint arXiv:1606.03940 . Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96 1348–1360. Fan, J. and Lv, J. (2011). Non-concave penalized likelihood with np-dimensionality. IEEE Transactions on Information Theory 57 5467–5484. Fan, J., Xue, L. and Yao, J. (2016). Sufficient forecasting using factor models. Journal of Econometrics, forthcoming . Farrell, M. (2015). Robust inference on average treatment effects with possibly more covariates than observations. Journal of Econometrics 174 1–23. Fithian, W., Sun, D. and Taylor, J. (2014). Optimal inference after model selection. arXiv preprint arXiv:1410.2597v1 ArXiv, 2014. Fu, W. J. (1998). Penalized regressions: the bridge versus the lasso. Journal of computational and graphical statistics 7 397–416. Gautier, E. and Tsybakov, A. B. (2011). High-dimensional instrumental variables regression and confi- dence sets. ArXiv working report . G’Sell, M. G., Taylor, J. and Tibshirani, R. (2013). Adaptive testing for the graphical lasso. arXiv preprint arXiv:1307.4765 . Hahn, P. R., Mukeherjee, S. and Carvalho, C. (2013). Partial factor modeling: Predictor dependent shrinkage for linear regression. JASA 108 999–1008. Hansen, C. B. (2007). Asymptotic properties of a robust variance matrix estimator for panel data when t is large. Journal of Econometrics 141 597–620. Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high- dimensional regression. arXiv:1306.3171v2 . Kadkhodaie, M., Sanjabi, M. and Luo, Z.-Q. (2014). On the linear convergence of the approximate proximal splitting method for non-smooth convex optimization. Journal of the Operations Research Society of China 2 123–141. Kozbur, D. (2015). Testing-based forward model selection. arXiv:1512.02666 . Lee, J. D., Sun, D. L., Sun, Y. and Taylor, J. E. (2016). Exact post-selection inference with the lasso. Annals of Statistics 44 907–927. Lee, J. D. and Taylor, J. E. (2014). Exact post model selection inference for marginal screening. Advances in Neural Information Processing Systems . Lockhart, R., Taylor, J., Tibshirani, R. J. and Tibshirani, R. (2014). A significance test for the lasso (with discussion). Annals of Statistics 42 413–468. Loftus, J. R. and Taylor, J. E. (2014). A significance test for forward stepwise model selection. arXiv preprint arXiv:1405.3920 . Loh, P.-L. and Wainwright, M. J. (2015). Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. Journal of Machine Learning Research 16 559–616. Mammen, E. (1993). Bootstrap and wild bootstrap for high dimensional linear models. The Annals of Statistics 255–285. Merleve`de, F., Peligrad, M. and Rio, E. (2011). A bernstein type inequality and moderate deviations for weakly dependent sequences. Probability Theory and Related Fields 151 435–474. Moon, R. and Weidner, M. (2015a). Dynamic linear panel regression models with interactive fixed effects. Econometric Theory Forthcoming. Moon, R. and Weidner, M. (2015b). Linear regression for panel with unknown number of factors as interactive fixed effects. Econometrica 83 1543–1579. Nesterov, Y. (2007). Gradient methods for minimizing composite objective function. Tech. rep., UCL. Pesaran, H. (2006). Estimation and inference in large heterogeneous panels with a multi factor error structure. Econometrica 74 967–1012. Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated gaussian designs. Journal of Machine Learning Research 99 2241–2259. Stock, J. and Watson, M. (2002). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association 97 1167–1179. Su, L. and Chen, Q. (2013). Testing homogeneity in panel data models with interactive fixed effects. Econometric Theory 29 1079–1135. Taylor, J., Lockhart, R., Tibshirani, R. J. and Tibshirani, R. (2014). Post-selection adaptive inference for least angle regression and the lasso. arXiv preprint arXiv:1401.3889 . van de Geer, S., Bu ̈hlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Annals of Statistics 42 1166–1202. Wager, S. and Athey, S. (2015). Estimation and inference of heterogeneous treatment effects using random forests. arXiv:1510.04342 . Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linaer models. Journal of the Royal Statistical Society: Series B 76 217–242.
URI:	https://mpra.ub.uni-muenchen.de/id/eprint/75313

All papers reproduced by permission. Reproduction and distribution subject to the approval of the copyright owners.

View Item