Fan, Jianqing and Liao, Yuan (2012): Endogeneity in ultrahigh dimension.

PDF
MPRA_paper_38698.pdf Download (556Kb)  Preview 
Abstract
Most papers on highdimensional statistics are based on the assumption that none of the regressors are correlated with the regression error, namely, they are exogenous. Yet, endogeneity arises easily in highdimensional regression due to a large pool of regressors and this causes the inconsistency of the penalized leastsquares methods and possible false scientic discoveries. A necessary condition for model selection of a very general class of penalized regression methods is given, which allows us to prove formally the inconsistency claim. To cope with the possible endogeneity, we construct a novel penalized focussed generalized method of moments (FGMM) criterion function and oer a new optimization algorithm. The FGMM is not a smooth function. To establish its asymptotic properties, we rst study the model selection consistency and an oracle property for a general class of penalized regression methods. These results are then used to show that the FGMM possesses an oracle property even in the presence of endogenous predictors, and that the solution is also near global minimum under the overidentication assumption. Finally, we also show how the semiparametric efficiency of estimation can be achieved via a twostep approach.
Item Type:  MPRA Paper 

Original Title:  Endogeneity in ultrahigh dimension 
Language:  English 
Keywords:  Focused GMM, Sparsity recovery, Endogenous variables, Oracle property, Conditional moment restriction, Estimating equation, Over identi cation, Global minimization, Semiparametric efficiency 
Subjects:  C  Mathematical and Quantitative Methods > C1  Econometric and Statistical Methods and Methodology: General > C13  Estimation: General C  Mathematical and Quantitative Methods > C5  Econometric Modeling > C52  Model Evaluation, Validation, and Selection C  Mathematical and Quantitative Methods > C0  General > C01  Econometrics 
Item ID:  38698 
Depositing User:  Yuan Liao 
Date Deposited:  10. May 2012 01:43 
Last Modified:  12. Feb 2013 05:32 
References:  Andrews, D. (1999). Consistent moment selection procedures for generalized method of moments estimation. Econometrica, 67 543564 Andrews, D. and Lu, B. (2001). Consistent model and moment selection procedures for GMM estimation with application to dynamic panel data models. J. Econometrics, 101, 123164 Antoniadis, A. (1996). Smoothing noisy data with tapered coi ets series. Scand. J. Stat., 23, 313330 Belloni, A. and Chernozhukov, V. (2011a). Least squares after model selection in highdimensional sparse models. Forthcoming in Bernoulli. Manuscript. MIT. Belloni, A. and Chernozhukov, V. (2011b). l1penalized quantile regression in high dimensional sparse models. Ann. Statist., 39, 82130. Bickel, P., Klaassen, C., Ritov, Y. and Wellner, J. (1998). Efficient and adaptive estimation for semiparametric models. Springer, New York. Bradic, J., Fan, J. and Wang, W. (2011). Penalized composite quasilikelihood for ultrahighdimensional variable selection. J. R. Stat. Soc. Ser. B, 73, 325349. B�uhlmann, P., Kalisch, M. and Maathuis, M. (2010). Variable selection in highdimensional models: partially faithful distributions and the PCsimple algorithm. Biometrika, 97, 261278 B�uhlmann, P. and van de Geer, S. (2011). Statistics for HighDimensional Data: Methods, Theory and Applications. Springer, New York. Caner, M. (2009). Lassotype GMM estimator. Econometric Theory, 25 270290 Caner, M. and Zhang,H. (2009). General estimating equations: model selection and estimation with diverging number of parameters. Manuscript, North Carolina State University Candes, E. and Tao, T. (2007). The Dantzig selector: statistical estimation when p is much larger than n. Ann. Statist., 35 23132404 Chamberlain, G. (1987). Asymptotic eciency in estimation with conditional moment restrictions. J. Econometrics, 34 305334 Daubechies, I., Defrise, M. and De Mol, C. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math., 57, 14131457. Donald, S., Imbens, G. and Newey, W. (2003). Empirical likelihood estimation and consistent tests with conditional moment restrictions. J. Econometrics,117 5593 Donoho, D. L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52, 1289{1306. Donoho, D. L. and Elad, E. (2003). Maximal sparsity representation via l1 Minimization, Proc. Nat. Aca. Sci., 100, 21972202. Engle, R., Hendry, D. and Richard, J. (1983). Exogeneity. Econometrica. 51, 277304. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc., 96 13481360 Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B, 70, 849911. Fan, J. and Lv, J. (2011). Nonconcave penalized likelihood with NPdimensionality. IEEE Trans. Inform. Theory, 57,54675484. Fan, J. and Yao, Q. (1998). Ecient estimation of conditional variance functions in stochastic regression. Biometrika, 85, 645660. Fu, W. (1998). Penalized regression: The bridge versus the LASSO. J. Comput. Graph. Statist., 7, 397416. Gautier, E. and Tsybakov, A. (2011). High dimensional instrumental variables regression and condence sets. Manuscript. Hansen, B. (2010). Econometrics, Unpublished manuscript. University of Wisconsin. Hansen, L. (1982). Large sample properties of generalized method of moments estimators. Econometrica, 50 10291054 Horowitz, J. (1992). A smoothed maximum score estimator for the binary response model. Econometrica 60 505531 Huang, J., Horowitz, J. and Ma, S. (2008). Asymptotic properties of bridge estimators in sparse highdimensional regression models. Ann. Statist. 36 587613 Kitamura, Y., Tripathi, G. and Ahn, H. (2004). Empirical likelihoodbased inference in conditional moment restriction models. Econometrica, 72 16671714 Liao, Z. (2010). Adaptive GMM shrinkage estimation with consistent moment selection. Manuscript. Yale University. Lounici, K. (2008). Supnorm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat., 2, 90102. Lv. J. and Fan, Y. (2009). A unied approach to model selection and sparse recovery using regularized least squares. Ann. Statist. 37 34983528 Newey, W. (1990). Semiparametric eciency bound J. Appl. Econometrics, 5 99125 Newey, W. (1993). Efficient estimation of models with conditional moment restrictions, in Handbook of Statistics, Volume 11: Econometrics, ed. by G. S. Maddala, C. R. Rao, and H. D. Vinod. Amsterdam: NorthHolland. Newey, W. and McFadden, D. (1994). Large sample estimation and hypothesis testing, in Handbook of Econometrics, Chapter 36, ed. by R. Engle and D. McFadden Owen, A. (1988). Empirical likelihood ratio condence intervals for a single functional. Biometrika, 75, 237249. Raskutti, G., Wainwright, M. and Yu, B. (2011). Minimax rates of estimation for highdimensional linear regression over lqballs. IEEE Trans. Inform. Theory, 57,69766994. St�adler, N., B�uhlmann, P. and van de Geer, S. (2010). l1penalization for mixture regression models (with discussion). Test, 19, 209256 Severini, T. and Tripathi, G. (2001). A simplied approach to computing efficiency bounds in semiparametric models. J. Econometrics, 102, 2366. Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B, 58 267288 Verbeek, M. (2008). A guide to modern econometrics. 3rd edition. John Wiley and Sons, England. Wasserman L. and Roeder, K.(2009). Highdimensional variable selection. Ann. Statist., 37 21782201. Zhang, C. (2010). Nearly unbiased variable selection under minimax concave penalty, Ann. Statist., 38 894942 Zhang, C. and Huang, J. (2008). The sparsity and bias of the Lasso selection in high dimensional linear models. Ann. Statist., 36 15671594. Zhang, C. and Zhang, T. (2012). A general theory of concave regularization for high dimensional sparse estimation problems/ Manuscript, Rutgers University. Zhang, T. (2010). Analysis of multistage convex relaxation for sparse regularization. J. Mach. Learn. Res., 11 10871107. Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res., 7 25412563 Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc., 101, 14181429 Zou, H. and Hastie, t. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67 301320 Zou, H. and Li, R. (2008). Onestep sparse estimates in nonconcave penalized likelihood models. Ann. Statist., 36 15091533 
URI:  http://mpra.ub.unimuenchen.de/id/eprint/38698 