Xu, Ning and Hong, Jian and Fisher, Timothy (2016): Model selection consistency from the perspective of generalization ability and VC theory with an application to Lasso.

PDF
MPRA_paper_71670.pdf Download (1MB)  Preview 
Abstract
Model selection is difficult to analyse yet theoretically and empirically important, especially for highdimensional data analysis. Recently the least absolute shrinkage and selection operator (Lasso) has been applied in the statistical and econometric literature. Consis tency of Lasso has been established under various conditions, some of which are difficult to verify in practice. In this paper, we study model selection from the perspective of generalization ability, under the framework of structural risk minimization (SRM) and VapnikChervonenkis (VC) theory. The approach emphasizes the balance between the insample and outofsample fit, which can be achieved by using crossvalidation to select a penalty on model complexity. We show that an exact relationship exists between the generalization ability of a model and model selection consistency. By implementing SRM and the VC inequality, we show that Lasso is L2consistent for model selection under assumptions similar to those imposed on OLS. Furthermore, we derive a probabilistic bound for the distance between the penalized extremum estimator and the extremum estimator without penalty, which is dominated by overfitting. We also propose a new measurement of overfitting, GR2, based on generalization ability, that converges to zero if model selection is consistent. Using simulations, we demonstrate that the proposed CVLasso algorithm performs well in terms of model selection and overfitting control.
Item Type:  MPRA Paper 

Original Title:  Model selection consistency from the perspective of generalization ability and VC theory with an application to Lasso 
Language:  English 
Keywords:  Model selection, VC theory, generalization ability, Lasso, highdimensional data, structural risk minimization, cross validation. 
Subjects:  C  Mathematical and Quantitative Methods > C1  Econometric and Statistical Methods and Methodology: General > C13  Estimation: General C  Mathematical and Quantitative Methods > C5  Econometric Modeling > C52  Model Evaluation, Validation, and Selection C  Mathematical and Quantitative Methods > C5  Econometric Modeling > C55  Large Data Sets: Modeling and Analysis 
Item ID:  71670 
Depositing User:  Mr Ning Xu 
Date Deposited:  01 Jun 2016 13:17 
Last Modified:  01 Jun 2016 13:18 
References:  Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In: Petrov, B. N., Csaki, F. (Eds.), 2nd International Symposium on Information Theory, Tsahkadsor, Armenia, USSR. Budapest: Akademiai Kaido, pp. 267–281. Bai, J., Ng, S., 2008. Forecasting economic time series using targeted predictors. Journal of Econometrics 146 (2), 304–317. Bellman, R. E., 1957. Dynamic Programming. Rand Corporation research study. Princeton University Press. Belloni, A., Chen, D., Chernozhukov, V., Hansen, C. B., 2012. Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 (6), 2369–2429. Belloni, A., Chernozhukov, V., 2011. High dimensional sparse econometric models: An introduction. Springer. Bickel, P. J., Ritov, Y., Tsybakov, A. B., 2009. Simultaneous analysis of lasso and dantzig selector. The Annals of Statistics 37, 1705–1732. Breiman, L., 1995. Better subset regression using the nonnegative garrote. Technometrics 37 (4), 373–384. Candes, E. J., Tao, T., 2007. The dantzig selector: statistical estimation when p is much larger than n. The Annals of Statistics, 2313–2351. Caner, M., 2009. Lassotype gmm estimator. Econometric Theory 25 (1), 270–290. Chatterjee, A., Gupta, S., Lahiri, S., 2015. On the residual empirical process based on the ALASSO in high dimensions and its functional oracle property. Journal of Econometrics 186 (2), 317–324. Cheng, X., Liao, Z., 2015. Select the valid and relevant moments: An informationbased Lasso for gmm with many moments. Journal of Econometrics 186 (2), 443–464. Chickering, D. M., Heckerman, D., Meek, C., 2004. Largesample learning of Bayesian networks is NPhard. Journal of Machine Learning Research 5, 1287–1330. De Mol, C., Giannone, D., Reichlin, L., 2008. Forecasting using a large number of predictors: Is bayesian shrinkage a valid alternative to principal components? Journal of Econometrics 146 (2), 318–328. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., 2004. Least angle regression. The Annals of statistics 32 (2), 407–499. Frank, I. E., Friedman, J. H., 1993. A statistical view of some chemometrics regression tools. Technometrics 35 (2), 109–135. Friedman, J., Hastie, T., Tibshirani, R., 2010. A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736. Friedman, N., Geiger, D., Goldszmidt, M., 1997. Bayesian network classifiers. Machine Learning 29 (23), 131–163. Friedman, N., Linial, M., Nachman, I., Pe’er, D., 2000. Using Bayesian networks to analyze expression data. In: Proceedings of the Fourth Annual International Conference on Computational Molecular Biology. RECOMB ’00. ACM, New York, NY, USA, pp. 127–135. Fu, W. J., 1998. Penalized regressions: the bridge versus the Lasso. Journal of computational and graphical statistics 7 (3), 397–416. Heckerman, D., Geiger, D., Chickering, D. M., 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Machine learning 20 (3), 197–243. James, W., Stein, C., 1961. Estimation with quadratic loss. In: Proceedings of the 4th Berkeley symposium on mathematical statistics and probability. Vol. 1. pp. 361–379. Kim, H. H., Swanson, N. R., 2014. Forecasting financial and macroeconomic variables using data reduction methods: New empirical evidence. Journal of Econometrics 178, 352–367. Knight, K., Fu, W., 2000. Asymptotics for Lassotype estimators. Annals of statistics, 1356–1378. Kock, A. B., Callot, L., 2015. Oracle inequalities for high dimensional vector autoregressions. Journal of Econometrics 186 (2), 325 – 344. Manzan, S., 2015. Forecasting the distribution of economic variables in a datarich environ ment. Journal of Business & Economic Statistics 33 (1), 144–164. Meinshausen, N., 2007. Relaxed Lasso. Computational statistics and data analysis 52 (1), 374–393. Meinshausen, N., Bühlmann, P., 2006. Highdimensional graphs and variable selection with the Lasso. The Annals of Statistics, 1436–1462. Meinshausen, N., Yu, B., 2009. Lassotype recovery of sparse representations for high dimensional data. The Annals of Statistics, 246–270. Newey, W. K., McFadden, D., 1994. Large sample estimation and hypothesis testing. Handbook of econometrics 4, 2111–2245. Pistoresi, B., Salsano, F., Ferrari, D., 2011. Political institutions and central bank indepen dence revisited. Applied Economics Letters 18 (7), 679–682. Schneider, U., Wagner, M., 2012. Catching growth determinants with the adaptive lasso. German Economic Review 13 (1), 71–85. Schwarz, G. E., 1978. Estimating the dimension of a model. Annals of Statistics 6 (2), 461–464. Shao, J., 1997. Asymptotic theory for model selection. Statistica Sinica 7, 221–242. Stone, M., 1974. Crossvalidatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, Series B (Methodological) 36 (2), 111–147. Stone, M., 1977. An asymptotic equivalence of choice of model by crossvalidation and akaike’s criterion. Journal of the Royal Statistical Society, Series B (Methodological) 39 (1), 44–47. Tibshirani, R., 1996. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B (Methodological) 58, 267–288. Tikhonov, A., 1963. Solution of incorrectly formulated problems and the regularization method. In: Soviet Math. Dokl. Vol. 5. pp. 1035–1038. Tropp, J. A., 2004. Greed is good: Algorithmic results for sparse approximation. Information Theory, IEEE Transactions on 50 (10), 2231–2242. Vapnik, V. N., Chervonenkis, A. Y., 1971a. On the uniform convergence of relative frequen cies of events to their probabilities. Theoretical Probability and its Applications 16 (2), 264–280. Vapnik, V. N., Chervonenkis, A. Y., 1971b. Theory of uniform convergence of frequencie of appearance of attributes to their probabilities and problems of defining optimal solution by empiric data. Avtomatika i Telemekhanika (2), 42–53. Vapnik, V. N., Chervonenkis, A. Y., 1974. On the method of ordered risk minimization, II. Avtomatika i Telemekhanika (9), 29–39. Varian, H. R., 2014. Big data: new tricks for econometrics. The Journal of Economic Perspectives 28 (2), 3–27. Zhang, C.H., 2010. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics 38, 894–942. Zhang, C.H., Huang, J., 2008. The sparsity and bias of the Lasso selection in high dimensional linear regression. The Annals of Statistics 36, 1567–1594. Zhang, T., 2009. On the consistency of feature selection using greedy least squares regression. Journal of Machine Learning Research 10, 555–568. Zhao, P., Yu, B., 2006. On model selection consistency of Lasso. The Journal of Machine Learning Research 7, 2541–2563. Zou, H., 2006. The adaptive Lasso and its oracle properties. Journal of the American statistical association 101 (476), 1418–1429. 
URI:  https://mpra.ub.unimuenchen.de/id/eprint/71670 