Model selection consistency from the perspective of generalization ability and VC theory with an application to Lasso

Xu, Ning and Hong, Jian and Fisher, Timothy (2016): Model selection consistency from the perspective of generalization ability and VC theory with an application to Lasso.

Preview

PDF
MPRA_paper_71670.pdf
Download (1MB) | Preview

Abstract

Model selection is difficult to analyse yet theoretically and empirically important, especially for high-dimensional data analysis. Recently the least absolute shrinkage and selection operator (Lasso) has been applied in the statistical and econometric literature. Consis- tency of Lasso has been established under various conditions, some of which are difficult to verify in practice. In this paper, we study model selection from the perspective of generalization ability, under the framework of structural risk minimization (SRM) and Vapnik-Chervonenkis (VC) theory. The approach emphasizes the balance between the in-sample and out-of-sample fit, which can be achieved by using cross-validation to select a penalty on model complexity. We show that an exact relationship exists between the generalization ability of a model and model selection consistency. By implementing SRM and the VC inequality, we show that Lasso is L2-consistent for model selection under assumptions similar to those imposed on OLS. Furthermore, we derive a probabilistic bound for the distance between the penalized extremum estimator and the extremum estimator without penalty, which is dominated by overfitting. We also propose a new measurement of overfitting, GR2, based on generalization ability, that converges to zero if model selection is consistent. Using simulations, we demonstrate that the proposed CV-Lasso algorithm performs well in terms of model selection and overfitting control.

Item Type:	MPRA Paper
Original Title:	Model selection consistency from the perspective of generalization ability and VC theory with an application to Lasso
Language:	English
Keywords:	Model selection, VC theory, generalization ability, Lasso, high-dimensional data, structural risk minimization, cross validation.
Subjects:	C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General > C13 - Estimation: General C - Mathematical and Quantitative Methods > C5 - Econometric Modeling > C52 - Model Evaluation, Validation, and Selection C - Mathematical and Quantitative Methods > C5 - Econometric Modeling > C55 - Large Data Sets: Modeling and Analysis
Item ID:	71670
Depositing User:	Mr Ning Xu
Date Deposited:	01 Jun 2016 13:17
Last Modified:	28 Sep 2019 23:29
References:	Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In: Petrov, B. N., Csaki, F. (Eds.), 2nd International Symposium on Information Theory, Tsahkadsor, Armenia, USSR. Budapest: Akademiai Kaido, pp. 267–281. Bai, J., Ng, S., 2008. Forecasting economic time series using targeted predictors. Journal of Econometrics 146 (2), 304–317. Bellman, R. E., 1957. Dynamic Programming. Rand Corporation research study. Princeton University Press. Belloni, A., Chen, D., Chernozhukov, V., Hansen, C. B., 2012. Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 (6), 2369–2429. Belloni, A., Chernozhukov, V., 2011. High dimensional sparse econometric models: An introduction. Springer. Bickel, P. J., Ritov, Y., Tsybakov, A. B., 2009. Simultaneous analysis of lasso and dantzig selector. The Annals of Statistics 37, 1705–1732. Breiman, L., 1995. Better subset regression using the nonnegative garrote. Technometrics 37 (4), 373–384. Candes, E. J., Tao, T., 2007. The dantzig selector: statistical estimation when p is much larger than n. The Annals of Statistics, 2313–2351. Caner, M., 2009. Lasso-type gmm estimator. Econometric Theory 25 (1), 270–290. Chatterjee, A., Gupta, S., Lahiri, S., 2015. On the residual empirical process based on the ALASSO in high dimensions and its functional oracle property. Journal of Econometrics 186 (2), 317–324. Cheng, X., Liao, Z., 2015. Select the valid and relevant moments: An information-based Lasso for gmm with many moments. Journal of Econometrics 186 (2), 443–464. Chickering, D. M., Heckerman, D., Meek, C., 2004. Large-sample learning of Bayesian networks is NP-hard. Journal of Machine Learning Research 5, 1287–1330. De Mol, C., Giannone, D., Reichlin, L., 2008. Forecasting using a large number of predictors: Is bayesian shrinkage a valid alternative to principal components? Journal of Econometrics 146 (2), 318–328. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., 2004. Least angle regression. The Annals of statistics 32 (2), 407–499. Frank, I. E., Friedman, J. H., 1993. A statistical view of some chemometrics regression tools. Technometrics 35 (2), 109–135. Friedman, J., Hastie, T., Tibshirani, R., 2010. A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736. Friedman, N., Geiger, D., Goldszmidt, M., 1997. Bayesian network classifiers. Machine Learning 29 (2-3), 131–163. Friedman, N., Linial, M., Nachman, I., Pe’er, D., 2000. Using Bayesian networks to analyze expression data. In: Proceedings of the Fourth Annual International Conference on Computational Molecular Biology. RECOMB ’00. ACM, New York, NY, USA, pp. 127–135. Fu, W. J., 1998. Penalized regressions: the bridge versus the Lasso. Journal of computational and graphical statistics 7 (3), 397–416. Heckerman, D., Geiger, D., Chickering, D. M., 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Machine learning 20 (3), 197–243. James, W., Stein, C., 1961. Estimation with quadratic loss. In: Proceedings of the 4th Berkeley symposium on mathematical statistics and probability. Vol. 1. pp. 361–379. Kim, H. H., Swanson, N. R., 2014. Forecasting financial and macroeconomic variables using data reduction methods: New empirical evidence. Journal of Econometrics 178, 352–367. Knight, K., Fu, W., 2000. Asymptotics for Lasso-type estimators. Annals of statistics, 1356–1378. Kock, A. B., Callot, L., 2015. Oracle inequalities for high dimensional vector autoregressions. Journal of Econometrics 186 (2), 325 – 344. Manzan, S., 2015. Forecasting the distribution of economic variables in a data-rich environ- ment. Journal of Business & Economic Statistics 33 (1), 144–164. Meinshausen, N., 2007. Relaxed Lasso. Computational statistics and data analysis 52 (1), 374–393. Meinshausen, N., Bühlmann, P., 2006. High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 1436–1462. Meinshausen, N., Yu, B., 2009. Lasso-type recovery of sparse representations for high- dimensional data. The Annals of Statistics, 246–270. Newey, W. K., McFadden, D., 1994. Large sample estimation and hypothesis testing. Handbook of econometrics 4, 2111–2245. Pistoresi, B., Salsano, F., Ferrari, D., 2011. Political institutions and central bank indepen- dence revisited. Applied Economics Letters 18 (7), 679–682. Schneider, U., Wagner, M., 2012. Catching growth determinants with the adaptive lasso. German Economic Review 13 (1), 71–85. Schwarz, G. E., 1978. Estimating the dimension of a model. Annals of Statistics 6 (2), 461–464. Shao, J., 1997. Asymptotic theory for model selection. Statistica Sinica 7, 221–242. Stone, M., 1974. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, Series B (Methodological) 36 (2), 111–147. Stone, M., 1977. An asymptotic equivalence of choice of model by cross-validation and akaike’s criterion. Journal of the Royal Statistical Society, Series B (Methodological) 39 (1), 44–47. Tibshirani, R., 1996. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B (Methodological) 58, 267–288. Tikhonov, A., 1963. Solution of incorrectly formulated problems and the regularization method. In: Soviet Math. Dokl. Vol. 5. pp. 1035–1038. Tropp, J. A., 2004. Greed is good: Algorithmic results for sparse approximation. Information Theory, IEEE Transactions on 50 (10), 2231–2242. Vapnik, V. N., Chervonenkis, A. Y., 1971a. On the uniform convergence of relative frequen- cies of events to their probabilities. Theoretical Probability and its Applications 16 (2), 264–280. Vapnik, V. N., Chervonenkis, A. Y., 1971b. Theory of uniform convergence of frequencie of appearance of attributes to their probabilities and problems of defining optimal solution by empiric data. Avtomatika i Telemekhanika (2), 42–53. Vapnik, V. N., Chervonenkis, A. Y., 1974. On the method of ordered risk minimization, II. Avtomatika i Telemekhanika (9), 29–39. Varian, H. R., 2014. Big data: new tricks for econometrics. The Journal of Economic Perspectives 28 (2), 3–27. Zhang, C.-H., 2010. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics 38, 894–942. Zhang, C.-H., Huang, J., 2008. The sparsity and bias of the Lasso selection in high- dimensional linear regression. The Annals of Statistics 36, 1567–1594. Zhang, T., 2009. On the consistency of feature selection using greedy least squares regression. Journal of Machine Learning Research 10, 555–568. Zhao, P., Yu, B., 2006. On model selection consistency of Lasso. The Journal of Machine Learning Research 7, 2541–2563. Zou, H., 2006. The adaptive Lasso and its oracle properties. Journal of the American statistical association 101 (476), 1418–1429.
URI:	https://mpra.ub.uni-muenchen.de/id/eprint/71670

All papers reproduced by permission. Reproduction and distribution subject to the approval of the copyright owners.

View Item