Large covariance estimation by thresholding principal orthogonal complements

Fan, Jianqing and Liao, Yuan and Mincheva, Martina (2011): Large covariance estimation by thresholding principal orthogonal complements.

Preview

PDF
MPRA_paper_38697.pdf
Download (620kB) | Preview

Abstract

This paper deals with estimation of high-dimensional covariance with a conditional sparsity structure, which is the composition of a low-rank matrix plus a sparse matrix. By assuming sparse error covariance matrix in a multi-factor model, we allow the presence of the cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specic examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms, including the spectral norm. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also veried by extensive simulation studies.

Item Type:	MPRA Paper
Original Title:	Large covariance estimation by thresholding principal orthogonal complements
Language:	English
Keywords:	High dimensionality, approximate factor model, unknown factors, principal components, sparse matrix, low-rank matrix, thresholding, cross-sectional correlation
Subjects:	C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General > C13 - Estimation: General C - Mathematical and Quantitative Methods > C0 - General > C01 - Econometrics
Item ID:	38697
Depositing User:	Yuan Liao
Date Deposited:	10 May 2012 01:41
Last Modified:	26 Sep 2019 16:34
References:	Ahn, S., Lee, Y. and Schmidt, P. (2001). GMM estimation of linear panel data models with time-varying individual eects. J. Econometrics. 101, 219-255. Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Annals of Statistics, 37, 28772921. Antoniadis, A. and Fan, J. (2001). Regularized wavelet approximations. J. Amer. Statist. Assoc. 96, 939-967. Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica. 71 135-171. Bai, J. and Ng, S.(2002). Determining the number of factors in approximate factor models. Econometrica. 70 191-221. Bai, J. and Ng, S.(2008). Large dimensional factor analysis. Foundations and trends in econometrics. 3 89-163. Bai, J. and Shi, S.(2011). Estimating high dimensional covariance matrices and its applications. Annals of Economics and Finance. 12 199-215. Bickel, P. and Levina, E. (2004). Some theory for Fisher's linear discriminant function, naive Bayes", and some alternatives when there are many more variables than observations. Bernoulli. 10 989-1010. Bickel, P. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577-2604. Cai, T. and Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 106, 672-684. Cai, T. and Zhou, H. (2010). Optimal rates of convergence for sparse covariance matrix estimation. Manuscript. University of Pennsylvania. Chamberlain, G. and Rothschild, M. (1983). Arbitrage, factor structure and mean variance analysis in large asset markets. Econometrica. 51 1305-1324. Doz, C., Giannone, D. and Reichlin, L. (2006). A two-step estimator for large approximate dynamic factor models based on Kalman ltering. Manuscript. Universite de CergyPontoise. dAspremont, A., Bach, F. and El Ghaoui, L. (2008). Optimal solutions for sparse principal component analysis. Journal of Machine Learning Research, 9, 1269-1294. Davis, C. and Kahan, W. (1970). The rotation of eigenvectors by a perturbation III. SIAM Journal on Numerical Analysis, 7, 146. Efron, B. (2007). Correlation and large-scale simultaneous signicance testing. Jour Ameri. Statist. Assoc., 102, 93-103. Efron, B. (2010). Correlated z-values and the accuracy of large-scale statistical estimates. Jour Ameri. Statist. Assoc., 105, 1042-1055. Fama, E. and French, K. (1992). The cross-section of expected stock returns. Journal of Finance. 47 427-465. Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. J. Econometrics. Fan, J., Han, X., and Gu, W.(2012). Control of the false discovery rate under arbitrary covariance dependence (with discussion). Journal of American Statistical Association, to appear. Fan, J., Liao, Y. and Mincheva, M. (2011). High dimensional covariance matrix estimation in approximate factor models. Ann. Statist. To appear. Fan, J., Zhang, J., and Yu, K. (2008). Asset Allocation and Risk Assessment with Gross Exposure Constraints for Vast Portfolios. Manuscript. Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2000). The generalized dynamic factor model: identication and estimation. Review of Economics and Statistics. 82 540-554. Hallin, M. and Liska, R. (2007). Determining the number of factors in the general dynamic factor model. J. Amer. Statist. Assoc. 102, 603-617. Harding, M. (2009). Structural estimation of high-dimensional factor models. Manuscript. Stanford University. Hastie, T.J., Tibshirani, R. and Friedman, J. (2009). The elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed). Springer, New York. James, W. and Stein, C. (1961). Estimation with quadratic loss, in Proc. Fourth Berkeley Symp. Math. Statist. Probab. 1 361-379. Univ. California Press. Berkeley. Johnstone, I.M. and Lu, A.Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. Jour. Ameri. Statist. Assoc., 104, 682-693. Jung, S. and Marron, J.S. (2009). PCA consistency in high dimension, low sample size context. Ann. Statist., 37, 4104-4130. Kapetanios, G. (2010). A testing procedure for determining the number of factors in approximate factor models with large datasets. Journal of Business and Economic Statistics. 28, 397-409. Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254-4278. Luo, X. (2011). High dimensional low rank and sparse covariance matrix estimation via convex minimization. Manuscript. Ma, Z. (2011). Sparse principal components analysis and iterative threshollding. Manuscript. Meinshausen, N. and B�uhlmann, P. (2006). High dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 34, 1436{1462. Merlevede, F., Peligrad, M. and Rio, E. (2009). A Bernstein type inequality and moderate deviations for weakly dependent sequences. Manuscript. Universite Paris Est. Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues. The Review of Economics and Statistics. 92, 10041016. Pesaran, M.H. (2006). Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica. 74, 967-1012. Ross, S.A. (1976). The arbitrage theory of capital asset pricing. Journal of Economic Theory, 13, 341-360. Rothman, A., Levina, E. and Zhu, J. (2009). Generalized thresholding of large covariance matrices. J. Amer. Statist. Assoc. 104 177-186. Sentana, E. (2009). The econometrics of mean-variance eciency tests: a survey Econmetrics Jour., 12, C65C101. Shen, H. and Huang, J. (2008). Sparse principal component analysis via regularized low rank matrix approximation. J. Multivariate Analysis 99, 1015 1034. Leek, J.T. and Storey, J.D. (2008). A general framework for multiple testing dependence. Proc. Natl. Acad. Sci., 105, 18718-19723. Sharpe, W. (1964). Capital asset prices: A theory of market equilibrium under conditons of risks. Journal of Finance, 19, 425-442. Stock, J. and Watson, M. (2002). Forecasting using principal components from a large number of predictors. J. Amer. Statist. Assoc. 97, 1167-1179. Wang, P. (2010). Large dimensional factor models with a multi-level factor structure: identification, estimation and inference. Manuscript. Hong Kong University of Science and Technology. Witten, D.M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10, 515-534. Xiong, H., Goulding, E.H., Carlson, E.J., Tecott, L.H., McCulloch, C.E. and Sen, S. (2011). A Flexible Estimating Equations Approach for Mapping Function-Valued Traits. Genetics, 189, 305-316. Yap, J.S., Fan, J., and Wu, R. (2009). Nonparametric modeling of longitudinal covariance structure in functional mapping of quantitative trait loci. Biometrics, 65, 1068-1077. Zhang, Y. and El Ghoui, L. (2011) Large-scale sparse principal component analysis with application to text data. NIPS.
URI:	https://mpra.ub.uni-muenchen.de/id/eprint/38697

All papers reproduced by permission. Reproduction and distribution subject to the approval of the copyright owners.

View Item