Fan, Jianqing and Liao, Yuan and Mincheva, Martina (2011): Large covariance estimation by thresholding principal orthogonal complements.

PDF
MPRA_paper_38697.pdf Download (620kB)  Preview 
Abstract
This paper deals with estimation of highdimensional covariance with a conditional sparsity structure, which is the composition of a lowrank matrix plus a sparse matrix. By assuming sparse error covariance matrix in a multifactor model, we allow the presence of the crosssectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure. The POET estimator includes the sample covariance matrix, the factorbased covariance matrix (Fan, Fan and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specic examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms, including the spectral norm. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also veried by extensive simulation studies.
Item Type:  MPRA Paper 

Original Title:  Large covariance estimation by thresholding principal orthogonal complements 
Language:  English 
Keywords:  High dimensionality, approximate factor model, unknown factors, principal components, sparse matrix, lowrank matrix, thresholding, crosssectional correlation 
Subjects:  C  Mathematical and Quantitative Methods > C1  Econometric and Statistical Methods and Methodology: General > C13  Estimation: General C  Mathematical and Quantitative Methods > C0  General > C01  Econometrics 
Item ID:  38697 
Depositing User:  Yuan Liao 
Date Deposited:  10. May 2012 01:41 
Last Modified:  20. Feb 2013 13:51 
References:  Ahn, S., Lee, Y. and Schmidt, P. (2001). GMM estimation of linear panel data models with timevarying individual eects. J. Econometrics. 101, 219255. Amini, A. A. and Wainwright, M. J. (2009). Highdimensional analysis of semidefinite relaxations for sparse principal components. Annals of Statistics, 37, 28772921. Antoniadis, A. and Fan, J. (2001). Regularized wavelet approximations. J. Amer. Statist. Assoc. 96, 939967. Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica. 71 135171. Bai, J. and Ng, S.(2002). Determining the number of factors in approximate factor models. Econometrica. 70 191221. Bai, J. and Ng, S.(2008). Large dimensional factor analysis. Foundations and trends in econometrics. 3 89163. Bai, J. and Shi, S.(2011). Estimating high dimensional covariance matrices and its applications. Annals of Economics and Finance. 12 199215. Bickel, P. and Levina, E. (2004). Some theory for Fisher's linear discriminant function, naive Bayes", and some alternatives when there are many more variables than observations. Bernoulli. 10 9891010. Bickel, P. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 25772604. Cai, T. and Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 106, 672684. Cai, T. and Zhou, H. (2010). Optimal rates of convergence for sparse covariance matrix estimation. Manuscript. University of Pennsylvania. Chamberlain, G. and Rothschild, M. (1983). Arbitrage, factor structure and mean variance analysis in large asset markets. Econometrica. 51 13051324. Doz, C., Giannone, D. and Reichlin, L. (2006). A twostep estimator for large approximate dynamic factor models based on Kalman ltering. Manuscript. Universite de CergyPontoise. dAspremont, A., Bach, F. and El Ghaoui, L. (2008). Optimal solutions for sparse principal component analysis. Journal of Machine Learning Research, 9, 12691294. Davis, C. and Kahan, W. (1970). The rotation of eigenvectors by a perturbation III. SIAM Journal on Numerical Analysis, 7, 146. Efron, B. (2007). Correlation and largescale simultaneous signicance testing. Jour Ameri. Statist. Assoc., 102, 93103. Efron, B. (2010). Correlated zvalues and the accuracy of largescale statistical estimates. Jour Ameri. Statist. Assoc., 105, 10421055. Fama, E. and French, K. (1992). The crosssection of expected stock returns. Journal of Finance. 47 427465. Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. J. Econometrics. Fan, J., Han, X., and Gu, W.(2012). Control of the false discovery rate under arbitrary covariance dependence (with discussion). Journal of American Statistical Association, to appear. Fan, J., Liao, Y. and Mincheva, M. (2011). High dimensional covariance matrix estimation in approximate factor models. Ann. Statist. To appear. Fan, J., Zhang, J., and Yu, K. (2008). Asset Allocation and Risk Assessment with Gross Exposure Constraints for Vast Portfolios. Manuscript. Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2000). The generalized dynamic factor model: identication and estimation. Review of Economics and Statistics. 82 540554. Hallin, M. and Liska, R. (2007). Determining the number of factors in the general dynamic factor model. J. Amer. Statist. Assoc. 102, 603617. Harding, M. (2009). Structural estimation of highdimensional factor models. Manuscript. Stanford University. Hastie, T.J., Tibshirani, R. and Friedman, J. (2009). The elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed). Springer, New York. James, W. and Stein, C. (1961). Estimation with quadratic loss, in Proc. Fourth Berkeley Symp. Math. Statist. Probab. 1 361379. Univ. California Press. Berkeley. Johnstone, I.M. and Lu, A.Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. Jour. Ameri. Statist. Assoc., 104, 682693. Jung, S. and Marron, J.S. (2009). PCA consistency in high dimension, low sample size context. Ann. Statist., 37, 41044130. Kapetanios, G. (2010). A testing procedure for determining the number of factors in approximate factor models with large datasets. Journal of Business and Economic Statistics. 28, 397409. Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 42544278. Luo, X. (2011). High dimensional low rank and sparse covariance matrix estimation via convex minimization. Manuscript. Ma, Z. (2011). Sparse principal components analysis and iterative threshollding. Manuscript. Meinshausen, N. and B�uhlmann, P. (2006). High dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 34, 1436{1462. Merlevede, F., Peligrad, M. and Rio, E. (2009). A Bernstein type inequality and moderate deviations for weakly dependent sequences. Manuscript. Universite Paris Est. Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues. The Review of Economics and Statistics. 92, 10041016. Pesaran, M.H. (2006). Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica. 74, 9671012. Ross, S.A. (1976). The arbitrage theory of capital asset pricing. Journal of Economic Theory, 13, 341360. Rothman, A., Levina, E. and Zhu, J. (2009). Generalized thresholding of large covariance matrices. J. Amer. Statist. Assoc. 104 177186. Sentana, E. (2009). The econometrics of meanvariance eciency tests: a survey Econmetrics Jour., 12, C65C101. Shen, H. and Huang, J. (2008). Sparse principal component analysis via regularized low rank matrix approximation. J. Multivariate Analysis 99, 1015 1034. Leek, J.T. and Storey, J.D. (2008). A general framework for multiple testing dependence. Proc. Natl. Acad. Sci., 105, 1871819723. Sharpe, W. (1964). Capital asset prices: A theory of market equilibrium under conditons of risks. Journal of Finance, 19, 425442. Stock, J. and Watson, M. (2002). Forecasting using principal components from a large number of predictors. J. Amer. Statist. Assoc. 97, 11671179. Wang, P. (2010). Large dimensional factor models with a multilevel factor structure: identification, estimation and inference. Manuscript. Hong Kong University of Science and Technology. Witten, D.M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10, 515534. Xiong, H., Goulding, E.H., Carlson, E.J., Tecott, L.H., McCulloch, C.E. and Sen, S. (2011). A Flexible Estimating Equations Approach for Mapping FunctionValued Traits. Genetics, 189, 305316. Yap, J.S., Fan, J., and Wu, R. (2009). Nonparametric modeling of longitudinal covariance structure in functional mapping of quantitative trait loci. Biometrics, 65, 10681077. Zhang, Y. and El Ghoui, L. (2011) Largescale sparse principal component analysis with application to text data. NIPS. 
URI:  https://mpra.ub.unimuenchen.de/id/eprint/38697 