Li, Jun and Chen, Songxi (2012): Two Sample Tests for High Dimensional Covariance Matrices. Published in:
This is the latest version of this item.

PDF
MPRA_paper_46026.pdf Download (190kB)  Preview 
Abstract
We propose two tests for the equality of covariance matrices between two highdimensional populations. One test is on the whole variancecovariance matrices, and the other is on offdiagonal submatrices which define the covariance between two nonoverlapping segments of the highdimensional random vectors. The tests are applicable (i) when the data dimension is much larger than the sample sizes, namely the “large p, small n” situations and (ii) without assuming parametric distributions for the two populations. These two aspects surpass the capability of the conventional likelihood ratio test. The proposed tests can be used to test on covariances associated with gene ontology terms.
Item Type:  MPRA Paper 

Original Title:  Two Sample Tests for High Dimensional Covariance Matrices 
Language:  English 
Keywords:  High dimensional covariance; Large p small n; Likelihood ratio test; Testing for Genesets. 
Subjects:  C  Mathematical and Quantitative Methods > C0  General C  Mathematical and Quantitative Methods > C1  Econometric and Statistical Methods and Methodology: General C  Mathematical and Quantitative Methods > C2  Single Equation Models ; Single Variables C  Mathematical and Quantitative Methods > C3  Multiple or Simultaneous Equation Models ; Multiple Variables C  Mathematical and Quantitative Methods > C4  Econometric and Statistical Methods: Special Topics C  Mathematical and Quantitative Methods > C5  Econometric Modeling C  Mathematical and Quantitative Methods > C6  Mathematical Methods ; Programming Models ; Mathematical and Simulation Modeling C  Mathematical and Quantitative Methods > C7  Game Theory and Bargaining Theory C  Mathematical and Quantitative Methods > C8  Data Collection and Data Estimation Methodology ; Computer Programs C  Mathematical and Quantitative Methods > C9  Design of Experiments G  Financial Economics > G0  General G  Financial Economics > G1  General Financial Markets G  Financial Economics > G2  Financial Institutions and Services G  Financial Economics > G3  Corporate Finance and Governance 
Item ID:  46278 
Depositing User:  Professor Songxi Chen 
Date Deposited:  17 Apr 2013 10:06 
Last Modified:  28 Sep 2019 16:56 
References:  Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis. Wiley,Hoboken, NJ. Bai, Z. (1993). Convergence rate of expected spectral distributions of large random matrices.part II. sample covariance matrices. Ann. Probab. 21 649672. Bai, Z. and Yin, Y. Q. (1993). Limit of the smallest eigenvalue of large dimensional covariance matrix. Ann. Probab. 21 12751294. Bai, Z., Jiang, D., Yao, J. and Zheng, S. (2009).Corrections to LRT on largedimensional covariance matrix by RMT. Ann. Statist. 37 38223840. Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311329. Bai, Z. and Silverstein, J. (2010). Spectral Analysis of Large Dimensional Random Matrices. Springer, New York. Barry, W., Nobel, A., and Wright, F. (2005). Significance analysis of functional categories in gene expression studies: A structured permutation approach. Bioinformatics 21 19431949. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289300. Bickel, P. J. and Levina, E. (2008a). Regularized estimation of large covariance matrices.Ann. Statist. 36 199227. Bickel, P. J. and Levina, E. (2008b). Covariance regularization by thresholding. Ann.Statist. 36 25772604. Cai, T. and Jiang, T. (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Ann. Statist. 39 14961525. Cai, T., Liu, W.D. and Xia, Y. (2011). Twosample covariance matrix testing and support recovery. Technical Report. Chen, S. X. and Qin, Y.L. (2010). A two sample test for high dimensional data with applications to geneset testing. Ann. Statist. 38 808835. Chen, S. X., Zhang, L.X. and Zhong, P.S. (2010). Testing for high dimensional covariance matrices. J. Amer. Statist. Assoc. 109 810819. Chiaretti, S., Li, X. C., Gentleman, R., Vitale, A., Vignetti, M., Mandelli, F.,Ritz, J. and Foa, R. (2004). Gene expression profile of adult Tcell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 103 27712778. Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962994. Dudoit, S., Keles, S. and van der Laan, M. (2008). Multiple tests of association with biological annotation metadata. Inst. Math. Statist. Collections 2 153218. Dykstra, R.L. (1970). Establishing the positive definiteness of the sample covariance matrix. Ann. Math. Statist. 41 21532154. Efron, B. and Tibshirani, R. (2007). On testing the significance of sets of genes. Ann.Appl. Stat. 1 107129. El Karoui, N. (2007). Tracywidom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab. 35 663714. Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. J. Econometrics 147 186197. Fan, J., Hall, P. and Yao, Q. (2007). How many simultaneous hypothesis tests can normal, students t or bootstrap calibration be applied. J. Amer. Statist. Assoc. 102,12821288. Fan, J., Peng, H. and Huang, T. (2005). Semilinear highdimensional model for normalization of microarray data: a theoretical analysis and partial consistency. J. Amer. Statist. Assoc. 100 781796. Glasser, G. (1961). An unbiased estimator for powers of the arithmetric mean. J. R. Stat.Soc. Ser. B Stat. Methodol. 23 154159. Glasser, G. (1962). Estimators for the product of arithmetic means. J. R. Stat. Soc. Ser.B Stat. Methodol. 24 180184. Hall, P. and Jin, J. (2008). Properties of higher criticism under longrange dependence.Ann. Statist. 36 381402. Huang, J., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 8598. Huang, J., Wang, D. and Zhang, C. (2005). A twoway semilinear model for normalization and analysis of cDNA microarray data. J. Amer. Statist. Assoc. 100 814829. Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295327. Johnstone, I. M. and Lu, A. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682693. Lam, C. and Yao, Q. (2011). Factor modelling for highdimensional time series: A dimensionreduction approach. Technical Report. Lam, C., Yao, Q. and Bathia, N. (2011). Estimation of latent factors for highdimensional time series. Biometrika 98 901918. Lan, W., Luo, R., Tsai, C., Wang, H. and Yang, Y. (2010). Testing the diagonality of a large covariance matrix in a regression setting. Technical Report. Ledoit, O. and Wolf, M. (2002). Some hypothesis tests for the covariance matrix when the dimension is large compare to the sample size. Ann. Statist. 30 10811102. Ledoit, O. and Wolf, M. (2004). A well conditioned estimator for largedimensional covariance matrices. J. Multivariate Anal. 88 365411. Nettleton, D., Recknor, J. and Reecy, J. (2008). Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis.Bioinformatics 24 192201. Newton, M., Quintana, F., Den Boon, J., Sengupta, S. and Ahlquist, P. (2007).Randomset methods identify distinct aspects of the enrichment signal in geneset analysis. Ann. Appl. Stat. 1, 85106. Rothman, A., Levina, L. and Zhu, J. (2010). A new approach to Choleskybased covariance regularization in high dimensions. Biometrika 97 539550. Schott, J.R. (2007). A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Comput. Statist. Data Anal. 51 65356542. Shedden, K. and Taylor, J. (2004). Differential correlation detects complex associations between gene expression and clinical outcomes in lung adenocarcinomas. Methods of Microarray Data Analysis IV. Springer, New York. Tracy, C. and Widom, H. (1996). On orthogonal and symplectic matrix ensembles.Comm. Math. Phys. 177 727754. Van der Laan, M. and Bryan, J. (2001). Gene expression analysis with the parametric bootstrap. Biostatistics 2 445461. Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 93 831844. Zhang, C.H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in highdimensional linear regression. Ann. Statist. 36 15671594. 
URI:  https://mpra.ub.unimuenchen.de/id/eprint/46278 
Available Versions of this Item

Two Sample Tests for High Dimensional Covariance Matrices. (deposited 11 Apr 2013 07:28)
 Two Sample Tests for High Dimensional Covariance Matrices. (deposited 17 Apr 2013 10:06) [Currently Displayed]