Chen, Songxi (2012): Two Sample Tests for High Dimensional Covariance Matrices. Published in:
Preview |
PDF
MPRA_paper_46026.pdf Download (190kB) | Preview |
Abstract
We propose two tests for the equality of covariance matrices between two high-dimensional populations. One test is on the whole variance-covariance matrices, and the other is on offdiagonal sub-matrices which define the covariance between two non-overlapping segments of the high-dimensional random vectors. The tests are applicable (i) when the data dimension is much larger than the sample sizes, namely the “large p, small n” situations and (ii) without assuming parametric distributions for the two populations. These two aspects surpass the capability of the conventional likelihood ratio test. The proposed tests can be used to test on covariances associated with gene ontology terms.
Item Type: | MPRA Paper |
---|---|
Original Title: | Two Sample Tests for High Dimensional Covariance Matrices |
Language: | English |
Keywords: | High dimensional covariance; Large p small n; Likelihood ratio test; Testing for Gene-sets. |
Subjects: | C - Mathematical and Quantitative Methods > C0 - General C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General C - Mathematical and Quantitative Methods > C2 - Single Equation Models ; Single Variables C - Mathematical and Quantitative Methods > C3 - Multiple or Simultaneous Equation Models ; Multiple Variables C - Mathematical and Quantitative Methods > C4 - Econometric and Statistical Methods: Special Topics C - Mathematical and Quantitative Methods > C5 - Econometric Modeling C - Mathematical and Quantitative Methods > C6 - Mathematical Methods ; Programming Models ; Mathematical and Simulation Modeling C - Mathematical and Quantitative Methods > C7 - Game Theory and Bargaining Theory C - Mathematical and Quantitative Methods > C8 - Data Collection and Data Estimation Methodology ; Computer Programs C - Mathematical and Quantitative Methods > C9 - Design of Experiments G - Financial Economics > G0 - General G - Financial Economics > G1 - General Financial Markets G - Financial Economics > G2 - Financial Institutions and Services G - Financial Economics > G3 - Corporate Finance and Governance |
Item ID: | 46026 |
Depositing User: | Professor Songxi Chen |
Date Deposited: | 11 Apr 2013 07:28 |
Last Modified: | 01 Oct 2019 12:11 |
References: | Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis. Wiley,Hoboken, NJ. Bai, Z. (1993). Convergence rate of expected spectral distributions of large random matrices.part II. sample covariance matrices. Ann. Probab. 21 649-672. Bai, Z. and Yin, Y. Q. (1993). Limit of the smallest eigenvalue of large dimensional covariance matrix. Ann. Probab. 21 1275-1294. Bai, Z., Jiang, D., Yao, J. and Zheng, S. (2009).Corrections to LRT on largedimensional covariance matrix by RMT. Ann. Statist. 37 3822-3840. Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311-329. Bai, Z. and Silverstein, J. (2010). Spectral Analysis of Large Dimensional Random Matrices. Springer, New York. Barry, W., Nobel, A., and Wright, F. (2005). Significance analysis of functional categories in gene expression studies: A structured permutation approach. Bioinformatics 21 1943-1949. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. Bickel, P. J. and Levina, E. (2008a). Regularized estimation of large covariance matrices.Ann. Statist. 36 199-227. Bickel, P. J. and Levina, E. (2008b). Covariance regularization by thresholding. Ann.Statist. 36 2577-2604. Cai, T. and Jiang, T. (2011). Limiting laws of coherence of random matrices with appli-cations to testing covariance structure and construction of compressed sensing matrices. Ann. Statist. 39 1496-1525. Cai, T., Liu, W.-D. and Xia, Y. (2011). Two-sample covariance matrix testing and support recovery. Technical Report. Chen, S. X. and Qin, Y.-L. (2010). A two sample test for high dimensional data with applications to gene-set testing. Ann. Statist. 38 808-835. Chen, S. X., Zhang, L.-X. and Zhong, P.-S. (2010). Testing for high dimensional covariance matrices. J. Amer. Statist. Assoc. 109 810-819. Chiaretti, S., Li, X. C., Gentleman, R., Vitale, A., Vignetti, M., Mandelli, F.,Ritz, J. and Foa, R. (2004). Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 103 2771-2778. Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962-994. Dudoit, S., Keles, S. and van der Laan, M. (2008). Multiple tests of association with biological annotation metadata. Inst. Math. Statist. Collections 2 153-218. Dykstra, R.L. (1970). Establishing the positive definiteness of the sample covariance matrix. Ann. Math. Statist. 41 2153-2154. Efron, B. and Tibshirani, R. (2007). On testing the significance of sets of genes. Ann.Appl. Stat. 1 107-129. El Karoui, N. (2007). Tracy-widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab. 35 663-714. Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. J. Econometrics 147 186-197. Fan, J., Hall, P. and Yao, Q. (2007). How many simultaneous hypothesis tests can normal, students t or bootstrap calibration be applied. J. Amer. Statist. Assoc. 102,1282-1288. Fan, J., Peng, H. and Huang, T. (2005). Semilinear high-dimensional model for nor-malization of microarray data: a theoretical analysis and partial consistency. J. Amer. Statist. Assoc. 100 781-796. Glasser, G. (1961). An unbiased estimator for powers of the arithmetric mean. J. R. Stat.Soc. Ser. B Stat. Methodol. 23 154-159. Glasser, G. (1962). Estimators for the product of arithmetic means. J. R. Stat. Soc. Ser.B Stat. Methodol. 24 180-184. Hall, P. and Jin, J. (2008). Properties of higher criticism under long-range dependence.Ann. Statist. 36 381-402. Huang, J., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85-98. Huang, J., Wang, D. and Zhang, C. (2005). A two-way semilinear model for normalization and analysis of cDNA microarray data. J. Amer. Statist. Assoc. 100 814-829. Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295-327. Johnstone, I. M. and Lu, A. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682-693. Lam, C. and Yao, Q. (2011). Factor modelling for high-dimensional time series: A dimension-reduction approach. Technical Report. Lam, C., Yao, Q. and Bathia, N. (2011). Estimation of latent factors for highdimensional time series. Biometrika 98 901-918. Lan, W., Luo, R., Tsai, C., Wang, H. and Yang, Y. (2010). Testing the diagonality of a large covariance matrix in a regression setting. Technical Report. Ledoit, O. and Wolf, M. (2002). Some hypothesis tests for the covariance matrix when the dimension is large compare to the sample size. Ann. Statist. 30 1081-1102. Ledoit, O. and Wolf, M. (2004). A well conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88 365-411. Nettleton, D., Recknor, J. and Reecy, J. (2008). Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis.Bioinformatics 24 192-201. Newton, M., Quintana, F., Den Boon, J., Sengupta, S. and Ahlquist, P. (2007).Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis. Ann. Appl. Stat. 1, 85-106. Rothman, A., Levina, L. and Zhu, J. (2010). A new approach to Cholesky-based covariance regularization in high dimensions. Biometrika 97 539-550. Schott, J.R. (2007). A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Comput. Statist. Data Anal. 51 6535-6542. Shedden, K. and Taylor, J. (2004). Differential correlation detects complex associations between gene expression and clinical outcomes in lung adenocarcinomas. Methods of Microarray Data Analysis IV. Springer, New York. Tracy, C. and Widom, H. (1996). On orthogonal and symplectic matrix ensembles.Comm. Math. Phys. 177 727-754. Van der Laan, M. and Bryan, J. (2001). Gene expression analysis with the parametric bootstrap. Biostatistics 2 445-461. Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 93 831-844. Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567-1594. |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/46026 |
Available Versions of this Item
- Two Sample Tests for High Dimensional Covariance Matrices. (deposited 11 Apr 2013 07:28) [Currently Displayed]