Chen, Song Xi and Li, Jun and Zhong, Pingshou (2014): TwoSample Tests for High Dimensional Means with Thresholding and Data Transformation.

PDF
MPRA_paper_59815.pdf Download (434kB)  Preview 
Abstract
We study two tests for the equality of two population mean vectors under high dimensionality and columnwise dependence by thresholding. They are designed for better power performance when the mean vectors of two populations differ only in sparsely populated coordinates. The first test is constructed by carrying out thresholding to remove those nosignal bearing dimensions. The second test combines data transformation and thresholding by first transforming the data with the precision matrix followed by thresholding. The benefits of the threshodling and the data transformations are demonstrated in terms of reduced variance of the test statistics and the improved power of the tests. Numerical analyses and empirical study are performed to confirm the theoretical findings and to demonstrate the practical implementations.
Item Type:  MPRA Paper 

Original Title:  TwoSample Tests for High Dimensional Means with Thresholding and Data Transformation 
English Title:  TwoSample Tests for High Dimensional Means with Thresholding and Data Transformation 
Language:  English 
Keywords:  Data Transformation; Large deviation; Large p small n; Sparse signals; Thresholding. 
Subjects:  C  Mathematical and Quantitative Methods > C0  General C  Mathematical and Quantitative Methods > C1  Econometric and Statistical Methods and Methodology: General C  Mathematical and Quantitative Methods > C1  Econometric and Statistical Methods and Methodology: General > C12  Hypothesis Testing: General 
Item ID:  59815 
Depositing User:  Professor Song Xi Chen 
Date Deposited:  11 Nov 2014 15:07 
Last Modified:  27 Sep 2019 15:05 
References:  Anderson, T.W. (2003). An introduction to multivariate statistical analysis}. Third edition. WileyInterscience. Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: by an example of a two sample problem. Statistic Sinica, 6, 311329. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57 289300. Bickel, P. and Levina, E. (2008a). Regularized estimation of large covariance matrices. The Annals of Statistics, 36, 199227. Bickel, P. and Levina, E. (2008b). Covariance regularization by thresholding. The Annals of Statistics, 36, 25772604. Cai, T., Liu, W. and Luo, X. (2011). A constrained $l_1$ minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106, 594607. Cai, T., Liu, W. and Xia, Y. (2014). Twosample test of high dimensional means under dependence. Journal of the Royal Statistical Society: Series B. To appear. Cai, T., Zhang, C. and Zhou, H. (2012). Optimal rates of convergence for covariance matrix estimation. The Annals of Statistics, 38, 21182144. Chen, S. X. and Qin, Y. (2010). A two sample test for high dimensional data with applications to geneset testing. \textit{The Annals of Statistics}, 38, 808835. Delaigle, A., Hall, P. and Jin, J. (2011). Robustness and accuracy of methods for high dimensional data analysis based on Students tstatistic. Journal of the Royal Statistical Society: Series B, 73, 283301. Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. The Annals of Statistics, 32, 962994. Donoho, D. and Johnstone, I. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81, 425455. EL Karoui, N. (2008). Operator norm consistent estimation of large dimensional sparse covariance matrices. The Annals of Statistics, 36, 27172756. Fan, J. (1996). Test of significance based on wavelet thresholding and Neyman's truncation. Journal of the American Statistical Association, 91, 674688. Hall, P. and Jin, J. (2008). Properties of higher criticism under strong dependence. The Annals of Statistics, 36, 381402. Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. The Annals of Statistics, 38, 16861732. Huang, J., Liu, N., Pourahmadi, M., and Liu, L. (2006). Covariance matrix selection and estimation via penalized normal likelihood. Biometrika, 93, 8598. Ji, P. and Jin, J. (2012). UPS delivers optimal phase diagram in highdimensional variable selection. The Annals of Statistics, 40, 73103. Jing, B. Y., Shao, Q. M. and Zhou, W. (2008). Towards a universal selfnormalized moderate deviation. Transactions of the American Mathematical Society, 360, 42634285. Kim, T. Y. (1994). Moment bounds for nonstationary dependent sequences. Journal of Applied Probability, 31, 731742. Petrov, V. V. (1995). Limit theorems of probability theory: sequences of independent random variables. Clarendon Press, London. Shao, Q. M. (1997). Selfnormalized large deviations. The Annals of Probability, 25, 285328. Srivastava, M. (2009). A test for the mean vector with fewer observations than the dimension under nonnormality. Journal of Multivariate Analysis, 100, 518532. Wang, Q. and Hall, P. (2009). Relative errors in central limit theorems for Student's t statistic, with application. Statistical Sinica, 19, 343354. Wu, W.B., and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika, 90, 831844. Zhong, P., Chen, S. X. and Xu M. (2013). Tests alternative to higher criticism for high dimensional means under sparsity and columnwise dependence. The Annals of Probability, 41, 28202851. 
URI:  https://mpra.ub.unimuenchen.de/id/eprint/59815 