Chen, Song Xi and Li, Jun and Zhong, Pingshou
(2014):
*Two-Sample Tests for High Dimensional Means with Thresholding and Data Transformation.*

Preview |
PDF
MPRA_paper_59815.pdf Download (434kB) | Preview |

## Abstract

We study two tests for the equality of two population mean vectors under high dimensionality and column-wise dependence by thresholding. They are designed for better power performance when the mean vectors of two populations differ only in sparsely populated coordinates. The first test is constructed by carrying out thresholding to remove those no-signal bearing dimensions. The second test combines data transformation and thresholding by first transforming the data with the precision matrix followed by thresholding. The benefits of the threshodling and the data transformations are demonstrated in terms of reduced variance of the test statistics and the improved power of the tests. Numerical analyses and empirical study are performed to confirm the theoretical findings and to demonstrate the practical implementations.

Item Type: | MPRA Paper |
---|---|

Original Title: | Two-Sample Tests for High Dimensional Means with Thresholding and Data Transformation |

English Title: | Two-Sample Tests for High Dimensional Means with Thresholding and Data Transformation |

Language: | English |

Keywords: | Data Transformation; Large deviation; Large p small n; Sparse signals; Thresholding. |

Subjects: | C - Mathematical and Quantitative Methods > C0 - General C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General > C12 - Hypothesis Testing: General |

Item ID: | 59815 |

Depositing User: | Professor Song Xi Chen |

Date Deposited: | 11 Nov 2014 15:07 |

Last Modified: | 27 Sep 2019 15:05 |

References: | Anderson, T.W. (2003). An introduction to multivariate statistical analysis}. Third edition. Wiley-Interscience. Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: by an example of a two sample problem. Statistic Sinica, 6, 311-329. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57 289-300. Bickel, P. and Levina, E. (2008a). Regularized estimation of large covariance matrices. The Annals of Statistics, 36, 199-227. Bickel, P. and Levina, E. (2008b). Covariance regularization by thresholding. The Annals of Statistics, 36, 2577-2604. Cai, T., Liu, W. and Luo, X. (2011). A constrained $l_1$ minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106, 594-607. Cai, T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. Journal of the Royal Statistical Society: Series B. To appear. Cai, T., Zhang, C. and Zhou, H. (2012). Optimal rates of convergence for covariance matrix estimation. The Annals of Statistics, 38, 2118-2144. Chen, S. X. and Qin, Y. (2010). A two sample test for high dimensional data with applications to gene-set testing. \textit{The Annals of Statistics}, 38, 808-835. Delaigle, A., Hall, P. and Jin, J. (2011). Robustness and accuracy of methods for high dimensional data analysis based on Students t-statistic. Journal of the Royal Statistical Society: Series B, 73, 283-301. Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. The Annals of Statistics, 32, 962-994. Donoho, D. and Johnstone, I. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81, 425-455. EL Karoui, N. (2008). Operator norm consistent estimation of large dimensional sparse covariance matrices. The Annals of Statistics, 36, 2717-2756. Fan, J. (1996). Test of significance based on wavelet thresholding and Neyman's truncation. Journal of the American Statistical Association, 91, 674-688. Hall, P. and Jin, J. (2008). Properties of higher criticism under strong dependence. The Annals of Statistics, 36, 381-402. Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. The Annals of Statistics, 38, 1686-1732. Huang, J., Liu, N., Pourahmadi, M., and Liu, L. (2006). Covariance matrix selection and estimation via penalized normal likelihood. Biometrika, 93, 85-98. Ji, P. and Jin, J. (2012). UPS delivers optimal phase diagram in high-dimensional variable selection. The Annals of Statistics, 40, 73-103. Jing, B. Y., Shao, Q. M. and Zhou, W. (2008). Towards a universal self-normalized moderate deviation. Transactions of the American Mathematical Society, 360, 4263-4285. Kim, T. Y. (1994). Moment bounds for non-stationary dependent sequences. Journal of Applied Probability, 31, 731-742. Petrov, V. V. (1995). Limit theorems of probability theory: sequences of independent random variables. Clarendon Press, London. Shao, Q. M. (1997). Self-normalized large deviations. The Annals of Probability, 25, 285-328. Srivastava, M. (2009). A test for the mean vector with fewer observations than the dimension under non-normality. Journal of Multivariate Analysis, 100, 518-532. Wang, Q. and Hall, P. (2009). Relative errors in central limit theorems for Student's t statistic, with application. Statistical Sinica, 19, 343-354. Wu, W.B., and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika, 90, 831-844. Zhong, P., Chen, S. X. and Xu M. (2013). Tests alternative to higher criticism for high dimensional means under sparsity and column-wise dependence. The Annals of Probability, 41, 2820-2851. |

URI: | https://mpra.ub.uni-muenchen.de/id/eprint/59815 |