Chen, Song Xi and Guo, Bin (2014): Tests for High Dimensional Generalized Linear Models.
Preview |
PDF
MPRA_paper_59816.pdf Download (421kB) | Preview |
Abstract
We consider testing regression coefficients in high dimensional generalized linear models. By modifying a test statistic proposed by Goeman et al. (2011) for large but fixed dimensional settings, we propose a new test which is applicable for diverging dimension and is robust for a wide range of link functions. The power properties of the tests are evaluated under the setting of the local and fixed alternatives. A test in the presence of nuisance parameters is also proposed. The proposed tests can provide p-values for testing significance of multiple gene-sets, whose usefulness is demonstrated in a case study on an acute lymphoblastic leukemia dataset.
Item Type: | MPRA Paper |
---|---|
Original Title: | Tests for High Dimensional Generalized Linear Models |
English Title: | Tests for High Dimensional Generalized Linear Models |
Language: | English |
Keywords: | Generalized Linear Model; Gene-Sets; High Dimensional Covariate; Nuisance Parameter; U-statistics. |
Subjects: | C - Mathematical and Quantitative Methods > C3 - Multiple or Simultaneous Equation Models ; Multiple Variables C - Mathematical and Quantitative Methods > C3 - Multiple or Simultaneous Equation Models ; Multiple Variables > C30 - General C - Mathematical and Quantitative Methods > C4 - Econometric and Statistical Methods: Special Topics C - Mathematical and Quantitative Methods > C5 - Econometric Modeling |
Item ID: | 59816 |
Depositing User: | Professor Song Xi Chen |
Date Deposited: | 11 Nov 2014 15:26 |
Last Modified: | 27 Sep 2019 03:16 |
References: | Auer, P. L. and Doerge, R. W. (2010). Statistical design and analysis of RNA sequencing data. Genetics, 185, 405-416. Bai, Z. D. and Saranadasa, H. (1996). Effect of high dimension: by an example of two sample problem. Statistica Sinica, 6, 311-329. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289-300. Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). {Valid post-selection inference. The Annals of Statistics, 41, 802-837. Chang, J., Tang, C. Y. and Wu, Y. (2013). {Marginal empirical likelihood and sure independence feature screening. The Annals of Statistics, 41, 2123-2148. Chen, S. X. and Cui, H. J. (2003). An extended empirical likelihood for generalized linear models. Statistica Sinica, 13, 69-81. Chen, S. X. and Guo, B. (2014). Tests for high dimensional generalized linear models.} Technical report, Guanghua School of Managment, Peking University. Chen, S. X., Peng, L. and Qin, Y. L. (2009). Effects of data dimension on empirical likelihood. Biometrika, 96, 711-722. Chen, S. X. and Qin, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics, 38, 808-835. Chen, S. X., Zhang, L. X. and Zhong, P. S. (2010). Tests for high-dimensional covariance matrices. Journal of the American Statistical Association, 105, 810-819. Chiaretti, S., Li, X., Gentleman, R., Vitale, A., Vignetti, M., Mandelli, F., Ritz, J. and Foa, R. (2004). Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood, 103, 2771-2778. Dudoit, S., Keles, S. and van der Laan, M. J. (2008) Multiple tests of association with biological annotation metadata. Institute of Mathematical Statistics. Collections, 2, 153-218. Efron, B. and Tibshirani, R. (2007). On testing the significance of sets of genes. The Annals of Applied Statistics, 1, 107-129. Fahrmeir, L. and Tutz, G. (1994). Multivariate statistical modelling based on generalized linear models (2nd edition). Springer, New York. Fan, J. and Song, R. (2010). Sure independent screening in generalized linear models with NP-dimensionality. The Annals of Statistics, 38, 3567-3604. Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B, 70, 849-911. Gentleman, R., Irizarry, R. A., Carey, V. J., Dudoit, S. and Huber, W. (2005). Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, New York. Goeman, J. J., Van De Geer, S. A. and Van Houwelingen, H. C. (2006). Testing against a high dimensional alternative. Journal of the Royal Statistical Society: Series B, 68, 477-493. Goeman, J. J., Van Houwelingen, H. C. and Finos, L. (2011). Testing against a high-dimensional alternative in the generalized linear model: asymptotic type I error control. Biometrika, 98, 381-390. Hall, P., and Heyde, C. C. (1980). Martingale limit theory and its application. Academic Press. Lan, W., Wang, H. and Tsai, C. L. (2014). Testing covariates in high-dimensional regression. Annals of Institute of Statistical Mathematics, to appear. Le Cessie, S. and Van Houwelingen, J. C. (1991). A goodness-of-fit test for binary regression models, based on smoothing methods. Biometrics, 47, 1267-1282. Lee, J. D., Sun, D. L., Sun, Y. and Taylor, J. E. (2014). Exact inference after model selection via the Lasso. arXiv:1311.6238}. Li, J. and Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices. The Annals of Statistics, 40, 908-940. Lockhart, R., Taylor, J., Tibshirani, J. R. and Tibshirani, R. (2014). A significance test for the lasso (with discussion). The Annals of Statistics, to appear. Lund, S., Nettleton, D., McCarthy, D. and Smyth, G. (2012). Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Statistical applications in genetics and molecular biology, 11, 8. McCullagh, P. (1983). Quasi-likelihood functions. The Annals of Statistics, 11, 59-67. McCullagh, P. and Nelder, J. A. (1989). Generalized Linear models (2nd edition). Chapman and Hall. Pan, W. (2009). Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genetic epidemiology, 33, 497-507. Rahmatallah, Y., Emmert-Streib, F. and Glazko, G. (2012). Gene set analysis for self-contained tests: complex null and specific alternative hypotheses. Bioinformatics, 28, 3073-3080. Seber G. A. (2008). A matrix handbook for statisticians}. Wiley, New York. Serfling, R. J. (1980). Approximate theorems of mathematical statistics. Wiley, New York. Taylor, J., Lockhart, R., Tibshirani, J. R. and Tibshirani, R. (2014). Post-selection adaptive inference for Least Angle Regression and the Lasso. arXiv:1401.3889. van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. The Annals of Statistics, 36, 614-645. van de Geer, S., B\"uhlmann, P., Ritov, Y. and Dezeure, R. (2013). On asymptotically optimal confidence regions and tests for high-dimensional models. arXiv:1303.0518. van der Vaart, A.W. (2000). Asymptotic statistics. Cambridge University Press. Voorman, A., Shojaie, A. and Witten, D. (2014). Inference in high dimensions with the penalized score test. arXiv:1401.2678. Wedderburn, R. W. (1974). Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika, 61, 439-447. Zhong, P. S. and Chen, S. X. (2011). Tests for high dimensional regression coefficients with factorial designs. Journal of the American Statistical Association, 106, 260-274. Zhang, C.-H. and Zhang, S. (2014). {Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76, 217-242. |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/59816 |