Chen, Song Xi and Guo, Bin (2014): Tests for High Dimensional Generalized Linear Models.

PDF
MPRA_paper_59816.pdf Download (421kB)  Preview 
Abstract
We consider testing regression coefficients in high dimensional generalized linear models. By modifying a test statistic proposed by Goeman et al. (2011) for large but fixed dimensional settings, we propose a new test which is applicable for diverging dimension and is robust for a wide range of link functions. The power properties of the tests are evaluated under the setting of the local and fixed alternatives. A test in the presence of nuisance parameters is also proposed. The proposed tests can provide pvalues for testing significance of multiple genesets, whose usefulness is demonstrated in a case study on an acute lymphoblastic leukemia dataset.
Item Type:  MPRA Paper 

Original Title:  Tests for High Dimensional Generalized Linear Models 
English Title:  Tests for High Dimensional Generalized Linear Models 
Language:  English 
Keywords:  Generalized Linear Model; GeneSets; High Dimensional Covariate; Nuisance Parameter; Ustatistics. 
Subjects:  C  Mathematical and Quantitative Methods > C3  Multiple or Simultaneous Equation Models ; Multiple Variables C  Mathematical and Quantitative Methods > C3  Multiple or Simultaneous Equation Models ; Multiple Variables > C30  General C  Mathematical and Quantitative Methods > C4  Econometric and Statistical Methods: Special Topics C  Mathematical and Quantitative Methods > C5  Econometric Modeling 
Item ID:  59816 
Depositing User:  Professor Song Xi Chen 
Date Deposited:  11. Nov 2014 15:26 
Last Modified:  11. Nov 2014 15:52 
References:  Auer, P. L. and Doerge, R. W. (2010). Statistical design and analysis of RNA sequencing data. Genetics, 185, 405416. Bai, Z. D. and Saranadasa, H. (1996). Effect of high dimension: by an example of two sample problem. Statistica Sinica, 6, 311329. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289300. Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). {Valid postselection inference. The Annals of Statistics, 41, 802837. Chang, J., Tang, C. Y. and Wu, Y. (2013). {Marginal empirical likelihood and sure independence feature screening. The Annals of Statistics, 41, 21232148. Chen, S. X. and Cui, H. J. (2003). An extended empirical likelihood for generalized linear models. Statistica Sinica, 13, 6981. Chen, S. X. and Guo, B. (2014). Tests for high dimensional generalized linear models.} Technical report, Guanghua School of Managment, Peking University. Chen, S. X., Peng, L. and Qin, Y. L. (2009). Effects of data dimension on empirical likelihood. Biometrika, 96, 711722. Chen, S. X. and Qin, Y. L. (2010). A twosample test for highdimensional data with applications to geneset testing. The Annals of Statistics, 38, 808835. Chen, S. X., Zhang, L. X. and Zhong, P. S. (2010). Tests for highdimensional covariance matrices. Journal of the American Statistical Association, 105, 810819. Chiaretti, S., Li, X., Gentleman, R., Vitale, A., Vignetti, M., Mandelli, F., Ritz, J. and Foa, R. (2004). Gene expression profile of adult Tcell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood, 103, 27712778. Dudoit, S., Keles, S. and van der Laan, M. J. (2008) Multiple tests of association with biological annotation metadata. Institute of Mathematical Statistics. Collections, 2, 153218. Efron, B. and Tibshirani, R. (2007). On testing the significance of sets of genes. The Annals of Applied Statistics, 1, 107129. Fahrmeir, L. and Tutz, G. (1994). Multivariate statistical modelling based on generalized linear models (2nd edition). Springer, New York. Fan, J. and Song, R. (2010). Sure independent screening in generalized linear models with NPdimensionality. The Annals of Statistics, 38, 35673604. Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B, 70, 849911. Gentleman, R., Irizarry, R. A., Carey, V. J., Dudoit, S. and Huber, W. (2005). Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, New York. Goeman, J. J., Van De Geer, S. A. and Van Houwelingen, H. C. (2006). Testing against a high dimensional alternative. Journal of the Royal Statistical Society: Series B, 68, 477493. Goeman, J. J., Van Houwelingen, H. C. and Finos, L. (2011). Testing against a highdimensional alternative in the generalized linear model: asymptotic type I error control. Biometrika, 98, 381390. Hall, P., and Heyde, C. C. (1980). Martingale limit theory and its application. Academic Press. Lan, W., Wang, H. and Tsai, C. L. (2014). Testing covariates in highdimensional regression. Annals of Institute of Statistical Mathematics, to appear. Le Cessie, S. and Van Houwelingen, J. C. (1991). A goodnessoffit test for binary regression models, based on smoothing methods. Biometrics, 47, 12671282. Lee, J. D., Sun, D. L., Sun, Y. and Taylor, J. E. (2014). Exact inference after model selection via the Lasso. arXiv:1311.6238}. Li, J. and Chen, S. X. (2012). Two sample tests for highdimensional covariance matrices. The Annals of Statistics, 40, 908940. Lockhart, R., Taylor, J., Tibshirani, J. R. and Tibshirani, R. (2014). A significance test for the lasso (with discussion). The Annals of Statistics, to appear. Lund, S., Nettleton, D., McCarthy, D. and Smyth, G. (2012). Detecting differential expression in RNAsequence data using quasilikelihood with shrunken dispersion estimates. Statistical applications in genetics and molecular biology, 11, 8. McCullagh, P. (1983). Quasilikelihood functions. The Annals of Statistics, 11, 5967. McCullagh, P. and Nelder, J. A. (1989). Generalized Linear models (2nd edition). Chapman and Hall. Pan, W. (2009). Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genetic epidemiology, 33, 497507. Rahmatallah, Y., EmmertStreib, F. and Glazko, G. (2012). Gene set analysis for selfcontained tests: complex null and specific alternative hypotheses. Bioinformatics, 28, 30733080. Seber G. A. (2008). A matrix handbook for statisticians}. Wiley, New York. Serfling, R. J. (1980). Approximate theorems of mathematical statistics. Wiley, New York. Taylor, J., Lockhart, R., Tibshirani, J. R. and Tibshirani, R. (2014). Postselection adaptive inference for Least Angle Regression and the Lasso. arXiv:1401.3889. van de Geer, S. A. (2008). Highdimensional generalized linear models and the lasso. The Annals of Statistics, 36, 614645. van de Geer, S., B\"uhlmann, P., Ritov, Y. and Dezeure, R. (2013). On asymptotically optimal confidence regions and tests for highdimensional models. arXiv:1303.0518. van der Vaart, A.W. (2000). Asymptotic statistics. Cambridge University Press. Voorman, A., Shojaie, A. and Witten, D. (2014). Inference in high dimensions with the penalized score test. arXiv:1401.2678. Wedderburn, R. W. (1974). Quasilikelihood functions, generalized linear models, and the GaussNewton method. Biometrika, 61, 439447. Zhong, P. S. and Chen, S. X. (2011). Tests for high dimensional regression coefficients with factorial designs. Journal of the American Statistical Association, 106, 260274. Zhang, C.H. and Zhang, S. (2014). {Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76, 217242. 
URI:  https://mpra.ub.unimuenchen.de/id/eprint/59816 