Novel Methods for Multivariate Ordinal Data applied to Genetic Diplotypes, Genomic Pathways, Risk Profiles, and Pattern Similarity

Wittkowski, Knut M. (2003): Novel Methods for Multivariate Ordinal Data applied to Genetic Diplotypes, Genomic Pathways, Risk Profiles, and Pattern Similarity. Published in: Computational Science and Statistics , Vol. 35, (2003): pp. 626-646.

Preview

PDF
MPRA_paper_4570.pdf
Download (315kB) | Preview

Abstract

Introduction: Conventional statistical methods for multivariate data (e.g., discriminant/regression) are based on the (generalized) linear model, i.e., the data are interpreted as points in a Euclidian space of independent dimensions. The dimensionality of the data is then reduced by assuming the components to be related by a specific function of known type (linear, exponential, etc.), which allows the distance of each point from a hyperspace to be determined. While mathematically elegant, these approaches may have shortcomings when applied to real world applications where the relative importance, the functional relationship, and the correlation among the variables tend to be unknown. Still, in many applications, each variable can be assumed to have at least an “orientation”, i.e., it can reasonably assumed that, if all other conditions are held constant, an increase in this variable is either “good” or “bad”. The direction of this orientation can be known or unknown. In genetics, for instance, having more “abnormal” alleles may increase the risk (or magnitude) of a disease phenotype. In genomics, the expression of several related genes may indicate disease activity. When screening for security risks, more indicators for atypical behavior may constitute raise more concern, in face or voice recognition, more indicators being similar may increase the likelihood of a person being identified.

Methods: In 1998, we developed a nonparametric method for analyzing multivariate ordinal data to assess the overall risk of HIV infection based on different types of behavior or the overall protective effect of barrier methods against HIV infection. By using u-statistics, rather than the marginal likelihood, we were able to increase the computational efficiency of this approach by several orders of magnitude.

Results: We applied this approach to assessing immunogenicity of a vaccination strategy in cancer patients. While discussing the pitfalls of the conventional methods for linking quantitative traits to haplotypes, we realized that this approach could be easily modified into to a statistically valid alternative to a previously proposed approaches. We have now begun to use the same methodology to correlate activity of anti-inflammatory drugs along genomic pathways with disease severity of psoriasis based on several clinical and histological characteristics.

Conclusion: Multivariate ordinal data are frequently observed to assess semiquantitative characteristics, such as risk profiles (genetic, genomic, or security) or similarity of pattern (faces, voices, behaviors). The conventional methods require empirical validation, because the functions and weights chosen cannot be justified on theoretical grounds. The proposed statistical method for analyzing profiles of ordinal variables, is intrinsically valid. Since no additional assumptions need to be made, the often time-consuming empirical validation can be skipped.

Item Type:	MPRA Paper
Institution:	The Rockefeller University
Original Title:	Novel Methods for Multivariate Ordinal Data applied to Genetic Diplotypes, Genomic Pathways, Risk Profiles, and Pattern Similarity
Language:	English
Keywords:	ranking; nonparametric; robust; scoring; multivariate
Subjects:	C - Mathematical and Quantitative Methods > C3 - Multiple or Simultaneous Equation Models ; Multiple Variables > C35 - Discrete Regression and Qualitative Choice Models ; Discrete Regressors ; Proportions C - Mathematical and Quantitative Methods > C4 - Econometric and Statistical Methods: Special Topics > C44 - Operations Research ; Statistical Decision Theory C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General > C14 - Semiparametric and Nonparametric Methods: General
Item ID:	4570
Depositing User:	Knut M. Wittkowski
Date Deposited:	22 Aug 2007
Last Modified:	28 Sep 2019 23:13
References:	1. Nussbaum R, Krueger JG. Treatment of inflammatory dermatoses with novel biologic agents: a primer. Adv Dermatol 2002; 18:45-89. 2. Cronbach LJ, Meehl PE. Construct validity in psychological tests. Psychological Bulletin 1955; 52:281-302. 3. Susser E, Desvarieux M, Wittkowski KM. Reporting sex-ual risk behavior for HIV: a practical risk index and a method for improving risk indices. American Journal of Public Health 1998; 88:671-674. 4. Wittkowski KM, Susser E, Dietz K. The protective effect of condoms and nonoxynol-9 against HIV infection. American Journal of Public Health 1998; 88:590-596, 972. 5. Banchereau J, Palucka AK, Dhodapkar M, et al. Immune and clinical responses after vaccination of patients with metastatic melanoma with CD34+ hematopoietic progenitor-derived dendritic cells. Cancer Research 2001; 61:6451-8. 6. Hoeffding W. A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics 1948; 19:293-325. 7. Gehan EA. A generalised two-sample Wilcoxon test for doubly censored samples. Biometrika 1965; 52:650-653. 8. Gehan EA. A generalised Wilcoxon test for comparing ar-bitrarily singly censored samples. Biometrika 1965; 52:203-223. 9. Schemper M. A nonparametric k-sample test for data defined by intervals. Statistica Neerlandica 1983; 37:69-71. 10. Finkelstein DM, Schoenfeld DA. Combining mortality and longitudinal measures in clinical trials. Statistics in Medicine 1999; 18:1341-54. 11. Moye LA, Davis BR, Hawkins CM. Analysis of a clinical trial involving a combined mortality and adherence de-pendent interval censored endpoint. Statistics in Medicine 1992; 11:1705-17. 12. Wilcoxon F. Individual Comparisons by Ranking Methods. Biometrics 1954; 1:80-83. 13. Kruskal WH, Wallis WA. Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association 1952; 47:583-631. 14. Friedman M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association 1937; 32:675-701. 15. Wittkowski KM. Friedman-type statistics and consistent multiple comparisons for unbalanced designs. Journal of the American Statistical Association 1988; 83:1163-1170. 16. Wittkowski KM. An extension to Wittkowski. Journal of the American Statistical Association 1992; 87:258. 17. Wittkowski KM. Versions of the sign test in the presence of ties. Biometrics 1998; 54:789–791. 18. Hubbell E, Liu W-M, Mei R. Robust estimators for expression analysis. Bioinformatics 2002; 18:1585-1592. 19. Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics 1947; 18:50-60. 20. Lee AJ. U-Statistics. New York, NY: Marcel Dekker, 1990. 21. McNemar Q. Note on the sampling error of the differences between correlated proportions or percentages. Psychometrica 1947; 12:153-157. 22. Wittkowski KM, Liu X. A statistically valid alternative to the TDT. Human Heredity 2002; 54:157-64. 23. Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: the insulin gene region and in-sulin-dependent diabetes mellitus (IDDM). American Journal of Human Genetics 1993; 52:506-16. 24. Wittkowski KM. Small sample properties of rank tests for incomplete unbalanced designs. Biometrical Journal 1988; 30:799-808. 25. Wittkowski KM. A structured visual language for a knowledge-based front-end to statistical analysis systems in biomedical research. Computer Methods and Programs in Biomedicine 1991; 35:59-67. 26. Krueger JG. The immunologic basis for the treatment of psoriasis with new biologic agents. J Am Acad Dermatol 2002; 46:1-23; quiz 23-6. 27. Gottlieb SL, Gilleaudeau P, Johnson R, et al. Response of psoriasis to a lymphocyte-selective toxin (DAB389IL-2) suggests a primary immune, but not keratinocyte, patho-genic basis. Nature Medicine 1995; 1:442-7. 28. Oestreicher JL, Walters IB, Kikuchi T, et al. Molecular classification of psoriasis disease-associated genes through pharmacogenomic expression profiling. Pharmacogenomics J 2001; 1:272-87. 29. Trepicchio WL, Ozawa M, Walters IB, et al. Interleukin-11 therapy selectively downregulates type I cytokine proinflammatory pathways in psoriasis lesions. Journal of Clinical Investigation 1999; 104:1527-37. 30. Gottlieb AB, Krueger JG, Wittkowski K, Dedrick R, Walicke PA, Garovoy M. Psoriasis as a Model for T-Cell-Mediated Disease: Immunobiologic and Clinical Effects of Treatment With Multiple Doses of Efalizumab, an Anti-CD11a Antibody. Archives of Dermatology 2002; 138:591-600. 31. Li K-C, Aragon Y, Shedden K, Thomas Agnan C. Dimension reduction for multivariate response data. Journal of the American Statistical Association 2003; 98:99-109. 32. Finkelstein DM, Goggins WB, Schoenfeld DA. Analysis of failure time data with dependent interval censoring. Biometrics 2002; 58:298-304. 33. Breiman L. Classification and regression trees. Belmont, CA: Wadsworth, 1984. 34. DiRienzo AG, DeGruttola V. Design and analysis of clinical trials with a bivariate failure time endpoint, with application to AIDS Clinical Trials Group Study A5142. Con-trolled Clinical Trials 2003; 24:122-134. 35. Puri ML, Sen PK. Nonparametric methods in multivariate analysis. New York: Wiley, 1971. 36. King TP, Jim SY, Wittkowski KM. Inflammatory role of two venom components of yellow jackets (Vespula vulgaris): a mast cell degranulating peptide mastoparan and phospholipase A1. International Archives of Allergy and Immunology 2003; 131:25-32. 37. Buchanan BG, Lederberg J. The Heuristic DENDRAL Program for Explaining Empirical Data, IFIP Congress 71, Ljubljana, Yugoslavia, 1971. Vol. 1. North-Holland. 38. Lindsay RK, Buchanan BG, Feigenbaum EA, Lederberg J. DENDRAL: A Case Study of the First Expert System for Scientific Hypothesis Formation. Artificial Intelligence 1993; 61:209-261. 39. Elliman AD, Wittkowski KM. The impact of expert systems on statistical database management. Statistical Software Newsletter 1987; 13:14-27. 40. Wittkowski KM. An expert system approach for generating and testing statistical hypotheses. In: Phelps B, ed. Interactions in artificial intelligence and statistical methods. Aldershot, GB: Unicom, 1987:45-59.
URI:	https://mpra.ub.uni-muenchen.de/id/eprint/4570

All papers reproduced by permission. Reproduction and distribution subject to the approval of the copyright owners.

View Item