Colignatus, Thomas (2007): Correlation and regression in contingency tables. A measure of association or correlation in nominal data (contingency tables), using determinants.
This is the latest version of this item.
Preview |
PDF
MPRA_paper_3660.pdf Download (816kB) | Preview |
Abstract
Nominal data in contingency tables currently lack a correlation coefficient, such as has already been defined for real data. A measure can be designed using the determinant, with the useful interpretation that the determinant gives the ratio between volumes. A contingency table by itself gives all connections between the variables. Required operations are only normalization and aggregation by means of that determinant, so that, in fact, a contingency table is its own correlation matrix. The idea for the normalization is that the conditional probabilities given the row and column sums can also be seen as regression coefficients that hence depend upon correlations. With M a m × n contingency table and n ≤ m the suggested measure is r = Sqrt[det[A'A]] with A = Normalized[M]. The sign can be recovered from a generalization of the determinant to non-square matrices. With M an n1 × n2 × ... × nk contingency matrix, we can construct a matrix of pairwise correlations R. A matrix of such pairwise correlations is called an association matrix. If that matrix is also positive semi-definite (PSD) then it is a proper correlation matrix. The overall correlation then is R = f[R] where f can be chosen to impose PSD-ness. An option is to use f[R] = Sqrt[1 - det[R]]. However, for both nominal and cardinal data the advisable choice is to take the maximal multiple correlation within R. The resulting measure of “nominal correlation” measures the distance between a main diagonal and the off-diagonal elements, and thus is a measure of strong correlation. Cramer’s V measure for pairwise correlation can be generalized in this manner too. It measures the distance between all diagonals (including cross-diagaonals and subdiagonals) and statistical independence, and thus is a measure of weaker correlation. Finally, when also variances are defined then regression coefficients can be determined from the variance-covariance matrix.
Item Type: | MPRA Paper |
---|---|
Institution: | Thomas Cool Consultancy & Econometrics |
Original Title: | Correlation and regression in contingency tables. A measure of association or correlation in nominal data (contingency tables), using determinants |
Language: | English |
Keywords: | association; correlation; contingency table; volume ratio; determinant; nonparametric methods; nominal data; nominal scale; categorical data; Fisher’s exact test; odds ratio; tetrachoric correlation coefficient; phi; Cramer’s V; Pearson; contingency coefficient; uncertainty coefficient; Theil’s U; eta; meta-analysis; Simpson’s paradox; causality; statistical independence; regression |
Subjects: | C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General > C10 - General |
Item ID: | 3660 |
Depositing User: | Thomas Colignatus |
Date Deposited: | 21 Jun 2007 |
Last Modified: | 27 Sep 2019 13:26 |
References: | Colignatus is the name of Thomas Cool in science. Becker, L.A. (1999), “Measures of Effect Size (Strength of Association)”, http://web.uccs.edu/lbecker/SPSS/glm_effectsize.htm, Retrieved from source Cool, Th. (1999, 2001), “The Economics Pack, Applications for Mathematica”, http://www.dataweb.nl/~cool, ISBN 90-804774-1-9, JEL-99-0820 Colignatus, Th. (2006), “On the sample distribution of the adjusted coefficient of determination (R2Adj) in OLS”, http://library.wolfram.com/infocenter/MathSource/6269/ Colignatus, Th. (2007a), “A logic of exceptions”, http://www.dataweb.nl/~cool, ISBN 978-90-804774-4-5 Colignatus, Th. (2007b), “Voting theory for democracy”, 2nd edition, http://www.dataweb.nl/~cool, ISBN 978-90-804774-5-2 Colignatus, Th. (2007c), “A measure of association (correlation) in nominal data (contingency tables), using determinants”, a earlier version of this paper (3rd publishable draft), http://ideas.repec.org/p/pra/mprapa/2662.html Colignatus, Th. (2007d), “Correlation in contingency tables. A measure of association or correlation in nominal data (contingency tables), using determinants”, this paper, to be put at MPRA as well, as the improved version of Colignatus, Th. (2007c), but useful to mention in this list of references if only an abridged version of this paper is eventually published, http://mpra.ub.uni-muenchen.de/3394/ Colignatus, Th. (2007e), “Elementary statistics and causality”, work in progress, http://www.dataweb.nl/~cool Colignatus, Th. (2007f), “The 2 × 2 × 2 case in causality, of an effect, a cause and a confounder”, http://mpra.ub.uni-muenchen.de/3614/, Retrieved from source Colignatus, Th, (2007g), “A comparison of nominal regression and logistic regression for contingency tables, including the 2 × 2 × 2 case in causality”, http://mpra.ub.uni-muenchen.de/3615/, Retrieved from source Friendly, M. (2007), “Categorical Data Analysis with Graphics”, Retrieved from http://www.math.yorku.ca/SCS/Courses/grcat/grc6.html (citing the data from Koch & Stokes (1991)) Garson, D. (2007), “Nominal Association: Phi, Contingency Coefficient, Tschuprow's T, Cramer's V, Lambda, Uncertainty Coefficient”, http://www2.chass.ncsu.edu/garson/pa765/assocnominal.htm, Retrieved from source Higham, N. J. (1989), “Matrix nearness problems and applications”. In M. J. C. Gover and S. Barnett (eds), “Applications of Matrix Theory”, pages 1–27. Oxford University Press Johnston J. (1972), “Econometric methods”, McGraw-Hill Kleinbaum, D.G., K.M. Sullivan and N.D. Barker (2003), “ActivEpi Companion texbook”, Springer Linacre J. M, (2005), “Correlation Coefficients: Describing Relationships”, Rasch Measurement Transactions, 19:3 p. 1028-9, retrieved from http://www.rasch.org/rmt/rmt193c.htm (citing the data from Uebersax (2000)) Losh, S.C. (2004), “Guide 5: Bivariate Associations and Correlation Coefficient Properties”, http://edf5400-01.fa04.fsu.edu/Guide5.html, Retrieved from Source Losh, S.C. (2004a), “Guide 6: Multivariate Crosstabulations and Causal Issues”, http://edf5400-01.fa04.fsu.edu/Guide6.html, Retrieved from Source Mood, A.M. and F.A. Graybill (1963), “Introduction to the theory of statistics”, McGraw-Hill Pearl, J. (2000), “Causality. Models, reasoning and inference”, Cambridge Simon, R. (2007), “Lecture Notes and Exercises 2006/07”, http://www.maths.lse.ac.uk/Courses/MA201/, Retrieved from source Takayama A. (1974), “Mathematical economics”, The Dryden Press Theil H. (1971), “Principles of econometrics”, North-Holland UCLA ATS (2007), “SAS Textbook Examples. Econometric Analysis, Fourth Edition by Greene. Chapter 16: Simultaneous Equations Models”, http://www.ats.ucla.edu/stat/SAS/examples/greene/chapter16.htm, Retrieved from source (Other) websites http://en.wikipedia.org/wiki/Contingency_table http://post.queensu.ca:8080/SASDoc/getDoc/en/procstat.hlp/corr_sect26.htm http://en.wikipedia.org/wiki/Fisher%27s_exact_test |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/3660 |
Available Versions of this Item
-
Correlation and regression in contingency tables. A measure of association or correlation in nominal data (contingency tables), using determinants. (deposited 05 Jun 2007)
- Correlation and regression in contingency tables. A measure of association or correlation in nominal data (contingency tables), using determinants. (deposited 21 Jun 2007) [Currently Displayed]