Munich Personal RePEc Archive
Login | Create Account

Correlation and regression in contingency tables. A measure of association or correlation in nominal data (contingency tables), using determinants

Colignatus, Thomas (2007): Correlation and regression in contingency tables. A measure of association or correlation in nominal data (contingency tables), using determinants. Unpublished.

WarningThere is a more recent version of this item available.

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
876Kb

Abstract

Nominal data currently lack a correlation coefficient, such as has already defined for real data. A measure is possible using the determinant, with the useful interpretation that the determinant gives the ratio between volumes. With M a m × n contingency table and n ≤ m the suggested measure is r = Sqrt[det[A'A]] with A = Normalized[M]. With M an n1 × n2 × ... × nk contingency matrix, we can construct a matrix of pairwise correlations R. A matrix of such pairwise correlations is called an association matrix. If that matrix is also positive semi-definite (PSD) then it is a proper correlation matrix. The overall correlation then is R = f[R] where f can be chosen to impose PSD-ness. An option is to use f[R] = Sqrt[1 - det[R]]. However, for both nominal and cardinal data the advisable choice is to take the maximal multiple correlation within R. The resulting measure of “nominal correlation” measures the distance between a main diagonal and the off-diagonal elements, and thus is a measure of strong correlation. Cramer’s V measure for pairwise correlation can be generalized in this manner too. It measures the distance between all diagonals (including cross-diagaonals and subdiagonals) and statistical independence, and thus is a measure of weaker correlation. Finally, when also variances are defined then regression coefficients can be determined from the variance-covariance matrix. The volume ratio measure can be related to the regression coefficients, not of the variables, but of the categories in the contingency matrix, using the conditional probabilities given the row and column sums.

Item Type:MPRA Paper
Additional Information:The March 27 version corrects a formula and introduces the term NominalCorrelation. The April 10 version gives the correct f[R] = Sqrt[1 - det[R]] and explains this measure also for real data. But advisable is the maximal multiple correlation. The May 1 version resolves the issue of positive semi-definiteness and extends the approach to Cramer's V. The May 14 version adds regression. The June 5 version corrects the variance and links up the volume ratio interpretation with the conditional probabilities within the contingency table.
Institution:Thomas Cool Consultancy & Econometrics
Language:English
Keywords:association; correlation; contingency table; volume ratio; determinant; nonparametric methods; nominal data; nominal scale; categorical data; Fisher’s exact test; odds ratio; tetrachoric correlation coefficient; phi; Cramer’s V; Pearson; contingency coefficient; uncertainty coefficient; Theil’s U; eta; meta-analysis; Simpson’s paradox; causality; statistical independence; regression
Subjects:C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods: General > C10 - General
ID Code:3394
Deposited By:Thomas Colignatus
Deposited On:05. Jun 2007
Last Modified:07. Nov 2007 03:10
References:

Colignatus is the name of Thomas Cool in science.

Becker, L.A. (1999), “Measures of Effect Size (Strength of Association)”, http://web.uccs.edu/lbecker/SPSS/glm_effectsize.htm, Retrieved from source

Cool, Th. (1999, 2001), “The Economics Pack, Applications for Mathematica”, http://www.dataweb.nl/~cool, ISBN 90-804774-1-9, JEL-99-0820

Colignatus, Th. (2006), “On the sample distribution of the adjusted coefficient of determination (R2Adj) in OLS”, http://library.wolfram.com/infocenter/MathSource/6269/

Colignatus, Th. (2007a), “A logic of exceptions”, http://www.dataweb.nl/~cool, ISBN 978-90-804774-4-5

Colignatus, Th. (2007b), “Voting theory for democracy”, 2nd edition, http://www.dataweb.nl/~cool, ISBN 978-90-804774-5-2

Colignatus, Th. (2007c), “A measure of association (correlation) in nominal data (contingency tables), using determinants”, a earlier version of this paper (3rd publishable draft), http://ideas.repec.org/p/pra/mprapa/2662.html

Colignatus, Th. (2007d), “Correlation in contingency tables. A measure of association or correlation in nominal data (contingency tables), using determinants”, this paper, to be put at MPRA as well, as the improved version of Colignatus, Th. (2007c), but useful to mention in this list of references if only an abridged version of this paper is eventually published, http://mpra.ub.uni-muenchen.de/3044/

Colignatus, Th. (2007e), “Elementary statistics and causality”, work in progress, http://www.dataweb.nl/~cool

Colignatus, Th. (2007f), “The 2 × 2 × 2 case in causality, of an effect, a cause and a confounder”, http://mpra.ub.uni-muenchen.de/3351/, Retrieved from source

Friendly, M. (2007), “Categorical Data Analysis with Graphics”, Retrieved from http://www.math.yorku.ca/SCS/Courses/grcat/grc6.html (citing the data from Koch & Stokes (1991))

Garson, D. (2007), “Nominal Association: Phi, Contingency Coefficient, Tschuprow's T, Cramer's V, Lambda, Uncertainty Coefficient”, http://www2.chass.ncsu.edu/garson/pa765/assocnominal.htm, Retrieved from source

Higham, N. J. (1989), “Matrix nearness problems and applications”. In M. J. C. Gover and S. Barnett (eds), “Applications of Matrix Theory”, pages 1–27. Oxford University Press

Johnston J. (1972), “Econometric methods”, McGraw-Hill

Kleinbaum, D.G., K.M. Sullivan and N.D. Barker (2003), “ActivEpi Companion texbook”, Springer

Linacre J. M, (2005), “Correlation Coefficients: Describing Relationships”, Rasch Measurement Transactions, 19:3 p. 1028-9, retrieved from http://www.rasch.org/rmt/rmt193c.htm (citing the data from Uebersax (2000))

Losh, S.C. (2004), “Guide 5: Bivariate Associations and Correlation Coefficient Properties”, http://edf5400-01.fa04.fsu.edu/Guide5.html, Retrieved from Source

Losh, S.C. (2004a), “Guide 6: Multivariate Crosstabulations and Causal Issues”, http://edf5400-01.fa04.fsu.edu/Guide6.html, Retrieved from Source

Mood, A.M. and F.A. Graybill (1963), “Introduction to the theory of statistics”, McGraw-Hill

Pearl, J. (2000), “Causality. Models, reasoning and inference”, Cambridge

Simon, R. (2007), “Lecture Notes and Exercises 2006/07”, http://www.maths.lse.ac.uk/Courses/MA201/, Retrieved from source

Takayama A. (1974), “Mathematical economics”, The Dryden Press

Theil H. (1971), “Principles of econometrics”, North-Holland

UCLA ATS (2007), “SAS Textbook Examples. Econometric Analysis, Fourth Edition by Greene. Chapter 16: Simultaneous Equations Models”, http://www.ats.ucla.edu/stat/SAS/examples/greene/chapter16.htm, Retrieved from source

(Other) websites

http://en.wikipedia.org/wiki/Contingency_table http://post.queensu.ca:8080/SASDoc/getDoc/en/procstat.hlp/corr_sect26.htm http://en.wikipedia.org/wiki/Fisher%27s_exact_test

Available Versions of this Item

All papers reproduced by permission. Reproduction and distribution subject to the approval of the copyright owners.
Repository Staff Only: item control page

LMU-Logo
MPRA is a RePEc service hosted by
the Munich University Library in Germany.