Tsagris, Michail (2014): The kNN algorithm for compositional data: a revised approach with and without zero values present. Published in: Journal of Data Science , Vol. 3, No. 12 (July 2014): pp. 519534.

PDF
MPRA_paper_65866.pdf Download (1MB)  Preview 
Abstract
In compositional data, an observation is a vector with nonnegative components which sum to a constant, typically 1. Data of this type arise in many areas, such as geology, archaeology, biology, economics and political science among others. The goal of this paper is to extend the taxicab metric and a newly suggested metric for compositional data by employing a power transformation. Both metrics are to be used in the knearest neighbours algorithm regardless of the presence of zeros. Examples with real data are exhibited.
Item Type:  MPRA Paper 

Original Title:  The kNN algorithm for compositional data: a revised approach with and without zero values present 
English Title:  The kNN algorithm for compositional data: a revised approach with and without zero values present 
Language:  English 
Keywords:  compositional data, entropy, kNN algorithm, metric, supervised classification 
Subjects:  C  Mathematical and Quantitative Methods > C1  Econometric and Statistical Methods and Methodology: General > C18  Methodological Issues: General 
Item ID:  65866 
Depositing User:  Mr Michail Tsagris 
Date Deposited:  31 Jul 2015 14:02 
Last Modified:  19 Oct 2019 09:12 
References:  Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society. Series B, 44 (2), 139177. Aitchison, J. (1992). On criteria for measure of compositional difference. Mathematical Geology, 24 (4), 365379. Aitchison, J. (2003). The statistical analysis of compositional data. Reprinted by The Blackburn Press, New Jersey. Baxter, M. J., Beardah, C. C., Cool, H. E. M., Jackson, C. M. (2005). Compositional data analysis of some alkaline glasses. Mathematical geology, 37 (2), 183196. Endres, D. M. Schindelin, J. E. (2003). A new metric for probability distributions. Information Theory, IEEE Transactions on, 49 (7), 18581860. Fry, J. M., Fry, T. R. L., McLaren, K. R. (2000). Compositional data analysis and zeros in micro data. Applied Economics, 32 (8), 953959. Gallo, M. (2010). Discriminant partial least squares analysis on compositional data. Statistical Modelling, 10 (1), 4156. MartinFernandez, J. A., Hron, K., Templ, M., Filzmoser, P., PalareaAlbaladejo, J. (2012). Modelbased replacement of rounded zeros in compositional data: classical and robust approaches. Computational Statistics Data Analysis, 56 (9), 26882704. Miller, W. E. (2002). Revisiting the geometry of a ternary diagram with the halftaxi metric. Mathematical geology, 34 (3), 275290. Neocleous, T., Aitken, C., Zadora, G. (2011). Transformations for compositional data with zeros with an application to forensic evidence evaluation. Chemometrics and Intelligent Laboratory Systems, 109 (1), 7785. Osterreicher, F. Vajda, I. (2003). A new class of metric divergences on probability spaces and its applicability in statistics. Annals of the Institute of Statistical Mathematics, 55 (3), 639653. Otero, N., TolosanaDelgado, R., Soler, A., PawlowskyGlahn, V., Canals, A.(2005). Relative vs. absolute statistical analysis of compositions: A comparative study of surface waters of a mediterranean river. Water research, 39 (7), 14041414. Owen, A. B. (2001). Empirical likelihood. CRC press, Boca Raton. Rodrigues, P. C. Lima, A. T. (2009). Analysis of an european union election using principal component analysis. Statistical Papers, 50 (4), 895904. Scealy, J. L. Welsh, A. H. (2011a). Properties of a square root transformation regression model. In Proceedings of the 4rth Compositional Data Analysis Workshop, Girona, Spain. Scealy, J. L. Welsh, A. H. (2011b). Regression for compositional data by using distributions defined on the hypersphere. Journal of the Royal Statistical Society. Series B, 73 (3), 351375. Scealy, J. L. Welsh, A. H. (2012). Fitting kent models to compositional data with small concentration. Statistics and Computing, to appear. Stephens, M. A. (1982). Use of the von mises distribution to analyse continuous proportions. Biometrika, 69 (1), 197203. Stewart, C. Field, C. (2011). Managing the essential zeros in quantitative fatty acid signature analysis. Journal of Agricultural, Biological, and Environmental Statistics, 16 (1), 4569. 
URI:  https://mpra.ub.unimuenchen.de/id/eprint/65866 