Tsagris, Michail
(2014):
*The k-NN algorithm for compositional data: a revised approach with and without zero values present.*
Published in: Journal of Data Science
, Vol. 3, No. 12
(July 2014): pp. 519-534.

Preview |
PDF
MPRA_paper_65866.pdf Download (1MB) | Preview |

## Abstract

In compositional data, an observation is a vector with non-negative components which sum to a constant, typically 1. Data of this type arise in many areas, such as geology, archaeology, biology, economics and political science among others. The goal of this paper is to extend the taxicab metric and a newly suggested metric for com-positional data by employing a power transformation. Both metrics are to be used in the k-nearest neighbours algorithm regardless of the presence of zeros. Examples with real data are exhibited.

Item Type: | MPRA Paper |
---|---|

Original Title: | The k-NN algorithm for compositional data: a revised approach with and without zero values present |

English Title: | The k-NN algorithm for compositional data: a revised approach with and without zero values present |

Language: | English |

Keywords: | compositional data, entropy, k-NN algorithm, metric, supervised classification |

Subjects: | C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General > C18 - Methodological Issues: General |

Item ID: | 65866 |

Depositing User: | Mr Michail Tsagris |

Date Deposited: | 31 Jul 2015 14:02 |

Last Modified: | 19 Oct 2019 09:12 |

References: | Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society. Series B, 44 (2), 139-177. Aitchison, J. (1992). On criteria for measure of compositional difference. Mathe-matical Geology, 24 (4), 365-379. Aitchison, J. (2003). The statistical analysis of compositional data. Reprinted by The Blackburn Press, New Jersey. Baxter, M. J., Beardah, C. C., Cool, H. E. M., Jackson, C. M. (2005). Com-positional data analysis of some alkaline glasses. Mathematical geology, 37 (2), 183-196. Endres, D. M. Schindelin, J. E. (2003). A new metric for probability distributions. Information Theory, IEEE Transactions on, 49 (7), 1858-1860. Fry, J. M., Fry, T. R. L., McLaren, K. R. (2000). Compositional data analysis and zeros in micro data. Applied Economics, 32 (8), 953-959. Gallo, M. (2010). Discriminant partial least squares analysis on compositional data. Statistical Modelling, 10 (1), 41-56. Martin-Fernandez, J. A., Hron, K., Templ, M., Filzmoser, P., Palarea-Albaladejo, J. (2012). Model-based replacement of rounded zeros in compositional data: classical and robust approaches. Computational Statistics Data Analysis, 56 (9), 2688-2704. Miller, W. E. (2002). Revisiting the geometry of a ternary diagram with the half-taxi metric. Mathematical geology, 34 (3), 275-290. Neocleous, T., Aitken, C., Zadora, G. (2011). Transformations for com-positional data with zeros with an application to forensic evidence evaluation. Chemometrics and Intelligent Laboratory Systems, 109 (1), 77-85. Osterreicher, F. Vajda, I. (2003). A new class of metric divergences on probability spaces and its applicability in statistics. Annals of the Institute of Statistical Mathematics, 55 (3), 639-653. Otero, N., Tolosana-Delgado, R., Soler, A., Pawlowsky-Glahn, V., Canals, A.(2005). Relative vs. absolute statistical analysis of compositions: A comparative study of surface waters of a mediterranean river. Water research, 39 (7), 1404-1414. Owen, A. B. (2001). Empirical likelihood. CRC press, Boca Raton. Rodrigues, P. C. Lima, A. T. (2009). Analysis of an european union election using principal component analysis. Statistical Papers, 50 (4), 895-904. Scealy, J. L. Welsh, A. H. (2011a). Properties of a square root transformation regression model. In Proceedings of the 4rth Compositional Data Analysis Work-shop, Girona, Spain. Scealy, J. L. Welsh, A. H. (2011b). Regression for compositional data by using distributions defined on the hypersphere. Journal of the Royal Statistical Society. Series B, 73 (3), 351-375. Scealy, J. L. Welsh, A. H. (2012). Fitting kent models to compositional data with small concentration. Statistics and Computing, to appear. Stephens, M. A. (1982). Use of the von mises distribution to analyse continuous proportions. Biometrika, 69 (1), 197-203. Stewart, C. Field, C. (2011). Managing the essential zeros in quantitative fatty acid signature analysis. Journal of Agricultural, Biological, and Environmental Statistics, 16 (1), 45-69. |

URI: | https://mpra.ub.uni-muenchen.de/id/eprint/65866 |