Cluster Evolution Analytics

Morales-Oñate, Víctor and Morales-Oñate, Bolívar (2024): Cluster Evolution Analytics.

Preview

PDF
MPRA_paper_120220.pdf
Download (1MB) | Preview

Abstract

In this paper we propose Cluster Evolution Analytics (CEA) as a framework that can be considered in the realm of Advanced Exploratory Data Analysis or unsupervised learning. CEA leverages on the temporal component of panel data and it is based on combining two techniques that are usually not related: leave-one-out and plug-in principle. This allows us to use exploratory what if questions in the sense that the present information of an object is plugged-in a dataset in a previous time frame so that we can explore its evolution (and of its neighbors) to the present. We illustrate our results on a real dataset applying CEA on different clustering algorithms and developed a Shiny App with a particular configuration. Finally, we also provide an R package so that this framework can be used on different applications.

Item Type:	MPRA Paper
Original Title:	Cluster Evolution Analytics
English Title:	Cluster Evolution Analytics
Language:	English
Keywords:	clustering, temporal clustering, statistical profiles
Subjects:	C - Mathematical and Quantitative Methods > C0 - General > C02 - Mathematical Methods C - Mathematical and Quantitative Methods > C3 - Multiple or Simultaneous Equation Models ; Multiple Variables > C38 - Classification Methods ; Cluster Analysis ; Principal Components ; Factor Models C - Mathematical and Quantitative Methods > C6 - Mathematical Methods ; Programming Models ; Mathematical and Simulation Modeling > C63 - Computational Techniques ; Simulation Modeling
Item ID:	120220
Depositing User:	Victor Morales-Oñate
Date Deposited:	21 Feb 2024 10:29
Last Modified:	21 Feb 2024 10:29
References:	Aggarwal, C. C., Philip, S. Y., Han, J., & Wang, J. (2003). A framework for clustering evolving data streams. In Proceedings 2003 VLDB conference (pp. 81–92). Elsevier. doi:https://doi.org/10.1016/B978-012722442-8/50016-1. Athey, S., & Imbens, G. W. (2019). Machine learning methods that economists should know about. Annual Review of Economics, 11 , 685–725. doi:https: //doi.org/10.1146/annurev-economics-080217-053433. Barro, R. J. (1991). Economic growth in a cross section of countries. The quarterly journal of economics, 106 , 407–443. Barro, R. J., & Sala-i Martin, X. (1992). Convergence. Journal of political Economy, 100 , 223–251. Behrens, J. T. (1997). Principles and procedures of exploratory data analysis. Psychological methods, 2 , 131. Bishop, C. M., & Nasrabadi, N. M. (2006). Pattern recognition and machine learning volume 4. Springer. Bowdler, C., & Malik, A. (2017). Openness and inflation volatility: Panel data evidence. The North American Journal of Economics and Finance, 41 , 57-69. Croissant, Y., & Millo, G. (2008). Panel data econometrics in R: The plm package. Journal of Statistical Software, 27 , 1–43. doi:10.18637/jss.v027. i02. De Carvalho, F. D. A., Lechevallier, Y., & De Melo, F. M. (2012). Partitioning hard clustering algorithms based on multiple dissimilarity matrices. Pattern Recognition, 45 , 447–464. doi:https://doi.org/10.1016/j.patcog.2011.05.016. Durlauf, S. N., Johnson, P. A., & Temple, J. R. (2005). Growth econometrics. Handbook of economic growth, 1 , 555–677. Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis. John Wiley & Sons. Ezugwu, A. E., Ikotun, A. M., Oyelade, O. O., Abualigah, L., Agushaka, J. O., Eke, C. I., & Akinyelu, A. A. (2022). A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Engineering Applications of Artificial Intelligence, 110 , 104743. doi:https://doi.org/10.1016/j.engappai.2022.104743. Feenstra, R. C., Inklaar, R., & Timmer, M. P. (2015). The next generation of the penn world table. American economic review, 105 , 3150–3182. doi:https://doi.org/10.34894/QT5BCC. Harris, R. J. (2001). A primer of multivariate statistics. Psychology Press. Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction volume 2. Springer. Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of classification, 2 , 193–218. doi:https://doi.org/10.1007/BF01908075. Izenman, A. J. (2008). Modern multivariate statistical techniques volume 1. Springer. James, G., Witten, D., Hastie, T., Tibshirani, R. et al. (2013). An introduction to statistical learning volume 112. Springer. Jebb, A. T., Parrigon, S., & Woo, S. E. (2017). Exploratory data analysis as a foundation of inductive research. Human Resource Management Review, 27, 265–276. Kaufman, L., & Rousseeuw, P. J. (2009). Finding groups in data: an introduction to cluster analysis. John Wiley & Sons. Lee, A. J., Yang, F.-C., Chen, C.-H., Wang, C.-S., & Sun, C.-Y. (2016). Mining 335 perceptual maps from consumer reviews. Decision Support Systems, 82,12–25. doi:https://doi.org/10.1016/j.dss.2015.11.002. Morris, C. J., Ebert, D. S., & Rheingans, P. L. (2000). Experimental analysis of the effectiveness of features in chernoff faces. In 28th AIPR Workshop: 3D Visualization for Data Exploration and Decision Making (pp. 12–17). SPIE volume 3905. doi:https://doi.org/10.1117/12.384865. Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2 , 86–97. doi:https://doi.org/10.1002/widm.53. Oliveira, M., & Gama, J. (2012). A framework to monitor clusters evolution applied to economy and finance problems. Intelligent Data Analysis, 16, 93–111. doi:10.3233/IDA-2011-0512. Pfitzner, D., Leibbrandt, R., & Powers, D. (2009). Characterization and evaluation of similarity measures for pairs of clusterings. Knowledge and Information Systems, 19 , 361–394. doi:https://doi.org/10.1007/s10115-008-0150-6. Ripley, B. D. (2007). Pattern recognition and neural networks. Cambridge university press. Sekrafi, H., & Sghaier, A. (2016). Examining the relationship between corruption, economic growth, environmental degradation, and energy consumption: a panel analysis in mena region. Journal of the Knowledge Economy, (pp.1–17). Sokal, R. R. (1961). Distance as a measure of taxonomic similarity. Systematic Zoology, 10 , 70–79. Sokal, R. R. (1963). The principles and practice of numerical taxonomy. Taxon, (pp. 190–199). Tango, T. (1984). The detection of disease clustering in time. Biometrics, (pp. 15–26). doi:https://doi.org/10.2307/2530740. Tukey, J. W. et al. (1977). Exploratory data analysis volume 2. Reading, MA. Wallenstein, S. (1980). A test for detection of clustering over time. American Journal of Epidemiology, 111 , 367–372. doi:https://doi.org/10.1093/oxfordjournals.aje.a112908. Xu, R., & Wunsch, D. (2009). Clustering. John Wiley & Sons.
URI:	https://mpra.ub.uni-muenchen.de/id/eprint/120220

All papers reproduced by permission. Reproduction and distribution subject to the approval of the copyright owners.

View Item