Andres, Antonio Rodriguez and Otero, Abraham and Amavilah, Voxi Heinrich (2021): Evaluation of technology clubs by clustering: A cautionary note. Forthcoming in: Applied Economics (2021)
Preview |
PDF
MPRA_paper_109138.pdf Download (727kB) | Preview |
Abstract
Applications of machine learning techniques to economic problems are increasing. These are powerful techniques with great potential to extract insights from economic data. However, care must be taken to apply them correctly, or the wrong conclusions may be drawn. In the technology clubs literature, after applying a clustering algorithm, some authors train a supervised machine learning technique, such as a decision tree or a neural network, to predict the label of the clusters. Then, they use some performance metric (typically, accuracy) of that prediction as a measure of the quality of the clustering configuration they have found. This is an error with potential negative implications for policy, because obtaining a high accuracy in such a prediction does not mean that the clustering configuration found is correct. This paper explains in detail why this modus operandi is not sound from theoretical point of view and uses computer simulations to demonstrate it. We caution policy and indicate the direction for future investigations.
Item Type: | MPRA Paper |
---|---|
Original Title: | Evaluation of technology clubs by clustering: A cautionary note |
Language: | English |
Keywords: | Machine learning; clustering, technological change; technology clubs; knowledge economy; cross-country |
Subjects: | C - Mathematical and Quantitative Methods > C4 - Econometric and Statistical Methods: Special Topics > C45 - Neural Networks and Related Topics C - Mathematical and Quantitative Methods > C5 - Econometric Modeling > C53 - Forecasting and Prediction Methods ; Simulation Methods O - Economic Development, Innovation, Technological Change, and Growth > O3 - Innovation ; Research and Development ; Technological Change ; Intellectual Property Rights > O38 - Government Policy O - Economic Development, Innovation, Technological Change, and Growth > O5 - Economywide Country Studies > O57 - Comparative Studies of Countries P - Economic Systems > P4 - Other Economic Systems > P41 - Planning, Coordination, and Reform |
Item ID: | 109138 |
Depositing User: | Voxi Heinrich Amavilah |
Date Deposited: | 21 Aug 2021 11:34 |
Last Modified: | 21 Aug 2021 11:34 |
References: | Ahlquist, J.S., and Breunig, C. (2012). Model-based clustering and typologies in the social sciences. Political Analysis, 20(1): 92—112. Antonelli, C. (1999). Evolution of Technological Cooperation in the Microdynamics of Technological Change. London and New York: Routledge Frontiers of Political Economy. Chapter 9. Athey, Susan C., Kevin A. Bryan, and Gans, J.S. (2020). The allocation of decision authority to human and artificial intelligence. AEA Papers and Proceedings, 110: 80—84. DOI: 10.1257/pandp.20201034 Athey, S. (2017). Beyond prediction: Using big data for policy problems. Science 355(6324): 483—485. Athey, S., Imbens, G. (2019). Machine learning methods economists should know about. Available at https://arxiv.org/abs/1903.10075 Bajari, P., Dalton, C., Hong, H., and Khwaja, A. (2014). Moral hazard, adverse selection and health expenditures: A Semiparametric Analysis. RAND Journal of Economics 45: 747-763. Basturk, N., Paap, R., and van Dijk, D. (2012) Structural differences in economic growth: an endogenous clustering approach. Applied Economics, 44(1): 119-134. Castellacci, F., Archibugi, D. (2008). The technology clubs: The distribution of knowledge across nations. Research Policy, 37: 1659—1673 Castellacci, F. (2008). Technology clubs, technology gaps, and growth trajectories. Structural Change and Economics Dynamics, 19: 301—314 Castellacci, F. (2011). Closing the technology gap? Review of Development Economics, 15(1): 180—197. Cerulli, G. (2020). Improving econometric prediction by machine learning. Applied Economics Letters. Forthcoming. Clement, J. (2020). Social protection clusters in sub-Saharan Africa. International Journal of Social Welfare, 29: 20—28. Cowgill, Bo., Stevenson, M.T. (2020). Algorithmic social engineering. AEA Papers and Proceedings, 110: 96—100. DOI: 10.1257/pandp.20201037 Currie, J., Kleven, H., and Zwiers, E. (2020). Technology and big data are changing economics: Mining text to track methods. AEA Papers and Proceedings, 110: 42—48. DOI: 10.1257/pandp.20201058 De la Paz-Marín, M., Campoy-Muñoz, P., and Hervás-Martínez, C. (2012). Non-linear multi-classifier model based on artificial intelligence to predict research and development performance in European countries. Technological Forecasting and Social Change, 79(9): 1731—1745. De la Paz-Marín, M., Gutiérrez, P.A., and Martínez, C.H. (2015). Classification of countries’ progress toward a knowledge economy based on machine learning classification techniques. Expert Systems with Applications, 42: 562—572. Durlauf, S. N., Johnson, P. A. (1995). Multiple regimes and cross-country growth behaviour. Journal of Applied Econometrics, 10: 365—384. Fagerberg, J., Srholec, M., and Knell, M/ (2007). The competitiveness of nations: Why some countries prosper while others fall behind. World Development, 35 (10): 1595-1620. Fraiman, R., Ghattas, B., and Svarc, M. (2013). Interpretable clustering using unsupervised binary trees. Advances in Data Analysis and Classification, 7(2): 125—145. Günther, F., and Fritsch, S. (2010). neuralnet: Training of neural networks. The R Journal, 2(1): 30—38. Halkidi, M., Batistakis, Y., and Vazirgiannis, M. (2002). Cluster validity methods: part I. ACM Sigmod Record, 31(2): 40—45. Kaufman, L., and Rousseeuw. P. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. New Jersey, NJ: Wiley. Kreiner, A., and Duca, J.V. (2020). Can machine learning on economic data better forecast the unemployment rate? Applied Economics Letters, 27 (17): 1434—1437 Kuncheva, L.I. (2014). Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons. Liu, Y., and Xie, T. (2019). Machine learning versus econometrics: Prediction of box office. Applied Economics Letters, 26(2): 124—130. Mueller, S.Q. (2020). Pre-and within season attendance forecasting in Major League Basketball: A random forest approach. Applied Economics, 52(41): 4512—4528 Nelson, R., and Phelps, E. (1966). Investment in humans, technological diffusion, and economic growth. American Economic Review, 56(2): 67—75. Onan, A. (2019). Consensus Clustering-Based Undersampling Approach to Imbalanced Learning. Scientific Programming PB - Hindawi. Onan, A., Korukoğlu, S., Bulut, H. (2016). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications, 57: 232—247. Onan, A. (2020). Sentiment analysis on massive open online course evaluations: A text mining and deep learning approach. Computer Applications in Engineering Education: 1— 18. https://doi.org/10.1002/cae.22253. Onan, A., Toçoğlu, M.A. (2020). Weighted word embeddings and ‐based identification of question topics in MOOC discussion forum posts. Computer Applications in Engineering Education; 1— 15. https://doi.org/10.1002/cae.22252. Onan, A. (2018). An ensemble scheme based on language function analysis and feature engineering for text genre classification. Journal of Information Science, 44(1): 28—47. https://doi.org/10.1177/0165551516677911. Onan, A. (2018). Sentiment analysis on Twitter based on ensemble of psychological and linguistic feature sets. Balkan Journal of Electrical and Computer Engineering, 6(2): 69—77. Onan, A. (2016). The use of data mining for strategic management: a case study on mining association rules in student information system. Croatian Journal of Education: Hrvatski časopis za odgoj i obrazovanje, 18(1): 41—70. Onan, A. (2017). Hybrid supervised clustering based ensemble scheme for text classification. Kybernetes, Vol. 46(2): 330—348. Onan, A., Korukoglu, S., and Bulut, H. (2016). LDA based topic modelling in text sentiment classification: An empirical analysis. International Journal of Computational Linguistics and Applications, 7(1): 101—119. Onan, A., and Toçoğlu, M.A. (2021). A term weighted neural language model and stacked bidirectional LSTM based framework for sarcasm identification. IEEE Access, 9: 7701—7722. Onan, A., Korukoğlu, S., and Bulut, H. (2016). A multi-objective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Systems with Applications, 62: 1—16. Onan, A., and Korukoğlu, S. (2017). A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science, 43(1): 25—38. Porter, M.E. (1990). The Competitive Advantage of Nations. New York: Free Press. R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. Rambachan, A., Kleinberg, J., Ludwig, J., and Mullainathan, S. (2020). An economic perspective on algorithmic fairness. AEA Papers and Proceedings, 110: 91—95. DOI: 10.1257/pandp.20201036 Rezankova, H. (2014). Cluster analysis of economic data. Statistika, 94(1): 73—85. Samarasinghe, S. (2016). Neural Networks for Applied Sciences and Engineering: From Fundamentals to Complex Pattern Recognition. CRC Press. Shaaba Saba, C., Oladipo Olalekan, D. (2020). Convergence patterns in global ICT: Fresh insights from a club clustering algorithm. Telecommunications Policy, 44(10), 102010. Scharfenaker, E., and Schneider, M.P.A. (2020). Labor market segmentation and the distribution of income: New evidence from Internal Census Bureau Data. In book: Great Polarization: Economics, Institutions and Policies in the Age of Inequality. Cambridge. Stöllinger, R. (2013). International spillovers in a world of technology clubs. Structural Change and Economic Dynamics, 27: 19—35. Sulkowski, A. White, D.S. (2016). A happiness Kuznets curve? Using model-based cluster analysis to group countries based on happiness, development, income, and carbon emissions. Environment, Development and Sustainability, 18(4): 1095—1111. Therneau, T., Atkinson, B., Ripley, B., and Ripley, M.B. (2015). Package ‘rpart’. Available online at: cran.ma.ic.ac.uk/web/packages/rpart/rpart.pdf (accessed April 2020). Wolfson, M., Madjd-Sadjadi, Z., and James, P. (2004). Identifying national types: A cluster analysis of politics, economics, and conflict. Journal of Peace Research, 41(5): 607—623. Xu, D., and Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of Data Science, 2(2): 165—193. |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/109138 |