Muratova, Anna and Sushko, Pavel and Espy, Thomas H. (2017): Black-Box Classification Techniques for Demographic Sequences : from Customised SVM to RNN. Published in: CEUR Workshop Proceeding , Vol. 1968, No. Experimental Economics and Machine Learning (28 October 2017): pp. 31-40.
Preview |
PDF
paper4.pdf Download (369kB) | Preview |
Abstract
Nowadays there is a large amount of demographic data which should be analysed and interpreted. From accumulated demographic data, more useful information can be extracted by applying modern methods of data mining. The aim of this study is to compare the methods of classification of demographic data by customising the SVM kernels using various similarity measures. Since demographers are interested in sequences without discontinuity, formulas for such sequences similarity measures were derived. Then they were used as kernels in the SVM method, which is the novelty of this study. Recurrent neural network algorithms, such as Simple RNN, GRU and LSTM, are also compared. The best classification result with SVM method is obtained using a special kernel function in SVM by transforming sequences into features, but recurrent neural network outperforms SVM.
Item Type: | MPRA Paper |
---|---|
Original Title: | Black-Box Classification Techniques for Demographic Sequences : from Customised SVM to RNN |
English Title: | Black-Box Classification Techniques for Demographic Sequences : from Customised SVM to RNN |
Language: | English |
Keywords: | data mining, demographics, support vector machines, neural networks, classification, sequences similarity |
Subjects: | C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General > C14 - Semiparametric and Nonparametric Methods: General J - Labor and Demographic Economics > J1 - Demographic Economics > J11 - Demographic Trends, Macroeconomic Effects, and Forecasts |
Item ID: | 82799 |
Depositing User: | Dr. Rustam Tagiew |
Date Deposited: | 24 Nov 2017 10:19 |
Last Modified: | 28 Sep 2019 07:09 |
References: | Elzinga, C.H., Liefbroer A.C.: De-standardization of Family-Life Trajectories of Young Adults. A Cross-National Comparison Using Sequence Analysis. European Journal of Population 23(3), 225-250 (2007). Elzinga, C.H., Rahmann, S., Wang, H. : Algorithms for subsequence combinatorics. Theoretical Computer Science 409(3), 394-404 (2008). Egho, E., Raïssi, C., Calders, T., Jay, N., Napoli, A.: On measuring similarity for sequences of itemsets. Data Mining Knowledge Discovery 29(3), 732-764 (2015). Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text Classification using String Kernels. Journal of Machine Learning Research 2, 419-444 (2002). Understanding LSTM Networks, http://colah.github.io/posts/2015-08-Understanding-LSTMs/, last accessed 2017/02/15. Scikit-learn: Scientific library for Machine Learning in Python, http://scikit-learn.org/, last accessed 2017/01/28. Keras: Deep Learning library for Theano and TensorFlow,https://keras.io/, last accessed 2017/02/17. The Unreasonable Effectiveness of Recurrent Neural Networks, http://karpathy.github.io/2015/05/21/rnn-effectiveness/, last accessed 2016/12/20. Ignatov, D.I., Mitrofanova, E.S., Muratova A.A., Gizdatullin D.K.: Pattern Mining and Machine Learning for Demographic Sequences. In: Knowledge Engineering and Semantic Web: 6th International Conference, KESW 2015, vol. 518, pp. 225-243. Springer, Switzer-land (2015). Buzmakov, A., Egho, E., Nicolas, J., Kuznetsov, S.O., Napoli, A., Raïssi, Ch.: On mining complex sequential data by means of FCA and pattern structures. Int. J. General Systems 45(2),135-159 (2016) Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: Delugach, H.S., Stumme, G. (eds.) ICCS 2001. LNCS (LNAI), vol. 2120, pp. 129–142. Springer, Heidelberg (2001) Ganter, B., Wille, R.: Formal Concept Analysis. Springer, Berlin (1999) Gizdatullin, D., Baixeries, J., Ignatov, D., Mitrofanova, E., Muratova, A., Thomas H. Espy: Learning Patterns from Demographic Sequences. In.: Intelligent Data Processing, IDP 2016, Springer (to appear) Gizdatullin,D., Ignatov, D., Mitrofanova, E., Muratova, A.: Classification of Demographic Sequences Based on Pattern Structures and Emerging Patterns. In.:14th International Conference on Formal Concept Analysis, Supplementary proceedings, ICFCA 2017, Rennes, France (2017) Aggarwal, Ch. C., Han, J. : Frequent Pattern Mining. Springer (2014) |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/82799 |