Muratova, Anna and Islam, Robiul and Mitrofanova, Ekaterina S. and Ignatov, Dmitry I. (2019): Searching for Interpretable Demographic Patterns. Published in: CEUR Workshop Proceedings , Vol. 2479, (26 September 2019): pp. 18-31.
Preview |
PDF
paper2.pdf Download (957kB) | Preview |
Abstract
Nowadays there is a large amount of demographic data which should be analyzed and interpreted. From accumulated demographic data, more useful information can be extracted by applying modern methods of data mining. Two kinds of experiments are considered in this work: 1) generation of additional secondary features from events and evaluation of its influence on accuracy; 2) exploration of features influence on classification result using SHAP (SHapley Additive exPlanations). An algorithm for creating secondary features is proposed and applied to the dataset. The classifications were made by two methods, SVM and neural networks, and the results were evaluated. The impact of events and features on the classification results was evaluated using SHAP; it was demonstrated how to tune model for improving accuracy based on the obtained values. Applying convolutional neural network for sequences of events allowed improve classification accuracy and surpass the previous best result on the studied demographic dataset.
Item Type: | MPRA Paper |
---|---|
Original Title: | Searching for Interpretable Demographic Patterns |
English Title: | Searching for Interpretable Demographic Patterns |
Language: | English |
Keywords: | data mining; demographics; neural networks; classification; SHAP; interpretation |
Subjects: | C - Mathematical and Quantitative Methods > C0 - General > C02 - Mathematical Methods C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General > C15 - Statistical Simulation Methods: General I - Health, Education, and Welfare > I0 - General > I00 - General J - Labor and Demographic Economics > J1 - Demographic Economics > J13 - Fertility ; Family Planning ; Child Care ; Children ; Youth |
Item ID: | 97305 |
Depositing User: | Dr. Rustam Tagiew |
Date Deposited: | 02 Dec 2019 09:38 |
Last Modified: | 02 Dec 2019 09:38 |
References: | Muratova, A., Sushko, P., Espy, T.: Black-Box Classification Techniques for Demographic Sequences: from Customised SVM to RNN. In: Proceedings of the Fourth Workshop on Experimental Economics and Machine Learning, EEML 2017, vol. 1968, pp. 31-40. Aachen : CEUR Workshop Proceedings, Dresden (2017). Scott M. Lundberg, Su-In Lee. A Unified Approach to Interpreting Model Predictions. NIPS 2017 http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf Scott M. Lundberg. A unified approach to explain the output of any machine learning model. https://github.com/slundberg/shap Christoph Molnar, Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. 2019 https://christophm.github.io/interpretable-ml-book/shapley.html 5.E l z i n g a , C . H . , R a h m a n n , S . , Wa n g , H . : A l g o r i t h m s f o r s u b s e q u e n c e combinatorics. Theoretical Computer Science 409(3), 394-404 (2008). Egho, E., Raïssi, C., Calders, T., Jay, N., Napoli, A.: On measuring similarity for sequences of itemsets. Data Mining Knowledge Discovery 29(3), 732-764 (2015). Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.:Text Classification using String Kernels. Journal of Machine Learning Research 2, 419-444 (2002). Understanding LSTM Networks, http://colah.github.io/posts/2015-08-Understanding-LSTMs/, last accessed 2017/02/15. Scikit-learn: Scientific library for Machine Learning in Python, http://scikit-learn.org/, last accessed 2017/01/28. Keras: Deep Learning library for Theano and TensorFlow, https://keras.io/, last accessed 2017/02/17. The Unreasonable Effectiveness of Recurrent Neural Networks, http://karpathy.github.io/2015/05/21/rnn-effectiveness/, last accessed 2016/12/20. Ignatov, D.I., Mitrofanova, E.S., Muratova A.A., Gizdatullin D.K.: Pattern Mining and Machine Learning for Demographic Sequences. In: Knowledge Engineering and Semantic Web: 6th International Conference, KESW 2015, vol. 518, pp. 225-243. Springer, Switzerland (2015). Buzmakov, A., Egho, E., Nicolas, J., Kuznetsov, S.O., Napoli, A., Raïssi, Ch.: On mining complex sequential data by means of FCA and pattern structures. Int. J. General Systems 45(2), 135-159 (2016) Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: Delugach, H.S., Stumme, G. (eds.) ICCS 2001. LNCS (LNAI), vol. 2120, pp. 129–142. Springer, Heidelberg (2001) Ganter, B., Wille, R.: Formal Concept Analysis. Springer, Berlin (1999) Gizdatullin, D., Baixeries, J., Ignatov, D., Mitrofanova, E., Muratova, A., Thomas H. Espy: Learning Patterns from Demographic Sequences. In.: Intelligent Data Processing, IDP 2016, Springer (to appear) Gizdatullin,D., Ignatov, D., Mitrofanova, E., Muratova, A.: Classification of Demographic Sequences Based on Pattern Structures and Emerging Patterns. In.:14th International Conference on Formal Concept Analysis, Supplementary proceedings, ICFCA 2017, Rennes, France (2017) Aggarwal, Ch. C., Han, J.: Frequent Pattern Mining. Springer (2014) 19.Scott M Lundberg, Gabriel G Erion, and Su In Lee. Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888, 2018 Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting system." Proceedings of the 22nd acm SIGKDD international conference on knowledge discovery and data mining. ACM, 2016. Srivastava, A., Kundu, A., Sural, S., & Majumdar, A. (2008). Credit card fraud detection using hidden Markov model. IEEE Transactions on dependable and secure computing, 5(1), 37-48. |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/97305 |