Bekena, Sisay Menji (2017): Using decision tree classifier to predict income levels.
Preview |
PDF
MPRA_paper_83406.pdf Download (678kB) | Preview |
Abstract
In this study Random Forest Classifier machine learning algorithm is applied to predict income levels of individuals based on attributes including education, marital status, gender, occupation, country and others. Income levels are defined as a binary variable 0 for income <=50K/year and 1 for higher levels .The data is acquired from UCI Machine Learning Repository and includes 32,561 individuals data on 13 attributes based on 1994 census database. Random forest classifier is used since it gave better accuracy compared to decision tree classifier and naïve bayes classifier. The predictive accuracy of the model on test data is 85%. Important features prediction shows marital status, capital gain, education, age and hours per week are the top features which account for larger shares of the model accuracy. Using decision tree classifier also shows that these variables are the top 5 features in importance.
Item Type: | MPRA Paper |
---|---|
Original Title: | Using decision tree classifier to predict income levels |
Language: | English |
Keywords: | random-forest classifier, data science |
Subjects: | A - General Economics and Teaching > A1 - General Economics > A10 - General D - Microeconomics > D1 - Household Behavior and Family Economics D - Microeconomics > D1 - Household Behavior and Family Economics > D10 - General |
Item ID: | 83406 |
Depositing User: | Sisay Menji Bekena |
Date Deposited: | 22 Dec 2017 05:10 |
Last Modified: | 26 Sep 2019 09:23 |
References: | Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. Scikit-learn documentation, http://scikit-learn.org/stable/documentation.html. |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/83406 |