Munich Personal RePEc Archive

Using decision tree classifier to predict income levels

Bekena, Sisay Menji (2017): Using decision tree classifier to predict income levels.

[img]
Preview
PDF
MPRA_paper_83406.pdf

Download (678kB) | Preview

Abstract

In this study Random Forest Classifier machine learning algorithm is applied to predict income levels of individuals based on attributes including education, marital status, gender, occupation, country and others. Income levels are defined as a binary variable 0 for income <=50K/year and 1 for higher levels .The data is acquired from UCI Machine Learning Repository and includes 32,561 individuals data on 13 attributes based on 1994 census database. Random forest classifier is used since it gave better accuracy compared to decision tree classifier and naïve bayes classifier. The predictive accuracy of the model on test data is 85%. Important features prediction shows marital status, capital gain, education, age and hours per week are the top features which account for larger shares of the model accuracy. Using decision tree classifier also shows that these variables are the top 5 features in importance.

UB_LMU-Logo
MPRA is a RePEc service hosted by
the Munich University Library in Germany.