Munich Personal RePEc Archive

Applying CHAID for logistic regression diagnostics and classification accuracy improvement

Antipov, Evgeny and Pokryshevskaya, Elena (2009): Applying CHAID for logistic regression diagnostics and classification accuracy improvement.

Preview

PDF
MPRA_paper_21499.pdf
Download (544kB) | Preview

Abstract

In this study a CHAID-based approach to detecting classification accuracy heterogeneity across segments of observations is proposed. This helps to solve some important problems, facing a model-builder: 1. How to automatically detect segments in which the model significantly underperforms? 2. How to incorporate the knowledge about classification accuracy heterogeneity across segments to partition observations in order to achieve better predictive accuracy? The approach was applied to churn data from the UCI Repository of Machine Learning Databases. By splitting the dataset into 4 parts, which are based on the decision tree, and building a separate logistic regression scoring model for each segment we increased the accuracy by more than 7 percentage points on the test sample. Significant increase in recall and precision was also observed. It was shown that different segments may have absolutely different churn predictors. Therefore such a partitioning gives a better insight into factors influencing customer behavior.

Item Type:	MPRA Paper
Original Title:	Applying CHAID for logistic regression diagnostics and classification accuracy improvement
English Title:	Applying CHAID for logistic regression diagnostics and classification accuracy improvement
Language:	English
Keywords:	CHAID; logistic regression; churn prediction; performance improvement; segmentwise prediction; decision tree; classification tree
Subjects:	M - Business Administration and Business Economics ; Marketing ; Accounting ; Personnel Economics > M3 - Marketing and Advertising > M31 - Marketing C - Mathematical and Quantitative Methods > C0 - General
Item ID:	21499
Depositing User:	Evgeny Antipov
Date Deposited:	29 Mar 2010 07:30
Last Modified:	26 Sep 2019 13:41
References:	Deodhar, M., Ghosh, J. (2007) A framework for simultaneous co-clustering and learning from complex data. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining; 12-15 August 2007, San Jose, California, USA. Magidson, J. (1982) Some Common Pitfalls in Causal Analysis of Categorical Data. Journal of Marketing Research, Vol. 19, No. 4, Special Issue on Causal Modeling (Nov., 1982), pp. 461-471. Ratner, B. (2003) Statistical modeling and analysis for database marketing: effective techniques for mining big data. Chapman & Hall/CRC. Hill, T. and Lewicki, P. (2007) STATISTICS Methods and Applications. StatSoft, Tulsa, OK. Hosmer, David W.; Stanley Lemeshow (2000). Applied Logistic Regression, 2nd ed. New York; Chichester, Wiley. Kleinbaum, D. G. 1994. Logistic Regression: A Self-Learning Text. New York: Springer-Verlag. Neslin, S., Gupta, S., Kamakura, W., Lu, J. and Mason, C. (2006) Detection defection: Measuring and understanding the predictive accuracy of customer churn models. Journal of Marketing Research 43(2): 204–211. Levin, N. and Zahavi, J. (1998) Continuous predictive modeling, a comparative analysis. Journal of Interactive Marketing 12: 5–22. Kass, G.V. (1980) An Exploratory Technique for Investigating Large Quantities of Categorical Data. Journal of Applied Statistics 29(2): 119-127. Ripley, B.D. (1996) Pattern recognition and neural networks. Cambridge: Cambridge University Press. Morgan, J.N. and Messenger, R.C. (1973) THAID: A sequential analysis program for the analysis of nominal scale dependent variables. Institute of Social Research, University of Michigan, Ann Arbor. Technical report. Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. (1984) Classification and Regression Trees. New York: Chapman & Hall/CRC. Blake, C. L. and Merz, C. J., Churn Data Set, UCI Repository of Machine Learning Databases, http://www.sgi.com/tech/mlc/db/. University of California, Department of Information and Computer Science, Irvine, CA, 1998.
URI:	https://mpra.ub.uni-muenchen.de/id/eprint/21499

All papers reproduced by permission. Reproduction and distribution subject to the approval of the copyright owners.

View Item

Atom RSS 1.0 RSS 2.0

Contact us: mpra@ub.uni-muenchen.de

This repository has been built using EPrints software.

MPRA is a RePEc service hosted by .