Martin, Eisele and Zhu, Junyi (2013): Multiple imputation in a complex household survey  the German Panel on Household Finances (PHF): challenges and solutions.

PDF
MPRA_paper_57666.pdf Download (429kB)  Preview 
Abstract
In this paper, we present a case study of the imputation in a complex household survey  the first wave of the German Panel on Household Finances (PHF). A household wealth survey has to be built on a questionnaire with rather complex logical structure mainly because the probes of many wealth items have to be proceeded on both intensive and extensive margins. Hence the number of potential predictors for each imputation model grows and more noncompliance can confront standard modelling due to, e.g., irregular missing patterns, interdependent logical constraints, data anomalies etc. Our model selection procedure borrows the techniques for the outofsample prediction to handle the overfitting often associated with the introduction of a large number of predictors. We also take the measures to produce ex ante evaluation for modelling which can be more efficient than the common diagnosis done after imputation in practice. Solutions for the difficulties in the real data and questionnaire structures are also presented. On the other hand, we incorporate the rich flagging information in developing various measures of itemnonresponse to access this complication from logical structure. We find that information loss due to the contagion of itemnonresponse between variables is not serious in our imputed data.
Item Type:  MPRA Paper 

Original Title:  Multiple imputation in a complex household survey  the German Panel on Household Finances (PHF): challenges and solutions 
Language:  English 
Keywords:  Multiple imputation, Model selection, Panel on household finance, itemnonresponse evaluation 
Subjects:  C  Mathematical and Quantitative Methods > C1  Econometric and Statistical Methods and Methodology: General > C15  Statistical Simulation Methods: General ?? C42 ?? C  Mathematical and Quantitative Methods > C5  Econometric Modeling > C52  Model Evaluation, Validation, and Selection 
Item ID:  57666 
Depositing User:  Junyi Zhu 
Date Deposited:  31. Jul 2014 13:52 
Last Modified:  31. Jul 2014 14:10 
References:  Abayomi, K., A. Gelman, et al. (2008). Diagnostics for multivariate imputations. Journal of the Royal Statistical Society: Series C (Applied Statistics). 3: 273291. Albacete, N. (2012). Multiple Imputation in the Austrian Household Survey on Housing Wealth. Oesterreichische Nationalbank (Austrian Central Bank) Working Paper. Barceló, C. (2006). Imputation of the 2002 wave of the Spanish survey of household. Banco de España Occasional Papers 0603. Barnard, J. and X. L. Meng (1999). Applications of Multiple Imputation in Medical Studies: From AIDS to NHANES. Statistical Methods in Medical Research. 8: 1736. Christelis, D. (2011). Imputation of Missing Data in Waves 1 and 2 of SHARE. CSEF Working Papers. Cragg, J. G. (1971). Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica: Journal of the Econometric Society: 829844. David, M., R. J. A. Little, et al. (1986). Alternative methods for CPS income imputation. Journal of the American Statistical Association: 2941. Drechsler, J. (2011). Multiple imputation in practice  a case study using a complex German establishment survey. AStA Advances in Statistical Analysis. 95: 126. Gelman, A. and D. B. Rubin (1992). Inference from iterative simulation using multiple sequences. Statistical science. 7: 457472. Jaenichen, U. and J. W. Sakshaug (2012). Multiple imputation of household income in the first wave of PASS. Institute for Employment Research (IAB). Kalton, G. and D. Kasprzyk (1985). The Treatment of Missing Survey Data. Survey Methodogogy. 12: 116. Kennickell, A. B. (1991). Imputation of the 1989 Survey of Consumer Finances: Stochastic Relaxation and Multiple Imputation. the Annual Meetings of the American Statistical Association. Kennickell, A. B. (1998). Multiple imputation in the Survey of Consumer Finances. Proceedings of the Section on Business and Economic Statistics, 1998 Annual Meetings of the American Statistical Association. Little, R. J. A. (1988). Missingdata adjustments in large surveys. Journal of Business and Economic Statistics: 287296. Little, R. J. A. and D. B. Rubin (2002). Statistical analysis with missing data. New York, Wiley. Meng, X. L. (1994). Multipleimputation inferences with uncongenial sources of input. Statistical Science: 538558. Raghunathan, T. E., J. M. Lepkowski, et al. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology. 27: 8596. Reiter, J. P., T. E. Raghunathan, et al. (2006). The importance of modeling the sampling design in multiple imputation for missing data. Survey Methodology. 32: 143. Rubin, D. B. (1978). Multiple Imputations in Sample Surveys  A Phenomenological Bayesian Approach to Nonresponse. Proceedings of the Survey Research Methods Section, American Statistical Association: 2034. Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. New York, Wiley. Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association. 91: 473489. Rubin, D. B. (2004). The design of a general and flexible system for handling nonresponse in sample surveys. The American Statistician. 58. Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. New York, Chapman and Hall. Schenker, N. a. R. T. E., P. L. Chiu, et al. (2006). Multiple imputation of missing income data in the National Health Interview Survey. Journal of the American Statistical Association. 101: 924933. Schunk, D. (2008). A Markov Chain Monte Carlo algorithm for multiple imputation in large surveys. AStA Advances in Statistical Analysis. 92: 101114. van Buuren, S., H. C. Boshuizen, et al. (1999). Multiple Imputation of Missing Blood Pressure Covariates in Survival Analysis. Statistics in Medicine. 18: 681694. White, I. R., R. Daniel, et al. (2010). Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables. Computational Statistics \& Data Analysis. 54: 22672275. Yucel, R. M. (2011). State of the Multiple Imputation Software. Journal of statistical software. 45. 
URI:  https://mpra.ub.unimuenchen.de/id/eprint/57666 