Sarracino, Francesco and Mikucka, Malgorzata (2016): Estimation bias due to duplicated observations: a Monte Carlo simulation.
Preview |
PDF
MPRA_paper_69064.pdf Download (479kB) | Preview |
Abstract
This paper assesses how duplicate records affect the results from regression analysis of survey data, and it compares the effectiveness of five solutions to minimize the risk of obtaining biased estimates. Results show that duplicate records create considerable risk of obtaining biased estimates. The chances of obtaining unbiased estimates in presence of a single sextuplet of identical observations is 41.6%. If the dataset contains about 10% of duplicated observations, then the probability of obtaining unbiased estimates reduces to nearly 11%. Weighting the duplicate cases by the inversion of their multiplicity minimizes the bias when multiple doublets are present in the data. Our results demonstrate the risks of using data in presence of non-unique observations and call for further research on strategies to analyze affected data.
Item Type: | MPRA Paper |
---|---|
Original Title: | Estimation bias due to duplicated observations: a Monte Carlo simulation |
Language: | English |
Keywords: | duplicated observations, estimation bias, Monte Carlo simulation, inference |
Subjects: | C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General > C13 - Estimation: General C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General > C18 - Methodological Issues: General C - Mathematical and Quantitative Methods > C2 - Single Equation Models ; Single Variables > C21 - Cross-Sectional Models ; Spatial Models ; Treatment Effect Models ; Quantile Regressions C - Mathematical and Quantitative Methods > C8 - Data Collection and Data Estimation Methodology ; Computer Programs > C81 - Methodology for Collecting, Estimating, and Organizing Microeconomic Data ; Data Access |
Item ID: | 69064 |
Depositing User: | Francesco Sarracino |
Date Deposited: | 28 Jan 2016 06:28 |
Last Modified: | 26 Sep 2019 12:40 |
References: | American Statistical Association (2003). Interviewer falsification in survey research: Current best methods for prevention, detection, and repair of its effects. Published in Survey Research Volume 35, Number 1, 2004, Newsletter from the Survey Research Laboratory, College of Urbaln Planning and Public Affairs, University of Illinois at Chicago. Elmagarmid, A. K., Ipeirotis, P. G., and Verykios, V. S. (2007). Duplicate record detection: A survey. Knowledge and Data Engineering, IEEE Transactions on, 19(1):1–16. Kuriakose, N. and Robbins, M. (2015). Falsification in surveys: Detecting near duplicate observations. Available at SSRN. Accessed on 28th of July 2015. Lessler, J. and Kalsbeek, W. (1992). Nonsampling error in surveys. Wiley, New York. Slomczynski, K. M., Powalko, P., and Krauze, T. (2015). The large number of duplicate records in international survey projects: The need for data quality control. CONSIRT Working Papers Series 8 at consirt.osu.edu. Waller, L. G. (2013). Interviewing the surveyors: Factors which contribute to questionnaire falsification (curbstoning) among Jamaican field surveyors. International Journal of Social Research Methodology, 16(2):155–164. |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/69064 |