Munich Personal RePEc Archive

Estimation bias due to duplicated observations: a Monte Carlo simulation

Sarracino, Francesco and Mikucka, Malgorzata (2016): Estimation bias due to duplicated observations: a Monte Carlo simulation.

[img]
Preview
PDF
MPRA_paper_69064.pdf

Download (479kB) | Preview

Abstract

This paper assesses how duplicate records affect the results from regression analysis of survey data, and it compares the effectiveness of five solutions to minimize the risk of obtaining biased estimates. Results show that duplicate records create considerable risk of obtaining biased estimates. The chances of obtaining unbiased estimates in presence of a single sextuplet of identical observations is 41.6%. If the dataset contains about 10% of duplicated observations, then the probability of obtaining unbiased estimates reduces to nearly 11%. Weighting the duplicate cases by the inversion of their multiplicity minimizes the bias when multiple doublets are present in the data. Our results demonstrate the risks of using data in presence of non-unique observations and call for further research on strategies to analyze affected data.

UB_LMU-Logo
MPRA is a RePEc service hosted by
the Munich University Library in Germany.