Erdem, Erkan and Prada, Sergio I (2011): Creation of public use files: lessons learned from the comparative effectiveness research public use files data pilot project. Published in: The 2011 JSM Proceedings (19. December 2011): pp. 4095-4109.
Download (511kB) | Preview
In this paper we describe lessons learned from the creation of Basic Stand Alone (BSA) Public Use Files (PUFs) for the Comparative Effectiveness Research Public Use Files Data Pilot Project (CER-PUF). CER-PUF is aimed at increasing access to the Centers for Medicare and Medicaid Services (CMS) Medicare claims datasets through PUFs that: do not require user fees and data use agreements, have been de-identified to assure the confidentiality of the beneficiaries and providers, and still provide substantial analytic utility to researchers. For this paper we define PUFs as datasets characterized by free and unrestricted access to any user. We derive lessons learned from five major project activities: (i) a review of the statistical and computer science literature on best practices in PUF creation, (ii) interviews with comparative effectiveness researchers to assess their data needs, (iii) case studies of PUF initiatives in the United States, (iv) interviews with stakeholders to identify the most salient issues regarding making microdata publicly available, and (v) the actual process of creating the Medicare claims data BSA PUFs.
|Item Type:||MPRA Paper|
|Original Title:||Creation of public use files: lessons learned from the comparative effectiveness research public use files data pilot project|
|Keywords:||Public use files, PUFs, re-identification, de-identification, Medicare claims, comparative effectiveness research, confidentiality, data utility|
|Subjects:||H - Public Economics > H1 - Structure and Scope of Government > H11 - Structure, Scope, and Performance of Government
H - Public Economics > H5 - National Government Expenditures and Related Policies > H51 - Government Expenditures and Health
C - Mathematical and Quantitative Methods > C4 - Econometric and Statistical Methods: Special Topics
|Depositing User:||Sergio Prada|
|Date Deposited:||19. Dec 2011 22:54|
|Last Modified:||13. Feb 2013 23:55|
Agrawal, R. and Srikant, R. (2000). “Privacy-preserving data mining.” Proc ACM SIGMOD International Conference on Management of Data, pp. 439-450.
Dalenius, T. (1977). Toward a methodology for statistical disclosure control. Statistik Tidskrift 15: 429-444.
Duncan et al. (1993). Private Lives and Public Policies. National Academy Press.
Duncan, G. T., Elliot, M. and Salazar-González, J. (2011). Statistical Confidentiality: Principles and Practice. New York: Springer.
Dwork, C. and Naor, M. (2010). “On the Difficulties of Disclosure Prevention in Statistical Database or the Case for Differential Privacy.” Journal of Privacy and Confidentiality 2(1): 93-107
Emam et al., (2009) “A Globally Optimal k-Anonymity Method for the De-Identification of Health Data.” Journal of the American Medical Informatics Association, 16(5): 670–682.
Erdem, E. and Thomas W. Concannon. “What Do Researchers Say about Medicare Claims Public Use Files?” Prepared for the Centers for Medicare & Medicaid Services, U.S. Department of Health & Human Services, 2011 (Submitted).
Fienberg, S. and McIntyre, J. (2005). “Data Swapping: Variations on a Theme by Dalenius and Reiss.” Journal of Official Statistics, 21(2): 309-323.
Hundepool, A. et al (2010). Handbook on Statistical Disclosure Control. ESSNet S D C (Available at http://neon.vb.cbs.nl/casc/..%5Ccasc%5Chandbook.htm) Narayanan, A., and Shmatikov, V. (2010). “Myths and Fallacies of ‘Personally Identifiable Information’”, Communications of the ACM 53(6): 24-26.
Prada, S. (2011) “Creating Public Use Files: What makes a successful initiative?” (Submitted).
Prada, S. et al (2011) “Avoiding Disclosure of Individually Identifiable Health Information in Public Use Files: A Literature Review” (SAGE Open).
Reiter, J. (2009). “Multiple Imputation for Disclosure Limitation: Future Research Challenges.” Journal of Privacy and Confidentiality, 1(2): 223-233.
Singh, A. (2009). “Maintaining Analytic Utility while Protecting Confidentiality of Survey and Nonsurvey Data.” Journal of Privacy and Confidentiality, 1(2): 155-182.
Skinner, C. (2009). “Statistical Disclosure Control for Survey Data.” in (D. Pfeffermann and C.R. Rao, eds., Handbook of Statistics Vol. 29A, Amsterdam: Elsevier, 381-396.
Sweeney, L. (2000). “Uniqueness of simple demographics in the U.S. population.” LIDAP-WP4. Laboratory for International Data Privacy, Carnegie Mellon University.
Sweeny, L. (1997). “Weaving Technology and policy together to maintain confidentiality.” Journal of Law, Medicine and Ethics, 25(2-3): 98-110
Thorpe, J. (2010) “Medicare Public Use Files for Comparative Effectiveness Research – Analysis of Relevant Laws and Regulations.” Prepared for the Centers for Medicare & Medicaid Services, U.S. Department of Health & Human Services.
Willenborg, L. and de Waal, T. (1996). “Statistical Disclosure Control in Practice.” Lecture Notes in Statistics Vol. 111, Springer-Verlag, New York.
Winkler, W. E. (2007). “Examples of Easy-to-implement, Widely Used Methods of Masking for which Analytic Properties are not Justified.” Research Report Series #2007-21, Statistical Research Division, U.S. Census Bureau.