Braaksma, Barteld and Zeelenberg, Kees (2015): “Re-make/Re-model”: Should big data change the modelling paradigm in official statistics? Published in: Statistical Journal of the IAOS , Vol. 31, No. 2 (2015): pp. 193-202.
Preview |
PDF
MPRA_paper_87741.pdf Download (721kB) | Preview |
Abstract
Big data offers many opportunities for official statistics: for example increased resolution, better timeliness, and new statistical outputs. But there are also many challenges: uncontrolled changes in sources that threaten continuity, lack of identifiers that impedes linking to population frames, and data that refers only indirectly to phenomena of statistical interest. We discuss two approaches to deal with these challenges and opportunities.
First, we may accept big data for what they are: an imperfect, yet timely, indicator of phenomena in society. These data exist and that's why they are interesting. Secondly, we may extend this approach by explicit modelling. New methods like machine-learning techniques can be considered alongside more traditional methods like Bayesian techniques.
National statistical institutes have always been reluctant to use models, apart from specific cases like small-area estimates. Based on the experience at Statistics Netherlands we argue that NSIs should not be afraid to use models, provided that their use is documented and made transparent to users. Moreover, the primary purpose of an NSI is to describe society; we should refrain from making forecasts. The models used should therefore rely on actually observed data and they should be validated extensively.
Item Type: | MPRA Paper |
---|---|
Original Title: | “Re-make/Re-model”: Should big data change the modelling paradigm in official statistics? |
English Title: | “Re-make/Re-model”: Should big data change the modelling paradigm in official statistics? |
Language: | English |
Keywords: | Big data, model-based statistics |
Subjects: | C - Mathematical and Quantitative Methods > C5 - Econometric Modeling > C55 - Large Data Sets: Modeling and Analysis C - Mathematical and Quantitative Methods > C8 - Data Collection and Data Estimation Methodology ; Computer Programs > C81 - Methodology for Collecting, Estimating, and Organizing Microeconomic Data ; Data Access C - Mathematical and Quantitative Methods > C8 - Data Collection and Data Estimation Methodology ; Computer Programs > C83 - Survey Methods ; Sampling Methods |
Item ID: | 87741 |
Depositing User: | Kees Zeelenberg |
Date Deposited: | 26 Jul 2018 12:22 |
Last Modified: | 26 Sep 2019 13:04 |
References: | A. Belloni, V. Chernozhukov and C. Hansen, High-dimensional methods and inference on structural and treatment effects, Journal of Economic Perspectives 28(2) (2014),29–50, doi: 10.1257/jep.28.2.29. B. Buelens, P.-P. de Wolf and K. Zeelenberg, Model-based estimation at Statistics Netherlands. Discussion Paper, Statistics Netherlands, The Hague, 2015. H. Choi and H.R. Varian, Predicting the present with Google trends, http://people.ischool.berkeley.edu/∼hal/Papers/2011/ptp.pdf, 2011. P.J.H. Daas and M.J.H. Puts, Social media sentiment and consumer confidence, Paper presented at the Workshop on using Big Data for Forecasting and Statistics, Frankfurt, 2014. https://www.ecb.europa.eu/events/pdf/conferences/140407/Daas_Puts_Sociale_media_cons_conf_Stat_Neth.pdf?409d61b733fc259971ee5beec7cedc61. P.J.H. Daas, M.J. Puts, B. Buelens and P.A.M. van den Hurk, Big Data and Official Statistics. Paper presented at the Conference on New Techniques and Technologies for Statistics, 5–7 March 2013, Brussels. http://www.cros-portal.eu/sites/default/files/NTTS2013fullPaper_76.pdf, 2013. European Union, Regulation on European statistics, Official Journal of the European Union, L 87 (31 March 2009), 164–173, http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri =CELEX:32009R0223:EN:NOT, 2009. European Union, Code of Practice for European Statistics, revised edition, Eurostat, Luxembourg. http://epp.eurostat.ec.europa.eu/portal/page/portal/quality/code_of_practice, 2005/2011. J. Fan, F. Han and H. Liu, Challenges of big data analysis, National Science Review 1(2), 293–314, doi: 10.1093/nsr/nwt032, 2014. A. Gelman, J.B. Carlin, H.S. Stern, D.B. Dunson, A. Vehtari and D.B. Rubin, Bayesian Data Analysis, 3e, Chapman and Hall/CRC, 2013. R.M. Groves, Three eras of survey research, Public Opinion Quarterly 75 (2011), 861–871, doi: 10.1093/poq/nfr057. N.M. Heerschap, S.A. Ortega Azurduy, A.H. Priem and M.P.W. Offermans, Innovation of tourism statistics through the use of new Big Data sources. Paper prepared for the Global Forum on Tourism Statistics, Prague (2014). http://www.tsf2014prague.cz/assets/downloads/Paper%201.2_ Nicolaes%20Heerschap_NL.pdf. International Statistical Institute, Declaration on Professional Ethics, revised edition, http://www.isi-web.org/about-isi/professional-ethics, 1985/2010. E. de Jonge, M. van Pelt and M. Roos, Time patterns, geospatial clustering and mobility statistics based on mobile phone network data. Discussion paper 2012/14, Statistics Netherlands. http://www.cbs.nl/NR/rdonlyres/010F11EC-AF2F-4138-8201-2583D461D2B6/0/201214x10pub.pdf, 2012. D. Lazer, R. Kennedy, G. King and A. Vespignani, The parable of Google flu: traps in big data analysis, Science 343(14) (2014), 1203–1205, doi: 10.1126/science.1248506. D.W. Nickerson and T. Rogers, Political campaigns and big data, Journal of Economic Perspectives 28(2) (2014), 51–74, doi: 10.1257/jep.28.2.51. C. Reimsbach-Kounatze, The proliferation of “big data” and implications for official statistics and statistical agencies: A preliminary analysis, OECD Digital Economy Papers, No. 245, OECD Publishing, Paris. doi:10.1787/5js7t9wqzvg8-en, 2015. F.J. van Ruth, Traffic intensity as indicator of regional economic activity, Discussion paper 2014/21, Statistics Netherlands, 2014. Statistical Commission of the United Nations, Fundamental Principles of Official Statistics. http://unstats.un.org/unsd/dnss/gp/fundprinciples.aspx, 1991/2014. P. Struijs, B. Braaksma and P.J.H. Daas, Official statistics and Big Data. Big Data & Society, April–June 2014, pp. 1–6, doi: 10.1177/2053951714538417. M. Tennekes and M.P.W. Offermans, Daytime population estimations based on mobile phone metadata. Paper prepared for the Joint Statistical Meetings, Boston, 2014. http://www.amstat.org/meetings/jsm/2014/onlineprogram/AbstractDetails.cfm?abstractid=311959. UN-ECE High-Level Group for the Modernisation of Statis tical Production and Services, What does “big data” mean for official statistics? http://www1.unece.org/stat/platform/pages/viewpage.action?pageId=77170622, 2013. C. Vaccari, Big Data in Official Statistics. PhD thesis, University of Camerino, 2014. H.R. Varian, Big data: New tricks for econometrics, Journal of Economic Perspectives 28(2) (2014), 3–28, doi:10.1257/jep.28.2.3 |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/87741 |