Tierney, Heather L.R. and Kim, Jiyoon (June) and Nazarov, Zafar (2018): The Effects of Temporal Aggregation on Search Engine Data.
Preview |
PDF
MPRA_paper_84474.pdf Download (975kB) | Preview |
Abstract
Using structured machine learning, this paper examines the effect that temporal aggregation has on big data from Google Analytics and Google Trends. Specifically, daily and weekly data from the Charleston Area Convention and Visitors Bureau (CACVB) website from January 2008 to March 2009 via Google Analytics and weekly, monthly, and quarterly data from Google Trends for seven economic variables from 2004 to 2011 are examined. Taking into account the different levels of aggregation, the CDFs and the estimated regression results are examined. The Kolmogorov-Smirnov test rejects the null of equivalent data distributions in the vast majority of cases for the CACVB data, but this is not the case for the economic variable. Through data mining, this paper also finds that aggregation has the potential of affecting the level of integration and the regression results for both the CACVB data and the seven economic variables.
Item Type: | MPRA Paper |
---|---|
Original Title: | The Effects of Temporal Aggregation on Search Engine Data |
Language: | English |
Keywords: | Big Data, Machine Learning, Data Mining, Aggregation, Unit roots, Scaling Effects, Normalization Effects |
Subjects: | C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General > C19 - Other C - Mathematical and Quantitative Methods > C4 - Econometric and Statistical Methods: Special Topics > C43 - Index Numbers and Aggregation C - Mathematical and Quantitative Methods > C5 - Econometric Modeling > C55 - Large Data Sets: Modeling and Analysis |
Item ID: | 84474 |
Depositing User: | Prof. Heather L.R. Tierney |
Date Deposited: | 14 Feb 2018 23:06 |
Last Modified: | 01 Oct 2019 15:59 |
References: | Alpaydin, E. (2009), Introduction to Machine Learning, 2nd Ed, The MIT Press, Cambridge, Massachusetts. Azar, J. (2009), “Oil Prices and Electric Cars”, Princeton University Working Paper. Barbaro, M. and Zeller, T. (2006), “A Face Is Exposed for AOL Searcher No. 4417749”, New York Times, August 9, accessed online at http://www.nytimes.com. Capon, J. (1965), “On the Asymptotic Efficiency of the Kolmogorov-Smirnov,” Journal of the American Statistical Association, 60:311, 843-853. Elliot, G., Rothenberg, T.J. and Stock, J.H. (1996). “Efficient Tests for an Autoregressive Unit Root,” Econometrica, 64, 813-836. Fürnkranz, J., Gamberger, D., Lavrač, N. (2012), Foundations of Rule Learning, Springer-Verlag, Berlin Heidelberg. -----Google Trends (2011a), “How is the data normalized?” http://www.google.com/support/insights//bin/bin/answer.py?answer=87284 (accessed July 7, 2011). -----Google Trends (2011b), “How is the data scaled?” http://www.google.com/support/insights//bin/bin/answer.py?answer=87282 (accessed July 7, 2011). -----Google Trends (2011c), “What do the numbers on the graph mean?” http://www.google.com/support/insights/bin/answer.py?hl=en&answer=87285 (accessed July 7, 2011). Granger, C. W. J., and Siklos, P.L. (1995), ‘‘Systematic Sampling, Temporal Aggregation, Seasonal Adjustment, and Cointegration: Theory and Evidence.’’ Journal of Econometrics, 66:2, 357–369. Hellström, J. (2002), Count Data Modelling and Tourism Demand. Umeå Economic Studies N.584. Horák J., Ivan I., Kukuliač P., Inspektor T., Devečka B., Návratová M. (2013), “Google Trends for Data Mining. Study of Czech Towns.” In: Bǎdicǎ C., Nguyen N.T., Brezovan M. (eds) Computational Collective Intelligence. Technologies and Applications. ICCCI 2013. Lecture Notes in Computer Science, V. 8083, 100-109, Springer, Berlin, Heidelberg. Jun, S.-P., Yoo, H.S., and Choi, S. (2017), “Ten Years of Research Change using Google Trends: From the Perspective of Big Data Utilizations and Applications.” Technological Forecasting & Social Change, https://doi.org/10.1016/j.techfore.2017.11.009 Kim, P.J. (1969), “On the Exact and Approximate Sampling Distribution of the Two Sample Kolmogorov-Smirnov Criterion ,” Journal of the American Statistical Association, 64: 328, 1625-1637. Kristoufek, L., Moat, H.S., and Preis, P. (2016), “Estimating Suicide Occurrence Statistics Using Google Trends,” European Physical Journal (EPJ) Data Science,” 5:32. DOI 10.1140/epjds/s13688-016-0094-0 Kim, P.J. and Jennrich, R.I. (1973), "Tables of the Exact Sampling Distribution of the Two-sample Kolmogorov-Smirnov Criterion ," Selected Tables in Mathematical Statistics: Vol I, H.L. Harter and D.B. Owen, eds., Chicago: Markham Publishing Co, 79-170. Marvasti, M.A. (2010), “Quantifying Information Loss through Data Aggregation,” VMware Technical White Paper, 1-14. Marcellino, M. (1999), ‘‘Some Consequences of Temporal Aggregation in Empirical Analysis.’’ Journal of Business and Economic Statistics, 17:1, 129–136. Massey, F.J. (1951), "The Kolmogorov-Smirnov Test for Goodness of Fit," Journal of the American Statistical Association, 46:253, 68-77. Miller, L.H. (1956), “Table of Percentage Points of Kolmogorov Statistics,” Journal of the American Statistical Association, 51:273, 111-121. Pavlicek, J. and Kristoufek, L. (2015), “Nowcasting Unemployment Rates with Google Searches: Evidence from the Visegrad Group Countries,” PLoS ONE, 10(5), e0127084. http://doi.org/10.1371/journal.pone.0127084 Rossana, R. J., and Seater, J.J. (1995), ‘‘Temporal Aggregation and Economic Time Series,’’ Journal of Business and Economic Statistics, 13:4, 441–451. Said, S. E., and Dickey, D. A. (1984), “Testing for Unit Roots in Autoregressive-Moving Average Models of Unknown Order.” Biometrika 71, 599–607. Stephens, M. A. (1970), “Use of the Kolmogorov-Smirnov, Cramer-Von Mises and Related Statistics Without Extensive Tables,” Journal of the Royal Statistical Society. Series B (Methodological), 32:1, 115-122. Stephens, M. A. (1974), “EDF Statistics for Goodness of Fit and Some Comparisons,” Journal of the American Statistical Association, 69:347, 730-737. Tierney, H. L. R. and Pan, B. (2013), “A Poisson Regression Examination of the Relationship between Website Traffic and Search Engine Queries,” NETNOMICS: Economic Research and Electronic Networking, 13:3, 155-189. |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/84474 |