Armstrong, J. Scott (2007): Significance Tests Harm Progress in Forecasting. Published in: International Journal of Forecasting No. 23 (2007): pp. 321-336.
Preview |
PDF
MPRA_paper_81664.pdf Download (1MB) | Preview |
Abstract
Based on a summary of prior literature, I conclude that tests of statistical significance harm scientific progress. Efforts to find exceptions to this conclusion have, to date, turned up none. Even when done correctly, significance tests are dangerous. I show that summaries of scientific research do not require tests of statistical significance. I illustrate the dangers of significance tests by examining an application to the M3-Competition. Although the authors of that reanalysis conducted a proper series of statistical tests, they suggest that the original M3 was not justified in concluding that combined forecasts reduce errors and that the selection of the best method is dependent upon the selection of a proper error measure. I show that the original conclusions were justified and that they are correct. Authors should try to avoid tests of statistical significance, journals should discourage them, and readers should ignore them. Instead, to analyze and communicate findings from empirical studies, one should use effect sizes, confidence intervals, replications/extensions, and meta-analyses.
Item Type: | MPRA Paper |
---|---|
Original Title: | Significance Tests Harm Progress in Forecasting |
Language: | English |
Keywords: | accuracy measures, combining forecasts, confidence intervals, effect size, M-competition, meta-analysis, null hypothesis, practical significance, replications |
Subjects: | C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General > C18 - Methodological Issues: General |
Item ID: | 81664 |
Depositing User: | J Armstrong |
Date Deposited: | 12 Nov 2017 19:58 |
Last Modified: | 27 Sep 2019 06:48 |
References: | Armstrong, J. S. (2006). Findings from evidence-based forecasting: Methods for reducing forecast error. International Journal of Forecasting (Forthcoming). (In full text under Working Papers at forecastingprinciples.com) Armstrong, J. S. (2001a). Principles of Forecasting: A Handbook for Researchers and Practitioners. Boston: Kluwer Academic Publishers. Armstrong, J. S. (2001b). Combining forecasts. In J. S. Armstrong (ed.), Principles of Forecasting. Boston: Kluwer Academic Publishing. Armstrong, J. S. & Collopy, F. (1992). Error measures for generalizing about forecasting methods: Empirical comparisons. International Journal of Forecasting, 8, 69-80. Atkinson, D. R., Furlong, M. J. & Wampold, B. E. (1982). Statistical significance, reviewer evaluations, and the scientific process: Is there a statistically significant relationship? Journal of Counseling Psychology 29, No. 2, 189-194. Cohen, J. (1988). The earth is round (p < .05). American Psychologist, 49, 997-1003. Goodwin, P. & R. Lawton (2003). Debiasing forecasts: How useful is the unbiasedness test? International Journal of Forecasting, 19, 467-475. Hubbard, R. & Armstrong, J.S. (1992). Are null results becoming and endangered species in marketing? Marketing Letters, 3 (April), 127-136 Hubbard, R. & Bayarri, M. J. (2003). Confusion over measures of evidence (p’s) versus errors (a’s) in classical statistical testing (with comments). The American Statistician, 57, (August), 171-182. Hubbard, R. & Ryan, P. A. (2000). The historical growth of statistical significance testing in psychology – and its future prospects. Educational and Psychological Measurement, 60, 661-681. Hunter, J. E. (1997). Needed: A ban on the significance test. Psychological Science, 8, (1), 1-20. Koning, A. J., Franses, P.H., Hibon, M. & Stekler, H. O. (2005). The M3 competition: Statistical tests of the results. International Journal of Forecasting, 21, 397-409. 10 Makridakis, S. & Hibon, M. (2000). The M3-Competition: Results, conclusions and implications. International Journal of Forecasting, 16, 451-476. McCloskey, D. N. & Ziliak, S. T. (1996). The standard error of regressions. Journal of Economic Literature, 34, 97-114. Schmidt, Frank L. & Hunter, J. E. (1997). Eight common but false objections to the discontinuation of significance testing in the analysis of research data, in Harlow, Lisa L., Mulaik, S. A. & Steiger, J. H. What if there were no Significance Tests? London: Lawrence Erlbaun. Shea, C. (1996). Psychologists debate accuracy of significance test. The Chronicle of Higher Education, 42 (August 16), A12 & A17. Shrout, P. E. (1997), Should significance tests be banned? Introduction to a special section exploring the pros and cons. Psychological Science, 8, (1), 1-2 (Special Section follows on pages 3-20). Wright, M. & Armstrong, J.S. (2006). Verification of citations: Fawlty towers of knowledge? Working paper (available in full text at jscottarmstrong.com) |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/81664 |