Ferman, Bruno and Lima, Lycia and Riva, Flavio (2020): Experimental Evidence on Artificial Intelligence in the Classroom.
Preview |
PDF
MPRA_paper_103934.pdf Download (640kB) | Preview |
Abstract
This paper investigates how technologies that use different combinations of artificial and human intelligence are incorporated into classroom instruction, and how they ultimately affect students' outcomes. We conducted a field experiment to study two technologies that allow teachers to outsource grading and feedback tasks on writing practices. The first technology is a fully automated evaluation system that provides instantaneous scores and feedback. The second one uses human graders as an additional resource to enhance grading and feedback quality in aspects in which the automated system arguably falls short. Both technologies significantly improved students' essay scores, and the additional inputs from human graders did not improve effectiveness. Furthermore, the technologies similarly helped teachers engage more frequently on nonroutine tasks that supported the individualization of pedagogy. Our results are informative about the potential of artificial intelligence to expand the set of tasks that can be automated, and on how advances in artificial intelligence may relocate human labor to tasks that remain out of reach of automation.
Item Type: | MPRA Paper |
---|---|
Original Title: | Experimental Evidence on Artificial Intelligence in the Classroom |
Language: | English |
Keywords: | artificial intelligence; technological change; automated writing evaluation; routine and nonroutine tasks |
Subjects: | I - Health, Education, and Welfare > I2 - Education and Research Institutions > I21 - Analysis of Education I - Health, Education, and Welfare > I2 - Education and Research Institutions > I22 - Educational Finance ; Financial Aid I - Health, Education, and Welfare > I2 - Education and Research Institutions > I25 - Education and Economic Development |
Item ID: | 103934 |
Depositing User: | Bruno Ferman |
Date Deposited: | 05 Nov 2020 14:25 |
Last Modified: | 05 Nov 2020 14:25 |
References: | Abaurre, M. L. M. and Abaurre, M. B. M. (2012). Um olhar objetivo para a produ¸c˜ao escrita: analisar, avaliar, comentar. Acemoglu, D. and Autor, D. (2011). Skills, tasks and technologies: Implications for employment and earnings. In Handbook of labor economics, volume 4, pages 1043–1171. Elsevier. Anderson, M. L. (2008). Multiple inference and gender differences in the effects of early intervention: A reevaluation of the abecedarian, perry preschool, and early training projects. Journal of the American statistical Association, 103(484):1481–1495. Autor, D. H., Levy, F., and Murnane, R. J. (2003). The skill content of recent technological change: An empirical exploration. The Quarterly journal of economics, 118(4):1279–1333. Banerjee, A., Glewwe, P., Powers, S., and Wasserman, M. (2013). Expanding access and increas- ing student learning in post-primary education in developing countries: A review of the evidence. Cambridge, MA: Massashusetts Institute of Technology. Barkaoui, K. and Knouzi, I. (2012). Combining score and text analyses to examine task equivalence in writing assessments. In Measuring Writing: Recent Insights into Theory, Methodology and Practice, pages 83–115. Brill. Bettinger, E., Fairlie, R. W., Kapuza, A., Kardanova, E., Loyalka, P., and Zakharov, A. (2020). Does edtech substitute for traditional learning? experimental estimates of the educational production function. Technical report, National Bureau of Economic Research. Borman, G. D., Dowling, N. M., and Schneck, C. (2008). A multisite cluster randomized field trial of open court reading. Educational Evaluation and Policy Analysis, 30(4):389–407. Bulman, G. and Fairlie, R. W. (2016). Technology and education: Computers, software, and the internet. In Handbook of the Economics of Education, volume 5, pages 239–280. Elsevier. de Chaisemartin, C. and Ramirez-Cuellar, J. (2019). At what level should one cluster standard errors in paired experiments? arXiv preprint arXiv:1906.00288. Doss, C. J., Fahle, E. M., Loeb, S., and York, B. N. (2018). More than just a nudge: Supporting kindergarten parents with differentiated and personalized text-messages. Technical report, National Bureau of Economic Research. Duflo, E., Dupas, P., and Kremer, M. (2011). Peer effects, teacher incentives, and the impact of track- ing: Evidence from a randomized evaluation in kenya. American Economic Review, 101(5):1739–74. Elbow, P. (1981). Writing with power: Techniques for mastering the writing process. Oxford University Press. Ferman, B. (2019). A simple way to assess inference methods. arXiv preprint arXiv:1912.08772. Ferman, B., Finamor, L., and Lima, L. (2019). Are public schools ready to integrate math classes with khan academy? Ferman, B. and Ponczek, V. (2017). Should we drop covariate cells with attrition problems? Mpra paper, University Library of Munich, Germany. Fonseca, E., Medeiros, I., Kamikawachi, D., and Bokan, A. (2018). Automatically grading brazilian student essays. In Computational Processing of the Portuguese Language (Lecture Notes in Computer Science, vol 11122). Springer. Fryer, R. G. (2017). The production of human capital in developed countries: Evidence from 196 randomized field experimentsa. In Handbook of Economic Field Experiments, volume 2, pages 95– 322. Elsevier. Grimes, D. and Warschauer, M. (2010). Utility in a fallible tool: A multi-site case study of automated writing evaluation. The Journal of Technology, Learning and Assessment, 8(6). Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics, pages 65–70. INEP/MEC (2018). Reda¸c˜ao do enem 2018. Cartilha do Participante. INEP/MEC (2019). Reda¸c˜ao do enem 2019. Cartilha do Participante. Jones, S. M., Brown, J. L., and Lawrence Aber, J. (2011). Two-year impacts of a universal school-based social-emotional and literacy intervention: An experiment in translational developmental research. Child Development, 82(2):533–554. Kim, J. S., Olson, C. B., Scarcella, R., Kramer, J., Pearson, M., van Dyk, D., Collins, P., and Land, R. E. (2011). A randomized experiment of a cognitive strategies approach to text-based analytical writing for mainstreamed latino english language learners in grades 6 to 12. Journal of Research on Educational Effectiveness, 4(3):231–263. Knudsen, E. I., Heckman, J. J., Cameron, J. L., and Shonkoff, J. P. (2006). Economic, neurobiological, and behavioral perspectives on building america’s future workforce. Proceedings of the national Academy of Sciences, 103(27):10155–10162. Lee, D. S. (2009). Training, wages, and sample selection: Estimating sharp bounds on treatment effects. The Review of Economic Studies, 76(3):1071–1102. Luckin, R., Holmes, W., Griffiths, M., and Forcier, L. B. (2016). Intelligence unleashed: An argument for ai in education. McCurry, D. (2012). Computer scoring and quality of thought in assessing writing. Mokyr, J., Vickers, C., and Ziebarth, N. L. (2015). The history of technological anxiety and the future of economic growth: Is this time different? Journal of economic perspectives, 29(3):31–50. Mooney, P. J. (2003). An investigation of the effects of a comprehensive reading intervention on the beginning reading skills of first graders at risk for emotional and behavioral disorders. Morrow, L. M. (1992). The impact of a literature-based program on literacy achievement, use of literature, and attitudes of children from minority backgrounds. Reading Research Quarterly, pages 251–275. Muralidharan, K., Singh, A., and Ganimian, A. J. (2019). Disrupting education? experimental evidence on technology-aided instruction in india. American Economic Review. Neumann, A. (2012). Advantages and disadvantages of different text coding procedures for research and practice in a school context. In Measuring Writing: Recent Insights into Theory, Methodology and Practice, pages 33–54. Brill. Palermo, C. and Thomson, M. M. (2018). Teacher implementation of self-regulated strategy develop- ment with an automated writing evaluation system: Effects on the argumentative writing perfor- mance of middle school students. Contemporary Educational Psychology, 54:255–270. Perelman, L. (2014). When “the state of the art” is counting words. Assessing Writing, 21:104–111. Pinnell, G. S., Lyons, C. A., Deford, D. E., Bryk, A. S., and Seltzer, M. (1994). Comparing instructional models for the literacy education of high-risk first graders. Reading Research Quarterly, pages 9–39. Puma, M., Bell, S., Cook, R., Heid, C., and Lopez, M. (2005). Head start impact study: First year findings. Administration for Children & Families. Shermis, M. D., Burstein, J. C., and Bliss, L. (2008). The impact of automated essay scoring on high stakes writing assessments. In annual meeting of the National Council on Measurement in Education. Somers, M.-A., Corrin, W., Sepanik, S., Salinger, T., Levin, J., and Zmach, C. (2010). The enhanced reading opportunities study final report: The impact of supplemental literacy courses for strug- gling ninth-grade readers. ncee 2010-4021. National Center for Education Evaluation and Regional Assistance. Wilson, J. and Roscoe, R. D. (2019). Automated writing evaluation and feedback: Multiple metrics of efficacy. Journal of Educational Computing Research, page 0735633119830764. York, B. N. and Loeb, S. (2014). One step at a time: The effects of an early literacy text messaging program for parents of preschoolers. Technical report, National Bureau of Economic Research. Young, A. (2018). Channeling Fisher: Randomization Tests and the Statistical Insignificance of Seem- ingly Significant Experimental Results*. The Quarterly Journal of Economics, 134(2):557–598. |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/103934 |