Akimov, Dmitry and Makarov, Ilya (2019): Deep Reinforcement Learning with VizDoomFirst-Person Shooter. Published in: CEUR Workshop Proceedings , Vol. 2479, (2019): pp. 3-17.
PDF
paper1.pdf Download (4MB) |
Abstract
In this work, we study deep reinforcement algorithms forpartially observable Markov decision processes (POMDP) combined withDeep Q-Networks. To our knowledge, we are the first to apply standardMarkov decision process architectures to POMDP scenarios. We proposean extension of DQN with Dueling Networks and several other model-freepolicies to training agent using deep reinforcement learning in VizDoomenvironment, which is replication of Doom first-person shooter. We de-velop several agents for the following scenarios in VizDoom first-personshooter (FPS): Basic, Defend The Center, Health Gathering. We com-pare our agent with Recurrent DQN with Prioritized Experience Replayand Snapshot Ensembling agent and get approximately triple increase inper episode reward. It is important to say that POMDP scenario closethe gap between human and computer player scenarios thus providingmore meaningful justification for Deep RL agent performance.
Item Type: | MPRA Paper |
---|---|
Original Title: | Deep Reinforcement Learning with VizDoomFirst-Person Shooter |
English Title: | Deep Reinforcement Learning with VizDoomFirst-Person Shooter |
Language: | English |
Keywords: | Deep Reinforcement Learning; VizDoom; First-Person Shooter; DQN; Double Q-learning; Dueling |
Subjects: | C - Mathematical and Quantitative Methods > C0 - General > C02 - Mathematical Methods C - Mathematical and Quantitative Methods > C6 - Mathematical Methods ; Programming Models ; Mathematical and Simulation Modeling > C63 - Computational Techniques ; Simulation Modeling C - Mathematical and Quantitative Methods > C8 - Data Collection and Data Estimation Methodology ; Computer Programs > C88 - Other Computer Software |
Item ID: | 97307 |
Depositing User: | Dr. Rustam Tagiew |
Date Deposited: | 09 Dec 2019 15:55 |
Last Modified: | 09 Dec 2019 15:55 |
References: | Akimov, D., Makarov, I.: Deep reinforcement learning in vizdoom first-personshooter for health gathering scenario. In: MMEDIA. pp. 1–6 (2019) Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on rein-forcement learning. arXiv:1707.06887 (2017) Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning envi-ronment: An evaluation platform for general agents. In: Proceedings of the 24thInternational Conference on Artificial Intelligence. pp. 4148–4152. IJCAI’15, AAAIPress (2015),http://dl.acm.org/citation.cfm?id=2832747.2832830 Hausknecht, M., Stone, P.: Deep recurrent q-learning for partially observable mdps.CoRR, abs/1507.06527 (2015) Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W.,Horgan, D., Piot, B., Azar, M., Silver, D.: Rainbow: Combining improvements indeep reinforcement learning. arXiv preprint arXiv:1710.02298 (2017) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computa-tion9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735,https://doi.org/10.1162/neco.1997.9.8.1735 Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Ja ́skowski, W.: Vizdoom: Adoom-based ai research platform for visual reinforcement learning. In: CIG’16.pp. 1–8. IEEE (2016) Lample, G., Chaplot, D.S.: Playing fps games with deep reinforcement learning.In: AAAI. pp. 2140–2146 (2017) Makarov, I., Kashin, A., Korinevskaya, A.: Learning to play pong video game viadeep reinforcement learning. CEUR WP pp. 1–6 (2017) Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G.,Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-levelcontrol through deep reinforcement learning. Nature518(7540), 529 (2015) Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXivpreprint arXiv:1511.05952 (2015) Schulze, C., Schulze, M.: Vizdoom: Drqn with prioritized experience replay, double-q learning, & snapshot ensembling. arXiv preprint arXiv:1801.01000 (2018) Sutton, R.S.: Learning to predict by the methods of temporal differences. Machinelearning3(1), 9–44 (1988) Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction, vol. 1. MITpress Cambridge (1998) Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with doubleq-learning. In: AAAI. vol. 16, pp. 2094–2100 (2016) Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.:Dueling network architectures for deep reinforcement learning. arXiv preprintarXiv:1511.06581 (2015) Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning8(3), 279–292 (May1992) Zhu, P., Li, X., Poupart, P., Miao, G.: On improving deep reinforcement learningfor pomdps. arXiv preprint arXiv:1804.06309 (2018) |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/97307 |