%0 Conference Proceedings
%T Learning How to Play Bomberman with Deep Reinforcement and Imitation Learning
%+ Universidade Federal Fluminense [Rio de Janeiro] (UFF)
%A Goulart, Ícaro
%A Paes, Aline
%A Clua, Esteban
%Z Part 3: Entertainment Algorithms
%< avec comité de lecture
%( Lecture Notes in Computer Science
%B 1st Joint International Conference on Entertainment Computing and Serious Games (ICEC-JCSG)
%C Arequipa, Peru
%Y Erik van der Spek
%Y Stefan Göbel
%Y Ellen Yi-Luen Do
%Y Esteban Clua
%Y Jannicke Baalsrud Hauge
%I Springer International Publishing
%3 Entertainment Computing and Serious Games
%V LNCS-11863
%P 121-133
%8 2019-11-11
%D 2019
%R 10.1007/978-3-030-34644-7_10
%K Bomberman
%K Proximal Policy Optimization
%K Reinforcement Learning
%K LSTM
%K Imitation Learning
%Z Computer Science [cs]Conference papers
%X Making artificial agents that learn how to play is a long-standing goal in the area of Game AI. Recently, several successful cases have emerged driven by Reinforcement Learning (RL) and neural network-based approaches. However, in most of the cases, the results have been achieved by training directly from pixel frames with valuable computational resources. In this paper, we devise agents that learn how to play the popular game of Bomberman by relying on state representations and RL-based algorithms without looking at the pixel level. To that, we designed five vector-based state representations and implemented Bomberman on the top of the Unity game engine through the ML-agents toolkit. We enhance the ML-agents algorithms by developing an Imitation-based learner (IL) that improves its model with the Actor-Critic Proximal-Policy Optimization (PPO) method. We compared this approach with a PPO-only learner that uses either a Multi-Layer Perceptron or a Long-Short Term-Memory network (LSTM). We conducted several pieces of training and tournament experiments by making the agents play against each other. The hybrid state representation and our IL followed by PPO learning algorithm achieve the best overall quantitative results, and we also observed that their agents learn a correct Bomberman behavior.
%G English
%Z TC 14
%2 https://inria.hal.science/hal-03652029/document
%2 https://inria.hal.science/hal-03652029/file/491829_1_En_10_Chapter.pdf
%L hal-03652029
%U https://inria.hal.science/hal-03652029
%~ IFIP-LNCS
%~ IFIP
%~ IFIP-ICEC
%~ IFIP-TC14
%~ IFIP-LNCS-11863
%~ IFIP-JCSG