Learning to Play Computer Games with Deep Learning and Reinforcement Learning

A COMP4801 Final Year Project in 2017-2018

Student: Mak Jeffrey Kelvin

Supervisor: Dr. Dirk Schneiders

About



The project's main objective is to use reinforcement learning and deep learning in order to play computer games, specifically Pacman. The project will be broadly divided into two stages. Specifically, the first stage involves exploration of various reinforcement algorithms and neural network architectures by using Pacman as a benchmark. The second stage would then involve improving the algorithm and subsequently testing it on Pacman maps.

Methodology


Methodology

A modified verson of Rainbow DQN is used to play four maps of varyiing difficulty, as shown in the above image. Agents are trained until policies converged, and the traind agents' performance is recorded by testing for 500 games on original and modified maps. Note that the sprites' positions in 506Pacman are initialized randomly per game.

Results and Discussion


Map No. of training games No. of training steps Win Rate (%) Average Score
506Pacman 10000 30000 100.0 507.67
smallGrid 5000 60000 48.0 -21.35
mediumGrid 2500 50000 95.2 478.34
smallClassic 1000 40000 0.0 -26.84
Methodology


 
Modified Rainbow DQN obtains high performance for simple maps. In particular, harder maps take more steps for the agent to learn. The right figure shows Q-values for four states in smallGrid after prolonged training, where the Q values created are sensical in terms of the agent's policy, ie. the trained agent learned to plan its path towards dots.

No. of dots in 506Pacman Win Rate (%) Average Score
2 93.4 448.18
3 85.6 375.96
4 82.4 351.06
5 68.8 217.56
6 60.2 135.32
Methodology


 
In terms of the ability of the agent to generalize its skills to unfamiliar situations, the trained 506Pacman agent generalizes well in map variants, but the trained smallGrid agent fails to generalize well in map variants.

Conclusion

Though data-inefficient with limited generalization, Modified Rainbow DQN can attain high performance in Pacman. Furthermore, the algorithm can be improved by decreasing trianing time, or by introducing more human-like characteristics, such as intrinsic motivation and imagination.

Project Plan

Interim Report

Final Report

Software deliverables

Github link to be posted soon.


Project Progress

Date Task Status
Early October 2017 Preliminary Research
  • Perform preliminary literature research on existing games that use reinforcement learning and deep learning techniques.
  • Experiment with various classic RL methods, including policy iteration, value iteration and tabular Q-Learning
Completed
October 1, 2017 Phase 1 Deliverables (Inception): Project Scheme, Detailed Project Plan Completed
Mid October 2017 Stage 1: Algorithmic Exploration
  • Implement and evaluate the performance of DQN in Pacman
  • Read up on variants of DQN algorithm
Completed
October 31, 2017
  • Read up on variants of DQN algorithm
  • Implement DDQN, Duel QN and DQN with proportional-based PER
Completed
Mid November 2017
  • Implement the 506Pacman map
  • Experiment with DQN algorithms in different Pacman maps
Completed
December 2017
  • Train and test DQN and its variants on 506Pacman and establish results
Completed
January 22, 2018 First Presentation Completed
January 21, 2018 Phase 2 Deliverables (Elaboration): Pacman RL Implementations, Interim Report Completed
January - April 2018 Stage 2: RL implementation and Optimization
  • Improve upon existing reinforcement learning algorithm
  • Play on larger Pacman maps
Completed
April 15, 2018 Phase 3 (Construction): Game RL Implementation, Final Report Completed
April 19, 2018 Final Presentation Completed
May 2, 2018 Project Exhibition Completed

Acknowledgement

Much gratitude towards my FYP supervisor for providing support during the project and HKXF for providing financial support through the HKXF FYP+ supporting scheme.