FYP17012: Learning to Play Computer Games with Deep Learning and Reinforcement Learning

About

The project's main objective is to use reinforcement learning and deep learning in order to play computer games, specifically Pacman. The project will be broadly divided into two stages. Specifically, the first stage involves exploration of various reinforcement algorithms and neural network architectures by using Pacman as a benchmark. The second stage would then involve improving the algorithm and subsequently testing it on Pacman maps.

Methodology

A modified verson of Rainbow DQN is used to play four maps of varyiing difficulty, as shown in the above image. Agents are trained until policies converged, and the traind agents' performance is recorded by testing for 500 games on original and modified maps. Note that the sprites' positions in 506Pacman are initialized randomly per game.

Results and Discussion

Map	No. of training games	No. of training steps	Win Rate (%)	Average Score
506Pacman	10000	30000	100.0	507.67
smallGrid	5000	60000	48.0	-21.35
mediumGrid	2500	50000	95.2	478.34
smallClassic	1000	40000	0.0	-26.84

Modified Rainbow DQN obtains high performance for simple maps. In particular, harder maps take more steps for the agent to learn. The right figure shows Q-values for four states in smallGrid after prolonged training, where the Q values created are sensical in terms of the agent's policy, ie. the trained agent learned to plan its path towards dots.

No. of dots in 506Pacman	Win Rate (%)	Average Score
2	93.4	448.18
3	85.6	375.96
4	82.4	351.06
5	68.8	217.56
6	60.2	135.32

In terms of the ability of the agent to generalize its skills to unfamiliar situations, the trained 506Pacman agent generalizes well in map variants, but the trained smallGrid agent fails to generalize well in map variants.

Conclusion

Though data-inefficient with limited generalization, Modified Rainbow DQN can attain high performance in Pacman. Furthermore, the algorithm can be improved by decreasing trianing time, or by introducing more human-like characteristics, such as intrinsic motivation and imagination.

Software deliverables

Github link to be posted soon.

Project Progress

Date	Task	Status
Early October 2017	Preliminary Research Perform preliminary literature research on existing games that use reinforcement learning and deep learning techniques. Experiment with various classic RL methods, including policy iteration, value iteration and tabular Q-Learning	Completed
October 1, 2017	Phase 1 Deliverables (Inception): Project Scheme, Detailed Project Plan	Completed
Mid October 2017	Stage 1: Algorithmic Exploration Implement and evaluate the performance of DQN in Pacman Read up on variants of DQN algorithm	Completed
October 31, 2017	Read up on variants of DQN algorithm Implement DDQN, Duel QN and DQN with proportional-based PER	Completed
Mid November 2017	Implement the 506Pacman map Experiment with DQN algorithms in different Pacman maps	Completed
December 2017	Train and test DQN and its variants on 506Pacman and establish results	Completed
January 22, 2018	First Presentation	Completed
January 21, 2018	Phase 2 Deliverables (Elaboration): Pacman RL Implementations, Interim Report	Completed
January - April 2018	Stage 2: RL implementation and Optimization Improve upon existing reinforcement learning algorithm Play on larger Pacman maps	Completed
April 15, 2018	Phase 3 (Construction): Game RL Implementation, Final Report	Completed
April 19, 2018	Final Presentation	Completed
May 2, 2018	Project Exhibition	Completed

Acknowledgement

Much gratitude towards my FYP supervisor for providing support during the project and HKXF for providing financial support through the HKXF FYP+ supporting scheme.

Learning to Play Computer Games with Deep Learning and Reinforcement Learning

A COMP4801 Final Year Project in 2017-2018

Student: Mak Jeffrey Kelvin

Supervisor: Dr. Dirk Schneiders

About

Methodology

Results and Discussion

Conclusion

Software deliverables

Project Progress

Acknowledgement

Student: Mak Jeffrey Kelvin

Supervisor: Dr. Dirk Schneiders

About

Methodology

Results and Discussion

Conclusion

Project Plan

Interim Report

Final Report

Software deliverables

Project Progress

Acknowledgement