Map |
No. of training games |
No. of training steps |
Win Rate (%) |
Average Score |
506Pacman |
10000 |
30000 |
100.0 |
507.67 |
smallGrid |
5000 |
60000 |
48.0 |
-21.35 |
mediumGrid |
2500 |
50000 |
95.2 |
478.34 |
smallClassic |
1000 |
40000 |
0.0 |
-26.84 |
Modified Rainbow DQN obtains high performance for simple maps. In particular, harder maps take more steps for the agent to learn.
The right figure shows Q-values for four states in smallGrid after prolonged training,
where the Q values created are sensical in terms of the agent's policy, ie. the trained agent learned to plan its path towards dots.
No. of dots in 506Pacman |
Win Rate (%) |
Average Score |
2 |
93.4 |
448.18 |
3 |
85.6 |
375.96 |
4 |
82.4 |
351.06 |
5 |
68.8 |
217.56 |
6 |
60.2 |
135.32 |
In terms of the ability of the agent to generalize its skills to unfamiliar situations, the trained 506Pacman agent generalizes well in map variants, but the trained smallGrid agent fails to generalize well in map variants.