Playing Othello by Deep Learning Neural Network

Introduction

This project is the Final Year Project of Argens NG in The University of Hong Kong under the supervision of Dr. K P Chan.

Background

Go has always been considered the pinnacle of artificial intelligence in gaming. With over 300 possible moves at each turn, it was believed to be a game of instinct, a proof that we humans possess something unique to us, a proof we creators bear true intelligence while the machines do not, a proof that held true until AlphaGo defeated Korean professional Lee Sedol 4 to 1 in a 5-game series. This remarkable feat unravels the endless possibilities of deep learning neural network, where work previously thought to be only achievable by humans may finally be handed to our machinery friend.

Abstract

In this project, we will attempt to replicate the success of AlphaGo in the field of Othello (also known as Reversi). We will complete as many parts of AlphaGo as possible under the permitted timeframe and hopefully we would achieve a performance better or more efficient than current available software.

What is Deep Learning

In the traditional approach of Artificial Intelligence, we had to tell the computer to look at specific details and value each in a very specific manner. At some point in the past, programmers have got tired of it. Why not let comptuers do their own learning? Hence come the field of machien learning, where we only need to give the computers training data, and they would understand and improve in specific fields. Deep learning advances this further by attempting to teach them abstract concepts by using multiple layers of "neurons". This has proven especially successful in gaming.

Sample Image 1

Deliverables

  • Main Program
    • It contains a GUI interface which interacts with user as well as the program skeleton of Artificial Intelligence.
  • Wrapper
    • It will be responsible for the communication of the program with other readily available software, as well as earlier version of itself.
  • Training Program
    • It will be responsible for trianing the neural network of the main program. Upon completion, our program should have an idea of the game instinctively.
  • Report
    • A compilation of all acquired data and training history will be presented in the manner of a report.

Order of Importance

  1. Monte Carlo Tree Search
  2. Value Network
  3. Policy Network
  4. Rollout Policy
  5. Evolution Algorithm