On March 15th 2016, the Go-playing computer program known as AlphaGo took the world by surprise as it won its fourth and final game against international Go champion Lee Sedol. This project aims to create a board-playing artificial intelligence (called IagoBot) using similar approaches as AlphaGo but instead of Go, the program will play the much simpler board game Othello using a combination of tree search algorithms and neural networks. The program will be implemented in Python and will offer a simple graphical user interface to play against human challengers.
This project will comprise two phases: the construction phase, in which I create IagoBot as well as its neural networks using Python, and the training phase, in which I feed the program sets of training data over a period of several days to allow it to optimize its neural network weight parameters. For the construction phase, I have chosen Python as the language to code IagoBot due to several reasons. Firstly, it is a dynamically typed language which allows for rapid-prototyping development techniques. Secondly, it provides rich and diverse sets of library functions and APIs developed by the open-source community, making it possible for me to plug rudimentary or generic functions into my code, which in turn cuts down my overall development time. Such libraries include but are not limited to: numpy for easy matrix manipulation, matplotlib to quickly visualize the output of neural networks, pandas for data analysis and scikit-learn, which is a very popular machine learning library. Moreover, Python is widely popular as a machine learning language and there exists an active online community specializing in Python machine learning which I could turn to should I require any help during the process.
An illustration of the various library functions available to Python. Here, a logistic regression model is imported and trained and new test data classified in just a few lines of code using scikit-learn and matplotlib.
IagoBot will utilize a form of tree-search algorithm to help it decide which move to make at each turn. Ideally, the tree-search algorithm would be able to combine both the value and policy networks in a single heuristic that selects what move to make via a lookahead search. I have yet to decide which algorithm will be most suitable for Othello, but it will likely be a variation of Monte-Carlo tree search which is used by AlphaGo. For the training phase, I will begin to train IagoBot by means of supervised learning. To be sure, there are other methods of training, for instance reinforcement learning which is popularly used to train AI programs to play classic Atari games such as Mario or Breakout. However, I have decided that a reinforcement learning approach is not necessary for IagoBot because it is overly complex and totally unnecessary for a relatively simple game (simple meaning computationally inexpensive) like Othello. A supervised learning method would stipulate feeding the program sets of training data which consist of sample games of Othello as well as an analysis (target outcomes) of every player's strength/likelihood to win within each of those games. IagoBot would take these values as the "golden standard", compare them with its actual output at the end of each iteration, and adjust its weight matrix accordingly. The policy network would be trained in a similar manner. During this phase, I would be responsible for feeding IagoBot a series of sample games as training data. These sample games will be stored as plain text documents, and under a unified format. This is so that every group currently taking on this project can produce around 50 games worth of data which are then to be shared with every other group. I plan to construct all my training data by playing against an Othello program currently installed on my phone.
|1||September 30th, 2016||Familiarize with Monte-Carlo tree search algorithms. Familiarize with linear regression and how it can be used to minimize the loss function for a neural network using supervised learning.||100%|
|2||October 8th, 2016||Record 50 sets of sample games to be shared with other groups.||100%|
|3||October 31st, 2016||Create a program to parse game data and record every game state within a game. Dynamically update the win-likelihood of each game state that is duplicated in the game set.||100%|
|4||November 30th, 2016||Design and implement value network.||100%|
|5||December 31st, 2016||Familiarize with minimax algorithms with alpha and beta pruning.
Finish Deep Learning Tutorial (by LISA lab, University of Montreal).
Continuously modify value network.
|6||January 15th, 2017||Train Value Network using existing sample data.
Display error graphically.
|6||January 22nd, 2017||Interim report Design and Implement policy network.||50%|
|7||February 28th, 2017||Begin training stage, run continuous training data to optimize neural network loss function. Optimize via hyper-parameter tuning.||20%|
|7||March 10th, 2017||Design and implement Minimax algorithm based on value and policy networks. Implement main game controller. Implement GUI via tkinter.||90%|
|8||March 31st, 2017||Begin final round of human testing, collaborate with other groups to produce standardized API for inter-group tournament if time allows.||0%|
|9||April 21st, 2017||Deliver finalized implementation. Submit final report for review.||0%|
Please see my Interim Report for a detailed account of my progress with IagoBot to date.