Machine Learning Final Project: Othello
Improving the accuracy of Othello by using new primitives and intelligent training systems
By Janak J Parekh and Scott A Susser
 
To navigate through this document quickly, click on one of the following keywords: 
Abstract * Introduction * Training * Implementation * Results * Other Works * Future Research * Conclusion * Miscellaneous

Abstract

Project Othello showed us the importance of the training system in the development of the genetic program.  As in "real life", the GP needs an opportunity to work with randomness and then build to higher, successive levels of players, until it is "fit" enough to compete against strong players like Edgar.  Therefore, we decided to develop an intelligent training system, encapsulating several, not one, models: As was true in the original Project Othello, time was our largest constraint, and it prohibited further research into modifying the trainer into working around the weaknesses in our generated players.

Introduction (So Ya Wanna Play A Game . . .)

As anyone who has ever played Othello (or Reversi if you are employed by Microsoft) can tell you, Othello is a very challenging game to play.  As it says on the box of the Milton Bradley edition, it takes, "A Minute to Learn . . . A Lifetime to Master."  The biggest problem we faced in developing a good player was in determining what primitives to give to Edgar to form his new evaluation function with.  This is rather difficult, because unlike most standard board games, the classic measure of the value of a board, namely the number of your pieces vs. those of your opponents, does not count for much in Othello.  This requires that we find primitives that express the qualities of a game board that are, at least in part, the true measure of its value.  These primitives were chosen by our resident Othello expert, Scott, for the original Project Othello assignment.  They are all described here.  (These descriptions come from the original Project Othello assignment.

Training (Looking For Some Competition?)

We hoped that these primitives would enable our genetic player to gain a greater insight into what makes an othello board a winner.  To help him along, we designed the following ten trainers. Note that the player is white in all of these cases.

Implementation details

The above was implemented by creating a new kind of trainer that extended OthelloPlayer, which in itself encapsulated the above ten genetic players as well as Othello.  The OthelloTrainer, as it came to be called, determined which player to pit against the new GP by varying on the generation #, i.e., every three generations a successive player was employed. After the first 30 generations, Edgar was pitted against the GP for three generations in turn.  (We chose three generations as a compromise between completeness and time.)

The resulting OthelloTrainer player was pitted against the GP learning algorithm in the GPOthello class and training was done with a population size of 200 and, as previously mentioned, 33 generations.  (One game was played per player per generation, since the players were deterministic).  Several different runs were made to see if mutation would work better in certain instances.

In order to evaluate the results generated by the above, in addition to using the fitness results generated by the system we created an OthelloPlayGP class which would essentially allow Scott to play our generated GP players.  The results discuss some of the findings and conclusions he made about the generated players.   (In fact, look below if you would like to challenge our player itself!)

Results

Our results were mixed -- the players generated through this system were in general better than those made by our original project, which naively trained against Edgar, but the players did not fare terribly well against Edgar either. Considering that the final-generated players were not terribly successful, we decided to investigate our own players to try and determine if our players did an adequate training job.  We picked one of the better ones and presented it to Scott:

Other works (the competition -- related Othello AI research)

In general, Edgar's approach to learning how to play Othello is not so optimized.  He basically provides evaluation functions to be used on any given game board at any given time.  One way to improve upon this would be to implement an artificial intelligence algorithm to search the game board space for a few moves ahead, looking to see what the best move would be considering what moves would follow it.  One way of doing this, using an alpha-beta search, is similar to what was done by Robert E. Smith and Brian Gray in their paper Co-Adaptive genetic Algorithms: An Example in Othello Strategy.   In their paper, Smith and Gray describe a strategy of training their Othello player with a Co-Adaptive fitness function.  Each individual in the population was evaluated on how well it performed playing games with other members of the population.  As was noted in the paper, this method does have a major downside, one which is not a significant problem for our method.  The  co-adaptive training tends to have a negative effect on the fitness of the better players in the population when they compete against each other.  Their results exhibit this, as there is not apparent improvement in the quality of players over time.  However, the player was fairly competitive, partially due to the use of an alpha-beta search to evaluate moves several levels down in the game.

The use of alpha-beta to evaluate an Othello board is by no means a new idea.  In Sections 18.6 - 18.12 of his book Paradigms of Artificial Intelligence Programming, Peter Norvig outlines a lisp program to play Othello using an alpha-beta evaluation function.  He mentions two high-calibur Othello playing programs, Iago, created by Paul Rosenbloom, and Bill, created by Kai-Fu-Lee.  The later program, Bill, defeated the highest rated American Othello player (Brian Rose) by a score of 56-8 in 1989.   As we did, Norvig chooses to focus on certain aspects of the Othello board in his evaluation function (His code actually comes from both Iago and Bill).  The two he chooses are Mobility and Edge Stability.   Norvig defines two types of mobility, current and potential.  Current mobility is defined as the number of moves available to the player.  Potential mobility is the number of empty spaces next to an opponents piece.  Edge stability is determined by taking the approximately 60,000 possible combinations of edge contents and evaluating them to see which are optimal.  Norvig combines these factors in a linear function, with coefficients being based on the move number.  While this is apparently an effective strategy, we fell having Edgar set his own weights will be as effective, assuming he can be trained sufficiently.

In his paper Games Computers Play: Simulating Characteristic Function Game Playing Agents with Classifier Systems, Garett Dworman discussed the abilities of his generated players to achieve their optimal performance level. The details of his "game" are not so important, as his methods are very general and could be easily applied to many types of learning.  Dwormans paper concluded that his players were, in fact, able to reach their optimal levels in competition with each other.  Each player started with a simple evaluation function which it adjusted through training.  Dworman trained his players to the point where they had reached the optimal playing level, and compared them to his evaluation of what the player should do.  They matched.

Future Research

There are several ways we could continue our research.  One way would be to integrate training against random players, i.e., incorporate nondeterminism.  This would ensure that the GA is progressing well for a desired target against humans, which are inherently nondeterministic (most of the time).  Additionally, as was mentioned before, time was a major constraint in this project.  The time required to train is substantial.  If we had more time, there are ways we could have tried to improve the training process so it works around weaknesses it finds in the trainers.  We might even use meta-learning and use a learning method to evaluate better training functions.  One method of developing a good player would be to build up to it like a pyramid: start training against random players, and develop a GA.  Make that GA the first training player, and create a second GA.  Do this until there are sufficient GAs to train a very good player.

Conclusion

Although the results were not the best, the system that was used to implement it can be adapted for future experimentation.   The most important guideline in guiding a trainer is in making sure the training players guide the GA along appropriately.  The players must be of different levels, so that the GA doesn't flounder, but has successively harder competition so it can improve.  Spending substantial time in the player-creation process seems to be important.  We also discovered the GP has a tendency to evolve simple players, so an important starting point is to select a few, high quality primitives to use.   With some additional programming, it should be feasible to develop a practical, commercial Othello player.

Miscellaneous

We're sure this document has just whetted your appetite to learn or master Othello.  Here's a start: if you would like to play our generated intermediate player, click here!

Scott's expert status is a scientific fact, not a product of boasting (although that doesn't hurt either).  If you need convincing, click here to see his record against Edgar.


Last updated Dec 17 1997 by Janak J Parekh and Scott A Susser.