GP Othello -- Machine Learning Final Project

Machine Learning Final Project: Othello
Improving the accuracy of Othello by using new primitives and intelligent training systems
By Janak J Parekh and Scott A Susser

To navigate through this document quickly, click on one of the following keywords:
Abstract * Introduction * Training * Implementation * Results * Other Works * Future Research * Conclusion * Miscellaneous

Abstract

Project Othello showed us the importance of the training system in the development of the genetic program. As in "real life", the GP needs an opportunity to work with randomness and then build to higher, successive levels of players, until it is "fit" enough to compete against strong players like Edgar. Therefore, we decided to develop an intelligent training system, encapsulating several, not one, models:

Several players designed & hard coded by us that become progressively harder at each level (for example, a player that always tries to reach the edges, etc.)
Edgar, and finally,
Scott, our in-house Othello expert.

As was true in the original Project Othello, time was our largest constraint, and it prohibited further research into modifying the trainer into working around the weaknesses in our generated players.

Introduction (So Ya Wanna Play A Game . . .)

As anyone who has ever played Othello (or Reversi if you are employed by Microsoft) can tell you, Othello is a very challenging game to play. As it says on the box of the Milton Bradley edition, it takes, "A Minute to Learn . . . A Lifetime to Master." The biggest problem we faced in developing a good player was in determining what primitives to give to Edgar to form his new evaluation function with. This is rather difficult, because unlike most standard board games, the classic measure of the value of a board, namely the number of your pieces vs. those of your opponents, does not count for much in Othello. This requires that we find primitives that express the qualities of a game board that are, at least in part, the true measure of its value. These primitives were chosen by our resident Othello expert, Scott, for the original Project Othello assignment. They are all described here. (These descriptions come from the original Project Othello assignment.

Primitive 1. Number of white and black pieces near occupied corners (num_white_near_corners; num_black_near_corners)

Why does this matter? While the corner squares are the most important, and powerful, squares in the game, occupying a square next to an opponent's corner square can take away some of his advantage, while occupying a square next to one of your corner squares can help seal yours.

     Say the top row is set up as follows (an X is an empty square):
     X X B B B B B X
     Then B goes in the corner on the left:
     B X B B B B B X

Invoking this primitive, the GP would want to go in column 2. Not only does this block the corner and help get white the top row of the game board, it also gains him a corner.

     White's turn:
     B W B B B B B X
     White's next turn:
     B W W W W W W W

Primitive 2. Number of white and black pieces near the sides, excluding near corners (num_white_near_side; num_black_near_side)

O O O O O O O O

O O X X X X O O

O X O O O O X O

O O X X X X O O

O O O O O O O O

These spaces are bad for a player to occupy. Why? They make the sides accessible to your opponent, which gives him
greater control over the playing area. (Remember that the pieces on the sides can only be flipped in one dimension.)

Primitive 3. Number of white and black pieces in "solid side-centers" (num_white_center_edges, num_black_center_edges)

This primitive can be represented by the following (the X-X represents the desired positions):

O O X-----X O O

O O O O O O O O

X O O O O O O X

| O O O O O O |

X O O O O O O X

O O O O O O O O

O O X-----X O O

This is useful because when a player has control of the 4 center pieces on a side of the board, he has a very useful tool. It eliminates the 2 squares next to it (row or columns 2 and 7) from being used by his opponent, because if his opponent plays there, the GP can take the corner. It also gives the GP 2 additional safe spaces to make a move. When the middle of the game board is filled up, and the game comes down to the corners, the GP can use those 2 spaces with minimal danger of giving his opponent a corner.

In addition to the above analysis, we have decided on our own brief evaluations of the existing primitives:

num_white, num_black - These are the traditional measures of value for board games. You have more pieces so you're winning. Sounds good, right? Well, not when you're dealing with Othello. The rules of play provide for quick reversals of fortune that makes the game entirely into one of strategic placement of pieces, and not one of numbers (except after the last turn, of course).

num_white_edges, num_black_edges - Now these are important. They help establish one of the most important things needed for a victory in Othello ( ** Novice Players Note: Big tip on how to play ------> ) is control of the middle pieces. Not we say control, NOT occupancy. With pieces on the edges, large quantities of the middle pieces can be captured at once, with no danger of your opponent coming up from behind. Of course, if you want to occupy the edges, it helps to have control of them. This brings us to:

num_white_corners, num_black_corners - These are THE MOST IMPORTANT squares on the board. The corner pieces can never be flipped, and therefore occupying them can give a player control of the perimeter of the game board, which in turn delivers the middle of the board, and, the game. It is (although possible) difficult to neutralize the corner pieces. In fact, it takes at minimum three pieces to totally neutralize one corner square.

num_white_near_corners, num_black_near_corners - These are the most dangerous pieces on the game board. Occupying them is very risky, because it is the first step in giving your opponent a window tot he corner. this makes the non-edge near-corner square the most dangerous. It is the most vulnerable to being flipped. However, it is important to note that once the corners are occupied, these squares become valuable. Hence we have adjusted these primitives to mean pieces near unoccupied corners. For squares near occupied corners, see the additional primitive above.

Training (Looking For Some Competition?)

We hoped that these primitives would enable our genetic player to gain a greater insight into what makes an othello board a winner. To help him along, we designed the following ten trainers. Note that the player is white in all of these cases.

Player #1 - 4 points of light

This player is possibly the worst of the bunch. While making the corners a goal is a good thing, making the near corners a goal is suicidal. Since the near corner spots will become available first, the player will grab them, thus opening the corners to his opponent, and costing him the game.

Player #2 - Push him away

This player has a simple idea: get the middle. It decides to do this by looking for opportunities to give the perimeter to black. This is also suicidal, since we benefit from the experience of knowing that he who has the perimeter will almost certainly win the game.

Player #3 - Living on the Edge

This player is getting a little better. He goes for the edge whenever possible. Unfortunately, his strategy is flawed. If it is available, he will grab a spot near the corner, possibly giving it away, and he has nothing telling him to grab a corner if it is available.

Player #4 - Moron Maximizer

This player has a simple strategy. Get as many pieces as possible in every move. As mentioned above, this is not the wisest of ideas. He will tend to not choose the strategically best squares to move on, and will not be so hard to defeat.

Player #5 - Centrist Coalition

This player is bad. Not as bad as those that deliver up away the corners on a silver platter, but not much better. This player is basically going to avoid the outer two rows and columns (except for the corners and near corners). Not much hope for him.

Player #6 - keep him out

This is where the players start getting decent. This player would like to grab the corners, which is always a good thing), but wants even more to keep his opponent out of them.

Player #7 - Make him choke

This player has the general idea, but does not really execute it well. He is trying to keep his opponent away from the corners, in all cases. This makes it hard for him to get the corners, though.

Player #8 - Make a run not quite for the border

This player has regressed a bit. He tries to occupy the squares near the sides, while keeping his opponent away from them. This is not such a good thing, since it does nothing for him to establish control over squares on the board.

Player #9 - Surround Soundly

This player is getting better. He basically tries to get the two outer perimeters of the board. There are really two flaws in this. By trying for the two outer perimeters, he will often grab the squares in the next to outer row first, leaving the edge pieces open to be grabbed by his opponent. He also does not recognize that taking a square adjacent to a corner is not usually such a good thing.

Player #10 - Savvy Strategist

This is by far the best of the training players. He basically tries to get the corners and non-next-to-corner edges, while trying to keep himself out of the dangerous next-to-corner squares, and putting his opponent in them. Using the unoccup_near_corners primitives is important, because if the corner is occupied, it is often better strategically to take it, and the player might not do that if he just looked at corner spaces in general.

Implementation details

The above was implemented by creating a new kind of trainer that extended OthelloPlayer, which in itself encapsulated the above ten genetic players as well as Othello. The OthelloTrainer, as it came to be called, determined which player to pit against the new GP by varying on the generation #, i.e., every three generations a successive player was employed. After the first 30 generations, Edgar was pitted against the GP for three generations in turn. (We chose three generations as a compromise between completeness and time.)

The resulting OthelloTrainer player was pitted against the GP learning algorithm in the GPOthello class and training was done with a population size of 200 and, as previously mentioned, 33 generations. (One game was played per player per generation, since the players were deterministic). Several different runs were made to see if mutation would work better in certain instances.

In order to evaluate the results generated by the above, in addition to using the fitness results generated by the system we created an OthelloPlayGP class which would essentially allow Scott to play our generated GP players. The results discuss some of the findings and conclusions he made about the generated players. (In fact, look below if you would like to challenge our player itself!)

Results

Our results were mixed -- the players generated through this system were in general better than those made by our original project, which naively trained against Edgar, but the players did not fare terribly well against Edgar either.

Best Generated Player: black_near_corners

Gen| Fitness | Complexity | Depth |Variety

| Best Average Worst| B A W| B A W|

0| 37.00 143.00 109.00| 5 14.6 63| 3 3.6 6| 1.000

1| 15.00 75.20 62.00| 3 9.4 31| 2 3.3 5| 1.000

2| 15.00 83.10 36.00| 3 6.2 13| 2 3.1 5| 0.700

3| 22.00 60.00 42.00| 3 6.4 13| 2 3.3 7| 0.800

4| 21.00 52.90 32.00| 7 7.4 5| 4 3.6 3| 1.000

5| 21.00 56.40 40.00| 7 6.4 9| 4 3.3 5| 0.900

6| 5.00 55.90 55.00| 5 5.6 13| 3 3.0 6| 0.900

7| 5.00 59.60 49.00| 5 4.4 7| 3 2.5 3| 0.800

8| 3.00 49.60 42.00| 3 4.2 5| 2 2.5 3| 0.800

9| 30.00 75.10 52.00| 3 3.6 3| 2 2.3 2| 0.900

10| 30.00 63.00 51.00| 3 4.0 7| 2 2.5 4| 0.800

11| 30.00 78.70 50.00| 3 5.8 5| 2 3.4 3| 0.700

12| 19.00 68.80 46.00| 1 5.0 3| 1 3.0 2| 0.800

13| 19.00 58.40 29.00| 1 4.8 9| 1 2.9 5| 0.700

14| 19.00 51.60 52.00| 1 4.4 9| 1 2.6 4| 0.700

15| 21.00 52.40 40.00| 1 4.2 9| 1 2.6 5| 0.600

16| 21.00 51.10 40.00| 1 3.6 9| 1 2.3 5| 0.700

17| 21.00 52.70 36.00| 1 3.0 5| 1 2.0 3| 0.500

18| 19.00 46.60 23.00| 1 2.8 5| 1 1.9 3| 0.500

19| 19.00 45.70 25.00| 1 2.8 5| 1 1.9 3| 0.600

20| 21.00 43.60 25.00| 3 3.8 5| 2 2.4 3| 0.500

21| 14.00 43.80 29.00| 3 3.4 5| 2 2.2 3| 0.400

22| 12.00 42.20 27.00| 1 3.2 3| 1 2.1 2| 0.500

23| 12.00 39.70 27.00| 1 2.6 3| 1 1.8 2| 0.500

24| 24.00 44.60 28.00| 1 2.4 5| 1 1.7 3| 0.500

25| 24.00 42.00 26.00| 1 1.6 3| 1 1.3 2| 0.300

26| 24.00 49.60 26.00| 1 1.4 3| 1 1.2 2| 0.200

27| 21.00 46.30 25.00| 1 1.8 5| 1 1.4 3| 0.300

28| 21.00 45.60 23.00| 1 1.2 3| 1 1.1 2| 0.200

29| 21.00 43.00 23.00| 1 1.2 3| 1 1.1 2| 0.200

30| 43.00 64.20 43.00| 1 1.0 1| 1 1.0 1| 0.100

31| 43.00 64.20 43.00| 1 1.0 1| 1 1.0 1| 0.100

32| 43.00 86.00 43.00| 1 1.0 1| 1 1.0 1| 0.100

========================================================================
Generation:32 best:0 worst:0
GP# dad mum oper mut shrk mut fitness len dep
===== ============= ============= ======== ======== ========== ==== ====
B 0 6:RPB:1 2:RPB:1 43.00 1 1

RPB: black_near_corners

Run time 66.79 seconds, 2.02 seconds/generation

Note that this player fared poorly against Edgar in generations 30-32, but against the other ten players it fared well, winning in most (the fitness is the number of pieces left, so if it's less than 32 our GP won.) This player, however, was difficult to use in reality. The problem is that black_near_corners tries to predict what the opposite player will do; unfortunately, against non-deterministic players like humans this algorithm immediately fails. We therefore decided to pick an intermediate child that appeared to have good performance, as shown immediately below.

Intermediate Generated Player: white_center_edges + black_near_corners

========================================================================

Generation:11 best:6 worst:1

GP# dad mum oper mut shrk mut fitness len dep

===== ============= ============= ======== ======== ========== ==== ====

B 6 9:RPB:1 1:RPB:5 30.00 3 2

RPB: ( + white_center_edges black_near_corners )

This player, while not the best in fitness, was playable by Scott, so we could see his analysis.

Scott's Analysis: This player is not so bad. Not much competition for Edgar, but better than most of our training players. A not surprising result for a player that trained on our training players, which are not the most skilled Othello players in the world, or on CS. He did okay playing me, but just didn't have the strategic depth to win.

Considering that the final-generated players were not terribly successful, we decided to investigate our own players to try and determine if our players did an adequate training job. We picked one of the better ones and presented it to Scott:

Our player - played for comparison purposes

Scott's Analysis: This player is also fairly decent. However, like many Shakespearean heroes, it does have a tragic flaw. The inclusion of white_near_corners opens up the possibility of giving the corner away. In fact, each of the five times I played him, the turning point of the game was when he seemed to be doing well, and then moved next to a corner, handing it to me on a silver platter, so to speak.

Other works (the competition -- related Othello AI research)

In general, Edgar's approach to learning how to play Othello is not so optimized. He basically provides evaluation functions to be used on any given game board at any given time. One way to improve upon this would be to implement an artificial intelligence algorithm to search the game board space for a few moves ahead, looking to see what the best move would be considering what moves would follow it. One way of doing this, using an alpha-beta search, is similar to what was done by Robert E. Smith and Brian Gray in their paper Co-Adaptive genetic Algorithms: An Example in Othello Strategy. In their paper, Smith and Gray describe a strategy of training their Othello player with a Co-Adaptive fitness function. Each individual in the population was evaluated on how well it performed playing games with other members of the population. As was noted in the paper, this method does have a major downside, one which is not a significant problem for our method. The co-adaptive training tends to have a negative effect on the fitness of the better players in the population when they compete against each other. Their results exhibit this, as there is not apparent improvement in the quality of players over time. However, the player was fairly competitive, partially due to the use of an alpha-beta search to evaluate moves several levels down in the game.

The use of alpha-beta to evaluate an Othello board is by no means a new idea. In Sections 18.6 - 18.12 of his book Paradigms of Artificial Intelligence Programming, Peter Norvig outlines a lisp program to play Othello using an alpha-beta evaluation function. He mentions two high-calibur Othello playing programs, Iago, created by Paul Rosenbloom, and Bill, created by Kai-Fu-Lee. The later program, Bill, defeated the highest rated American Othello player (Brian Rose) by a score of 56-8 in 1989. As we did, Norvig chooses to focus on certain aspects of the Othello board in his evaluation function (His code actually comes from both Iago and Bill). The two he chooses are Mobility and Edge Stability. Norvig defines two types of mobility, current and potential. Current mobility is defined as the number of moves available to the player. Potential mobility is the number of empty spaces next to an opponents piece. Edge stability is determined by taking the approximately 60,000 possible combinations of edge contents and evaluating them to see which are optimal. Norvig combines these factors in a linear function, with coefficients being based on the move number. While this is apparently an effective strategy, we fell having Edgar set his own weights will be as effective, assuming he can be trained sufficiently.

In his paper Games Computers Play: Simulating Characteristic Function Game Playing Agents with Classifier Systems, Garett Dworman discussed the abilities of his generated players to achieve their optimal performance level. The details of his "game" are not so important, as his methods are very general and could be easily applied to many types of learning. Dwormans paper concluded that his players were, in fact, able to reach their optimal levels in competition with each other. Each player started with a simple evaluation function which it adjusted through training. Dworman trained his players to the point where they had reached the optimal playing level, and compared them to his evaluation of what the player should do. They matched.

Future Research

There are several ways we could continue our research. One way would be to integrate training against random players, i.e., incorporate nondeterminism. This would ensure that the GA is progressing well for a desired target against humans, which are inherently nondeterministic (most of the time). Additionally, as was mentioned before, time was a major constraint in this project. The time required to train is substantial. If we had more time, there are ways we could have tried to improve the training process so it works around weaknesses it finds in the trainers. We might even use meta-learning and use a learning method to evaluate better training functions. One method of developing a good player would be to build up to it like a pyramid: start training against random players, and develop a GA. Make that GA the first training player, and create a second GA. Do this until there are sufficient GAs to train a very good player.

Conclusion

Although the results were not the best, the system that was used to implement it can be adapted for future experimentation. The most important guideline in guiding a trainer is in making sure the training players guide the GA along appropriately. The players must be of different levels, so that the GA doesn't flounder, but has successively harder competition so it can improve. Spending substantial time in the player-creation process seems to be important. We also discovered the GP has a tendency to evolve simple players, so an important starting point is to select a few, high quality primitives to use. With some additional programming, it should be feasible to develop a practical, commercial Othello player.

Miscellaneous

We're sure this document has just whetted your appetite to learn or master Othello. Here's a start: if you would like to play our generated intermediate player, click here!

Scott's expert status is a scientific fact, not a product of boasting (although that doesn't hurt either). If you need convincing, click here to see his record against Edgar.

Last updated Dec 17 1997 by Janak J Parekh and Scott A Susser.