Voting Population

Modification of the Learning Method

Machine Learning (W4771)
Homework #4: Genetic Programming
Project: Othello

Corey Tripp


In order to create a "good" Othello player I was interested in the learning method of the Othello Genetic Programming system. The learning method is what controls the overall learning of the individuals in the evolving population. If you can improve this learning method than you should be able to create superior Othello players, or at least somewhat better players than random players.

I decided to try one of the learning method variations suggested in the homework assignment. This was having a population vote on each move. My take on this was that each member of the population would suggest what move to make and then the one with the most votes would be the move made. This method is very different than the standard evaluation method used. The old method is a fitness measure based on each individual's ability to play an Othello game by themselves. Without the information of the overall population, if they are really bad they get a bad fitness and so forth.

The Genetic Program Othello package that was given for this assignment has the following properties that were of interest to my experiments. The function tree representation has these functions: +, -, *, /, and the following terminals: white, black, white_edges, black_edges, white_corners, black_corners, white_near_corners, black_near_corners, 10. There are also three different Othello Players: OthelloPlayer, OthelloRandomPlayer, and OthelloEdgar. Each player had different styles of playing Othello: pick the first move, rate each move with a random number pick the lowest rank, and plays like Edgar, respectively.


The approach that I use a very simple approach. To get the fitness of an individual, the population plays a series of five Othello games against one of the three types of opponents. During the course of the game each member of the population gives which move they would make given the current Othello board. These votes are recorded along with the individual that voted for that move. At the end of the voting process the move that was most suggested is made in the game. The individuals that suggested that move are then given a vote credit that is used in the fitness measure at the end of the game.

The fitness measure is based on how many vote credits an individual gets during the course of one game. If the game was won by the population then the more vote credits an individual has the better their fitness. However, if the game is lost the more vote credits the individual has the worse the fitness. The idea being that if a lot of individuals votes make the population lose the game they should get the worst fitness rating.

The fitness function that I created was a scale factor based on the vote credits that the individual received. The most vote credits that anyone can get is 32, one for each move in the game. The function has a range of 1 to 5 inclusive if the game was won. Where 1 is the best fitness that you can get. As you move away from 32 vote credits the fitness increases linearly. If the game is lost the fitness function has a range from 5 to 1 which is then scaled by five. Where after the scale 25 is the worst fitness that you can get and 5 is the best.

The first few variations that I had of this function included the score of the opponent (White in this case). The longer I thought about this the more I decided that just winning the game was good enough. If the individual could win the game then it does not matter if they won by 1 or 32, they simply one. So the final function that I used in my experiments was the following:

  if( GameWon )
     fitness = 1 + (32 - voteCredits)/8
     fitness = (5 - (32 - voteCredits)/8) * 5

The fitness was summed over five games and very best fitness that any member of the population could get is 5.


Since I eventually wanted to use Edgar as an opponent, all of my experiment populations were playing as the Black player in the Othello games. Also, the population size was 500 for every experiment with at most 15 generations for each of the experiment runs. To contrast this method I ran similar experiments using the unmodified evaluation function.

The first Othello opponent that the population played against was the standard OthelloPlayer. This player always picks the first move that the Othello board gives it.

Here is a list of the best five players, one from each run:

   / white_corners black_near_corners
   / white_corners black_edges
Out of 500 games these players beat the OthelloPlayer every time.

The unmodified method's best players were all exactly the same: black_edges. And this player can also beat the White player 500 times out of 500 games.

Since it was relatively easy to create a player that can beat the standard OthelloPlayer in a few generations I moved on to the Random Player. This player ranks each move with a random number. The move with the smallest random number is the move that this player makes. I thought that playing against this Othello player would result in somewhat more complicated players.

However as you can see below each of these players are as simple as before:

   / white black_near_corners
   * white_edges white
These players however did not win every game, instead they only won about 95% of the 500 games.

Again the unmodified version came up with only one player white, who won about 93% of the 500 games played against the Random Player.

These results again were unsatisfying especially if I wanted to get a really good player to play against Edgar. So I decided to try next playing against Edgar himself in the evaluation. This way I would definitely get some players that could beat him.

After some time (about a day of computation) and a lot of unsuccessful runs these are the five players that could beat Edgar:

   * white_near_corners white
   * white_near_corners black_edges
   * black_edge white_near_corners
   * white white_near_corners
   * white_near_corners white
As you can see there is really is only two distinct players * white_near_corners white and * white_near_corners black_edges. Their scores against Edgar are: 35 and 39 respectively. However if you play these players against the Random Player they lose 99% of the time. This is not very good since the players cannot generalize and beat other opponents.

The unmodified evaluation also created players that can beat Edgar:

   / white * black black_edges
   * -  / black_near_corners black_edges * white black_corners / white - black white_corners
   / white / + black_edges white_edges white_edges
   - + - 10 white - white_near_corners black * / white_corners white_near_corners / black black_edges
   - * 10 black_edges * black black_edges
Some of these players (1, 4) behave like the ones before and cannot generalize. However some of them (2, 3, 5) could quite easily beat the Random Players.


Thus it would seem that my modification to the learning method did not really improve very much over the old way. Based on the three different players that the population played against, both methods could create players that could beat the OthelloPlayer (picks the first move), OthelloRandomPlayer (randomize the moves), and OthelloEdgar. However, the players that can beat Edgar cannot beat the other two OthelloPlayers. The unmodified method produced players that could beat all of the players.

During the different genetic runs the diversity of the population using the new learning method was not as diverse as the old method. This makes sense because the moves with the most votes wins, and after some generations, most of the population's votes become the same. This especially true when the population is winning the Othello games.