Genetic Program: Othello

Peng Xu

px7@columbia.edu

 

Project objective

Using the provided Genetic Program package gpjpp, and the Othello game program based on that, experiment on the methods to get a good Othello player with GP.
 

1. Introduction

The primary motivation here is to use multiply trees to do the evaluation based on the basic idea-"the wisdom of the masses exceeds that of the wisest individual". I can use multiply evaluate trees through ADF functionality provied by gpjpp. Although the original objective of ADF functionality is to get the module function of the RPB, I can just consider one ADF as an independent evaluate tree. So in one GP, there can be multiply evaluation trees. There are several ways to use the evaluation from the multiple trees - use the smallest one, randomly choose one, use different trees at different points of the game. I tried these different ways, and finally focused on how to use the multiple trees to integrate the time information into evaluation.

2. Preliminary experiments

I trained the GPPlayer against the RandomPlayer in all the preliminary experiments, and after that I realized that the RandomPlayer is just too simple to be useful to train against.

Exp 1:  Each OthelloGPPlayer has two tree to evaluate the move, one is for the first part of the game, the other is for the second part.

Experiment Result:

RPB:  ( +   black   black_edges )
ADF0: ( /   black_edges   white )

GPPlayer : RandomPlayer = 46 : 4 (50 games)

So it can play very well against RandomPlayer. During the experiment, I find the behavior of the RandomPlayer is not random enough. So I improved RandomPlayer. I add one static Random variable to the class OthelloRandomPlayer, it  is initialized by the seed determined at the first instance of RandomPlayer. Together with the local  random variable in function evalMove(), the two random variables give the value of the random move. After the modification, the RandomPlayer's skill impoves much. And in the later experiments, I use the improved RandomPlayer.
 
GPPlayer Improved RandomPlayer
29 21
30 20
37 13
27 23
32 18
 
Exp 2:  Each OthelloGPPlayer has two tree to evaluate the move, the evalue is given by the smaller one.

Experiment Result:

RPB:  ( *   black_near_corners  ( -   10   white_edges ))
ADF0: ( /   black_edges   white )
 
GPPlayer Improved RandomPlayer
25 25
30 20
32 17
34 16
36 14
 
When playing with Edgar, it is defeated all the time, but the result is all same.  I think all the games are just one pattern, because the evaluation of move for both GPPlayer and Edgar are fixed, so they are just playing the same game all the time.

Exp 3: GPPlayer randomly choose one of the two trees to evaluate.

GPPlayer : Edgar = 9 : 41(50 games).

3. Integrate time information into evaluation

Usually, in a game,  at different periods, players need to use different strategy to win at the end. For example, in the game go, players try to get 'shi'(import positions on the board) at the beginning of the game, and  try to get more pieces towards the end of the game. It is similar to the Othello. So we need to have a dynamic evaluate method that can change depending on the period of the game. Here I want to find ways that can integrate the time information into the evaluation.

In Othello, the time information is very straightforward, it is the total number of the pieces on the board. So I will try to use this information.

Approach 1:

Through playing the games, I find at different points of the game, we should count more on different terminals. It is hard to get only one evaluate tree to cover all the different situation. Since now I can have multiply trees, maybe it is better to evaluate the move from different trees at different points of the game. In this experiment I parted the game into five sections evenly according to the total pieces on the board. And the GPPlayer will use different trees to evaluate the move at different points. It used RPB for section 1, ADF0 for section 2, ... etc.

I trained the GPPlayer against Edgar. The fitness is got by one game result against Edgar with adjustment of the complexity of the evaluation trees. The basic parameters I
changed are: PopulationSize 100, NumberofGeneration 100, TerminationFitness 25, GoodRuns 5. To simplify the evaluate tree, I incorporate my observation. At the first section, all the pieces are at the center of the board, it is useless to use primitives such as black_corners, white_corners etc. So I only use four primitives: +, -, white, black.
For the same reason, for other sections I choose primitives according to my experience.

Experiment result:

Player 1: (fitness: 23.80)
RPB:  ( +  ( +   black   black )  white )
ADF0: ( +   white  ( /   black   black ))
ADF1:  white_near_corners
ADF2: ( -   white_edges  ( +   white_near_corners   black_edges ))
ADF3: ( /   10   10 )

GPPlayer : Edger = 44 : 20 (one game)
GPPlayer : RandomPlayer : 27 : 23 (50 games)

Player 2: (fitness: 24.80)
RPB:  ( -   black   white )
ADF0: ( +  ( +  ( *   white   white_near_corners ) ( -   white_near_corners   black )) white )
ADF1:  10
ADF2: ( *   white_near_corners   white_corners )
ADF3: ( -   black_corners   10 )

GPPlayer : Edger = 43 : 21 (one game)
GPPlayer : RandomPlayer : 29 : 21 (50 games)

The result shows the GPPlayer can defeat Edgar, but it is not satisfactory against RandomPlayer. Although it wins more games than RandomPlayer, it also loses many.
The reason is because the GPPlayer is trained against Edgar, which is more specific,
the result player is biased to the players with good fitness compared with Edgar, but
not players with other strategy. This is kind of like the real nature. One species can
be very dominant and powerful in one environment, but will be weak and even extinct
when the environment changes.  Jurassic Park! So the good way is to train the GPPlayer using more general players, and many different players. In the further experiment, I trained GPPlayer against both Edgar and RandomPlayer. The fitness of the GPPlayer is given by the result from both players.

Experiment Result:

GPPlayer : (fitness: 29.60)
RPB:  ( +   white   black )
ADF0: ( *   black_edges  ( *   black_edges   white ))
ADF1: ( +  ( *   black_corners  ( +   black_edges   white_near_corners ))  white_near_corners )
ADF2: ( *  ( +   white_corners   black_edges )  black_corners )
ADF3: ( *   black_corners   black_near_corners )

GPPlayer : Edgar = 39:25 (one game)
GPPlayer : RandomPlayer = 36 : 14 (50 games)

So GPPlayer can still defeat Edgar, and the result against RandomPlayer is also improved. During the training, for the evaluation of the fitness of GP, I only play one game against Edgar and one game against RandomPlayer, this has some bad effect on the result GPPlayer. I believe if more games are used to do the evaluation, the result can be better. Anyway, this at least shows training against different players can definately improve the skill of the final GPPlayer.

After that, I tried another experiment with different parameters.

AddBestToNewPopulation    = true
SteadyState               = true

This makes fast for the fitness of generation to arrive the termination fitness.

Experiment Result:

GPPlayer: (fitness = 30.40)
RPB:  ( +  ( +   black   white )  black )
ADF0: ( -   black_edges   black )
ADF1: ( -   10  ( +   white_corners   10 ))
ADF2: ( +   white_near_corners  ( *   black_near_corners   black_corners ))
ADF3: ( +  ( -  ( +   10   white ) ( +   white_near_corners   black_edges ))  white_near_corners )

GPPlayer : Edgar = 41 : 23 (one game)
GPPlayer : RandomPlayer = 29 : 21 (50 games)

But it seems that performance of the GPPlayer against RandomPlayer is lowered. I think it is because the new parameters make the population less diverse.

Appoach 2:

Use one evaluate tree with new primitives. The branches that will be used in the subtree rooting on the new primitive depends on the current point of the game. For exmple, the primitive "ifend" is a function with two operands, it will return the first operand(left branch) if the total pieces are less than 48, otherwise it will return the second operand (right branch). In my experiment, I added two new primitives: "ifstart" and "ifend", which parted the game into three sections: start (4-24pieces), middle (25-48pieces), end (49-64pieces). It was trained against Edgar with the same parameters as the appoach 1.

Experiment Result

After a whole day training, it can not get the fitness less than 25, and it ended up the smallest value 36.

Then I tried the training against both Edgar and RandomPlayer, and also changed the parameters AddBestToNewPopulation, SteadyState to true.

Experiment Result:

GPPlayer: (fitness = 34)
RPB: ( ifstart black black_edges )
GPPlayer : Edgar = 33 : 31(one game)
GPPlayer : RandomPlayer = 34 : 16

I also tried termination fitness 30, but at least in my limited experiment time, it can not arrive the termination fitness, and it stops around 36.  It is same even I increased the population size to 200. Since now I use only one tree and use different subtrees to evaluate at different point of the game, it must require the size of the tree is pretty big, and the population size is pretty big, so the only two special primitives that are responsible to part the tree, can have more chance to be in the evaluation tree to work their role.
 
 

4. Conclusion

In the natural world, each individual has different genes that can be expressed gradually with the time, also they have multiple genes even for one character, which can be expressed at different situations. In this project, I tried to give each individual multiple genes, so after enough evolution, hopefully the individual with good fitness can be more strong.  And the result shows that with all the other parameters same, the GPPlayer with multiple evalution trees has better performance.
Another issue in the project is which kind of the environment we should provide for the population to evolve, more specifically, how to evaluate the fitness of the individual so that the evolved individual can be strong and flexible. The basic idea is to expose the individual to more diverse environment. In the Othello, we should train our GPPlayer against players with different strategies. Then the GPPlayer can get more generic, meaningful and powerful strategy to play the game.
For now,  when I look into the training result, it is hard to find more fixed pattern in the evaluation tree. Some of them are even nonsense. I think with enough evolution, the population from different initial population should have similar patterns.

Last modified 4/4/2000