Othello Project: generalizing terminals and fitness measure

Othello Project

Ilya Mayzus

Introduction

An important issue in Genetic Programming is the choice of terminals used to create the tree. My initial motivation for this project was to explore how this choice will affect the performance of trees that were used to evaluate the moves in a game Othello. In particular, I wanted to start with terminals that provide the best heuristics and proceed towards more general ones. In the process, I also wanted to analyze the evolved players to see whether their use of terminals matched my assumptions.

As I made progress with creating successful players, I became more interested in maximizing the potential of existing terminals by altering the training. This led me to experiment with different combinations of players used in the fitness function (RandomPlayer and Edgar). I was especially interested in finding a way to influence strategy of the evolved players, while keeping the training time to a modest value. It turned out that changing a fitness function can be at least as effective in evolving players as is changing the input terminals.

Approach

I Terminals

As a starting point, I took the observations made by Juno Suk on the limitations of the current terminals, in particular the tendency of white and black terminals to dominate at the end of the game as the number of pieces increases. I decided that an easy way to fix this is to provide the GP with only the number of pieces that will be flipped as a result of the move, rather than total number of pieces that will exist on the resulting board. The downside was that to do this for opponent's pieces requires performing a shallow minimax search to select the best possible gain (for opponent) the opponent can make. I performed a similar substitution with the number of corners, leaving only the number of corners that will be gained by a particular move (either 0 or 1).

I also introduced the terminals reflecting the position of the new move relative to the center of the board as col_from_center and row_from_center (for example, columns of the board are assigned values as 4 3 2 1 1 2 3 4 and similar for rows). In addition, I renamed all provided terminals so that they refer to me/opponent rather than black/white, to avoid restriction of who the current player is being trained on.

Here's a summary of the terminals that I used:

me_change		number of extra pieces the player will gain with
			this move (this means number flipped + 1)
me_corners_gain		number of corners (0 or 1) that will be gained with
			this move

opponent_change		similar to the above; to use these terminals,
opponent_corners_gain	the OthelloGPTester needs to perform a shallow
			minimax search to calculate the maximum possible
			value that the opponent can potentially gain 
			with the next move (choose worst scenario)

col_from_center		for 8 by 8 board, this gives possible values of
			4 3 2 1 1 2 3 4, where the central two fields are 1
row_from_center		

me_near_corners		these terminals were originally provided as 
opponent_near_corners	white_corners, black_corners, 

me_edges		white_edges, black_edges
opponent_edges

-------------------------------------------------------------------------

II Evolving players

1) using 5 (4 variable, 1 static) terminals:

me_change, me_corners_gain, opponent_change, opponent_corners_gain, and 10.

The two opponent's terminals required shallow minimax search and were computationally expensive. Training was performed by playing 1 game against Edgar
a) population of 200, ComplexityAffectsFitness off, 10 generations.
b) population of 200, ComplexityAffectsFitness on, 20 generations.
c) polulation of 400, ComplexityAffectsFitness off, 20 generations.

2) using 7 (4 variable, 3 static) terminals:
me_change, me_corners_gain, col_from_center, row_from_center, 0, 5, 10

All opponent's terminals were removed. Training was performed by playing 10 games against RandomPlayer.
Population of 200, ComplexityAffectsFitness off, 50 generations.

3) using 11 (8 variable, 3 static) terminals:
me_change, me_corners_gain, col_from_center, row_from_center, me_near_conners, opponent_near_corners, me_edges, opponent_edges, 0, 5, 10

Training was performed by playing 10 games against RandomPlayer.
Population of 200, ComplexityAffectsFitness off, 50 generations.

4) using 9 terminals, 8 variable, 1 static: [same as in 3rd, but without 0 and 5]

me_change, me_corners_gain, col_from_center, row_from_center, me_near_conners, opponent_near_corners, me_edges, opponent_edges, 10

Training performed by playing 1 game against Edgar and 2 against RandomPlayer.
Population of 200, ComplexityAffectsFitness off, 19 generations (interrupted run for 50 generations).

5) same terminals as in 4th

Training performed by playing 1 game against Edgar and 9 against RandomPlayer.
Population of 200, ComplexityAffectsFitness off, 50 generations.

Results

1) Not surprisingly, the successful players were trying to maximize the positive values, while minimizing the opponent's, such as:

( +  ( -   me_change   opponent_change ) ( *   me_change   me_corners_gain ))
( -  ( +   me_corners_gain   me_change )  opponent_change )

When ComplexityAffectsFitness was turned on, practically all best players in each generation became:

 ( -   me_change   opponent_change )
 ( /   me_change   opponent_corners_gain )

The trees became much more convoluted with bigger population (400) and ComplexityAffectsFitness turned off. However, the evolved players, while capable of beating Edgar, were nothing spectacular, considering the time penalty that was paid to compute the minimax values for opponent. The best player beat Edgar with a score of 53 to 10, and also did well against RandomPlayer, winning 449/500. Of course, this tree had an advantage against both Edgar and RandomPlayer in that it could look ahead what kind of move the opponent could make.

2) Without information about opponent, but with additional information about position of the move on the board, the GP still was able to evolve an effective player (with fitness 90):

( +  ( +  ( *  ( *   me_change   10 ) ( +   me_corners_gain   10 )) 
( +   me_corners_gain   col_from_center )) ( +  ( +   me_change   
col_from_center )  me_corners_gain ))

This player beat Edgar with a score of 29 (35-29), even though it was trained only against RandomPlayer.

Against RandomPlayer, it had a record of 393/500. For comparison, Edgar's record is 477/500.

However, no other evolved player approached the same success, casts some doubts on whether this approach is consistently successful.

3) The results were very similar to those in 2nd step, perhaps even less glamorous.

The best player (with fitness 70) lost to Edgar with a score of 36 (28-36) and posted a record against RandomPlayer 355/500.

( /  ( -  ( +  ( +   me_edges   me_change ) ( /   me_corners_gain   
row_from_center )) ( /   0  ( /   col_from_center   5 ))) ( *  ( -  ( +  
( /  ( *   me_change   5 ) ( /   opponent_near_corners   me_change )) 
( +  ( *  ( *   me_edges   me_edges ) ( +   row_from_center   me_change )) 
( *  ( +   0   me_edges ) ( /   opponent_edges   me_change )))) ( +  ( *  
( *  ( +   me_change   0 ) ( /   me_near_corners   me_edges )) ( *  
( *  ( *   me_edges   opponent_near_corners ) ( -  ( -   row_from_center
   col_from_center )  5 )) ( /   opponent_edges   row_from_center ))) 
( /  ( /   0  ( /   col_from_center   5 )) ( -   opponent_edges   
me_edges )))) ( /   opponent_edges   row_from_center )))

One intriguing feature of the best players produced in this run is that they tend to play very strong at the beginning of the game and often suffocate the opponent early on (notice proliferation of scores where opponent had a 0). However, if the opponent survives until the middle, they lose more easily. Also, the strategy seemed to be geared towards the center of the board. I would speculate that this is a side effect of training against RandomPlayers, where the fitness is measured by least number of pieces allowed. If trained against a player with a distinct strategy (such as Edgar) or when measuring fitness differently (which I have not tried), the resulting strategy might be different.

4) With Edgar's score playing a large role in the total fitness, many best players were also the ones that beat Edgar. Also, many players were evolved in the early generations that were able to beat Edgar with a large margin (3, 14, and even 0).

The best player shut down Edgar (37-0) and had a fitness of 16:

( +  ( +   row_from_center   opponent_edges ) ( *  ( +   me_change   
me_edges ) ( *   me_edges  ( +   me_edges   me_corners_gain ))))

However, this performance against Edgar did not generalize against RandomPlayer, posting 361/500, which is similar to results of 3rd step.

5) One surprising result of increasing number of games played against RandomPlayer in the evaluation function is that players capable of beating Edgar with a wide margin did not appear until later generations, with a score of 0 appearing only somewhere 84% into the run of 50-generations (compare to 19 total generations in step 4).

The player with best fitness among those able to beat Edgar (with a score of 19; total fitness of 172) posted 423/500 when playing against RandomPlayer:

( -  ( -   me_corners_gain  ( /  ( /   10   me_edges ) ( +   10   
opponent_near_corners ))) ( -   me_near_corners   me_edges ))

The player with absolute best fitness of 126 lost to Edgar with a score of 38 and posted a similar result of 424/500:

 ( +  ( -  ( -   me_corners_gain  ( /  ( /   10   me_edges ) ( +   10   
opponent_near_corners ))) ( -   me_near_corners   me_edges ))  me_change )

Interestingly, these trees are very similar, which points to the fact that the fitness measure was quite noisy even with 10 games.

This lack of correlation between fitness and quality of the player poised some difficulty of finding the really good players. By trial and error, I found one that had a fitness of 183 (which is worse than either of the above two) and was able to narrowly beat Edgar with a score of 30 (34-30), but it posted an impressive 450/500 when tested against RandomPlayer.

( /   me_near_corners  ( /  ( *   me_near_corners  ( *  ( /  me_near_corners
me_edges )  me_change )) ( -   me_edges  ( *  ( -   row_from_center  ( -  
me_corners_gain   me_edges )) ( -   10  ( *   me_corners_gain   
me_change ))))))

In comparison, Edgar's record is 477/500.

Interestingly, this player practically never completely shuts down the RandomPlayer (with a score of 0), but does very well overall. This is in sharp contrast to players developed in step 3. The main difference seems to be inclusion of playing against Edgar in the fitness function.

Summary of results

'

Run1, a) .det file .ini file .stc file GP file

Run1, b) .det file .ini file .stc file GP file best player

Run1, c) .det file .ini file .stc file GP file best player

Run2 .det file .ini file .stc file GP file best player log

Run3 .det file .ini file .stc file GP file best player log

Run4 .det file .ini file .stc file GP file best player log runs summary

Run5 .det file .ini file .stc file GP file best player log runs summary

# Run Player string representation Player gif representation Fitness (in the run) Record against Edgar Record/log against Random Watch it play against Edgar Watch it play against Random Play against it

0 Edgar 477/500 [---]

1 1 [---] [---] 10 won 53-10 449/500 [---] [---] [---]

2 2 [---] [---] 90 won 35-29 393/500 [---] [---] [---]

3 3 [---] [---] 70 lost 28-36 355/500 [---] [---] [---]

4 4 [---] [---] 16 won 37-0 361/500 [---] [---] [---]

5a 5 [---] [---] 172 won 45-19 423/500 [---] [---] [---]

5b 5 [---] [---] 126 lost 26-38 424/500 [---] [---] [---]

5c 5 [---] [---] 182 won 34-30 450/500 [---] [---] [---]

#	Run	Player string representation	Player gif representation	Fitness (in the run)	Record against Edgar	Record/log against Random	Watch it play against Edgar	Watch it play against Random	Play against it
0		Edgar				477/500			[---]
1	1	[---]	[---]	10	won 53-10	449/500	[---]	[---]	[---]
2	2	[---]	[---]	90	won 35-29	393/500	[---]	[---]	[---]
3	3	[---]	[---]	70	lost 28-36	355/500	[---]	[---]	[---]
4	4	[---]	[---]	16	won 37-0	361/500	[---]	[---]	[---]
5a	5	[---]	[---]	172	won 45-19	423/500	[---]	[---]	[---]
5b	5	[---]	[---]	126	lost 26-38	424/500	[---]	[---]	[---]
5c	5	[---]	[---]	182	won 34-30	450/500	[---]	[---]	[---]

View the source code

Problems Encounterd

One of the biggest problems encountered during this assignment was inconsistent performance by RandomPlayers. The problem had to do with creating a new Random object in the evalMove() function, with the hope that it will give a different random number because it is by default seeded with the current time. In practice, not enough time elapsed between calls to change the seed and affect the Random sufficiently.

A quick fix was to use Math.random() (OthelloRandomPlayer). Another possibility was to make Random a class variable, instantiate it in the constructor and then use it to get a new value (OthelloConsistentRandomPlayer). The problem with the latter is that when many games are played in succession (such as during testing against RandomPlayer), there was a significant probability that a Random will be constructed with the same seed, so that even though the moves themselves will be random, the actual sequence will be the same. Unfortunately, even Math.random() does not solve the problem sufficiently. For example, in the logs of playing 500 games against RandomPlayer, there are sometimes the same scores in succession. In fact, the experience of watching evolved players play against RandomPlayer may be different from what would be expected based on their performance during batch testing.

Another problem was that OthelloEdgar() needed another constructor when called from inside an applet from Netscape in order to successfully read in FinalWeights. This problem was solved by writing another constructor. Alternative solution was to always use appletviewer.

There is a minor bug in the implementation of tokenizing sequence of original OthelloGPPlayer (it couldn't handle )( in succession). This problem was solved in MyPlayer.

Finally, the average fitness in calculateStatistics in GP/GPPopulation.java (from original gpjpp package) is calculated incorrectly (it always increases) because sumFitness variable is not reset to 0. This is easy enough to fix.

Conclusion

During this project, I proceeded from using strong heuristics, such as looking two moves ahead, to more general but less computationally expensive, such as position of the move. As it turned out, it is possible to compensate the loss of available information with careful selection of fitness measure. The evolved players from the latest stage were able to successfully defeat Edgar and also have respectable performance against Random player.

It appeared that the choice of the fitness function may be more influential to the strategy of evolved player than the initial set of available terminals. When training against a combination of RandomPlayers and Edgar, the evolved players avoided over-fitting against Edgar, at the same time seeme to have learned from him, adopting the strategy of to trying to capture corners and maximize edges. This was not possible when training just against RandomPlayers, since the fitness dictated trying to minimize number of pieces lost to opponent, implicitly leading to over-focusing on opponent's pieces and consequently to the center of the board (since this is where the game starts). When training only against Edgar, there was little correlation between ability to defeat Edgar and ability to perform well against RandomPlayers.

It would be interesting to go a step further and purposefully evolve players with distinctly different strategies and then using the fitness function to evolve players that would be able to combine them.

Run1, a)	.det file	.ini file	.stc file	GP file
Run1, b)	.det file	.ini file	.stc file	GP file	best player
Run1, c)	.det file	.ini file	.stc file	GP file	best player
Run2	.det file	.ini file	.stc file	GP file	best player	log
Run3	.det file	.ini file	.stc file	GP file	best player	log
Run4	.det file	.ini file	.stc file	GP file	best player	log	runs summary
Run5	.det file	.ini file	.stc file	GP file	best player	log	runs summary