Exp 1: Each OthelloGPPlayer has two tree to evaluate the move, one is for the first part of the game, the other is for the second part.
RPB: ( + black black_edges )
ADF0: ( / black_edges white )
GPPlayer : RandomPlayer = 46 : 4 (50 games)
So it can play very well against RandomPlayer. During the experiment,
I find the behavior of the RandomPlayer is not random enough. So I improved
RandomPlayer. I add one static Random variable to the class OthelloRandomPlayer,
it is initialized by the seed determined at the first instance of
RandomPlayer. Together with the local random variable in function
evalMove(), the two random variables give the value of the random move.
After the modification, the RandomPlayer's skill impoves much. And in the
later experiments, I use the improved RandomPlayer.
GPPlayer | Improved RandomPlayer |
29 | 21 |
30 | 20 |
37 | 13 |
27 | 23 |
32 | 18 |
RPB: ( * black_near_corners ( -
10 white_edges ))
ADF0: ( / black_edges white )
GPPlayer | Improved RandomPlayer |
25 | 25 |
30 | 20 |
32 | 17 |
34 | 16 |
36 | 14 |
Exp 3: GPPlayer randomly choose one of the two trees to evaluate.
GPPlayer : Edgar = 9 : 41(50 games).
In Othello, the time information is very straightforward, it is the total number of the pieces on the board. So I will try to use this information.
I trained the GPPlayer against Edgar. The fitness is got by one game
result against Edgar with adjustment of the complexity of the evaluation
trees. The basic parameters I
changed are: PopulationSize 100, NumberofGeneration 100, TerminationFitness
25, GoodRuns 5. To simplify the evaluate tree, I incorporate my observation.
At the first section, all the pieces are at the center of the board, it
is useless to use primitives such as black_corners, white_corners etc.
So I only use four primitives: +, -, white, black.
For the same reason, for other sections I choose primitives according
to my experience.
Player 1: (fitness: 23.80)
RPB: ( + ( + black black )
white )
ADF0: ( + white ( / black
black ))
ADF1: white_near_corners
ADF2: ( - white_edges ( + white_near_corners
black_edges ))
ADF3: ( / 10 10 )
GPPlayer : Edger = 44 : 20 (one game)
GPPlayer : RandomPlayer : 27 : 23 (50 games)
Player 2: (fitness: 24.80)
RPB: ( - black white )
ADF0: ( + ( + ( * white white_near_corners
) ( - white_near_corners black )) white )
ADF1: 10
ADF2: ( * white_near_corners white_corners
)
ADF3: ( - black_corners 10 )
GPPlayer : Edger = 43 : 21 (one game)
GPPlayer : RandomPlayer : 29 : 21 (50 games)
The result shows the GPPlayer can defeat Edgar, but it is not satisfactory
against RandomPlayer. Although it wins more games than RandomPlayer, it
also loses many.
The reason is because the GPPlayer is trained against Edgar, which
is more specific,
the result player is biased to the players with good fitness compared
with Edgar, but
not players with other strategy. This is kind of like the real nature.
One species can
be very dominant and powerful in one environment, but will be weak
and even extinct
when the environment changes. Jurassic Park! So the good way
is to train the GPPlayer using more general players, and many different
players. In the further experiment, I trained GPPlayer against both Edgar
and RandomPlayer. The fitness of the GPPlayer is given by the result from
both players.
GPPlayer : (fitness: 29.60)
RPB: ( + white black )
ADF0: ( * black_edges ( * black_edges
white ))
ADF1: ( + ( * black_corners ( +
black_edges white_near_corners )) white_near_corners
)
ADF2: ( * ( + white_corners black_edges
) black_corners )
ADF3: ( * black_corners black_near_corners
)
GPPlayer : Edgar = 39:25 (one game)
GPPlayer : RandomPlayer = 36 : 14 (50 games)
So GPPlayer can still defeat Edgar, and the result against RandomPlayer is also improved. During the training, for the evaluation of the fitness of GP, I only play one game against Edgar and one game against RandomPlayer, this has some bad effect on the result GPPlayer. I believe if more games are used to do the evaluation, the result can be better. Anyway, this at least shows training against different players can definately improve the skill of the final GPPlayer.
After that, I tried another experiment with different parameters.
AddBestToNewPopulation = true
SteadyState
= true
This makes fast for the fitness of generation to arrive the termination fitness.
GPPlayer: (fitness = 30.40)
RPB: ( + ( + black white )
black )
ADF0: ( - black_edges black )
ADF1: ( - 10 ( + white_corners
10 ))
ADF2: ( + white_near_corners ( * black_near_corners
black_corners ))
ADF3: ( + ( - ( + 10 white ) (
+ white_near_corners black_edges )) white_near_corners
)
GPPlayer : Edgar = 41 : 23 (one game)
GPPlayer : RandomPlayer = 29 : 21 (50 games)
But it seems that performance of the GPPlayer against RandomPlayer is lowered. I think it is because the new parameters make the population less diverse.
After a whole day training, it can not get the fitness less than 25, and it ended up the smallest value 36.
Then I tried the training against both Edgar and RandomPlayer, and also changed the parameters AddBestToNewPopulation, SteadyState to true.
GPPlayer: (fitness = 34)
RPB: ( ifstart black black_edges )
GPPlayer : Edgar = 33 : 31(one game)
GPPlayer : RandomPlayer = 34 : 16
I also tried termination fitness 30, but at least in my limited experiment
time, it can not arrive the termination fitness, and it stops around 36.
It is same even I increased the population size to 200. Since now
I use only one tree and use different subtrees to evaluate at different
point of the game, it must require the size of the tree is pretty big,
and the population size is pretty big, so the only two special primitives
that are responsible to part the tree, can have more chance to be in the
evaluation tree to work their role.
Last modified 4/4/2000