Chihiro Ishii

Machine Learning W4771

Homework 4: Genetic Programming

Introduction

The crossover step in genetic programming plays a role of “information sharing”. Put more precisely, the information contained in more than one potential solution can be put in common to generate another (hopefully a better) potential solution. To evaluate the importance of crossover characteristics in genetic programming, difference combination of parameters involved in crossover is used to train players and is played against random player to see how the performance changes with respect to the different combination.

Experiment

Performance comparison is done by running the defined genetic player (trained) against random players 50 times, and counting how many times the defined genetic player wins.

Unchanged parameters settings:

PopulationSize = 150
NumberOfGenerations = 35
MaximumDepthForCreation = 6

Changed parameters:

MaximumDepthForCrossover: ranging from 6 to 30
MaximumComplexity: ranging from 50 to 250
CrossoverProbability: ranging from 0 to 100

This experiment evaluates how the performance of othello game is affected by changes made in crossover parameters.

Defined genetic player is expressed as given in othello packet.

Result

(1) Fixed: MaximumDepthForCrossover=17, MaximumComplexity=100

 CrossoverProbability 0 30 50 70 90 100 Games Won 36 (*) 36 35 34 33 Tree Depth 2(3.6) 3 (3.6) 2(3.6) 3(3.8) 2(3.7) 2(3.6) Complexity 3(15.4) 7(14.5) 3(15.5) 5(15.9) 3(15.4) 3(15.3)

(2)Fixed: CrossoverProbablitiy=70, MaximumComplexity=100

 MaximumDepth 6 12 18 24 30 Games Won 36 37 42 38 48 Tree Depth 2(3.6) 2(3.5) 2(3.7) 2(3.7) 2(3.6) Complexity 3(15.2) 3(15.0) 3(15.7) 3(15.4) 3(15.6)

(3)Fixed: CrossoverProbablity=70, MaximumDepthForCrossover=17

 MaximumComplexity 50 100 150 200 250 Games Wom 49 35 37 (*) (*) Tree Depth 2(3.5) 3(3.8) 2(3.7) 3(3.6) 4(3.6) Complexity 3(12.0) 5(15.9) 3(15.3) 7(15.4) 7(15.1)

(*) … the genetic representation is not in simple form; parse error

(4) Combination of the best and worst values  (from above 3 tables)
(ordered <CrossoverProbability,MaximumDepth,MaximumComplexity>)

Best:

 <50,18,50> <50,30,50> Games Won 41 (*) Tree Depth 2(3.4) 3(3.6) Complexity 3(11.9) 7(12.4)

Worst:

 <100,6,100> Games Won (*) Tree Depth 3(3.8) Complexity 7(16.2)

Note: for tree depth and complexity, the best case is written, and average is in parenthesis.

Discussion

The first observation seen here is that when playing against a random player, a defined genetic player won more than half of the 50 games it played, no matter how the parameters were set.  For some particular parameter settings, it won more than 45 out of 50 games, which is remarkable considering the fact that the population size was set to a small number(150) for a problem that is fairly complex.  This shows that the information within the potential solution was shared in some useful manner to achieve this outcome.  None of the cases covered for this experiment performed terribly,  though one can see an exceptionally good performance.

The best and average depth tree and complexity seem to stay similar for most cases. The only case when the change is drastic is when the MaximumComplexity parameter was altered. This implies that the crossover probability and MaximumDepthForCrossover parameters do not affect the shape of the tree which results from applying genetic programming.

An interesting observation from the experiment (which is not mentioned in detail here) is the genetic representation of a defined genetic player.  The player with the best outcome had the simplest form of genetic representation.  For example, the genetic representation of the best result in table (3) is expressed simply as
( +   white_corners   white ).    Throughout the experiment, the best RPBs were always simpler and less complex than the respective worst RPBs.

In table (3), the non-simplistic expression are the two largest values set to MaximumComplexity.   Performance could not be measured from these two parameter settings, but from what is said about non-simplistic expression in the previous paragraph,  the assumption is that the performance probably would not have been any better than the previous three cases.  Since the population size for this experiment is 150,  setting the MaximumComplexity to 200 and 250 seens too much;  population can becomes too diverse and so the problem very complex.  Intuitively, the assumption seems to make sense. This assumption is specific to this particular problem.  To verify this assumption, or to see if this applies to a more general case, a more intensive testing is required.

Table (4) was put to shows that simply combining the best values for each parameters will not necessarily result in the best performance.

One puzzling result of the experiment is the result in table (1).
The performance of when Crossover probablity was set to 0 (no "information sharing") was not much different from that of when Crossover probablity was set to some positive number.  This might again be the problem of setting the population size to a small number.    And population size may also be the reason for most cases to have the low tree depth of 2 or 3.  Not much information will be shared if the population is small.  This is similar to how people group together to come up with a solution to a project (more information will be exchanged if there is more people and vice versa)  It would be interesting to see how the number will differ if a larger population size is used, in which case a characteristics of information sharing will come to play a  more important role in order to achieve good, stable results.   From this is the conclusion that crossover operation can have an effect on the performance of the game, but to what degree it can have an effect seems to depend on other factors such as population size.