Spring 2000 Machine Learning Homework 4: Genetic Programming

Project: Othello
By Brian Whitman and Grace Junxin Zhang

Motivation

During testing and example play, we found that the provided Othello project GP packet using the default settings (GPOthello.ini) to train GP Player is not very effective to generate good Othello GP Players: it will give a best individual such as (- white black) and then stop. After some experiment and analysis of the result, we found there are several things could be improved besides changing the settings to increase the standard for a good Player:

Overview of Solution

1. Fitness measurement during the learning process

Let scoreEdgar be the score that Edgar get for a game against the GP player
Let scoreRandom_N be the sum of the scores that the Random Player get for N games against the GP player
Let scoreLength be the complexity of the hypothesis measured in terms of length

The original Fitness of a GP player is measured this way:

Playing five games against the random player is not enough to train a good GP player because most of the time, the random player plays poorly. But if we play more than 5 games we will increase the training time significantly.

Therefore we change our fitness measurement:

Our heuristic to use this formula is as following:
If we assume Edgar is a "good" Othello player, we can use it to distinguish between a good and poor GP player. Since both Edgar and the GP player play Othello deterministically, they can only play one game (all others would return the same result). But playing only one game against Edgar will result a player whose performance is rather unstable (a hypothesis that is not general enough, shaped only by one game). So, we let the GP player play four more games against the random player. According to the experiment, the score that a random player gets against a good GP player can be as low as 1, while Edgar usually gets at least 20.

So we should increase the weight of scoreRandom_N to make it more effective. In the original measurement, it favors too much of a shorter hypothesis. Intuitively, since Othello is a complex game, a good hypothesis for evaluating a board shouldn't be too simple. So we decrease the weight of the complexity penalty which is expressed in scoreLength.

2. Variations on the Hypothesis

We add following primitives to the GP tree:

Approach

Since Edgar is fixed to play WHITE, we modified the program so that the GP player plays BLACK and the random player plays WHITE. This way the GP player can play against both Edgar and the Random Player with a minimum of future modifications.

We conduct the following experiments to verify our solutions. To save time, we modified goodRuns to 2 in the default setting.

For detail of the variables please refer to the Results section

  1. use original packet and default setting as the baseline

    Train the GP player
    java GPOthello exp1
    Test the GP player
    java OthelloCache scoreonly "<function>"
    GP player played BLACK, Edgar played WHITE, they played 1 game
    GP player played BLACK, Random player played WHITE, they played 49 games against each other

  2. Modify the fitness measurement and settings
    Train the GP player
    java GPOthello exp2
    Test the GP player
    java OthelloCache scoreonly "<function>"
    GP player played BLACK, Edgar played WHITE, they played 1 game
    GP player played BLACK, Random player played WHITE, they played 49 games against each other

  3. Add num_op_pos_moves to original primitives, use the same fitness measurement and settings as in experiment 2.
    Train the GP player
    java GPOthello exp3
    Test the GP player
    java OthelloCache scoreonly "<function>"
    GP player played BLACK, Edgar played WHITE, they played 1 game
    GP player played BLACK, Random player played WHITE, they played 49 games against each other

  4. Add white_clusters and black_clusters to original primitives (do NOT include num_op_pos_moves), use the same fitness measurement and settings as in experiment 2.
    Train the GP player
    java GPOthello exp4
    Test the GP player
    java OthelloCache scoreonly "<function>"
    GP player played BLACK, Edgar played WHITE, they played 1 game
    GP player played BLACK, Random player played WHITE, they played 49 games against each other

Code modified:

For fitness measurement:
GPOthello.java GPOthelloGP.evaluate()
scoreEdgar+10*scoreRandom+0.2*scoreLength

For testing against both Edgar and Random player:
OthelloCache.java OthelloCache()
play 1 game against Edgar, 49 games against random

For adding num_op_pos_moves
* OthelloBoard.java
Add two Statistics data member:
OthelloBoard.num_white_clusters and OthelloBoard.num_black_clusters
OthelloBoard.updateBoardStatistics()

* GPOthello.java
GPOthello.createNodeSet()
GPOthelloGene.evaluate()
places that call OthelloGPTester

* OthelloGPPlayer.java
OthelloGPPlayer.evalFunction()
OthelloGPPlayer.evalSubFunction()
OthelloGPPlayer.evalMove()

* OthelloGPTester.java (similar to OthelloGPPlayer.java)
OthelloGPTester.evalFunction()
OthelloGPTester.evalMove()

*OthelloCache places that call OthelloGPPlayer

For adding white_clusters, black_clusters,
* GPOthello.java
GPOthello.createNodeSet()
GPOthelloGene.evaluate()

* OthelloGPPlayer.java
OthelloGPPlayer.evalFunction()
OthelloGPPlayer.evalSubFunction()
OthelloGPPlayer.evalMove()

* OthelloGPTester.java (similar to OthelloGPPlayer.java)
OthelloGPTester.evalFunction()
OthelloGPTester.evalMove()


Results

Settingsexp 1
stc
det
exp2
stc
det
exp3
stc
det
exp4
stc
det
PopulationSize200646464
NumberOfGenerations315050 50
TerminationFitness 5.0 25.0 25.0 25.0
GoodRuns 2 2 2 2
Fitness measurement scoreRandom_5 formula 1formula 1 formula 1
Good Runs
     goodRun #1
          Run number 114(**) 2 12
          best fitness 1.00 28.4 14.8 23.6
          RPB black RPB_1RPB_3 RPB_5
          test result(*) 0/47 1/47 1/49 1/49
     goodRun #2
          Run number 2 3(**) 13 12(***)
          best fitness 3.00 28.8 22.60 30.4
          RPB ( - black white ) RPB_2 RPB_4 RPB_6
          test result(*) 0/48 1/47 1/49 1/49

RPB_1: ( * ( - ( - ( * black_corners black_corners ) white_near_corners ) white_near_corners ) ( * ( * ( - ( * black_corners black_edges ) white_near_corners ) black_edges ) black_edges ))

RPB_2: ( + ( * ( * white_near_corners ( / black_edges white_near_corners )) white_corners ) black_edges )

RPB_3: ( * black ( / ( * black ( - black num_op_pos_moves )) white_edges ))

PRP_4: ( / ( + ( + ( + black_corners 10 ) ( - black white )) ( - ( - ( - black white ) ( + num_op_pos_moves white_near_corners )) ( - white white_corners ))) ( - white white_corners ))

RPB_5: ( + ( - ( / ( + ( + ( - white black_edges ) ( * black_edges black_clusters )) ( + ( - white_clusters 10 ) ( - white black_edges ))) ( * ( + ( - ( + ( / ( + black black_near_corners ) ( * ( / black black ) ( * 10 white_near_corners ))) ( * black_edges black_clusters )) ( / ( * ( * white_corners white_edges ) ( / white_edges white )) ( + ( / white_edges white ) ( * black_edges black_clusters )))) ( / black_clusters black_corners )) ( * 10 white_near_corners ))) ( / ( - white_clusters 10 ) ( - white_clusters 10 ))) ( + black_corners ( * ( * ( * 10 white_near_corners ) ( / ( + black black_near_corners ) ( + black black_near_corners ))) ( / ( / ( - white_clusters 10 ) ( - white_clusters 10 )) ( + ( / white_edges white ) ( * black_edges black_clusters ))))))

RPB_6: ( + ( - ( / ( + ( - white black_edges ) ( * black_edges black_clusters )) ( * ( + black black_near_corners ) ( * 10 white_near_corners ))) ( / ( - white_clusters 10 ) ( - white_clusters 10 ))) ( + black_corners ( * ( * 10 white_near_corners ) ( / ( * white_corners white_edges ) ( + black_corners white_clusters )))))

Footnotes

(*):
GP player will totally play 1 game against Edgar, 49 games against Random player. a/b means that the GP player won a(0 or 1) games against Edgar, and won b games against the random player

(**)
after 26 runs it still can not get a good run. We choose 2 individuals with the top fitness out of all the generations of all the runs.

(***)
At run 16, the program was killed unexpectedly, so we only got one good run before that.

Conclusions

Using the original fitness measurement can't generate a good GP player most of the time. We show this in experiment 1.

A modified fitness measurement can result in a GP player that is capable of defeating Edgar. We show this in experiments 2 and 3.

By adding the two new primitives num_op_pos_moves and white_clusters, black_clusters: (contrast experiments 3 and 4 with experiment 2)


Junxin Zhang, Brian Whitman, 4/4/00