In this project we apply a Genetic Programming algorithm to learn a program to play the game Othello. The main motivation and focus is on investigating if a good player can be produced by Genetic Programming without the need for strong supervision. This means that we would like the Genetic Programming algorithm to learn a good player without the need of a human player or a sophisticated automatic player like Edgar. It is probably true that in most problems in real life that we want to apply Genetic Programming to, there is no known good solution like Edgar or the cost of evaluating each hypothesis, by playing against a human for instance, is prohibitive. Genetic Programming addresses this issue by the standard method of a Competitive Fitness Measure, where the members of the hypothesis population evolve by competing between themselves. In this project, we show the results of some runs where hypothesis compete via the method of Tournament Selection.
Every hypothesis in the Genetic Programming algorithm is represented by functional tree. This tree simply computes a real valued function from the state of the playing board. This value can be used to decide favorable moves from unfavorable ones. The fitness of a hypothesis can be evaluated by playing a game against other hypothesis. In Tournament Selection, whenever two hypothesis are required for crossover, first two hypothesis are selected from the current population at random, second a tournament group of hypothesis is selected from the current population at random, third the first two hypothesis and the tournament group are played against each other in a single elimination tournament, at last, the winner and second place hypothesis are selected for crossover resulting in the children that will fill the next population. The size of the tournament group is a parameter of the algorithm.
Code changes required for this project include overriding the method betterThan(GP gp) in the class GPOthelloGP. This method is called from the Tournament Selection function, and it plays the called GPOthelloGP object against the given GP object (that is assumed to be an instance of the GPOthelloGP class too). The result is true if the called object wins the game, false otherwise, and if there is a draw, the complexity of the hypothesis is used to decide for the simplest (shortest) one. In addition to this, every hypothesis is played 5 games against a random player, and the average score obtained against the random player is added to the score in any tournament game, thus a hypothesis must do well, on average, against both the random player and the other competing hypothesis to win the tournament.
A minor difficulty exists when implementing the betterThan(GP gp) method, in that all hypothesis are trained to play Othello as the black opponent. When playing hypothesis against each other, one of them must play white, color for which we are not training them. The difficulty is solved by inverting the board colors and scores before whichever hypothesis is playing as white makes its move, after the move is made, the board colors and scores are inverted again.
With the Java package we are given a fairly good player, Edgar, that could be used for evaluating hypothesis by playing against it. But given the aim of this project, we cannot use it for training, we want to investigate if Genetic Programming can produce a good player from little supervised information and we would not want to produce hypothesis that are overfitted to Edgar. Hence Edgar is used here only for testing, a method evaluateTest() was added to the class GPOthelloGP to evaluate the best hypothesis produced by the GP algorithm. The method evaluateTest() plays against Edgar. The best hypothesis of each generation is played against Edgar, to track the progress of the GP algorithm.
Population evolution by Tournament Selection is more computationally expensive than standard Probabilistic Selection, because for each individual selected for the next population, a number of evaluations (games) roughly equal to the tournament size is required. For Probabilistic Selection only one evaluation per hypothesis is required. Here are the results of the biggest population run performed for this project:
PopulationSize = 1024
NumberOfGenerations = 25
CrossoverProbability = 90.0
CreationProbability = 0.0
CreationType = RampedHalf
MaximumDepthForCreation = 6
MaximumDepthForCrossover = 17
MaximumComplexity = 100
SelectionType = Tournament
TournamentSize = 10
DemeSize = 100
DemeticMigProbability = 100.0
SwapMutationProbability = 0.0
ShrinkMutationProbability = 0.0
TerminationFitness = 0.0
GoodRuns = 1
DemeticGrouping = false
AddBestToNewPopulation = false
SteadyState = false
PrintDetails = true
PrintPopulation = false
PrintExpression = true
PrintTree = true
TreeFontSize = 12
UseADFs = false
TestDiversity = true
ComplexityAffectsFitness = true
CheckpointGens = 0
RPB: + (2) - (2) * (2) / (2) 10 black_near_corners
white_near_corners black_corners white_corners black_edges
white_edges black white
Run number 1 (good runs 0)
Too complex 0
Duplicate 241
Gen| Fitness | Complexity | Depth |Variety
| Best Average Worst| B A W| B A W|
0| 3.00 141.14 373.00| 3 15.7 63| 2 3.7 6| 1.000
1| 1.00 106.48 334.00| 1 12.9 63| 1 3.7 6| 0.882
2| 1.00 94.38 334.00| 1 12.2 39| 1 3.8 6| 0.867
3| 1.00 89.85 327.00| 1 11.8 27| 1 3.9 11| 0.865
4| 1.00 88.30 301.00| 1 13.4 11| 1 4.2 4| 0.884
5| 1.00 83.57 306.00| 1 13.7 21| 1 4.4 6| 0.902
6| 1.00 84.21 309.00| 1 14.6 19| 1 4.6 7| 0.907
7| 1.00 85.20 304.00| 1 15.4 79| 1 4.9 9| 0.924
8| 1.00 81.57 293.00| 1 15.7 3| 1 5.0 2| 0.900
9| 1.00 85.14 301.00| 1 17.1 17| 1 5.3 5| 0.921
10| 1.00 83.49 295.00| 1 18.1 31| 1 5.5 8| 0.932
11| 3.00 89.05 312.00| 3 20.1 35| 2 5.8 8| 0.936
12| 1.00 86.93 335.00| 1 22.0 87| 1 6.1 12| 0.926
13| 1.00 88.73 304.00| 1 23.2 45| 1 6.3 8| 0.940
14| 1.00 91.01 359.00| 1 25.3 85| 1 6.5 10| 0.961
15| 3.00 93.84 306.00| 3 27.3 31| 2 6.8 7| 0.966
16| 3.00 91.56 307.00| 3 28.1 59| 2 7.0 15| 0.961
17| 3.00 91.38 319.00| 3 29.5 39| 2 7.2 10| 0.964
18| 1.00 95.87 328.00| 1 31.8 33| 1 7.5 7| 0.971
19| 3.00 98.06 325.00| 3 33.0 57| 2 7.7 10| 0.970
20| 3.00 91.56 324.00| 3 34.5 59| 2 7.9 10| 0.972
21| 3.00 91.77 344.00| 3 35.6 97| 2 8.1 13| 0.966
22| 1.00 92.43 347.00| 1 36.7 27| 1 8.3 10| 0.963
23| 3.00 91.33 366.00| 3 38.5 81| 2 8.5 13| 0.965
24| 3.00 94.70 311.00| 3 40.9 59| 2 9.0 12| 0.979
25| 3.00 95.29 342.00| 3 43.7 91| 2 9.2 11| 0.976
Fitness values shown are as obtained from playing against
the random player only. The following hypothesis has the best
fitness against the random player only. Such fitness value is
only part of the competitive fitness measure and is added to each
tournament result as explained before.
========================================================================
Generation:25 best:118 worst:920
GP# dad mum oper mut shrk mut fitness len dep
===== ============= ============= ======== ======== ==========
==== ====
B 118 52:RPB:1 1014:RPB:46 3.00 3 2
RPB: ( * black_edges white )
GPOthello.evaluatetest(), GP: 53.0, Edgar: 11.0
These are the best four hypothesis from the last population, they were selected by single elimination tournament over the whole population, each was played against Edgar and the resulting score is shown (the winner shows the lowest score, since we are thinking in terms consistent with fitness):
RPB: ( + ( * ( + black ( *
black_corners black_edges )) ( + black_corners 10 )) ( / ( / ( -
( / ( * black_corners black_edges ) ( * black_edges white )) ( /
( * black_near_corners black_edges ) ( / ( + ( + black
black_corners ) ( + ( / white black_edges ) ( - black_edges ( -
white ( * black_corners black_edges ))))) ( * black_edges white
)))) ( * black_edges white )) ( * black_edges white )))
GPOthello.evaluatetest(), GP: 39.0, Edgar: 25.0
RPB: ( - ( * white ( + black
( - black_edges ( + black_corners 10 )))) ( / ( + 10 white ) ( *
( * black_corners black_edges ) ( * ( + black_corners ( *
black_edges black_corners )) ( + black_corners 10 )))))
GPOthello.evaluatetest(), GP: 20.0, Edgar: 44.0, Edgar loses here
!
RPB: ( + ( * ( * ( *
black_corners black_edges ) ( + 10 white )) ( + black_corners ( *
black_edges black_corners ))) ( / ( / ( + black ( - 10
white_near_corners )) ( + white ( / ( * black_edges white ) ( * (
* black_near_corners white_corners ) ( + black_corners 10 ))))) (
* black_edges white )))
GPOthello.evaluatetest(), GP: 38.0, Edgar: 26.0
RPB: ( + ( * ( - 10
white_near_corners ) ( - ( * black_corners black_edges )
white_edges )) ( + 10 ( * black_edges black_corners )))
GPOthello.evaluatetest(), GP: 44.0, Edgar: 20.0
Run time 38283.89 seconds, 1507.80 seconds/generation
We are encouraged to find at least one hypothesis that can beat Edgar and that was generated without involving Edgar during training at all. The winning hypothesis looks rather complex, and should be tested against other players, like humans. It is possible that although that hypothesis beats Edgar, it may not be general enough to beat other players.
Future interesting work include extending the function node set and/or using ADFs. For this project just the function nodes given were used. Also more generations and bigger populations, run on a fast machine, may produce better results.