Homework 4: Genetic Programming Project Othello
by Terrence J. Truta
The use of search in game trees is a powerful technique that can be used to create challenging opponents. For search algorithm such as Alpha-Beta and Min-Max, an effective evaluation function is key to its success. However, effective evaluation functions for non-trivial problems are usually difficult to derive.
The purpose of this experiment is to see if an effective evaluation function for the game Othello can be developed using genetic programming. When combined with a search algorithm a good heuristic should produce a worthy Othello opponent.
In this experiment, an Alpha-Beta search tree applied a genetically evolved function at its end points both during training and testing. The fitness measure during evolution was obtained by playing one game against Edgar. Variations on training parameters included population size, number of generations, crossover percentage, and whether the complexity of the function affected the fitness measure. A static "greedy" player and myself tested the performance of the best individuals.
In order to perform this experiment I had to add a few classes to the Othello implementation that was given for the assignment. OthelloSearchTester and OthelloSearchTester are two similar classes that are derived from the OthelloPlayer class. Both override the non-searching bestMove evaluation function and implement an Alpha-Beta search in its place. The depth of the search during both training and testing was held constant at two. OthelloGreedyPlayer also is derived from OthelloPlayer, which is used during testing. Its evaluation function simply returns the total amount of its colors pieces resulting from a move.
I also changed some parts of the existing code. I made a change so that during fitness evaluation, each individual plays Edgar one time instead of five. Edgar and the individual make the same moves against each other every game so every game played will have the same outcome. This reduced the execution time of the algorithm and helped to offset the increased training time introduced by Alpha-Beta searching. In addition, I modified OthelloDisplay so I could play a visual game against all the evolved players.
I methodically performed a series of experiments varying the parameters mentioned in the introduction section. The different population sizes that I used were 200, 400, and 600. Each population size was tested for 5, 10 and 15 generations. Each population size and generation was run with crossover at 50% and then again at 90%. The ComplexityAffectsFitness parameter was also varied. Unfortunately, only one run of each experiment was performed because training time was too long to repeat all the experiments. However multiple runs were simulated since all the parameters were repeated for each value of the generation parameter (5, 10, and 15). For example, when a particular variation of the parameters ran for 10 generations it would need to repeat the evolution that was done when it ran for 5 generations.
As previously mentioned, performance of the best individual for each experiment was measured by playing against a greedy Othello player as well as myself. In both cases only one game was played. While the results against the greedy player are fairly objective, the results against myself may be questionable. For example, I saw the evaluation function of each evolved player so there was an opportunity to exploit this knowledge. In addition, I played so much Othello during testing, I feel that I improved and became harder to beat. So although I tried my best to play fair with all the evolved players, the results probably aren't completely objective. To get an idea of my playing ability Table1 shows the results of 5 games with me vs. Edgar.
Table 1: Me vs. Edgar
Some rough baselines for the experiments are presented in Table2. The greedy search player did very well against the random player. Naturally, the searching greedy player defeated its non-searching greedy counterpart. However, the greedy searching player lost to Edgar and to myself. In order to conclude that an evolved player actually learned the concept of Othello more than a simple greedy player it should not only beat a greedy player but it should be able to defeat me as well.
Table 2: Searching Greedy Player versus other players
|Searching Greedy Player:
|Searching Greedy Player:
Non-searching Greedy Player:
|Searching Greedy Player:
|Searching Greedy Player:
The ComplexityAffectsFitness parameter had a substantial impact on the performance of the evolved individual. Table 3 shows the results when the parameter was set to false. All the experiments evolved an individual that achieved a terminal 0 fitness measure in the second generation. This means the evolved player completely shut out Edgar. However they performed poorly during testing. Three of them beat the greedy player but none of them were able to defeat me. The resulting function trees were large. The trees have 13 to 63 nodes. By comparison, the trees grown with ComplexityAffectsFitness set to true have 3 to 11 nodes. Clearly overlearning took place. The large function trees that were grown were able to play a perfect game against Edgar by exploiting its specific strategy. However they failed to generalize the concept of Othello.
Crossover percentage seemed to play a small part when the ComplexityAffectsFitness parameter is turned on. The three times in which the evolved individual defeated the greedy player seems to corresponded to the cases when the crossover percentage was set to 50%.
Table 3: Results of experiments with ComplexityAffectsFitness set to false*
|Population||Generations / Terminated||Crossover %||Tree Length||Best Fitness (vs. Edgar)||Beat Greedy Player||Beat Terry (1st game)|
|200||5 / 1||50||63||0.00||Yes||No|
|200||5 / 1||90||27||0.00||No||No|
|400||5 / 1||50||13||0.00||Yes||No|
|400||5 / 1||90||21||0.00||No||No|
|600||5 / 1||50||27||0.00||Yes||No|
|600||5 / 1||90||59||0.00||Yes||No|
* The results of the experiments with generation set to 10 and 15 are omitted from Table 3 since they also reached their terminal fitness at generation 1 and therefore their results are comparable to the ones given in the table.
As a group, the experiments run with the value of ComplexityAffectsFitness set to true did much better learning the concept of a good Othello player. Of the 18 different experiments I ran (shown in Table 4), 10 of them were able to beat the greedy player. In addition, 4 of them were able to beat me in the first game I played against them. This indicates that penalizing an individual for complexity is an effective way to reduce overlearning.
Table 4: Results of experiments with ComplexityAffectsFitness set to true
|Population||Generations||Crossover %||Tree Length||Best Fitness (vs. Edgar)||Beat Greedy Player||Beat Terry (1st game)|
Population size seemed to play a part in the successfulness of the resulting players. Table 5 shows clearly that in the majority of cases where the evolved player beat the greedy player or myself the population size was set at 400. Table 5 also shows that players evolved with populations of 600 performed worse than players do with a population of 200 or 400. Its puzzling why a population size of 400 produced the most successful players and a population of 600 seemed to produce sub par players. It might simply be that 400 is closer to the optimum population for the current configuration of the Othello project.
Table 5: Victories for evolved players based on their population size
|Population Size||Beat Greedy Player||Beat Terry||Total victories|
Intuitively one might think that with a population of 600, more generations would be needed for the effective hypothesis parts to come together form something useful. However if this were the case, then one might expect the performance of the best individuals to improve as the generations increased from 5 to 15. The results dont support this. It may be the case that a population of 600 individuals need more than 15 generations to evolve.
It is interesting to point out that all of the evolved programs listed in Table 4 probably beat Edgar during the fitness evaluation. All of the fitness measures are less than 32 which is the number needed to tie an opponent in Othello. The fitness measure is the number of Edgars pieces on the board at the end of the game plus the complexity penalty. Anything less than 32 and Edgar probably lost since he probably had fewer pieces than the evolved function. Alternatively, Edgar could have shut out the evolved player and therefore could have won with a final piece count less than 32. However this scenario is less probable so for now lets go with the assumption that the evolved player won. Since Edgar beat the searching greedy player and the best evolved player which beat Edgar didnt beat the non-searching greedy player during performance testing, it seems probable that overlearning took place. This is somewhat disconcerting because some trees that apparently overlearned in Table 4 only have 3 nodes. This means than invoking a penalty for complexity cant stop overlearning by itself.
The other parameters that were varied during the experiments didnt have much of an impact on the performance. Tables 6 and 7 present the victories by generation and crossover percentage respectively. As you can see by this data, it is not as easy to draw a correlation between number of generations or crossover percentage and how successfully the evolved players perform against unseen opponents.
Table 6: Victories for evolved players based on the number of generations
|Generations||Beat Greedy Player||Beat Terry||Total against Greedy and Terry|
Table 7: Victories for evolved players based on the percent of crossover
|Crossover %||Beat Greedy Player||Beat Terry||Total against Greedy and Terry|
Table 8 shows some of the evolved function trees.
Table 8: Examples of evolved functions.
|Population||Generations||Crossover %||Fitness Measure (vs. Edgar)||Beat Greedy Player||Beat Terry (1st Game)||Tree|
|200||5||90||20||No||Yes||( - ( / black_edges black_near_corners ) black_near_corners )|
|400||5||90||18||Yes||Yes||( - 10 ( - ( * white_near_corners ( * white_corners white )) white ))|
|400||15||90||9||Yes||No||( - ( * white_near_corners ( * ( - black_edges 10 ) black )) white )|
There were a number of problems and anomalies I encountered during the experiments. Most of the time I managed to work around them so they didnt have an impact on the accuracy of the results. For example, I originally planned to use the RandomPlayer for fitness testing. However, the RandomPlayer behaved too randomly for me to trust it for the experiments. Playing the random player against the greedy player had different outcomes on different Java VMs. The random player won consistently on IBMs VM and lost consistently on Suns VM. I worked around this by using Edgar for fitness testing.
One existing anomaly leads me to question the accuracy of the results. Recall that the fitness measure during evolution is the amount of Edgars pieces on the board at the end of the game (plus a penalty if the function is large with the ComplexityAffectsPerformance flag turned on). Recall also that the outcome of a game between Edgar and an evolved player is always the same because both players will make the same moves during the game against each other. I tried to verify this by playing Edgar against an evolved player. The score didnt always match. For example, in all of the experiments where I turned off the ComplexityAffectsPerformance flag, the best individuals had fitness measures of 0 indicating a decisive defeat of Edgar. However when I used one of these evolved trees to play a separate game with Edgar, Edgar actually won. I did a little debugging and found that when a tree is relatively small (3 to 5 nodes) the fitness measure matched the result of a game with the tree and Edgar. However for larger trees the fitness measure didnt always match the result of a game with Edgar. I suspect that there might be a bug within the tree creation or tree parsing routine within the Othello implementation. This behavior would be expected if the internal representation of an evolved tree in a GPOthelloGP object is different than the generated String representation. This anomaly causes me not to have less confidence in the accuracy of the performance results of the larger trees. In particular, I question the performance results of the whole set of experiments run with the ComplexityAffectsFitness flag set to false (shown in Table 3).
Assuming that the results are correct the following conclusions can be made: