Using Diagonals, Time, and Complexity for a Better Othello Player
Written by Jesse Schechter

Introduction

The effectiveness of an othello player evolved by genetic programming depends directly on the usefulness of the non-terminals to which it has access. Without meaningful non-terminals, such as pieces on the corner, the program is blind to important aspects of the game that are not ascertainable from the number of pieces on the board or other simple concepts.

Another consideration for a genetic programming task is selection of learning parameters and how big to allow the program to grow. A tree that grows too big will be difficult to understand and will require more time to test. Following Acham's razor that the simplest solution with a given performance is probably the best, genetic programming techniques often punish programs for having too high of a complexity.

In this experiment I added three new non-terminals to the basic othello player, the number of white diagonals, the number of black diagonals, and how many moves had been made so far. The motivation for these changes were based on my own experiences playing othello and watching other programs, like EDGAR, play. The number of pieces on the edges seemed to be less important to me early in the game and became more so near the end, when they could not as easily be changed. Because of this difference I decided to see if the genetic programs created would benefit from having access to information about the distance into a game. I added the diagonals because they create a variation of an edge; if an entire diagonal is owned by one color, it gives that player an advantage toward earning the corner pieces.

How does the time help the program learn? By itself, time can't distinguish one state from another, but it can be used to scale other factors during the course of the game. Even though each board state has the same value, some factors can have more effect at the end of the game than at the beginning because of the time factor, allowing the hypothesis to compare the board states according to different attributes at different times.

For example, one tree I got used (time * white), which conceptually might mean that it is relatively unimportant at the beginning of the game to have a lot of pieces on the board compared to later in the game. This makes sense since the pieces can change a lot more at the beginning than at the end, so the time factor prevents the hypothesis from getting too greedy early in the game, but puts a big emphasis on number of pieces at the end of the game, which also makes sense. Other factors are allowed to have more relative influence early in the game.

A similar hypothesis that just used "white" without the scaling of time might make decisions early in the game that look ok but lead to local maxima that aren't good in the long term, and it might not pay enough attention to the number of pieces on the board later in the game. The number of white pieces would count the same at the beginning as at the end.

Basically, time by itself is not good for evaluating a board state, but it seems to help the board evalutation by adding a scaling factor to other factors. It will be the same scale for each board, so maybe it just replicates a scaling factor, but it seems to allow the tree to emphasize some ideas at the beginning of the game and others at the end by changing the relative importance of the attributes it scales, like the number of white pieces in the case above. This way, even though they all have the same value, some can be more affected by it by having the wrong mix of attributes at some point in the game, and the evaluation has a dynamic weighting of the different attributes of the board.

I changed some of the learning parameters from the basic design with which we started. Instead of 200 hypotheses in a generation I used only 100. I also only trained over 31 generations instead of 51, mostly to save time. The biggest change I made was to remove the penalty for complexity on any one hypothesis. I was not as concerned with understanding the resulting player as getting the best one I could, and the resulting test time was not slowed down terribly. This technique allowed important parts of the tree to be used more than once, removing the need for numeric constants and giving effective parts an even better chance to be crossed into a new hypothesis.

I tested the resulting programs 50 times each against a random player to give them a chance. Since my purpose was to compare results of trees that used the above changes versus those that didn't, I could find relative results easier using random players since the chance for a win was higher. If I had used EDGAR or a human, I might have had to compare one win to two (if the player was lucky), so I decided that random would allow better numbers for comparison.


Approach

The non-terminal "time" was implemented by adding the number of white pieces to the number of black pieces on the board, which gives a measure of how many moves have been made. The non-terminals concerning the diagonals were implemented by checking if a piece's x and y coordinates were the same or added up to nine (like [1, 8] which would be the first row, last column, or [1, 1] which would be the first row, first column). Unfortunately I did not exclude corners from the definitions of the diagonals, so the two concepts might have interfered with each other.

Since there was a file that defined all of the learning method parameters, the changes I made concerning these were implemented simply by changing the necessary numbers.


Results

Of the five best trees that resulted from my experiments using the new primitives, two of them used the concept of time, and two were complex enough that they would not have survived a run in which complexity was penalized. Two of the other programs only used the concept "white", and the last used "white" and "white_corners". The best resulting program, which won 44 out of 50 games against the random player, used the non-terminal "time" but did not use the diagonals. It was relatively complex, using 21 non-terminals compared with only two for the second-best program, which won 42 games. The program that did use the diagonals concept had 15 non-terminals and won 36 games. The last two programs in the top five each had two non-terminals and won 37 and 39 games, respectively. Neither used time nor diagonals.

The best program has several repetitious subtrees, including (- black_near_corners black_corners), which was used four times, (* time (- black_near_corners black_corners)), which was used twice, and (- white black_corners), which was also used twice. This repetition replaces automatically defined functions (ADF's) in a sense by allowing subtrees to grow and reproduce regardless of how big the resulting tree becomes. In this way a particularly useful subtree can be replicated in a tree as if it were an ADF. The advantage to avoiding ADF's in this way is that we do not have to know beforehand how many ADF's to allow, and the important parts of ADF's can be crossed into other trees as further subtree divisions, such as the (* time (- black_near_corners black_corners)) subtree just discussed, whose own subtree, (- black_near_corners black_corners) was also used on its own. Although this process might have happened in the other direction, the consequent paring of important subtrees and recombination is easier without ADF's and can be facilitated by allowing complex trees.

The program that used both the "time" and "diagonals" concepts had a different type of repetition. The prefix descripition of this tree is the best description:

* * * * - * - - * * * - - * time white white_diagonals white_diagonals white white white white_diagonals white_diagonals white white_diagonals white white white white.

Starting from the most embedded subtree, (* time white), the number of white pieces is multiplied each time it is seen, showing that it is beneficial, and the number of white_diagonals is subtracted, showing that it is not desirable to the player. The fact that these results are used more than once shows that the complex tree can replicate the use of numerical constants by repeating the relevant subtrees. In this sense again, allowing a complex tree prevents the designer from having to preset the constants that might be necessary, just as the amount of ADF's would not need to be set.


Conclusions

Of the two concepts introduced as new non-terminals, both seemed somewhat useful and were used in different cases by the genetic programs. Between them, the time element was used more often in the successful trees and was used in the most successful tree, indicating that it might have been a more important concept than diagonals, although this might be due to the overlap of corners and diagonals. Programs for playing othello developed in the future may wish to consider these concepts along with edges and corners.

The most surprising result was the emergence of pseudo-ADF's and the simulation of constants by repetition. These results imply that although simple solutions are often desirable for understanding and speed, allowing complex solutions can facilitate the emergence of useful properties without forcing the designer to make assumptions about their forms. Instead of penalizing hypotheses for being complex and thus losing the benefits larger trees may offer, it might be a better idea to restrict how complex a solution can be in an absolute sense but to show no bias for the simpler solution between two hypotheses that are both under the maximum complexity. The more complex solution might be able to contribute more in the future generations.