# Othello Unbowed

## Barry Schiffman

We are given a genetic programming shell written in Java and extended classes needed to develop a machine-learned Othello program. The extensions include facilities to play the game, to evolve and test players, and an expert player called Edgar.
Our long-term goal is to produce a program -- a simple mathematical function to evaluate potential board states with only addition, subtraction, multiplication and division -- that will best Edgar in a 50-game contest.

# Overview

We are going to expand the hypothesis representation in the hope of evolving a player that can consider moves by looking ahead at the opportunities inherent in a particular move. Using observations about the parity of the row and column indices as well as the quadrant of the move, we hope the player will learn to position himself for a favorable endgame.

In learning methods, we are going to try two methods. We are going to train directly against a modified Edgar and we are going to try to evolve against several opponents, beginning with a random player, and then training against the player that results. The modified Edgar is a version of Edgar that plays randomly for a specified number of turns, and then plays as Edgar does. To facilitate these goals, we have extended the code to allow us to alter the player color we are training as and the player type that we train against (Random, Heuristic, RandomHeuristic, Edgar, RandomEdgar) from the .ini file.

For evaluation, we will run various tests against Edgar and against some players using some common-sense heuristics. Again, in order to play a variety of games, we will use the extended code to make a small number of random moves before the players' intelligence kicks in.

# Alteration to the Hypothesis

In view of the short development time, we decided to rework the description of the hypothesis although the basic concept is similar.

We decided to stay with the four arithmetical operators and to expand the concepts inherent in the given primitives. The thinking is that some perception of board location will be sufficient to allow the evolutionary operations of the underlying GP program to find an effective evaluation formula for moves by our automated player. We believe that this approach is reasonable for a board game played on a finite space with a finite number of moves.

Our opponent, Edgar, maintains a substantial data base of learned weights for various locations and evaluates moves according to those tables.

As a starting point, the GPOthello player we are given is able to perceive the exact location of all squares on the board and keeps count of specific positions considered important, including the corners, which are advantageous, and the near corners, which are dangerous. This player also tracks the total number of pieces.

We expanded this notion and examine all locations with respect to future moves, employing a calculation of parity that depends on which quadrant of the board we are considering. We observe that when we make a move, the next open square in that direction (whether along a row, column or diagonal) may be taken by the opponent. We are looking to set up moves into the corners and away from the near corners.

The key observation is that a contiguous set of marked squares exists starting from the center of boards, and each move consists of marking a square on the fringe -- in the direction of one edge. The row, column or diagonal may then flipped to the color of the maker of the move. The board is 1..8 by 1..8. As a heuristic to guide the choice between moves, we can easily compute the potential of a move. If the player extends the set of marked squares in the direction of row 1, it would be detrimental to be on an even-rowed square, since that will not lead to the capture of an edge piece. The only possibility to recover from a "bad" move, i.e. a move heading toward a near corner, would be to capture a row or column that had been flipped by the opponent, assuming the opponent will choose a move to a position in a sequence to land in a near corner.

All eight adjacent squares have to be checked to see which are open, and the parity computation has to be made in all open directions -- possibly as many as seven but more likely three -- for each potential move, including diagonals. Since the quadrants of the board have to be taken into account for the diagonals, there are a lot of cases. Counts are kept maintained for:

• Corner squares
• Edges (not including squares adjacent to corners)
• Total squares
• Hi squares, those in favorable position to move toward Row 8 or Col 8.
• Lo squares, those in favorable position to move toward Row 1 or Col 1.
All counts are kept for both black and white pieces in the hope that the subtraction operator will be used to adjust for an opponent's relative strength. Moves along the diagonals would be counted twice.

The given GP Othello player marks 16 specific locations for counting -- a corner and three near-corners for all four corners of the board -- as well as total pieces. While our arrangement adds work to this scheme, note that the computational effort would be the same for an 100 by 100 board as a 8 by 8 board.

# Learning Methods

We will try two learning methods to evolve a player. One involves training directly against a modified Edgar opponent and one involves successive evolutions, beginning with training against a random player and once we can beat the random player, training against that player.

The motivation of training against Edgar is clear. Edgar is an expert player who seems to be able to beat most people, although some in the class say they have taken his measure. In order to effectively train against Edgar though, we want to have some variability in the games, since each individual plays five games against the opponent to measure their fitness. Since the "programs" we are developing are completely deterministic, and Edgar is as well, they would play five of the exact same games. We made a modified Edgar that will play as a Random player for a user specified number of turns, and then begins to play like a normal Edgar. This allows us to get different results from the set of five games played between our individuals and the modified Edgar. It is not clear at this time, however, what the threshold should be for Edgar to switch from Random play to Edgar play.

Initially one might think that by lengthening the amount of time that Edgar plays randomly we will be decreasing his intelligence, thus creating an opponent that our programs are more likely to be able to beat. It is possible, however, that the game of Othello is often won or lost by the "end game" or that the importance of the last few moves is disproportionate to the importance of the moves in the rest of the game. Under this framework our individuals will still have a tough time because Edgar will make the last few moves of the game correctly, negating whatever poor initial performance resulted from earlier Random play.

The motivation for evolving against successive evolutions of individuals is basically an elaboration based on a Darwinian metaphor. A species will develop gradually under a certain set of circumstances: a stable climate and ecology. Then some cataclysmic event changes the world, a collision with a comet, an ice age or a new species, and a species must make drastic adaptations or die out.

Indeed, we found that in the first evolution, a player that reliably defeated the random player evolved. It had a relatively simple formula that often reduced to counting the number of it's pieces. The formulas grew increasingly sophisticated and defeated the random player in a much higher percentage of games and with much higher scores.

To make experimentation feasible in such a short time, we extended the GP Othello code to allow us to easily change player color and player type (random, heuristic, Edgar) from the .ini files that we trained as and against, instead of recompiling, and allow us to create a larger variety of games by choosing a threshold for beginning to use their "intelligent" behavior.

Using this facility, we are able to attempt to avoid the danger of over fitting from the opening that a player with a heuristic of our own making or by the Edgar player, since we can set the first few moves on one or both sides to be random.

We noticed that we could train against Edgar and occasionally come up with a successful formula for that specific match, with the exact same sequence of moves on both sides. When tested in a more complete way later, this player proved to be incompetent. With the slightest variation on the opening moves, Edgar prevailed.

# Evaluation

In the evaluation phase of this experiment, we will use similar extensions of the code to allow different test runs, switching color and type of player on the command line of the OthelloCache class.

Our main evaluation was against Edgar modified to play an arbitrary number of moves at random before switching to Edgar strategy. We also note results of the strength of the various players against the Random player. We devoted considerable time in the two weeks to explore why Edgar was so tough to beat and to come closer to a solution.

What we did, was to rework the code to make it possible to test our players and Edgar thoroughly by allowing play to start off at random and then switch to deterministic strategies of our formulas and Edgar's tables.

Establishing a baseline was thus difficult as the random player turned out to be too easy to beat to be a telling benchmark, and Edgar too hard. We tried to create a heuristic by hand that would embody some qualities of a good player and to use that in a test against our evolved functions. While these were able to beat a random player (after all random against random should produce a 25-25 split), none made progress against Edgar.

# Results

Overall, we were not happy with the results. We were never able to make a strong challenge to Edgar.

## Evolving Against Edgar

When evolving against Edgar, or some random version of him, we were not able to develop consistently strong othello players. All of the best functions that we evolved leveled off at a fitness of about 60-70, and could not eclipse this level of performance. This seemed to be a barrier that we could not break regardless of how long we told Edgar to be random for - I tried two main evolutions at 10 random turns, and 15 random turns, as well as some side evolutions at 5 random turns. All of the results yielded similarly performing functions, which leads me to believe that Edgar is able to make up for any random behavior initially in his good end game. In a situation like this, it would be ideal to give feedback to a GA as the game progresses, instead of just at the end of the game, since othello seems to be split up into two distinct temporal units.

The best function from an evolution evolving against an edgar that plays randomly for 15 moves is:

``` ( +  ( -   black_edge  ( +  ( +   black_hi   black_edge )  black_hi ))  black_corners )
```

This player generally loses to Edgar, but not always. Playing against the same Edgar that this function trained against, an Edgar that plays randomly for the first 15 moves, this player won 12 games to Edgar's 38. Of these 12 wins 9 of them were unique wins, while the other 3 were games that played along the same path. The average margin of victory for Edgar was 26 pieces, while our player's average margin of victory was 17.1 pieces.

Against a random (white) opponent,

White won 37; black won 13

Despite having some success against Edgar, against a random opponent, this player does poorly. I believe that is because this player has "overlearned" for the particular case of how to play Othello against this Edgar; but in general, these are not good strategies to apply against a random opponent. Looking at the games that we won against Edgar, one of the common winning sequences was a shut out of Edgar at 13-0, which tended to lower the average margin of victory. Since most of the GPs in the population will lose to Edgar during evolution by a pretty high margin, any algorithm that can sometimes win will receive a marked increase in it's fitness relative to the rest of the population. Random play for a certain number of turns has the effect of giving Edgar a different initial board to play from, which he then proceeds to play deterministically over. This player took advantage of Edgar's determinism to develop a strategy that could shut out an Edgar early in the game from a certain initial board, thus specializing itself to play to one of Edgar's deterministic paths.

The best function from the second evolution was:

```( +   black_corners   black_lo )
```

This function also does not win at all against Edgar. This player lost to Edgar by an average margin of 24.2 pieces. This seems to me an improvement over a random player against edgar, who in 50 games lost them all by an average of 34.4 pieces.

Against a random opponent,

White won 9; black won 41

This function does do better against a random opponent, although not as well as our best functions that trained on a Random as an opponent. (The best player from those evolutions beat Random players 47 times out of 50 games.) In this case, the player did not do as well against Edgar, but was able to learn enough through training against him to beat a Random opponent with some regularity.

The best function from the third evolution was:

```( /  ( +  ( +   10  ( *   black_edge  ( /   10   white_lo ))) ( - white_hi   white ))  white )
```

Against a RandomEdgar who is random for 15 rounds, this function lost 45 games by an average margins of 20.6 pieces, and won 5 games by an average margin of 19.4 pieces.

Against a random opponent, White won 1; black won 49

Even better than previous attempts against a random player, however, still not good enough to best Edgar consistently! This was the last real evolution that I attempted against some form of Edgar, because I ran out of time at this point. I am not sure that we would have been able to evolve a formula that could beat Edgar with the terminals we used to express our evaluation functions; I believe that a more expressive set of terminals is necessary.

Curiously, a function that evolved against a random edgar found a way to beat him. The function was:

``` * black_hi / black white_lo
```
and shut Edgar out 33-0. Against a random player however, this function only won 43 games and lost 7. That is not terrible performance, but we were able to achieve better performance against random by evolving directly against random. Against a randomized edgar, however, this function loses the majority of it's games. It will win at 33-0 sometimes still, but that is the only "game" that it can win; the other games that edgar takes over after a few moves are wins for edgar. In fact, the more random the edgar opponent plays, the better he does; with two rounds of random play, edgar wins 26 of the games. When he plays randomly for three rounds, he wins 38, and at four rounds he wins 45. This percentage will continue to increase as the longer edgar plays randomly, up to a certain point. This particular player is able to beat Edgar when he takes the moves that he would if he played deterministically the entire game. The more rounds that Edgar plays randomly, the less likely that the Random player will follow the pattern that this function is able to beat. In this case the function has really overlearned for Edgar, but unlike in the case of the first function in this section, this function had some success against a Random player as well.

## Evolving Against Random

We found early on that it was easy to evolve a player that beat a random player. In some case, the formula that evolved was simply to count the number of white pieces.

We did get some more interesting formulas, like:

```      (- black_edge black)
```

But the tally for 50 games against a random white player with this heuristic was:

White won 42; black won 8
Total White Pcs: 1512; total black pcs: 572

Since none of the early formulas were very interesting, or seemed to come close to beating Edgar, we decided to evolve on the best-looking function from the first evolution. To do this we reversed the colors, so that the formula, (- black_edge black ), which evolved from the first training against a black random player, became
(- white_edge white)

The second evolution produces some more interesting results

```     - + / white_edge white_edge - white white_lo white_hi
```

White won 47; black won 3
Total White Pcs: 1971; total black pcs: 1229

and

```     / + / + white_lo white_corners + white_lo
* / white_edge black_lo - + white / 10
black white_edge white_edge white_hi
```

White won 46; black won 4
Total White Pcs: 2246; total black pcs: 945

In comparison with our second evolution results, we trained against Eleazar's parameters and evolved a player at roughly the same level. (Most of the formulas from these runs did not use his explicit count of near corners):

`     - * white white + white black_corners`

White won 45; black won 5
Total White Pcs: 1947; total black pcs: 1231

The obvious move was to try a third evolution, with training using one of the formulas obtained in the second evolution, and we obtained some very odd results: The formula came out to be a constant: 10.

We had no time to repeat this experiment, and the results are suspect, but it may be that our formula was vulnerable to a player that worked toward the NW corner of the board first, as a constant formula would favor the choice of moves at the head of the list, which would be the square closest to (1,1).

We began to suspect that the white player seems to have an inherent advantage. After we took a look at the original white players, we decided that they were doing very poorly, and once we had this suspicion, we tested several simple heuristics applied to both colors, as our version of the OthelloCache class allowed us to do.

When we ran a white player that simply counted white pieces against a black player that counted black pieces, both sides playing at random for the first two moves, we got:

White won 38; black won 12
Total White Pcs: 1691; total black pcs: 1503

And we repeated the experiment.

White won 40; black won 10
Total White Pcs: 1812; total black pcs: 1386

At this stage, we began training a black player against a white random player.

We were encouraged when we tested the black formula evolved against the white random player, and there was an almost Edgar-like performance:

```     - / * white_hi black_corners white_corners black_corners
```

White won 1; black won 49
Total White Pcs: 347; total black pcs: 2812

But Edgar beats the Black player trained against a random white:

```- / * white_hi black_corners white_corners black_corners
```
White won 50; black won 0
Total White Pcs: 2364; total black pcs: 836

We also trained a black player against our more highly evolved white player and came up with this one formula twice, after 36 hours of evolution (just two runs)

`+ black_edge black_corners`

When we tested it against a random white player, the results were not impressive:

White won 36; black won 14
Total White Pcs: 1819; total black pcs: 1375

Against Edgar, it did much worse.

White won 50; black won 0
Total White Pcs: 2014; total black pcs: 1186

Needless to say, we ran out of time trying to explore this direction.

Edgar seems to have both a strong opening and a very strong endgame.

When we trained a black player against the original Edgar, we evolved several formulas which could all be more or less reduced to:

`black_hi.`

However, this player had no luck against Edgar, when the opening moves were varied, and he failed to win any of the 50 games in the trial.

White won 50; black won 0
Total White Pcs: 2211; total black pcs: 989

But after our expanded code was completed, and we tested our player against Edgar playing the first 10 moves at Random, we had a glimmer of success:

```black_hi
```

White won 45; black won 5
Total White Pcs: 2052; total black pcs: 1059

Still it must be noted that two of the games won by our player seem to have been won in the very early stages by bottling up one corner of the board, and shutting out Edgar with scores like 25 to 0.

# Conclusions:

This is the frustrating part of the assignment because we have no shining success to examine.

Most of our observations tend involve thinking about the world of Othello and how one could steer a learning program of any kind toward a successful conclusion.

The thrust of Genetic Programming is to discover methods that are domain independent, but the problems and shortcomings we found concern the domain of board games in general and Othello in particular.

For our purposes here, we can see that in many ways the experiment depended too much on world knowledge (the world of Othello) and yet recognized too little of the game.

For example, the fitness measure, which is a count of the opponent's pieces makes an assumption that that is a key element of the game. Oddly, while the count of the pieces at the end of the game determines the winner, it may be irrelevant at any one stage of the game. If your player captures an entire row to pick up a gain of 6 or 7 pieces and lets a corner slip away, your player is on the road to defeat.

The terminals we chose were an effort to capture just that notion, or rather to allow the GP to capture it for us. The idea was to emphasize locations that would be safe places to work toward the edges and corners, as expressed in the _hi and _lo terminals. The practice, we think, fell short in failing to provide a way to express defensive play. Looking at traces of games make it clear that Edgar emphasized corners and stayed away from near corners. We kept the terminal for the corners but instead of near corners, we provided a terminal called edge, which means squares 3 through 6 so as not to include the near corners.

The new _hi and _lo terminals did improve performance against Edgar as compared to Eleazar's baseline implementation. Using that implementation we were not able to evolve any players that beat Edgar. All of the functions we did evolve that had some form of success against Edgar included the _hi or _lo terminals for either black or white, which I feel helped them to capture a little more information about the Othello game and improve their performance against Edgar.

Against random players, the new terminals did not seem to help, as Eleazar's baseline players could play as well as our players, winning in the 45-49 game range against Random players. It is difficult to say if the addition of new terminals helped in this case because the baseline implementation was sufficient to beat Random players.

Looking back, it seems that the better way to approach this problem in GP terms would be to get rid of the complex geography and replace it with row and col terminals and if then-else statements and relational operators.

We encountered a further problem in the proper training formula. Edgar is too strong, and random is too weak. Our efforts to evolve increasing powerful species deserves some further exploration, but we were not able to develop the notion well enough.

In the future, it would be interesting to see if Genetic Programming operations would be able to evolve a workable method with more primitive primitives and whether it would be able to establish notions of parity and quadrants. But these would have to include comparison operators, as well odd and even operators, and to consider separate row and column indices rather than a higher level of abstraction obtained from using the two-dimensional index.

Last modified: Fri Dec 12 18:20:18 1997