Adaptability of Genetic Programming Hypotheses |
|
|
Motivation |
The motivation behind this project is the search for the
meta-trait of adaptability. In regular populations, we see the population
slowly converging towards a relative notion of "fitness". For example, in
a population where all the food is 6 feet above the ground, the
individuals who are six feet tall or have the ability to jump are
considered the "most fit". In modern America, the notion of "fitness"
loosely based the ability to generate capital. In a situation like
America, traits such as strength and physical robustness aren't critical
components the fitness metric but rather traits such as intelligence and
economic understanding emerge as key fitness critera.
Populations slowly converge towards these relative notions of
fitness until a major change in the environment like a physical
catastrophe completely changes what makes an individual fit. For example,
imagine if a cataclysmic event destroyed all technology and set human
civilization back to a medieval living style. In this situation, the
investment bankers, computer scientists and lawyers would no longer be
"fit" and probably the homeless, because of their survival skills would
emerge as "the fittest".
The point of this argument is not to make a social argument, but
rather to demonstrate that after many generations have been subjected to
catastrophic events and the fitness metric constantly changed, there is an
emergent quality in the remaining members of the population: adaptability
and robustness.
Each of the genotypes in the remaining population was able to
adapt each new "fitness metric" and emerge as a species capable of
adapting to new situations. The inductive generalization is that if an
individual is fit under a series of different fitness measures, the
individual should also be fit in other situations and hopefully posses
'adaptability'.
In Othello, we see the convergence towards a "local relative
fitness metric" in a hypothesis' ability to adapt to an individual player's
strategy but fail to generalize over many players. Lacking many players
to train with, we decided to vary the fitness metric and change it from
generation to generation (in a cyclical manner) in hope that our player
would display the emergent property of "adaptability" in general and
hopefully, be able to beat a GP player trained with the regular genetic
algorithm.
|
|
Approach |
Fitness Metrics
We developed six different versions of what it meant
to be a "good othello player." Each metric was created to
value different features of an Othello players playing style
Note: In the actual implementation, the fitness
metrics are reversed to 0
- Standard Fitness Metric
The first metric used is the standard Othello fitness metric, i.e
points. This metric is used as a baseline which should corrolate to
general fitness.
- Speedy Games Favored
This metric favors "fast" games. The metric is based around
dividing the score by the number of moves in a game. A player who wins a
game very quickly is valued much higher than a player who plays a very
slow defensive game. The metric hopefully encourges an offensive strategy.
- Winning Streaks favored.
This metric is based around "squaring the number of games won."
Since indivual players are evaluated over five games, this "winning
streak" fitness metric values a player who wins 3/5 games disporionatly
more than a player that wins 2/5 games. The hope behind this fitness
metric is that it will be easier for highly fit individuals to differeniate
themselves in the population.
- Slow Games Favored
In this situation, the players score is multiplied by the number
of moves in the game. Opposite of #2, this metric values slow defensive
games which take longer and hopefully encourges "defensive playing."
- Openness of the Game Space
The metric is based on the observation that there might be a
correlation between the "number of moves you can make any point in the
game" and "how fit you are as an Othello player". This method sums the
possible number of moves for each player over the game (summed at each
round) and then derives a fitness metric by dividing the two. For
example, if over the whole game, white had a total of 30 possible moves
and black had a total of ten possible moves
- Straight Linear
This straight forward metric is a linear sum of the number of games
won (or lost, depending on how you look at it)
Distribution and Normalization
The main problem with having so many different fitness metrics is
that after each time you change the metric (i.e once per generation), you
have to reevaluate the entire population. Given the extreme slowness of
the regular training process, this situation of constant reevaluation isn
The alternative to constant reevaluation is to normalize all fitness
metrics into a single range. Each metric produces a different range for
values. For example, metric #1 produces fitnesses from 0-~300 and #6
produces fitnesses from 0-5, etc. In other to normalize each measure, we
had to gather a priori information about each fitness measure. In order
to do this, we ran the GA for each fitness measure and determined the
possible range of values for that measure. Then, during the real run, we
used this previous collected data to normalize each fitness into a value
between 0 and 1.
The assumption behind this process is that the distribution of
values between min and max for each fitness measure is the same. This is
not nessecerily a valid assumption but definitely a useful one for
implementing this experiment.
|
|
Testing |
The goal of our testing was to discover if a more diverse
hypothesis obtained from training to multiple fitness
attributes was more successful in training against the two
players that we have, Edgar and the Random Player. In order
to show this, we trained four different Othello players:
[Standard genetic program evaluates fitness
based on the number of pieces the opponent possesses, where
the Modified Code varies between 4 different fitness
evaluation routines.]
- Standard Genetic Programming against Random Player
- Standard Genetic Programming against Edgar
- Modified code against Random Player.
- Modified Code against Edgar
The goal of our test was not to evaluate the different
fitness evaluation mechanisms but rather to evaluate the
effects of them together in a changing
environment. Therefore, we deliberately did not test on the
individual fitness evaluation routines.
For control values against which we can compare, we
trained the initial code to play against the random player
and edgar. We trained with a population of 100 and 100
generations. We kept the Termination fitness value (the
point at which the run would print out the best tree) as low
as possible so we could get the most generational results.
We did this because our modifications depend on the
existence and development of many generations and we thought
that it would compare better to the results of the standard
code if we were to use a comparable number of generations.
Training the data took a lot of time. Due to the extra time
spent running the genetic programs prior to the actual
testing (see the information on normalization), and the
extra computational power required by the modified
algorithm, we were unable to train to as many generations
with the modified code.
We try to compensate for this by changing the function by
which we evaluate fitness every generation so that each
generation is forced to adapt much more quickly. However,
more time would have been better because our algorithm would
theoretically take longer to arrive at any sort of
equilibrium.
|
|
Results |
Our first sets of results represent our control
experiments that simply trained the code as written to
have something to compare to. Then we show how our newly
formed hypotheses performed in the same situations.
The results are separated out by different hypotheses.
We chose to test on 3 hypotheses from each set of training
data, taking one hypothesis from different points in the
runs (different generation numbers). There were three
trials, each playing 50 games. Because Edgar plays the same
game over every hypothesis, his scores are either all or
nothing and so as a point of comparison I include the margin
that describes the difference between the winner's score and
the loser's score.
|
Regular GP vs. Random Player |
These are the results from a handful of hypotheses that were
derived when the Othello player was trained against a random
player, as in the standard code. Notice the poor
performance when played against Edgar.
|
Hypothesis: white
|
[hypothesis is white player] |
Trial |
White Wins |
Black Wins |
1 | 40 | 10 |
2 | 43 | 7 |
3 | 42 | 8 |
Same hypothesis against Edgar
(Reverse colors so hypothesis is now a black player) |
Trial |
White Wins |
Black Wins |
1 | 50 | 0 |
2 | 50 | 0 |
3 | 50 | 0 |
Margin: 34 |
|
Hypothesis is ( + white white_corners ) |
[hypothesis is white player] |
Trial |
White Wins |
Black Wins |
1 | 43 | 7 |
2 | 41 | 9 |
3 | 45 | 5 |
Same hypothesis against Edgar
(Reverse colors so hypothesis is now a black player) |
Trial |
White Wins |
Black Wins |
1 | 50 | 0 |
2 | 50 | 0 |
3 | 50 | 0 |
Margin: 20 |
|
Hypothesis: ( - ( - ( - ( - ( - ( - ( / 10 black ) black ) black ) black ) black ) black ) black ) |
[hypothesis is white player] |
Trial |
White Wins |
Black Wins |
1 | 38 | 12 |
2 | 42 | 8 |
3 | 40 | 10 |
Same hypothesis against Edgar
(Reverse colors so hypothesis is now a black player) |
Trial |
White Wins |
Black Wins |
1 | 50 | 0 |
2 | 50 | 0 |
3 | 50 | 0 |
Margin: 34 |
|
|
Regular GP vs. Edgar |
The first set of results comes from earlier
populations around generation 30 or so. The second set of
data is from a later generation and the final set is
from the last generation generated before we stopped the program.
|
Hypothesis: ( / black_near_corners ( / ( / black black_edges ) ( - white 10 ))) |
[hypothesis is a black player] |
Trial | White wins | Black wins |
1 | 50 | 0 |
2 | 50 | 0 |
3 | 50 | 0 |
Margin: 18 pts |
Played against a random player |
Trial | White wins | Black wins |
1 | 3 | 47 |
2 | 8 | 42 |
3 | 1 | 49 |
|
Hypothesis: ( + ( * ( -
white_corners 10 ) ( + ( - white_corners 10 ) ( /
black_edges black_edges ))) black_corners ) |
[hypothesis is a black player] |
Trial | White wins | Black wins |
1 | 0 | 50 |
2 | 0 | 50 |
3 | 0 | 50 |
Margin: 32 |
Played against a random player |
Trial | White wins | Black wins |
1 | 1 | 49 |
2 | 2 | 48 |
3 | 5 | 45 |
|
Hypothesis: ( * ( + ( - ( / black_edges black_edges ) 10 ) black_corners ) ( / black_edges black_edges ))
|
[hypothesis is a black player] |
Trial | White wins | Black wins |
1 | 0 | 50 |
2 | 0 | 50 |
3 | 0 | 50 |
Margin: 32 |
Played against a random player |
Trial | White wins | Black wins |
1 | 2 | 48 |
2 | 2 | 48 |
3 | 0 | 50 |
|
|
Modified GP vs. Random Player |
The following data is the results from
running our data trained on adaptivity against a Randon
Player. The
first hypothesis is taken from the end of our first
trial, which only lasted 30 generations. The second hypothesis
is taken from the second trial, right before the 30th
generation. The third hypothesis is from the end of the
second trial, which ended after 80 generations.
All hypotheses were also pitted against edgar to
measure their true adaptivity
|
Hypothesis: ( + ( + ( + ( /
white_corners black ) ( * ( - 10 black_edges )
( + black_corners white ))) ( / 10
black_edges )) ( + ( * ( * 10 ( - ( * white
white ) ( * ( * black_edges white_edges ) white
))) ( * ( - white black_edges ) ( - ( +
white_edges black_corners ) ( * ( / black_corners
white_near_corners ) ( - 10 black_edges ))))) ( /
10 black_edges ))) |
[hypothesis is a black player] |
Trial | White wins | Black wins |
1 | 10 | 40 |
2 | 13 | 37 |
3 | 8 | 42 |
Results against Edgar |
Trial | White wins | Black wins |
1 | 0 | 50 |
2 | 0 | 50 |
3 | 0 | 50 |
Margin: 32 |
|
Hypothesis: ( / white ( *
black_edges ( - ( / black black_corners ) ( /
( - white_near_corners white_corners ) ( +
white_corners 10 ))))) |
[hypothesis is a black player] |
Trial | White wins | Black wins |
1 | 30 | 20 |
2 | 34 | 16 |
3 | 31 | 19 |
Results against Edgar |
1 | 0 | 50 |
2 | 0 | 50 |
3 | 0 | 50 |
Margin: 34 |
|
Hypothesis: ( / white ( - ( /
white ( / ( / white ( * black_edges ( + ( /
black_corners ( + black_near_corners black )) (
+ white_edges black_near_corners )))) black )) ( +
( - white_near_corners white_corners ) ( -
white_corners black )))) |
[hypothesis is a black player] |
Trial | White wins | Black wins |
1 | 41 | 9 |
2 | 37 | 13 |
3 | 39 | 11 |
Results against Edgar |
Trial | White wins | Black wins |
1 | 0 | 50 |
2 | 0 | 50 |
3 | 0 | 50 |
Margin: 16 |
|
|
Modified GP vs. Edgar |
These hypotheses were trained using the
modified training method against Edgar specifically. The
first hypothesis is from early in the trial (15th), the second is
from about halfway through (30th) and the third hypothesis
comes from the end of the trial. |
Hypothesis: ( + ( * white_edges
white_near_corners ) ( - 10 white_corners )) |
[hypothesis is a black player] |
Trial | White wins | Black wins |
1 | 0 | 50 |
2 | 0 | 50 |
3 | 0 | 50 |
Margin: 4 |
Results Against Random
Player |
Trial | White wins | Black wins |
1 | 41 | 9 |
2 | 44 | 6 |
3 | 45 | 5 |
|
Hypothesis: ( / white_near_corners
black ) |
[hypothesis is a black player] |
Trial | White wins | Black wins |
1 | 0 | 50 |
2 | 0 | 50 |
3 | 0 | 50 |
Margin: 6 |
Results Against Random
Player |
Trial | White wins | Black wins |
1 | 0 | 50 |
2 | 2 | 48 |
3 | 0 | 50 |
|
Hypothesis: ( * ( +
black_near_corners white_edges ) ( *
white_near_corners black_near_corners )) |
[hypothesis is a black player] |
Trial | White wins | Black wins |
1 | 50 | 0 |
2 | 50 | 0 |
3 | 50 | 0 |
Margin: 4 |
Results Against Random
Player |
Trial | White wins | Black wins |
1 | 16 | 34 |
2 | 13 | 37 |
3 | 16 | 34 |
|
|
|
Analysis |
These tests showed very interesting results
overall. We will discuss them in terms of their
different training methods.
Training against a Random Player
Training against a Random player in the original
code proves to be successful in beating another random
player, though when the hypothesis is reversed to play
against Edgar, it fails miserably. This is due to the
level of fitness to which the algorithm is held. The
random player is relatively easy to beat and as a
results the hypotheses never become too complex. Edgar
is more difficult to beat.
Training against a Random player using our modified
fitness evaluation mechanism yielded different
results. First of all, when pitted against another
random player, our hypotheses did not stand up as well,
with only one hypothesis being able to withstand
playing against another random player. Confusingly
enough, however, these algorithms we able to beat edgar
by fairly large margins. The algorithm that performed
the worse against the random player beat Edgar by the
smallest margin.
Explaining these results is more difficult, however
I think that the random players' poor performance can
be attributed to the significant difference in training
time. The random player in the original code was
trained for the longest in our experiments and thus had
full oppurtunity to develop against solely random
players, hence its improved performance against random
players. However its training became so specific that
it was unable to beat Edgar. Our modified player
trained against a Random player and due to the many
different evaluations of fitness, it was able to
produce a more adaptable algorithm that could easily
adapt to play Edgar.
Training Against Edgar Himself
When we trained the original code against Edgar, our
results were fairly successful. Although one
hypothesis failed against Edgar, this hypothesis was
developed first and thus had not evolved as much. All
hypothesis performed well against the random player,
probably because it is easier to beat.
When we trained our modified code against Edgar, we
got similar results, except this time the only
hypothesis that failed was the final one. We trained
our modified code against Edgar less than we trained
the original code. Notice that the hypotheses trained
to be adaptable won by a much smaller margin than other
hypotheses against Edgar. Even our losing hypothesis
lost by smaller values. This could be attributed our
focus on adaptablity across a multiple of fitness
attributes instead of just focusing on one. We cover
more ground in terms of a variety of players instead of
just trying to get the maximum gain over one specific
player. When these hypotheses were played against a
Random Player, the results were good, although one
hypothesis that beat Edgar lost to the random player.
Overall
In the end, our results indicate that training
against Edgar with the original code will yield the
better player overall. In this set up, only one set of
trials was lost and it ws early on in the generational
developmment.
The most impressive results come from training against a
less skilled player such as the random player and then
pitting the resulting hypothesis against Edgar. This is
where we proved to be the most successful.
Our experimentation could have been improved if we
would have tested our 'adaptable' hypotheses against
many different
players to see how they each perform. With only two
players to test against, it is difficult to show
exactly how well
something can perform against a variety of situations.
We believe that testing the same
hypotheses against a variety of different players (not
just Edgar and random) will show its superiority in the end.
|
|
Conclusion |
Our experiment, although it has not yielded incredibly
successful results, shows how changing the way in which the
fitness of a population is calculated can change the
resulting hypothesis. In our experiments training against a
skilled player such as Edgar, it proved more useful to train
with one fitness measurement. But in our experiments
training against a random player, this modification
yielded a more diverse hypothesis that could beat Edgar
more easily. Such a process could
prove useful in real life applications where the most
difficult problem is not available for training, and so by
training against an easier problem (in this case the random
player) and modifying the fitness evaluation you can produce
a more powerful hypothesis.
|
|
|