HW4: Mitosis and Genetic Programming Population Voting

by Matt Bogosian, November 11, 1997

Index

Introduction

Genetic programming and genetic algorithms have historically been used in efforts to create "good" individual hypotheses to solve real-world problems. Using genetics to create groups of hypotheses which work well together is an approach often used in metalearning.

One good example of this application is using genetic programming to create neural networks (where groups of nodes work together to create a single hypothesis). However, nets are complex and time-consuming in their evaluation and require great deals of data to train them once they are created.

One might like to use genetic programming to create a group of individual hypotheses that work together to form a single hypothesis without the complexity and time necessary for creating neural networks. One way of doing this is GP population voting. In this approach, GP populations are used as teams working towards the goal of a good hypothesis. Each member of a population casts a vote during hypothesis evalution. The votes are tallied and the most common result is used as the result. The approach works well for problems for which there are large populations relative to the number of possible outcomes from hypothesis evalution.

The Problem

The theory of Majority Rules does not necessarily insure the best outcome (see Hobbes for details). In order to increase the population's combined effectiveness in voting we must first have a method of evaluating the effectiveness for a given population, then we must provide a means of improving that effectiveness.

One method is to use standard genetic programming evolution in the hopes that somehow after many generations the population will have developed enough strong individuals to vote well together. The evaluation is based on the individual members of a population, and the means for improving effectiveness is evolution. However, as life demonstrates, a team of strong individuals does not necessarily imply a strong team, and we are right back where we started.

A Proposed Solution

In efforts to insure not only a good players, but a good team, I propose another method of improving the combined effectiveness of a population is what I am tentatively calling Mitosis (the metaphor is flawed and incomplete, but it has enough similarity to justify the name). Here, we are trying to encourage populations which work well together, not just whose members work well individually.

GP Population Mitosis

Mitosis is performed on populations of individuals of size 2n. The performance of a population must be measurable by successes and failures. Here, "success" and "failure" are loosely defined. They may be exact requirements, thresholds or any number of things. This is determined by the problem representation.

There need not necessarily be a 1-to-1 mapping of elections to sucesses or failures. In other words, one may have to vote several times before arriving at a success or failure (e.g., Othello: voting occurs at every move, where as success or failure is determined at the end of the game).

The Basic Mitosis Algorithm

Like biological mitosis, the Mitosis Algorithm has four parts roughly coinciding with the four phases of mitosis. They are prophase, metaphase, anaphase and telophase.

Prophase

The population is duplicated. The two populations are the parent population and the child population.

Metaphase

The child population is partitioned into two randomly-selected subpopulations of n (see Genetic Population Mitosis above for the meaning of n).

Anaphase

The two partitions are split into two populations. The populations are then evaluated individually for a number of rounds (usually roughly from 10 to 20). The one with the most successes is considered the winner or winning half, the one with the least successes is considered the loser or losing half.

In the event of a tie, the partitions are rejoined and Mitosis begins again at metaphase. The notion of a tie is not limited to the exact number of successes. There may be a threshold or percentage of successes which determines a tie; if, after 20 evaluations, partition 1 has 13 successes, and partition 2 has 12, this may not be enough of a difference to justify declaring a winner and loser. One may choose to return to metaphase at this point.

Telophase

The child population is then somehow modified (see Mitosis Variants below) in efforts to increase its effectiveness over its parent. Optionally, one may evaluate the performance of the child versus the performance of its parent (much in the same way that the two halves of the child were evaluated). This may be useful in evaluating the performance of the Mitosis algorithm in general.

Mitosis Variants

The following variants fill in the modification occuring in telophase (see Telophase above).

Mitosis I (Mate the Loser)

This version of Mitosis is loosely based on the philosophy that, "If it ain't broke, don't fix it," (or as George Forman once said, "If it ain't fixed, it ain't broke"). In this version, the winning half is retained without modification for the next generation. An even number (ranging from 0 to n; see Genetic Population Mitosis above for the meaning of n) of randomly selected individuals from the losing half are crossed over with each other. Note: if 0 losers are mated, the next generation will be exactly like its parent. Here, we are trying to "fix" the "broken" individuals of the population by mating them.

Mitosis II (Mutate the Loser)

If Mitosis I follows the, "If it ain't broke, don't fix it," philosophy, Mitosis II follows the, "If it ain't broke, don't replace it," method of thinking. Here, the winning half is again retained without modification. A number (again, ranging from 0 to n; see Genetic Population Mitosis above for the meaning of n) of randomly selected individuals from the losing half are mutated. Note: if 0 losers are mutated, the next generation will be exactly like its parent. Here, we are trying to "replace" the "broken" individuals.

Mitosis III (General Mate and Mutate the Loser)

Mitosis III is a general hybrid of Mitosis I and Mitosis II. Again, the winning half is kept unmodified for the next generation. The losing half is partitioned into two groups of k and l individuals (where k + l = n; see Genetic Population Mitosis above for the meaning of n). An even number (ranging from 0 to k) of randomly selected individuals from the first partition are mated with each other. A possibly different number (ranging from 0 to l) of randomly selected individuals from the second partition are mutated.

Note that Mitosis I and Mitosis II are really special cases of Mitosis III. Mitosis I has k = n and l = 0. Mitosis II has k = 0 and l = n. Further variants of Mitosis III may include a gradient transition from Mitosis II to Mitosis I (or vice versa), as well as the number of individuals modified or replaced in each losing half's partition. This follows the philosophy that it is better to replace early on and refine later. There is much potential for dynamic adjustment and expirimentation with Mitosis III.

Mitosis IV (Duplicate and Mutate the Winner)

This version of Mitosis should converge on a more homogenous population rather quickly. Here the losing half is discarded. The winning half is duplicated, and a number (ranging from 0 to 2n; see Genetic Population Mitosis above for the meaning of n) of the duplicates are mutated. I'm not sure of the contexts where it would be appropriate to use this version, though experimentation should indicate its usefulness in increasing the effectiveness of a population. This will not be my focus for this project.

Performance Evaluation

Because this approach is based not in theory and logic, but in common sense and observation of the real world, a good deal of data collection is required to validate its effectiveness. Specifically, we will be testing Mitosis as it is applied to the Othello problem.

Mitosis is complex in that it has many parameters controlling its progression. These parameters must be tested to find good values. The first problem we encounter is population size and its role in determining the winner and loser in anaphase. We want our population to be large enough to sufficiently cover the hypothesis space, but small enough to allow for a clear winner and loser. We want our threshold of successes to be useful in determining a winner and loser.

For example, during anaphase for two populations of size 250, after 20 iterations, we find that group 1 has 12 successes and group 2 has 13 successes. Are they sufficiently different to call group 2 the winner and group 1 the loser? As population sizes grow, the probability of partition the population into groups which are sufficiently different gets smaller.

The second problem we face is what values to give to k and l in Mitosis III, and when to vary them. Also, we must choose good candidates for the number of individuals selected for modification in k and l.

We must first collect data on population size and winner/loser thresholds. To do this we will generate random populations the size of which will vary from 50 to 500. For each population, we will run Mitosis stopping after anaphase, and keep track of the scores of successes and failes of each child partition.

Once good values are found for population size and winner/loser threshold, we will continue evaluating Mitosis for varying values of k and l.

Mitosis with a Ladder Tournament

Although this approach remains (as of yet) unimplemented, one method of evolution among populations or teams is something we've all been exposed to in high school physical education: the ladder tournament. Here, the idea is to have many populations compete against each other. Each population goes through a training phase, in which the population is refined using Mitosis. Then each population is entered into the tournament. The "best" population will rise to the top of the ladder.

Results

For the first set of runs, Anaphase was evaluated after 20 rounds, and the threshold for the winner was at least two more successes than the loser. After each 25 iterations of Mitosis, the resulting population was pit against a random player (playing white) for fifty games and the results were recorded (see Table 1 below for the first run results and Table 2 below for the second run results).

Table 1: Mitosis Iterations and Performance from Run 1
Number of Mitosis Iterations Games Won By Mitosis I vs. Random Player Games Won By Mitosis II vs. Random Player
0 52% 48%
25 58% 56%
50 66% 56%
75 70% 54%
100 68% 52%

For the first run we see that Mitosis II produces only slightly better results than the random player, where Mitosis I seems to be better at creating strong populations (see Table 1 above for details).

Table 2: Mitosis Iterations and Performance from Run 2
Number of Mitosis Iterations Games Won By Mitosis I vs. Random Player Games Won By Mitosis II vs. Random Player
0 48% 56%
25 58% 58%
50 66% 48%
75 64% 56%
100 64% 52%

For the second run we see that Mitosis II continues to produce only slightly better results than the random player, and Mitosis I seems to behave similarly to the first run, where the performance is slightly less after 100 iterations than it is after 75 (see Table 2 above for details).

For the second set of runs, Anaphase was evaluated similarly to the first set. Again, after each 25 iterations of Mitosis, the resulting population was pit against a random player (playing white) for fifty games and the results were recorded (see Table 3 below for the third run results and Table 4 below for the fourth run results). Here, however, Mitosis III was used, where k and l were treated as follows:

k = n - l
l = iteration * n / 100

So we see that here, there is a simple linear gradient shift between Mitosis II and Mitosis I as the number of iterations increases.

Table 3: Mitosis Iterations and Performance from Run 3
Number of Mitosis Iterations Games Won By Mitosis III vs. Random Player
0 48%
25 60%
50 68%
75 74%
100 76%

For the third run we see that Mitosis III does better in finding and maintaining a population better than a random player (see Table 3 above for details).

Table 4: Mitosis Iterations and Performance from Run 4
Number of Mitosis Iterations Games Won By Mitosis III vs. Random Player
0 54%
25 62%
50 64%
75 74%
100 74%

Our observation in the third run is confirmed by the fourth (see Table 4 above for details).

Conclusion

As we can see, the three Mitosis variants produce slightly better players than a random one. Mitosis III with a linear gradient shift of the values of k and l seemed to do the best for reasons which make sense: it's better to replace early and refine later. Further experimentation is required, but this presents a good (if not slightly disappointing) introduction to the performance of this method of population refinement. Perhaps the addition of a ladder tournament and more iterations are required for better performance.