By Araik Grigoryan ag272@columbia.edu

Dave Kalina dak40@columbia.edu

Jon Spiegel jhs49@columbia.edu

Genetic programming can be implemented to provide solutions to problems over a myriad of different domains. Regardless of the domain's implementation, genetic programming will almost always follow the same strict but basic framework. It follows that it would serve programmers well to provide the basic genetic programming skeleton in a simple and straightforward manner, so that solving problems within this paradigm can be accomplished with relative ease.

Existing systems designed to provide programmers with a framework for solving generic problems with genetic programming have generally been overly complex and difficult to work with. Two such existing systems are gpc++ and its java port, gpj++. These systems require the programmer to fit the general requirements of a genetic programming system to its own strict framework. For example, they assume that the programmer will represent the programs or functions being bred in a very specific format, so that the system can perform the genetic recombination itself. While this also is a useful tool, it is equally useful to leave certain implementation decisions such as these entirely up to the programmer, since they often depend heavily on the domain in question.

Identification of components that are necessary to fully specify a Genetic Program was the essential part of building our system. These components are parts of a program common to all GPs. They are functions that perform selection, mutatation, crossover, and copy; without them no GP can operate. Additional features of the abstraction may include methods to check for the termination condition, test fitness of an individual, print results, and others.

The above considerations lead us to construct our system such that the programmer is only concerned with the implementation of his or her specific problem and not with the way the genetic program does the search though the space of individual. That is, given a template which he or she has to follow, the programmer must only implement the essential part of a GP (mutate, crossover, etc.) by delving only in the specifics of his or her problem. Essentially the template is the programmer's guideline of what methods must be implemented. Their successful implementation will theoretically allow the programmer to run the genetic algorithm and produce the most optimal solution to his or her particular problem.

Additional templates may exist for such abstract components of a GP as an individual, with its own attributes and methods, the individual's chromosome, and others. Again, they are a product of an attempt on our part to make our system as general as possible. It must also be noted that the interface that we devised is by no means final, and can be enhanced as more general features of a GP are needed.

At the level of concrete implementation, we used Java's interfaces to design a template for each of the components already mentioned. In particular, we defined an interface Problem with such methods as crossover, mutate, select, and others. Any specific problem (e.g. a Traveling Salesman Problem) must implement that interface before it can successfully use the GP algorithm to breed populations of individuals. The same is true for the Individual and Chromosome interfaces. Finally, the Solver class - the main driver class - heavily relied on all the components of a problem, and once they are fully implemented, the problem can instantiate a Solver class and ask it to breed populations of individuals. As was already noted, it's up to the particular problem to keep track of the progress made by the GP and output the results in a desired format.

This problem was originally explored by John Koza in his paper on evolving emergent, cooperative behavior amongst independent agents. It was chosen for implementation because of its interesting visual results. Fitting this domain to the generic problem solver was relatively simple.

We have 10 ants on a 10x10 toroidal grid. There are 10 grains of sand of three different colors. The ants are only able to sense information about their immediate surroundings (location in the world, whether or not there is sand below them, etc). We want to evolve a program that when executed concurrently by all the ants, gets the ants to re-arrange the colored sand so that the leftmost three columns are filled with only sand of a pre-specified color. This toy problem demonstrates the ability of genetic programming to evolve EMERGENT and COOPERATIVE behavior.

Desert.java implements Problem

AntProgram.java implements Individual

Node.java implements Chromosome

The Desert class defines all top-level problem specific parameters. It's main execution initializes a random n x m grid (set to 10 x 10 here) to be used for testing throughout an execution of the Solver. The goal of the Painted Desert problem is to evolve a program that guides the behavior of a series of virtual ants. These ants are only privy to the information regarding their current square - this includes their x,y position on the grid and the color of the sand beneath them. They have the ability to pick up sand and drop it in another location. The goal is to run a program on a number of ants (here 10 over a 10 x 10 board) and get the ants to fill the left-most three columns of the grid with colored sand (one specific color for each column).

Following the Problem interface, initializing a population of Individuals involves the generation of n random AntPrograms (where n is 1000 in our tests). An AntProgram contains a tree that represents the program's behavior. This tree is composed of Functions and Terminals. The sets of functions and terminals are as follows:

T = { x, y, carrying, color, (move-n), (move-s), (move-w), (move-e), (move-rand), (pick-up) } F = { if-less-than-equal-to, if-less-than-zero, if-drop }See Koza's paper for in depth discussion of these program components. The Desert program then takes instantiates 10 Ants for each board (an Ant is simply an instance of AntProgram, containing the program and the ant's information about its position on the board). It runs each Ant's program in series on the grid and after some pre-specified number of time-steps, records the fitness and number of hits. Fitness is based on the final positioning of the sand relative to its desired position - 100 is a perfect fitness, and the lower fitness score the better. Every grain of sand in its appropriate column at the end of a program's execution counts as a hit.

I ran a series of 20 runs over 20 generations and got the following results:

Generation Avg. Fitness Avg. Hits 1 328.369 2.86 2 318.828 2.80 3 300.978 3.025 4 276.287 3.731 5 261.470 4.352 6 246.091 5.104 7 230.764 5.903 8 219.405 6.671 9 211.356 6.809 10 201.619 7.45 11 189.458 8.04 12 178.112 8.894 13 172.279 9.098 14 170.974 8.112 15 166.383 8.199 16 162.646 8.367 17 157.467 8.581 18 156.114 8.383 19 151.582 8.696 20 149.890 8.835

Below is one evolved (and relatively simple) solution discovered in the 13th generation:

Best of run:

Individual #397 fitness = 100.0 hits = 30 points = 31

Program: iflte ( x carrying ifdrop ( go_w go_e ) ifltz ( go_n iflte ( x
carrying ifdrop ( go_w go_e ) ifltz ( pick_up ifltz ( iflte ( go_w x ifltz
( ifdrop ( go_e pick_up ) y pick_up ) pick_up ) pick_up x ) o_w )
) pick_up ) )

Re-testing program execution on initial test board:

%== ||| \\\ ___ ___ ||| ___ ___ \\\ /// \\\ \\\ ||| /// \\\ ___ /// ___ ||| ___ \\\ %== %== ___ ___ ___ ___ \\\ ___ ___ ___ ___ ___ ___ \\\ ___ ___ ||| \\\ /// ___ /// ___ %== %== ___ ___ ___ /// ___ ___ ___ ___ /// ||| %== ___ ___ ___ ___ ___ \\\ ___ ___ ___ ___ ||| ___ ___ ___ ___ /// ___ ___ ___ ___ ___ ||| ___ ||| ___ ___ ___ ___ ___ ___ %== ___ ___ ___ ___ ___ ___ ___ ___ ___ /// %== %== /// ___ ___ ___ ||| ___ ___ ___ ___ ___

After 155 executions of ant program:

/// ||| \\\ ___ ___ ___ ___ ___ ___ ___ /// ||| \\\ ___ ___ ___ ___ ___ ___ ___ /// ||| \\\ ___ ___ ___ ___ ___ ___ ___ %== %== /// ||| \\\ ___ ___ ___ ___ ___ ___ ___ %== %== %== /// ||| \\\ ___ ___ ___ ___ ___ ___ ___ /// ||| \\\ ___ ___ ___ ___ ___ ___ ___ %== %== /// ||| \\\ ___ ___ ___ ___ ___ ___ ___ %== /// ||| \\\ ___ ___ ___ ___ ___ ___ ___ %== /// ||| \\\ ___ ___ ___ ___ ___ ___ ___ /// ||| \\\ ___ ___ ___ ___ ___ ___ ___

So in the span of twenty generations over twenty runs, only 6 correct programs were produced, although given more time, certainly more working programs would have been produced.

One aspect of this problem not explicitly dealt with by Koza was retesting. He stressed the importance of retesting successful programs to make sure the program was not evolved with the ability to only solve one board. In this implementation, once a successful board is found Programs found to be successful were re-tested on five new random boards, and given a small amount of extra time to solve the problem (double the original number of program executions). It follows that if an AntProgram is able to solve five other completely random boards, the Program should be robust enough to handle any random board. Of course, more extensive testing would be preferable.

- Koza, John R. "Evolution of Emergent Cooperative Behavior Using Genetic Programming" Computer Science Department, Stanford University.

Implementation of this problem using our generic algorithm for solving problems with genetic algorithms proceeded smoothly and intuitively. An advantage of our system made obvious by these tests is that changing parameters is as simple as changing static variables within the Desert.java code. For example, testing the problem on a 5x5 board with 12 ants over 100 generations would require only simple modifications. Furthermore, changing Koza's fitness measurement to perhaps try and find a fitness measure that leads to faster convergence on a solution can be done easily without modifying any of the main code.

The Traveling Salesman Problem (TSP) constists of finding the shortest
path between N cities. A salesman starts in one city and follows a
path to another, visiting all the cities and comes back home. In order
to preserve money and time (*alas, one is the other*), the
salesman would like to know what is the shortest path between the
cities. Many optimization problems reduce to the TSP. It essentially
poses a question of finding a Hamiltonian cycle through N nodes
(cities). An attempt to iterate through all the possible combinations
of cities fails as the number of cities increases since the search
space is on the order of N factorial.

Due to the complexity of the problem, sometimes it is impossible to find the solution exactly - we must settle for an approxmation. Thus a technique must be used to intelligently select parts of the graph based on their ability to connect a subset of cities with the shortest distance possible. In this work we will concentrate on planar symmetric (distance from i to j is same as disctance from j to i) graphs (2-D).

There are two main ways to approach the TSP: devising an intelligent heuristic to approach the optimal solution or randomly mutate individuals' chromosomes (city tours) with the use of the genetic programming, again hoping to arrive at the best-fit answer. I chose the latter approach. It allowed me to breed the individuals by selecting their most successful representatives and allowing them to procreate, thus improving the fitness of the population in he next. By allowing such process to take place for many generations, it is feasible to arrive at the near-optimal or optimal solution to the TSP.

The TSP algorithm starts by initializing a fully-connected symmetric graph of cities. The graph can be produced in two ways: either generated randomly within the program, with the number of cities depending on the specified default value (currently 12), or read from a file that is in TSPLIB format. The TSPLIB format is a well-defined file format for specifying city coordinates. The coordinates can be represented in many ways: Euclidean distance, Manhattan distance, Geographical distance, and others. The current version of the TSP program understands only Euclidean 2-dimensional distances (type EUC_2D) and Geographical distances (type GEO). By giving the program the TSPLIB file as the command-line argument, it will generate an equivalent matrix representation of the graph and proceed with the evolution. For a full description of all the types, see the official document.

After initialization, the genetic forces take over and they breed new populations. Genetic operation methods, mutate and crossover, are the main features of the TSP program. As is the case with many genetic programs, the hardest part of the problem is to perform the genetic operations while preserving the constraints of the TSP. For example, if we represent the city tour (i.e. the chromosome) as a list enumerating each city visited in order - (3,4,1,5,7,6,2,0), then each of the genetic operations must preserve the constraint which disallows duplicate cities to appear in the city tour. The case of the mutation is simple: pick two random points and swap the cities (i.e. genes). The case of the crossover is not as simple; if we randomly pick two pieces of the chromosome from two different chromosomes, then we are bound to introduce duplicates in both of the resulting chromosomes. Therefore care must be taken in performing the crossover. The method we use here is (1) randomly pick the same two cut points on both parent chromosomes; the piece of the chromosome between the two cut points remains without change, (2) in the rest of the places, copy genes from one parent to the other, while taking precaution not to copy duplicate cities into the chromosome. This crossover procedure is described in Michalewicz and is demonstrated below:

p1 = (1 2 3 | 4 5 6 7 | 8 9) p2 = (4 5 2 | 1 8 7 6 | 9 3) c1 = (x x x | 4 5 6 7 | x x) c2 = (x x x | 1 8 6 7 | x x) c1 = (2 1 8 | 4 5 6 7 | 9 3) c2 = (2 3 4 | 1 8 6 7 | 5 9)

The fitness of each individual is tested by first traversing the graph and adding all the costs on the arcs between the cities, and then inverting the cost to produce a measure of fitness. The smaller the total cost of the arcs, the greater the fitness. Note that one must be careful to complete the entire Hamiltonian cycle before the fitness is calculated.

The program must be invoked at the command prompt with the following command-line arguments:

$ java TSP population_size max_generations max_runs [file.tsp]population_size- The size of the populationmax_generations- Maximum number of generationsmax_runs- Maximum number of runsfile.tsp- Optional file (type EUC_2D or GEO) in TSPLIB format

The TSP program was successful on the test cases that contained less than 20 cities. Due to the enormous inherent computational complexity of the search, the algorithm fails to find an optimal solution for bigger graphs, although it always produces the most near-optimal solution at the time of the termination. For example, for a 10-city graph (araik10.tsp), the algorithm consistently produces correct optimal tours, which are any tour that enumerates numbers zero through nine, starting at any number, wrapping to zero when nine is reached, or wrapping to nine when zero is reached. So (3,2,1,0,9,8,7,6,5,4) is one such tour. This example was specifically contrived to test the TSP on the small data set.

Attempts were also made to run the program on the files available from the TSPLIB. The smallest such file (ulysses16.tsp) contains 16 cities. On this set, the algorithm (population_size=500, max_generations=60000, max_runs=3) failed to find the optimal solution (known in advance), but came within 1.7% of it. For the second-smallest set (ulysses22.tsp), the algorithm (population_size=500, max_generations=60000, max_runs=3) came within 11% of the optimal solution. City graphs of bigger dimensions gave worse resutls. Curious reader is encouraged to browse the TSPLIB site and look at the complete distribution of the files, many of which contain optimal answers. It is clear that the GP is doing intelligent breeding since the results are far beyond random guesses.

As a side note (perhaps an obvious one), the number of generations determined the success of the genetic reproduction. The reason for this is that the population gradualy grew more fit with each generation due to intellegent selection and crossover processes. In contrast, the number of individuals in the population made little difference in the final result.

Finally, we must remark on the success of the above-mentioned generic interface as it applied to the TSP. We used the interface as it was originally defined, and achieved good performance.

- Langton, W.B.,
__Quick Intro to____simple-gp.c__Department of Computer Science, University College, London.

- Michalewicz, Zbigniev.
__Genetic Algorithms + Data Structures = Evolution Programs__.*Third Edition*. Springer-Verlag Berlin, 1996.

- The Traveling Salesman Problem Library (TSPLIB)

The Bayesian Belief Network (BN) is an expressive tool for representing probabalistic relationships between discrete variables. For a vector of discrete variables, we say that an instantiation of these variables is a vector of values in which each element corresponds to one of the variables, taking on one of the possible values ("states") for that variable. A Joint Probability Distribution (JPD) is a function that maps points from the space of possible instantiations to a real-valued probabilities. If one is interested in the likelihood of a certain outcome for a set of related variables it is inefficient to store the entire JPD, as the domain of instantiations grows exponentially with the number of variables. A BN provides for the calculation of the probability of any point in the JPD in time linear with the number of variables it contains. In addition, the representation of a BN uses to abstract the dependencies between variables is more intuitive and convenient than searching through the JPD.

A BN consists of a directed acyclic graph (DAG) and a set of probabilities. If there exists an edge from node i to node j, we say that i is a parent of j. At every node in the DAG there is a corresponding table, denoting the conditional probability distribution (CPD) of the variable at the node, given its parents. A node that has no parents has a set of prior probabilities. In either case, these tables are used during analysis to determine the relative likelihoods of the variable taking on any one of its states, given the rest of the BN. In such a fashion, the probability P(X|Y) may be computed for any two disjoint sets of variables, X and Y. For more on BNs, see (Jensen 1996).

As useful as the Bayesian Network is, traditionally one is constructed by an expert in the particular field of application, and then used to predict outcomes of future events. The accuracy of the BN thus becomes subject to human error. Ideally we could pass a database D to a computer program which would output the optimal BN for the underlying data generating process. The central difficulty in designing such an algorithm is finding the optimal network structure. Given a structure, determining the optimal probabilites is somewhat trivial.

Currently metrics exist for rating the structure of a BN given a database. Two of the traditional methods are the Bayesian Method (Cooper 1992) and method inspired by the Minimal Description Description Length (MDL) Principle, which is more efficient to compute (Lam 1998). The critical concern, thus, is spanning through the search space in order to find the optimal solution. Given a set of N variables, the number of possible structures connecting them is combinatorical in size, so a naive search would be futile. Coopers and Herskovits presented a K2, a greedy algorithm using the Bayesian metric, in 1992 (Cooper 1992). Unfortunately, K2 requires an a-priori ordering of the variables, such that no variable can be a parent of its successor. Lam introduced an algorithm for refining networks, using MDL, which takes a prior network as input (Lam 1998). In 1999 Wong developed an approach called MDLEP, incorporating both evolutionary programming and the MDL principle (Wong 1999). While the evolutionary programming guides the search through the space of network structures, the MDL metric rates the networks along the way, providing an intriguing blend of computer science methods.

Into our software system we have built an interface with a program inspired by MDLEP. Thus, we have implemented a tool to induct BNs from a database, utilizing both evolutionary programming and the MDL principle. This approach suffers no prior constraints, except for knowing the number of variables in the database and the number of possible states each one may take. The MDL principle has come out of information theory, developed by J. Rissanen in 1989. Applied to BNs, the concept simply states that the best BN is ultimately the one that is most expressive with regard to the data. More technically, the structure of a BN is ultimately information, and may be encoded as string of bits. Thus, the network itself has a description length (DL). This description length is comprised of the number of edges between nodes, and the number of possible states that connected nodes may take on. We will call this the network description length (NDL). However, this is not enough. Minimizing the NDL alone would just result in the thinnest network possible. We must also consider the description length of the database, given the network. The network structure contains some information, no matter how abstract, about the database. Obviously it does not contain all of the data in the database. Thus, we may ask how much additional information is needed (in bits) to restore the original database, given a BN structure. We may call this the Data Description Length (DDL). The DDL is based on the mutual information contained between variables and their parents. Ultimately we want to find the network structure that minimizes the sum of the NDL and the DDL. We are essentially looking for the structure that expresses the most information in the most concise manner. It is important to note that both the NDL and DDL may be decomposed as the sum of the NDL and the DDL over all nodes.

Before the search begins, the algorithm computes and stores the DL for each potential edge in the network. Then it generates an initial population consisting of randomly generated BN individuals, with all cycles removed. For each successive population, every individual is cloned and copied into an intermediate population. Then each individual undergoes a sequence of 1, 2, 3, 4, 5, or 6 mutations, and is again copied into the intermediate population, which is now twice the size of the current population. These mutations may each take on one of four types:

1) Simple Mutation: A node is randomly added or removed from the
network.

2) Reversion: The direction of a random edge is
reversed.

3) Move: A parent is randomly deleted from a node, and
a new parent is added.

4) Knowledge-Guided Mutation: A random
decision is made to add an edge or delete an edge. If it is a
deletion, the existing edge with the largest DL (as previously
computed) is deleted. Otherwise the non-existing node with the
smallest DL is added.

After the individual mutates, cycles are removed by randomly deleting edges involved in cycles. There is no crossover in MDLEP. Finally, the individuals in the intermediate population "compete" with respect to MDL, and the top half scorers move on to the next population.

We started by testing MDLEP on a data generated by a network of three nodes, connected by two arcs, with no node having more than one parent. In all cases MDLEP converged on an individual during the first population. Interestingly, however, the prior probabilities of the node with no parents effected the outcome. When the variable with no parents had a relatively uniform distribution between its states, MDLEP classified the correct structure. However, when the prior probabilities were weighted heavily, it drew one arc in a wrong direction. Of course, when the effected variables had uniform conditional distributions given their parents, MDLEP could not find the correct structure. Intuitively, this makes sense, because the MDL metric looks for frequent instantiations, which are much more likely when variables have weighted probabilities. When two correlated variables have similar weights, however, it becomes harder for the algorithm to decide which one is the "cause" and which is the "effect."

For a graph with five nodes, in which four outer nodes are connected through a central node, the number of records in the database was important. For 1000, the resulting network had two wrong edges, but 5000 training instances resulted in the correct structure after 34 generations of 100 individuals each. For the largest graph we tested on, with eight, multiply connected nodes, the algorithm missed by two edges: it added two relationships that were not in the connected network. This was significantly better than the results over 1000 test instances, so perhaps even more would enable us to do better.

- Cooper, C.F. Herskivits, E. "A Bayesian Method for the Induction of Probabalistic Networks from Data." Machine Learning 9: 309-347 (1992).
- Jensen, F.V. An Introduction to Bayesian Networks. UCL Press, 1996.
- Lam, W. "Bayesian Network Refinement Via Machine Learning Approach." IEEE Transactions on Pattern Analysis and Machine Intelligence. 20: (3) March, 1998.
- Wong, M.L. Lam, W. Leung, K. "Using Evolutionary Programming and Minimum Description Length Principle for Data Mining of Bayesian Networks." IEEE Transactions on Pattern Analysis and Machine Intelligence. 21: (2) Feb, 1999.

We set out to define a generic solver that could be used by a genetic program regardless of its specific implementation. The goal was achieved by defining templates (Java interfaces) that serve as guidelines for the programmer and use the generic solver and the main driving algorithm to breed populations of individuals. Two out of three programs implemented using the templates successfully accomplished their tasks. The third required minor changes within the problem implementation to bypass the constraints set by the driver program. Although additional implemented problems will determine the success of our generic solver, it still proved to be useful for the implementation of simple GPs.