FINAL PROJECT WRITE UP
One shortcoming of neural networks is their tendency to prematurely converge on sub-optimal solutions. Correcting this shortcoming usually involves increased computational cost and human interference. Ideally, we would like to have a neural net agent that converges on local minima in an error surface while exploring alternative hypotheses. Genetic programming can provide an equivalent agent by evolving a diverse population of hypotheses which, independent of one another, converge on local minima.
Neural nets are ideal candidates for genetic programming, because they can easily be represented in a manner conducive to genetic operators. For example, it is trivial to perform a one-point crossover operation on two neural nets by exchanging their hidden layer neurons. Testing their fitness or generating random mutations is similarly simple.
We will introduce a method for genetic evolution of neural nets that, in theory, fosters diversity in the population and avoids the premature convergence found in both neural nets and conventional genetic methods. This is accomplished by evolving not only a population of neural networks, but also the neurons which comprise the networks. In practice, this duel-level process introduces greater randomness and exploration in the population without hindering the speed of convergence or reducing the accuracy of the best networks.
The application of this method to problems was limited to simple nonlinear functions. Although we did not apply the method to a specific domain, and were unable to compare it against conventional learning in neural networks, our results indicated that the method had some merit.
The key to evolving the neural networks was a suitable representation. As the input and output layer for the networks were identical from network to network (for a given problem), by limiting ourselves to a single-layer feed-forward topology, we needed only to consider the hidden layer of each network. Since each network was required to have the same number of hidden nodes and each neuron the same number of total connections to other layers, the representation problem became trivial.
The original idea was to evolve only the neural nets, each of which would contain representations of its neurons. Once we decided to try evolving the neurons separately, the nets were modified to contain a list of its hidden layer neurons. Each neuron contains a list of weights, each of which can be connected to any input or output neuron (which, of course, are actually weighted sums, not neurons, in our representation). The link is encoded as an integer; if its value lies in the upper half of the range of an integer, it is connected to the output layer, otherwise to the input layer; the index of the neuron to which it is connected is equal to its integer value mod the number of neurons in the target layer. The weights are stored as floating point values. In this way, the links and weights can be gently perturbed by mutation.
The reproduction of the top networks and neurons is performed, in a simplified manner, according to the classic genetic algorithm. The networks are tested several times on the given problem; fitness is assessed according to average error. Each of the neurons in the tested network is assigned equal credit or blame for the network’s overall performance. The network and neuron populations are ranked by their fitness, and a certain number are chosen to breed. To make room for new offspring, members of each population are randomly chosen to be replaced. Members in the bottom 25% of the population will never reproduce, and none in the top 25% will be chosen for replacement. Higher rank means greater chance for reproduction, and lower rank means greater chance for replacement.
Offspring are created using a one-point crossover. If two mating networks have N neurons each, they will produce two children: one will inherit neurons 1 through M from the first parent and neurons M+1 through N of the second, and the other will inherit neurons 1 through M from the second parent and neurons M+1 through N of the first (where M is a randomly chosen number between 2 and N-1). The offspring of neurons are similarly generated, with links to the input and output neurons substituted for neurons.
Finally, all offspring have a small chance of mutation. Mutated neurons are regenerated with random weights and links. (The weights of every neuron, whether or not it is mutated, are perturbed very slightly to introduce randomness.) Networks have a small, independent chance for each of their neurons that a different, randomly chosen neuron will be used instead. There is also a greater chance that the network will replace any number of its neurons with those neurons’ children, if they had any this generation.
The possibility exists that any of a network’s neurons will be eliminated from the population, replaced by the children of fitter neurons--creating a sort of nepotistic influence in the population. When this occurs, the network will use a randomly chosen replacement. This helps insure that all neurons are given equal opportunity to be used by networks.
The source code can be found at: http://www.columbia.edu/~djc28/4721/finalprj.tar.
(Near the end, we intended to explore the so-called Baldwin Effect, by allowing the networks to use backpropogation of error to learn while they were being evolved. However, we did not finalize the implementation. Perhaps this could be examined in the future.)
We were able to evolve neural networks to compute a variety of nonlinear functions with very little error. When the method was successful, it was very quick and converged after few generations (usually ten to twenty). When the population seemed to hopelessly converge on sub-optimal solutions, allowing it to continue through additional generations often got it out of its rut and it continued to find new solutions. Here is a list of several classes of functions it attempted to learn, with results:
Arithmetic expressions: Found solutions to simple functions such as identity, summation, and polynomials with varying success. Identity was extremely accurate for many inputs. Summation was accurate for two inputs to thirty inputs, sometimes finding a near-perfect solution. Multivariable polynomial computation was less accurate, often converging on a solution with a small average error but horrible worst-case performance. Sometimes the population failed to converge with low number of inputs; it apparently performed better, albeit slower, with more data.
Boolean expressions: Extremely successful, even with high complexity and number of inputs. For example, the method was able to develop solutions to expressions like [(A+B)(C+D)o(A+E)(B+F)] with less than 0.0001% worst-case error.
Constant-value functions: Success varied with the number of inputs. With one to five inputs the fittest networks were nearly perfect, but as the number of inputs increased accuracy degraded and convergence time increased.
Clearly, evolving neural nets to perform tasks can be successful in certain domains. The purpose of this project was not to compare the performance of our method to conventional methods, but to assess its feasibility. Indeed, we were initially unsure whether it would work at all. The obvious next step is to apply the method to a more difficult domain, such as game playing, and refine the evolutionary operations. There is a great deal of guesswork involved in the process of evolving the networks, in choosing them for mating, in mutating them, and so on.
Genetic programming also has the potential to correct another shortcoming of neural nets, by evolving nets with diverse topologies. In this way, the best network topology could theoretically be found without resorting to user interference or intuition.