Active Learning for GP

In active learning: The real value of a fitness case is actually the information gain it brings. {Gregory Seront}

David Cohn visited the GP symposium from his concurrent symposium on Active Learning. Here is his contribution to this archive:

As promised, a few notes on those slides that I didn't make it to in my "Active Learning for GP" talk on Sunday:

I see three basic areas for applying some of the recent results in active learning to genetic programming. The general question I (naively) looked at was "What do you (or your algorithm) have control of, and how do you control it?" The opportunities that suggested themselves to me were:

1) selection of parameters for simulation (i.e. length of epochs, number of "best" individuals that survive to the next epoch, etc.)

2) selection of mutation operators. This would seem to be along the lines of "smart crossover": learning online which mutations seem to be most beneficial, and concentrating on those (which is, of course, a time-varying function).

3) selection of environment to throw at the population. This is the area that I see the most direct applicability in terms of active learning results, so I'll spend a few paragraphs on it:

Selecting the environment

The trick is figuring out what you consider to be your "learner"; I'm most comfortable framing it as the population. I don't know how standard that is, though.

One thing you could do here that could be considered active learning would be to design the environment to optimize development: you could monitor the population and automatically provide tests and stimuli that will most quickly and efficiently guide development.

Example 1: present stimuli that will optimally distinguish the candidates. I've got individual 1 and individual 2 -- which is more fit? Techniques like selective sampling (Cohn)/uncertainty feedback (Lewis and Catlett)/query by committee (Seung et al) filter examples, and only apply those which are expected to distinguish hypotheses from each other.

But you don't really want to know how each individual rates with respect to the others -- you're only interested in identifying the best ones. For that, you want to concentrate your resources on the fittest individuals as soon as you can.

Example 2: present stimuli that will optimally distinguish the best candidates from all others. Leslie Kaelbling explored this in her thesis with "interval estimation" for control: she maintained confidence intervals on the value of certain actions in certain states, and took actions based on the upper bound of their interval. Oded Maron and Andrew Moore wrote this approach up in a nice formalization that draws on something called "Hoeffding races".

In fact, this approach is still heuristic. Optimally, we really want a resource allocation *policy*, telling us which fitness case to apply to which individual. This would suggest a reinforcement learning problem, where the value function optimizes something like "rate of improvement in population fitness". A couple of good examples of using reinforcement learning for resource allocation are Zhang & Dietterich (job shop scheduling) and Crites & Barto (elevator scheduling). In general, this should converge on an optimal schedule, but could take a looooong time to do so, and is probably not worth the extra bang/buck over Hoeffding-type policies.

A Philosophical Note

As I mentioned to some folks in the hall, there's actually an odd juxtaposition between the domains most people use active learning on and that of GP. AL is usually most interesting on problems where data are very expensive and computation is cheap. One trades computation for data, to make sure every piece of data is as informative as possible. In effect, one "considers everything before trying anything." Active learning in GP, on the other hand, would trade computation for computation, which requires more restraint. Since the philosophy behind the field is "try everything, let the fitness function sort 'em out," there's a bit of an odd match there.

Anyhow, these are the thoughts I intended to toss out before I lost track of time during my talk. Enjoy, and feel free to ask questions.

References

Cohn, Atlas and Ladner. Improving generalization with active learning. Machine Learning, 5(2):201-221. ftp://psyche.mit.edu/pub/cohn/active_learning.ps.Z

Crites and Barto. Improving elevator performance using reinforcement learning. To appear in NIPS 8. ftp://ftp.cs.umass.edu/pub/anw/pub/crites/nips8.ps.Z

Kaelbling. Learning in embedded systems. Ph.D. thesis. TR-90-04, Dept. of Computer Science, Stanford University. 1990.

Lewis and Catlett. Heterogeneous uncertainty sampling for supervised learning. International Conference on Machine Learning, 1994, Morgan Kaufmann.

Maron and Moore. ftp://ftp.ai.mit.edu/pub/users/oded/papers/NIPS6.ps.Z, and a longer version which includes more racing applications is in ftp://ftp.ai.mit.edu/pub/users/oded/papers/AIReview95.ps.Z

Seung, Opper and Sompolinsky. Query by committee Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, 1992, ACM.

Zhang and Dietterich. A Reinforcement Learning Approach to Job-shop Scheduling. IJCAI95 ftp://ftp.cs.orst.edu/users/t/tgd/papers/tr

{David Cohn}