How to Write a GP Paper

by Eric V. Siegel

I originally posted this message to the GP mailing list December, 1995. By popular request (one person:), here it is for your convencience.

Dear GP-ers,

Since the submission deadline for GP96 is approaching (Jan 10), there are two serious paper-writing topics we need to address:

1. The introduction section of your paper. This must (A.) motivate your work by pinpointing the problem you are addressing and then (B.) give an overview of your approach and/or contributions (and perhaps even a general description of your results). In this way, the intro sets up my expectations for the rest of your paper -- it provides the context, and a preview.

I have been on several GA/GP paper review committees, and I sit on GP96's committee. About HALF of the GP papers I've reviewed had bad introduction sections. This is understandable: GP is a young field, so many GP researchers are new to writing research papers.

More than often a paper with an unclear introduction is filled with good ideas. In this case, the lack of organization makes it REALLY hard to sort out what the specific research contributions are.

Your paper must have one main contribution that it focuses on. Your paper should not be an autobiography of the work you did over the last 7 months.

EXAMPLE BAD INTRODUCTION:

Here at the institute for computer research, me and my colleagues have created the SUPERGP system and have applied it to several toy problems. We had previously fumbled with earlier versions of SUPERGPSYSTEM for a while. This system allows the programmer to easily try lots of parameters, and problems, but incorporates a special constraint system for parameter settings and LISP S-expression parenthesis counting.

The search space of GP is large and many things we are thinking about putting into the supergpsystem will make this space much more colorful.

EXAMPLE GOOD INTRODUCTION: (Siegel/Chaffee chapter from upcoming AdvancesII)

(THE PROBLEM:)

Many new domains for genetic programming require evolved programs to be executed for longer amounts of time. For example, it is beneficial to give evolved programs direct access to low-level data arrays, as in some approaches to signal processing \cite{teller5}, and protein segment classification \cite{handley,koza6}. This type of system automatically performs more problem-specific engineering than a system that accesses highly preprocessed data. However, evolved programs may require more time to execute, since they are solving a harder task.

(PREVIOUS OR OBVIOUS APPROACH: (Note that you can also have a related work section that gives more details about previous work.))

One way to control the execution time of evolved programs is to impose an absolute time limit. However, this is too constraining if some test cases require more processing time than others. To use computation time efficiently, evolved programs must take extra time when it is necessary to perform well, but also spend less time whenever possible.

(APPROACH/SOLUTION/CONTRIBUTION. THE *FIRST* SENTENCE OF A PARAGRAPH LIKE THIS SHOULD SAY WHAT THE CONTRIBUTION IS:)

In this chapter, we introduce a method that gives evolved programs the incentive to strategically allocate computation time among fitness cases. Specifically, with an {\it aggregate computation time ceiling} imposed over a series of fitness cases, evolved programs dynamically choose when to stop processing each fitness case. We present experiments that show that programs evolved using this form of fitness take less time per test case on average, with minimal damage to domain performance. We also discuss the implications of such a time constraint, as well as its differences from other approaches to {\it multiobjective problems}. The dynamic use of resources other than computation time, e.g. memory or fuel, may also result from placing an aggregate limit over a series of fitness cases.

(CHAPTER OVERVIEW:)

The following section surveys related work in both optimizing the execution time of evolved programs and evolution over Turing-complete representations. Next we introduce the game Tetris as a test problem. This is followed by a description of the aggregate computation time ceiling, and its application to Tetris in particular. We then present experimental results, discuss other current efforts with Tetris, and end with conclusions and future work.

------------------------------------------- -------------------------------------------

2. For problems with an infinite number of instances, you must evaluate/validate your "best" individual over new, unseen data. Otherwise NOBODY HAS ANY IDEA how good the resulting individual is!!!

FURTHER, your paper must be very explicit about which numbers indicate performance over which data set. Since GP has a history of not validating results over unseen data, the reader will not assume you did the right thing unless you are very explicit.

----------------------------------------- -----------------------------------------

Any additional comments or responses? If you have something helpful to add to this, PLEASE DO! (post to gp-list)

-Eric