Using Genetic Programming to Evolve a Connect-4 game player

                             Using Genetic Programming to Evolve a Connect-4 game player
                                                                             Mohamed F. Abdelsadek
                                                                          Department of Computer Science
                                                                                  Columbia University
                                                                                mfa11@columbia..edu

ABSTRACT

In this paper, we employ genetic programming to evolve Connect-4 players. A players skill level (fitness) is based on how the player
performs when placed in competition with other randomly chosen members of the population. Different emerging game strategies are
observed as the players evolve and their skill improves.

1. Introduction

Connect-4 is a popular board game that many played as children. The game is played on an eight by eight board. A legal move is placing
a piece in an empty square so that the piece is either at the bottom row of the playing board, or directly on top of the highest piece in any
column. The first player capable of placing four their pieces adjacent to one another either vertically, horizontally or diagonally wins the game.
A game may be composed of many moves. Because of the difficulty of attributing the effect of a single move on the final result of the game,
fitness evaluation is based on the final result of a game. This indicates that the evolved player has to find some general concept behind playing
Connect-4 in order to play competitively.

Since we do not have an optimal solutions to the Connect-4 board game there is no absolute measure that the evolved players can be measured
against. In order to determine the quality of the evolved genetic programs, they are tested against each other, random players, and human players.
General trends in good players are observed and are used to speculate to how an optimal player would behave. We feel that the purpose of genetic
programming should be to discover these “optimal” solutions or patterns.

This paper is divided into four other sections. An brief description of genetic programming is first given in section 2. A description of how genetic
programming was utilized to evolve the connect-4 programs is explained in section 3. An analysis of the experiments and the results is described
in section 4. A final conclusion consolidates the experiments the results and possible ways have improved the process.

2. Genetic Programming

Genetic programming is a machine learning method that was motivated by an analogy to biological evolution. The underlying principal is similar
to that of the Darwinian theory of evolution. The goal of genetic programming is to evolve amongst a population of hypothesis (or programs)
the most fit. The process of evolution occurs though a sequence of generations in which the fittest survive and the weak die out. The initial
population is typically chosen at random and certain rules are employed in order to ensure that the members are correctly behaved within the
domain of the problem. Each member of the population is then evaluated for their fitness based on some evaluation criteria. Members of the
population are then chosen at random based on their fitness, and are moved to the next generation. New members are also formed by mutation
of select members or by crossover between two existing members. Other operations may also be employed. As generations die out, new
generations will emerge inheriting the good attributes of their predecessors. Members with good genetic structure will tend to be selected and
their genetic material passed on. A typical member of a genetic program will have several input points known as terminals and several evaluation
functions known as nodes see Koza [1]. A program will be composed of some combination of select nodes and functions. Some of the challenges
in formulating a genetic programming models is defining these nodes and terminal to fit the given problem.

There has been extensive research done in the area of genetic programming. This paper utilizes ideas used in some of these research experiments.
Craig W. Reynolds [2] performed a study on the pursuit and evasion game of Tag. In the study competitive fitness was used to evaluate fitness
of members of the population. Players fitness was rated on how well they performed when placed in competition with six other random members
of the population. The purser would chase the evader and if caught role reversal would take place; the evader now becomes the pursuer and the
purser becomes the evader. A game was composed of twenty five moves where each move the purser attempts to make a capture and the evader
attempts to escape. The winner is calculated based on the player who was the evader for a greater number of moves. The idea of competitive fitness
was used by in the research paper to develop an optimal connect-4 player.

Another research example was the study done by Gabriel J Ferrer and W. N. Martin [3]. The study analyzed the genetic programming paradigm in the
context of Senet, a strategy game popular in ancient Egypt. The hypothesis representation used by the genetic programs in the population used conditional
and logical operators. In particular the “if” function and the logical operators and, or and, not were used. The simple arithmetic operators +, -, and
* were also utilized. The fitness of an individual in the population was determined by pairing two members and having them compete in three games.
Individuals who won two games were declared winners and moved on to the next round of the tournament. All individuals who lost at the same round
were given the same fitness. The reason three games were chosen as opposed to a single game of competition was the non-deterministic nature of Senet.
The next plausible move is determined based on a combination of a probabilistic drawing factor in conjunction with the genetic programs selection of
the next best move. The research was able to produce effective players at the game of Senet.

3. Genetic Programming implementation

A pre-built Java based genetic programming package GPJPP written by Adam Fraser and Thomas Weinbrenner was used to implement the Connect-4
problem. The package is flexible enough to allow the user to define the problem and the primitives to use for the genetic program representation.
The following terminals and nodes were used in the formulation of the problem domain.

Function Nodes

                Node

        Number of Input Terminals

Description

                    +                             2 Addition Operator

                    -                             2 Subtracion Operator

                    *                             2 Multiplication Operator

                   %                             2 Safe Division Operator

Terminals

Terminal Value Description

w number of white pieces currently on the board

b number of black pieces currently on the board

W number of white pieces on the edges (row 1 or row 8 or column 1 or column 8)

B number of black pieces on the edges (row 1 or row 8 or column 1 or column 8)

c number of white pieces on the corners (there are four corners on the board)

C number of black pieces on the corners (there are four corners on the board)

n number of white pieces near the corner (the three squares around the corner)

N number of black pieces near the corner (the three squares around the corner)

e number of white pieces near the edges

E number of black pieces near the edges

j number of adjacent white pieces (total number of pieces that are directly adjacent)

J number of adjacent black pieces (total number of pieces that are directly adjacent)

10 A constant used to adjust weights

The re is a total of thirteen terminals. The more terminals that a problem is given the more time and generation which are required to evolve a player
with the required fitness measure. Many of the primitives used here were also used in a previous research effort for the Othello board game (see http://www.cs.columbia.edu/~eeskin/othello/). An additional terminal was added “j” and “J” to allow for an understanding of pieces being adjacent
to one another. Notice that the selection of these nodes is very critical to the resulting player. If the selected nodes/terminals do not conform to the
problem at hand, genetic programming will only be able to discover mediocre players.

After each move statistics on the board are collected to determine the new value of the terminals. The evolved player is presented with the value of these
terminals and all the currently possible legal moves. Based on the players evolved program a decision is made as to what the next move is. An example
of an evaluation function is “(+ 10 E).” Which adds 10 to the number of black pieces near the edges to select the next move.

One of the limitations that the genetic package introduced was a natural way to allow competitive coevolution of the players with other players in the population.
The package evaluates the fitness of a player as soon as it is created without consideration that other members of the population have not been created.
To overcome this problem we used a previous generation to evaluate the fitness of the current generation. The very first generation has its fitness measured
against random players. The fitness of players is evaluated by competition against six other players. Based on the number of games that the player wins
and the duration of each game, the player is assigned a fitness measure. Shorter games are assigned higher fitness (an assumption is made that better
players will tend to win the game faster).

The learning cycle ends either when a fixed number of players with fitness measure higher than a pre-specified value are evolved or when the maximum number
of generations is reached.

4. Results

Several evolution runs were conducted using the design above with minor variations between the runs. The population size and the number of generations
were fixed through out the runs. Three runs will be discussed here and the different learned concepts as the player evolves will be analyzed.

4.1 Run A

In this run the genetic program was evaluated for fitness by playing against a random player. Several interesting patterns were observed amongst the
evolved players. Random players determine the next move by randomly choosing a move among the set of current legal moves. The evolved players
for this scenario had a tendency of lining up their pieces directly on top of each other as illustrated in the diagram below.

Genetic Program Playing White, Random Player
Competing as Black.

Most players would start at the right column and stack the pieces on top of each other. Although this strategy is simple it was very effective when
used against the random player. This technique resulted in the computer defeating the random player in 49 out of 50 games (or a 98 % win ratio).
Because of this ratio the proceeding generations did not have to evolve dramatically in order to defeat the random player, as long as they used
the same basic strategy they were almost guaranteed victory. The above strategy is so simple that when we competed directly with the evolved
player we found that defeating it next to trivial. A sample genetic program that used this technique was a one terminal function

RPB: white_edges

Among all the evolved functions none used the terminals “white” and “black.” The reason for that becomes obvious when the rules of Connect-4
are re-observed. Since the players alternate moves the number of black pieces will always be equal to or one piece greater than the number of
black pieces on the board

4.2 Run II

In this run we employed some modification to the fitness measure. In particular so that players compete against other members of the population.
We felt that this way would produce better players than competition against random players. The fitness measure depended on two factors: The
complexity of the Genetic Program tree, and whether the game was a win, a lose or a draw.

To our surprise the resulting strategies were similar in many ways to those observed in Run I. There were however enough relevant differences.
The major difference was the use of white_adj_pieces as a terminal to some of the evolved players. This produced a strategy in which the players
not only place the pieces vertically on top of one another filling one column at a time, but also attempting to maximize the number of adjacent pieces.
When we competed against the player we found it attempting to place its pieces close to one another horizontally or diagonally portraying this
simple strategic approach. This approach was again very effective against the random player, yet can be easily defeated when we competed
against it.

Below are samples of some of the evolved players:

RPB: ( - black_adj_pieces white_near_corners)
RPB: ( - black_corners white_near_corners)

4.3 Run III

Most of the genetic player produced by Run I and Run II were short trees. We felt that these short trees are incapable of capturing the general
concept behind playing the Connect-4 game, and that more complex trees would have to be developed. Since our fitness measure penalized more
complex solutions, this fitness measure was removed. In order to reward efficient players, the number of moves required untill victory were used
for the fitness measure, longer game resulted in worst player fitness.

Below is a sample of one of the evolved players

RPB: ( -   black_edges ( + ( - ( *   white_edges   white_near_corners ) ( -   10   white_corners )) ( / ( +   white_corners   white_near_corners )
( / ( + ( / ( - ( +   white   black_adj_pieces ) ( / ( -   white_near_corners   white_edges ) ( / ( -   black_adj_pieces ( -   white_near_edges   10 ))
( /   10   white_near_edges )))) ( /   black_near_corners ( / ( + ( - ( + ( +   white_near_corners   white_near_edges ) ( * ( *   white_near_corners
black_near_edges ) ( -   black_adj_pieces   white_corners ))) ( -   white_near_edges   black_near_corners )) ( * ( +   white_corners   white_near_corners )
( -   black_adj_pieces   white_corners ))) ( /   white_near_corners   black_near_edges )))) ( / ( * ( * ( /   10   black_near_edges ) ( +   white_corners
black_corners )) ( - ( -   white_near_corners   white_edges ) ( /   white_near_corners   black_near_edges ))) ( -   black_adj_pieces   white_corners )))
( *   white_edges   white_near_corners )))))

There was a tendency for all of the developed players to be as complex as the above sample player. This player was evolved in generation 21 and was
able to survive for several more generations. When we competed against this player we found a dramatic improvement in performance in playing tactics
over Run II and Run I players. Although the player was much more competitive and made some surprisingly good moves, it was still at a high enough
level of competition to be able to always defeat us (although on occasion it did). When compared to the random player, this player won all of the fifty
played games.

5. Conclusion

Evolution of a good board game player is highly dependent on the interplay between the fitness evaluation measure, the selected terminals and nodes,
and the training partners used for training competition. Different variations on these elements can have dramatic effects on the resulting players. In
order to produce a highly skilled player the terminals and nodes have to be general enough to allow formulation of the game concept. However,
one must be careful not allow for too many terminals and nodes otherwise it will take longer times to evolve the players.

In this paper we were able to evolve good players that were easily able to defeat the Random player. The player’s skill level was not enough to
be able to compete effectively against a human competitor. We feel however, that with more variations on the genetic program model a very good
player can be evolved. This was observed as the player continuously improved over time with these variations.

References

[1] Koza, J. R., 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press.

[2] Craig W. Reynolds. Competition, Coevolution and the Game of Tag. Proceedings of the Fourth International Workshop on the Synthesis
and Simulation of Living Systems, pp. 59-69, MIT Press, 6-8 July 1994.

[3] Gabriel J. Ferrer and Worthy N. Martin. Using Genetic Programming to Evolve Board Evaluation Functions for a Board Game. 1995 IEEE
Conference on Evolutionary Computation, Vol. 2, p. 747, IEEE Press, 29 November - 1 dec 1995.

[4] Frank W. Moore and Oscar N. Garcia. A Genetic Programming Approach to Strategy Optimization in the Extended Two-Dimensional
Pursuer/Evader Problem. Genetic Programming 1997: Proceedings of the Second Annual Conference, pp. 249-254, Morgan Kaufmann,
13-16 July 1997.

Node	Number of Input Terminals	Description
+	2	Addition Operator
-	2	Subtracion Operator
*	2	Multiplication Operator
%	2	Safe Division Operator

Terminal Value	Description
w	number of white pieces currently on the board
b	number of black pieces currently on the board
W	number of white pieces on the edges (row 1 or row 8 or column 1 or column 8)
B	number of black pieces on the edges (row 1 or row 8 or column 1 or column 8)
c	number of white pieces on the corners (there are four corners on the board)
C	number of black pieces on the corners (there are four corners on the board)
n	number of white pieces near the corner (the three squares around the corner)
N	number of black pieces near the corner (the three squares around the corner)
e	number of white pieces near the edges
E	number of black pieces near the edges
j	number of adjacent white pieces (total number of pieces that are directly adjacent)
J	number of adjacent black pieces (total number of pieces that are directly adjacent)
10	A constant used to adjust weights