Final Project: PuyoPuyo
Juno Suk, firstname.lastname@example.org
Full game instructions
Play the game
What is PuyoPuyo? PuyoPuyo is an addictive game in the same falling blocks
genre as Tetris. You are given pieces, each consisting of two colored balls.
The goal is to shift and rotate and drop these pieces into a 6 column,
14 row pit in such a way that the colored balls of the same color group
together. Once four or more of the same colored balls are grouped together
horizontally, vertically, or a combination of the two, they disappear and any
balls above them drop straight down to fill their place. The goal of the game
is to prevent them from filling to the top (like Tetris).
Simple game so far, but to aspire to higher scores, a little more intelligence
is required. Greater points are given for groups of balls disappearing in
a chain reaction sequence.
For example, when a group of like-colored balls disappear, the pieces above
drop down to fill their place. If, upon dropping, these balls find
themselves in a group of four or more like-colored balls, then they too will
disappear. Bonus points are given for this. If, after these balls disappear,
another group is formed upon dropping, then they too disappear and give you
even more bonus points. In this way, the game becomes more exciting and
requires a bit of intelligent foresight.
When you add head-to-head two player competition to the game, it becomes
an even greater challenge since bonus points translate into sending
obstacles (rocks) into your opponent's pit. For full instructions
on this game, please click here. This game
is best learned by playing it. Click here to play
The objective of this project is to genetically program AI players who
are capable of playing this game at a level comparable to experienced human
players. The Java programming language is used to implement every aspect of
this game, including the use of the gpjpp package for generating the
Breaking the problem down, it was determined that there are two
goals to focus on when creating the PuyoPuyo AI player. These two goals
are closely knit to how the fitness of the players would be measured.
The first goal would be to create a player capable of merely surviving
for an extended period of time. That is, create a player that knows
better than to just stack pieces haphazardously up to the top.
The more difficult second goal would be to create a player capable of
setting up the pieces in positions conducive to chain reactions to get
those bonus points (or to send rocks over to the opponent in
To satisfy these two goals, the fitness measures were measured in two
For every piece that is dropped into the pit, the survival fitness
(s-fitness) is incremented 1 point. That way, the longer the player survived,
the higher the s-fitness, and therefore, the higher the chance that this
player's GP function knows how to keep it alive.
The question then arose as to how the two would work together in creating
the AI player. There needed to be some sort of balance between the two
fitnesses. Horn et al. (1994) says, "Historically, multiple objective
have been combined ad hoc to form a scalar objective function, usually
through a linear combination (weighted sum) of the multiple attributes..."
Though Horn goes beyond this linear combination in his work, the linear
combination approach would suffice for this project. To find the
Pareto optimal front in this project was relatively simple, and actually
unnecessary as was found out later. But it was decided that populations
would be created at five different points along the front. The five
points are represented by the following fitness equations:
The second fitness measure, the position fitness (p-fitness), is measured
directly by how many rocks would be sent over from a group or groups
disappearing had the player been playing with an opponent.
This translates to:
So, if the player were to make one group of 4 and one group of 6 disappear,
then he would send over 8 rocks since it would be a chain sequence of
2 groups plus an extra 2 balls (6 + 2 rocks). Therefore, his p-fitness would
be incremented by 8 points.
1 group 1 rock
1 group + n balls 1 + n rocks
2 groups 1 row (6 rocks)
2 groups + n balls 1 row + n balls (6 + n rocks)
3 groups 2 rows (12 rocks)
4 groups 3 rows (18 rocks)
5 groups 4 rows (24 rocks)
6 groups 5 rows (30 rocks)
At the extreme edges of this front was the percentages of 5% and 95%. At
first, I had tried 0% and 100% but this would cause too many ties in
best fitness, especially when s-fitness was set to 100% since many of the
AI players were able to survive indefinitely and would max out the s-fitness
score to 144.
p05: s-fitness * 0.05 + p-fitness * 0.95
p25: s-fitness * 0.25 + p-fitness * 0.75
p50: s-fitness * 0.50 + p-fitness * 0.50
p75: s-fitness * 0.75 + p-fitness * 0.25
p95: s-fitness * 0.95 + p-fitness * 0.05
When creating the primitives to implement the genetic programming for
the the AI player, a decision of whether I would retain complete
domain-independence and follow the standard model of GP or if I would go and
actually try to enhance the playing algorithm by hand which may result
in better results, albeit with some human intervention. This dilemma is
addressed by Janikow(1993), "On one side of the spectrum lies a method that
only uses the classical operators of mutation and crossover. This approach is
conceptually easy... On the other side of the spectrum lies a
knowledge-intensive method that completely abandons the trandition domain
independent operators, and, instead, fully implements the specific
problem-solving methodology." 
It was decided to give the GP full reign on how my AI players would play.
I would only provide the primitives for it to work with:
|+||2||N/A||Sums the two arguments|
|-||2||N/A||Subtracts the two arguments|
|*||2||N/A||Multiplies the two arguments|
|/||2||N/A||Safe divides the two arguments|
height is a measure of how tall a column is. But this primitive is special
in that it doesn't return the straight height but maps the height to an exponential function so the higher columns give extremely low numbers to prevent pieces from being placed there for survival's sake. Here is the mapping:
height of 0 -> 10
height of 1 -> 9
height of 2 -> 9
height of 3 -> 9
height of 4 -> 9
height of 5 -> 8
height of 6 -> 8
height of 7 -> 8
height of 8 -> 7
height of 9 -> 7
height of 10 -> 6
height of 11 -> 5
height of 12 -> 4
height of 13 -> 2
height of 14 -> 0
|sides||0||3 x 2.0
Counts the same-colored pieces to its left, right, and bottom.
This terminal, above all, is probably the most important since it will be
responsible for causing the same-colored balls to group together.
|diagonals||0||5 x 1.2
Counts the same-colored pieces to its upper-left, upper-right,
lower-left, lower-right, and two pieces below. This terminal will hopefully
facilitate chain reactions by causing same-colored groups to form when
its own column or a neighboring column that has a same-colored ball it is
diagonal to shifts down one.
|diagonals2||0||4 x 1.5
Counts the same-colored pieces two rows up and to the left and right, and
two rows down to the left and right. This terminal has the same idea as the
diagonals terminal except now the chain reactions would come when
its column or a neightboring column shifts down two.
|column||0||9 x 0.67
Counts all the same colored balls in the same column.
|adjcolumn||0||18 x 0.33
Counts all the same colored balls in the neighboring column(s).
The factor column shows the highest possible value of the primitive along
with its multiplier used to normalize the output values
so that no single primitive would dominate and cause the "can't see forest
because of trees" problem. As you can see, all factors multiply their
primitive max to a product of 6.0, with the exception of the constant 4
and the terminal height.
So how are the moves evaluated in the game of Puyo?
piece, there are 4 different rotation positions, two vertical and two
horizontal. In the horizontal position, the piece has five possible column
positions. In the vertical position, the piece has six possible column
positions. So, all together, there is a total of 5 + 5 + 6 + 6 = 22 moves
available to any given piece. Before running through all the moves, to
avoid repeating calculations, the evaluation function calculates the
values for side, diagonal, and diagonal2 for each
column four times-
for each of the two ball colors in the current piece it calculates
the primitive twice, once with the ball right above the highest ball on the
column, and once with the bell two spaces above the highest (for the
vertical piece orientation). So a maximum of 3 primitives * 24 = 72
values will be obtained for these three primitives. For the height,
column, and adjcolumn primitives, it will find a
two values for each column, one for each color in the piece. This may seem
like a lot of values to obtain but surprisinly, any time in calculating
these values is unnoticeable. Fewer values are usually found anyway since
the evaluation function
will not calculate for primitives not found in the current GP function tree.
Further reductions are made when the both balls in the current piece are the
Once all the primitives have been calculated for each column, color, and
position, the function runs through the 6 * 2 possible locations each ball
in the current piece could go, and evaluates the GP tree at each location.
Finally, the evaluation runs through each of the 22 moves and determines
which one gives the maximal output. After determining the optimal move,
it will send the right sequence of moves (shift left, shift right, rotate,
and down) to a queue which will place the piece in the correct position.
finds that there are more than one optimal move, then it will randomly
choose one, each move given the same probability of being selected.
Training and Testing
For the training aspect of this project, first and foremost, it had to
be decided who each player would train against. Would it train against other
humans, AI players, or itself? And what would be the criteria for
terminating a player(s) training session?
Playing against humans was immediately ruled out since there weren't enough
humans around who had the time to play the thousands of AI hopefuls
in the GP population. Playing against other AI players in the population in
a tournament model was a possibility, but it posed some problems,
such as termination criteria. What if both players play equally good and
neither refuses to lose.
Should there be a time limit on each game? It was finally concluded that
the best way to train the players was against itself. It would play
solo until it either reached the top of the pit or the pieces ran
out, at which point, the s-fitness and the p-fitness would be added with
the appropriate weights. The resulting sum would then be divided by an
arbitrary constant so better fitness would give lower scores, to be consistent
with the given gpjpp java package.
Termination criteria for each player can be satisfied in two ways. The first
way is the pieces reach the top of the pit. The second way is to limit the
total number of pieces per game to 144. 144 was chosen because it takes
36 pieces to fill up a pit, and it was arbitrarily decided that if a player
could last long enough to fill up the pit four times over (4 * 36 = 144), then
its survival capability was good enough to go on indefinitely.
For the population training phase, each player would go through three games
with itself, each time with another set of pieces. The 3 piece sets were
pregenerated using a program that would generate 288 digit strings, each
digit in the range of 1-6. The numbers 1-6 represent the 6 possible colors
of a ball. Though the game can be played using anywhere between 4 through 6
colors, the players trained under the most challenging since a player that
can play under all settings is desired.
that can play with 6 colors is more likely to be able to play with 4 balls
than a player trained with only 4 colors is likely to handle 6.
In the player evolution process, fitness was not penalized for tree complexity.
Though in most real-time system, tree complexity would be an issue, it
was discovered that the game engine used here is fast enough that the
the tree complexity doesn't cause a noticeable difference in move-to-move
There was a total of 5 different populations, each running under a different
section of the pareto optimal front, and each population underwent 21
generations. The best of each generation was then run through the
validation set, consisting of 10 games of preset 288-piece sets. After
determining the best generation from each population, the 5 AI players
from each populations' best generation was taken and run through a
final testing set, consisting once again of 10 games of preset 288-piece sets.
None of the 23 preset piece sets used from training through testing were
identical since each were randomly generated individually. This was to assure
that there would be no bias.
Bias, though, is really not a concern nor is it a threat in this project
since all cases were randomly generated and since in every game, there is
an element of randomness. In other words, the same player playing with the
same set of pieces the second time, will not play it exactly as it did the
first time. There are many occasions in the evaluation period that the
GP tree evaluation will yield more than one possible moves, at which point
the program will randomly choose which move to take.
Both the validation set and the training set were run using a p05 Pareto
setting where s-fitness is 0.05 and p-fitness is 0.95. This was chosen
because it was found that most players that had evolved this far could
max out its survival fitness anyway. Would choosing a p05 front for
validation and testing bias the results? Possibly, but not in this case
obviously since the AI player that was evolved from the p05 front did the
worst out of all 5 of the final players.
The 5 GP representations of the remaining AI players were programmed into
the PuyoPuyo game configuration and are available as players, AI #1 through
AI #5, AI #1 being the easiest and AI#5 being the hardest, as determined
by the testing run.
Here are the resulting 5 AI players, their functions, their fitness scores
from their testing phase, the section of the Pareto optimal front they're from,
and the delay in milliseconds between piece moves.
(The delay value is arbitrary, not genetically evolved. The smaller the
delay, the faster the player.)
Function: ( + ( / diagonals2 ( * sides column ))
( + ( / sides ( * 4 ( * sides height )))
( + sides diagonals )))
s-fitness: 1382 out of 1440
delay: 400 ms
Function: ( + ( + ( + height sides ) sides ) adjcolumn )
s-fitness: 1440 out of 1440
delay: 300 ms
Function: ( / ( + diagonals2 ( + sides ( + sides
( / sides diagonals2 ))))
s-fitness: 1418 out of 1440
delay: 250 ms
Function: ( + sides ( + height adjcolumn ))
s-fitness: 1440 out of 1440
delay: 225 ms
Function: ( + ( + diagonals2 ( / ( + ( + sides ( - 4 sides ))
( * ( / ( / sides sides )
( * ( + sides sides ) sides ))
s-fitness: 1440 out of 1440
delay: 200 ms
To see all the output, click here.
Looking at the results, it can be said that this project was a success.
Five AI players were evolved that could play decently, even defeating
experienced human players (i.e. me) more than half of the time. But that's
not to say that the results couldn't have been better.
First of all, it was found that the Pareto optimal front was not necessary
in this project. Upon viewing the finally evolved players, it can be seen that
most of them could survive indefinitely. Indefinitely for at least 1440 pieces!
But this is not the main reason why the front wasn't needed. In fact, there
were two cases that did not max out at 1440, but that's besides the point.
The point is, the p-fitness itself reflects the p-fitness inherently by
its value. The larger the s-fitness, the larger the p-fitness will be.
Therefore, the s-fitness could have been excluded from fitness calculations
completely and the Pareto front done away with.
But there is a second, better way to calculate fitness that would have
required the optimal front. If the p-fitness was not a simple sum of all the
rocks sent over but a ratio of rocks sent over per piece, then the survival
fitness ceases to be inherently a part of the p-fitness value. This method
of calculating fitness is being brought up since there is another problem
noticed in the final results that may require that a front remain in place
should modifications be made.
The problem is that the final AI players that were generated do not
take into account, at least not to the level they should be, the height
of a column. If you were to play the AI players, you would see that they
will often stack up pieces all the way to the top, even with open spaces
to lower positions in the pit available. I can only attribute this shortcoming
to the fact that these players were trained to play by themselves. They
were not trained in a hostile head-to-head environment where rocks are
being sent by the opponent constantly. These AI players were good at
keeping their pieces low throughout their solo play, so they never experienced
the high pressure situation where pit space was getting scarce and to survive,
height had to be taken account.
As a solution to this problem, a rock-sending mechanism could be placed in the
training sessions, forcing the players to learn how to avoid high columns.
The mechanism could send a couple rocks down in random locations every piece,
or the mechanism could be more responsive to how the AI is playing. The danger
to avoid when using a more dynamic rock-sending mechanism is that it may
cause the GP players to evolve certain unwanted behaviors. For example, if it
sends rocks down in abundance only when pieces in the pit are at a low level,
then the players may evolve to avoid chain reaction strategies and only seek
to group pieces quickly to get them out of the way quickly.
There are many other improvements or even totally different approaches that
could lead to
better PuyoPuyo players, but for a start, the results from this project
are good, not only for creating competent players, but also opening new ideas
to incorporate into the following "generations".
 Horn, J., Nafpliotis, N. and Goldberg, D. E. (1994) A Niched Pareto Genetic Algorithm for Multiobjective Optimization. First IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence, Volumn 1, 1994. pp82-87.
 Janikow, C. Z. (1993) A Knowledge-Intensive Genetic Algorithm for Supervised Learning. Machine Learning, 13, pp.189-228. Department of Mathematics and Computer Science, University of Missouri, St. Louis.