Alpha-Beta Pruning and Checkers

By Dave Evans and Carl Sable

We have decided to demonstrate Alpha-Beta pruning by example. Alpha-Beta is a pruning method used in conjunction with a minimax search, and it is best suited for two-player, zero-sum games. We have implemented the game of Checkers with a nice graphical user interface and several options for players. We used a strict minimax with alpha-beta pruning strategy. There are no pre-processed openings or specific endgame strategies. Since the game space of Checkers is way too large for our program to exhaust in its search, we have also come up with a heuristic to evaluate a board position. The factors our evaluation routine considers, in order of importance, are:

Who has more/better pieces? Kings are worth 5, normal pieces are worth 3. (The red score minus the black score is responsible for the 2 highest order digits of the final score.)
Whose normal pieces are, on average, closer to being kings? (A measurement of this is responsible for the next 2 high order digits.)
How many pieces are left on the board? Trades are good for the player who is ahead, since fewer pieces make for simpler positions. (The next 2 digits are higher for fewer pieces if the maximizing player is ahead, and lower for fewer pieces if the minimizing player is ahead.)
Whose kings are better positioned on the board? (Next 2 digits.)
Randomization component. (The last digit is pseudo-random, so moves of otherwise equal score are distinguished. This final digit prevents the game from being clearly deterministic.)

When a human plays the computer (or the computer plays itself), each computer player has 2 possible methods of alpha-beta search. The first is to search down to a specified depth. We consider depths of 4 or 5 to be beginner level, depths of 6 to 8 to be intermediate, and depths of 9 or 10 to be advanced. The second method is to allow the computer to search for a specified time limit. We consider this to be the better method, since depending on the state of the game and number of legal moves in each position, choosing a specific depth can lead to moves of very varying durations. At times, low depth cutoffs are really unnecessary, especially in situations when there are multiple forced moves in a row. When searching for a specified time limit, the computer starts off with an alpha-beta search of depth 4, and then uses iterative deepening, increasing the depth by 1 for each search. If the time limit runs out mid-search, the search to the current depth is halted, and the move given by the previous search is used. In almost all cases, the search to each depth takes more time than the searches to all previous (smaller) depths combined, so at most one depth (and often none) is lost due to the extra time used for previous searches done by iterative deepening. Also, if after a search to a given depth, more than half of the time limit has expired, the best move based on that depth is returned without starting the next level of search, since it is highly unlikely that the new search would have time to complete anyway. We find that giving the computer 30 seconds per move leads to a challenging game, although that sometimes needs to be increased in certain end game situations for the computer to correctly force pieces out of corners.

In our opinion, the best way to gain a feel for the power (and drawbacks) of alhpa-beta pruning and minimax search is to experience the effects of playing against it! We allow the user to choose whether or not the computer searches to a specific depth or for a given time limit. The depth or time can also be specified. Giving the computer more time (or a higher depth) clearly leads to better moves and a more challenging game!

Another interesting thing about playing the computer using this strategy is that the importance of specific drawbacks become more apparent. One commonly toted drawback of minimax search, with or without alpha-beta pruning, is the horizon effect. We have found, however, that this is insignificant (so long as the computer is given a decent time limit or specified depth per move). Even if the computer makes a non-ideal move due to the horizon effect on one move, on its next turn it will see further and probably be able to compensate. However, there is one drawback that has been significant, and we have thought of no way to fix it without partially redifining minimax search with alpha-beta pruning (which we have not done). The problem is that all evaluation is done at the same depth of the tree (unless paths to definite wins or losses will be found), and there is no preference given to turns that lead to equally good situations quicker or prevent equally bad situations longer. So, let's say the computer searches to depth 10, and sees that the opponent, through some complicated series of moves, can force a win (i.e. all moves lead to a loss if the opponent plays perfectly, as assumed). Then all of the computer's possible moves are considered equally bad, and one is randomly selected, which might allow the opponent to obviously win quickly! If the opponent had not seen the complex win, it appears that the computer has just "thrown the game". Similarly, let's say that the computer can force a win in just 2 moves, whereas some other move will allow it to force a move in 10. These 2 moves are considered equally good, and it will radomly choose between them. Let's say it chooses the loner path; 6 moves later, it will have a way to force a win in 4 moves, but it may again see another way to win in 10. This could go on indefinitely, although the random component will eventually choose a quick win. Without the random component, the first move leading to a win would be chosen, and in testing the algorithm this way, we have seen situations where the computer would move back and forth forever, even though it sees a forced win on every turn!

There are two possible "fixes" for this problem. One is to lock in to specific paths, e.g. paths leading to wins, once they are found. This is complicated, and required complete deviation from the minimax, alpha-beta paradigm in specific situations. The other is to prefer "closer" game states if they are good, and "further" game states if they are bad. This involves writing an evalation function not just for a given state, but for a given state compared to an initial, or current, state. This is a bit complicated also, and is a slight deviation from the exact specification of minimax search with alpha-beta pruning. We have decided not to implement either of these fixes. The goal of this project was to create a project to explain and demonstrate minimax search with alpha-beta pruning, and the drawbacks are inherent to the procedure. Also, we have found that they are usually not very apparent. The random component of score generally causes the quickest win to be chosen within a few moves. In mid-game situations, paths leading to good situations quicker are generally preferred over longer such paths since, once the computer is ahead, it can generally gain at least slight improvements to score with the extra moves occuring after the obvious benefit if that benefit is acheived quickly.

So now have fun and try your wits against our program! Or have the computer play itself and watch the game. Experiment with the parameters and see how alpha-beta is affected by level of depth or time limit.

Click here to return to main TIC page.

Send comments or suggestions to sable@cs.columbia.edu or devans@cs.columbia.edu.