Minimax and Alpha-beta Algorithms CSW4701 Artificial Intelligence We are concerned with two player, perfect knowledge games, where one of the players is the computer. The following two algorithms are called MINIMAX and ALPHA-BETA. In order to use these algorithms, one most design and implement the state descriptions for the game board, the heuristic static evaluation functions that rate the boards, the move generators that implement the rules of the game, and a predicate that determines if a board is a winning position. We present the algorithms in a mixed style incorporating features of LISP and typical block structured languages like C to provide a flavor of how one might implement them in a variety of languages. The MINIMAX procedure is embodied by the function BESTMOVE. BESTMOVE is called with the current board position as its first argument, and a depth bound. The algorithm will expand the search space (an AND/OR tree or graph) to the depth specified by the second argument, and return the best move for the computer to make given the current board position as its first argument. The move generators and heuristic evaluation function are considered global and are directly called by the algorithm. The heuristic static evaluation function is named EVALUATE. EVALUATE provides high positive numbers when board positions that are GOOD for the computer (+ infinity means it wins), while conversely low negative numbers are bad for the computer (- infinity is a loss). The move generator, called POSSIBLE-MOVES, returns a list of possible moves according to the rules of the game. Another function, called NEWPOSITION, generates a new board position given a legal move. (Certainly, you can modify this pseudo-code to pass these functions as functional arguments.) Question: Where are the "AND and OR nodes"? Why does it appear that we have ignored all that? In fact, we haven't. Can you see how AND(min) and OR(max) nodes are implicitly defined through the recursion? Remember: - n = -n, - -n = +n BTW: Do you have a two agent game on your computer (something like Checkers or Chess on your Windows95 PC?). Well then, this code is already on your computer! BESTMOVE = proc (posn, depth) begin (movelist, bestscore, bestm, try, tryscore) %local variables % posn: is the current BOARD CONFIGURATION FROM WHICH A MOVE % MUST BE CHOSEN BY OUR INTELLIGENT AGENT COMPUTER % depth: MAXIMUM NUMBER OF PLIES TO LOOK AHEAD. THIS IS DETERMINED % BY SPEED CONSTRAINTS AND COMPLEXITY OF THE GAME % (i.e. branching factor b) % movelist: the list of all possible MOVES from the current posn % bestscore: the best "backed up" score found so far as we iterate % thru the movelist % bestm: the best move found so far that results in the bestscore % try: just a tmp to hold returned values from recursive calls % tryscore: just a tmp to hold returned scores from recursive calls if depth = 0 then return list(EVALUATE(posn), nil) fi; %This ends the recursion at the bottom of the search space %Note a two element list of values is returned. Movelist = POSSIBLE-MOVES(posn); %According to the rules of the game, we generate all possible %moves from the given board position. Bestscore = -infinity; %initializations Bestm = nil; %We are now ready to scan the movelist and select the %best move. Note how we initialized Bestscore and Bestm. while ( Movelist <> nil ) repeat, try = BESTMOVE( NEWPOSITION(posn, first(Movelist)), depth-1); %Here is the main recursive call that expands and searches %the state space from the selected move. tryscore = - first(try); %recall BESTMOVE returns two values %Now we determine how well this current move did and whether %it should be selected as our best move found so far. if tryscore > Bestscore then do, Bestscore = tryscore; Bestm = first(Movelist); od; %Now we continue scanning down the list of moves to see %if we can find a better move then found so far. Movelist = rest(Movelist); taeper; %After scanning the entire Movelist, we have our best move so: return( list(Bestscore, Bestm) ); nigeb; %End of procedure min-max. Now we provide the alpha-beta variant of mini-max. READ CAREFULLY. BESTMOVE = proc (Posn, Depth, Mybest, Herbest) %Note we added two additional parameters. Mybest should be %initialized to -infinity, while herbest is initialized to %+infinity when calling this version of BESTMOVE. begin (Movelist, Bestscore, Bestm, Try, Tryscore) %local variables if Depth = 0 then return list(EVALUATE(Posn), nil) fi; %This ends the recursion at the bottom of the search space %Note a two element list of values is returned. Movelist = POSSIBLE-MOVES(Posn); %According to the rules of the game, we generate all possible %moves from the given board position. Bestscore = Mybest % note the change to initializations here Bestm = nil; %We are now ready to scan the movelist and select the %best move. Note how we initialized Bestscore and Bestm. while ( Movelist <> nil ) repeat, Try = BESTMOVE( NEWPOSITION(posn, first(Movelist)), Depth-1, -Herbest, -Bestscore ); %Here is the main recursive call that expands and searches %the state space from the selected move. But notice we %added the two additional parameters here. This makes %sense when we view the following conditional expressions. Tryscore = - first(Try); %recall BESTMOVE returns two values %Now we determine how well this current move did and whether %it should be selected as our best move found so far. if Tryscore > Bestscore then do, Bestscore = Tryscore; Bestm = first(Movelist); od; %Ok, here is where the cutoffs occur in alpha-beta. We %terminate the search if we can't possibly find a good move. %Recall our discussion in class. %The "return" function breaks the loop, terminates this %execution and returns to the previous recursive level. if Bestscore > Herbest then return ( list(Bestscore, Bestm)); %Now we continue scanning down the list of moves to see %if we can find a better move then found so far. Movelist = rest(Movelist); taeper; %After scanning the entire Movelist, we have our best move so: return( list(Bestscore, Bestm) ); nigeb; %End of procedure alpha-beta. One last important item to think about. The depth bounds specified in the algorithms have to be chosen carefully NOT ONLY SO THAT WE DON'T RUN OUT OF TIME IN RESPONDING WITH A MOVE, BUT ALSO THAT WHEN WE BOTTOM OUT AT THE FRONTIER, THE CORRECT SENSE OF THE EVALUATION FUNCTION IS USED AT THAT LEVEL. Thus, we have to make sure we descend down to an EVEN number of levels counting the starting position as level 1. Since depth is decremented to 0, this implies the depth bound should be set to an ODD number. Alternatively, you can set it to an EVEN number if you check that depth = 1, rather than 0. This minor little point can reak havoc with the algorithm if you are not absolutely careful.