COMS W4115
Programming Languages and Translators
Lecture 8: Predictive Top-Down Parsers
February 16, 2015

Lecture Outline

Top-down parsing
FIRST
FOLLOW
How to construct a predictive parsing table
LL(1) grammars
Transformations on grammars

1. Top-Down Parsing

Throughout this lecture, we assume we have been given a grammar G and an input string w of nonterminals (token names) that we want to parse.
Top-down parsing consists of trying to construct a parse tree for w starting from the root of the tree and creating the nodes of the parse tree in preorder.
Equivalently, top-down parsing consists of trying to find a leftmost derivation for w from the start symbol of G.
Recursive-descent parsing is a top-down method of syntax analysis in which a set of recursive procedures is used to process the input string with a procedure associated with each nonterminal of the grammar. See Fig. 4.13, p. 219.
A nonrecursive predictive parser uses an explicit stack and a parsing table to do deterministic top-down parsing.
In this class we will develop an algorithm to construct predictive parsing tables for a large class of useful grammars called LL(1) grammars.
To implement this algorithm we need to define two functions on grammar symbols, FIRST and FOLLOW.

2. FIRST

FIRST(α), where α is a string of terminal and nonterminal symbols, is the set of terminal symbols that are prefixes of the strings derivable from α. If α can derive ε, then we add ε to FIRST(α).
Algorithm to compute FIRST(X):

If X is a terminal, then FIRST(X) = { X }.
If X → ε is a production, then add ε to FIRST(X).
If X → Y₁ Y₂ ... Y_k is a production for k ≥ 1, and
for some i ≤ k, Y₁Y₂ ... Y_i_-1 derives the empty string, and a is in FIRST(Y_i), then add a to FIRST(X).
If Y₁Y₂ ... Y_k derives the empty string, then add ε to FIRST(X).

Example. Consider the grammar G:

S → ( S ) S | ε

For G, FIRST(S) = {(, ε}.

3. FOLLOW

FOLLOW(A), where A is a nonterminal of G, is the set of terminals that can appear immediately to the right of A in some sentential form of G. Let us assume the string to be parsed is terminated by an end-of-string endmarker $. If A can be the rightmost symbol in some sentential form, then we add the right endmarker $ to FOLLOW(A).
Algorithm to compute FOLLOW(A) for all nonterminals A of a grammar:

Place $ in FOLLOW(S) where S is the start symbol of the grammar.
If A → αBβ is a production, then add every terminal symbol a in FIRST(β) to FOLLOW(B).
If there is a production A → αB, or a production A → αBβ, where FIRST(β) contains ε,
then add every symbol in FOLLOW(A) to FOLLOW(B).

Example. For G above, FOLLOW(S) = {), $}.

4. How to Construct a Predictive Parsing Table

Input: Grammar G.
Output: Predictive parsing table M.
Method:


  for (each production A → α in G) {
     for (each terminal a in FIRST(α))
        add A → α to M[A, a];
     if (ε is in FIRST(α))
        for (each symbol b in FOLLOW(A))
           add A → α to M[A, b];
  }
  make each undefined entry of M be error;

Example 1. Predictive parsing table for the grammar:

S → +SS | *SS | a;

FIRST(S) = {+, *, a}

FOLLOW(S) = {+, *, a, $}


                        Input Symbol
   Nonterminal      a        +          *        $

       S          S → a    S → +SS    S → *SS  error

Example 2. Predictive parsing table for the grammar:

S → ( S ) S | ε

FIRST(S) = {(, ε}

FOLLOW(S) = {), $}


                        Input Symbol
   Nonterminal      (         )        $

       S        S → (S)S    S → ε    S → ε

Example 3. Predictive parsing table for the grammar:

S → S ( S ) | ε

FIRST(S) = {(, ε}

FOLLOW(S) = {(, ), $}


                        Input Symbol
   Nonterminal      (         )        $

       S        S → S(S)    S → ε    S → ε
                S → ε

5. LL(1) Grammars

A grammar is LL(1) iff whenever A → α | β are two distinct productions, the following three conditions hold:

For no terminal a do both α and β derive strings beginning with a.
At most one of α and β can derive the empty string.
If β derives the empty string, then α does not derive any string beginning with a terminal in FOLLOW(A).
Likewise, if α derives the empty string, then β does not derive any string beginning with a terminal in FOLLOW(A).

We can use the algorithm above to construct a predictive parsing table with uniquely defined entries for any LL(1) grammar.
The first "L" in LL(1) means scanning the input from left to right, the second "L" for producing a leftmost derivation, and the "1" for using one symbol of lookahead to make each parsing action decision.

6. Transformations on Grammars

Two common language-preserving transformations are often applied to grammars to try to make them parsable by top-down methods. These are eliminating left recursion and left factoring.
Eliminating left recursion:

Replace


      expr →  expr + term
           |  term


      expr  →  term expr'

      expr' →  + term expr'
            |  ε

Left factoring:

Replace


      stmt → if ( expr ) stmt else stmt
           | if (expr) stmt
           | other


      stmt  → if ( expr ) stmt stmt'
            | other

      stmt' → else stmt
            | ε

7. Practice Problems

Consider the following grammar G for Boolean expressions:


     B → B or T | T
     T → T and F | F
     F → not B | ( B ) | true | false

What precedence and associativity does this grammar give to the operators and, or, not?
Compute FIRST and FOLLOW for each nonterminal in G.
Transform G into an equivalent LL(1) grammar G'.
Construct a predictive parsing table for G'.
Show how your predictive parser processes the input string

true and not false or true

Draw the parse tree traced out by your parser.

8. Reading

ALSU, Section 4.4.

aho@cs.columbia.edu

COMS W4115 Programming Languages and Translators Lecture 8: Predictive Top-Down Parsers February 16, 2015