# COMS W4115 Programming Languages and Translators Lecture 8: Predictive Top-Down Parsers February 16, 2015

## Lecture Outline

1. Top-down parsing
2. FIRST
3. FOLLOW
4. How to construct a predictive parsing table
5. LL(1) grammars
6. Transformations on grammars

## 1. Top-Down Parsing

• Throughout this lecture, we assume we have been given a grammar G and an input string w of nonterminals (token names) that we want to parse.
• Top-down parsing consists of trying to construct a parse tree for w starting from the root of the tree and creating the nodes of the parse tree in preorder.
• Equivalently, top-down parsing consists of trying to find a leftmost derivation for w from the start symbol of G.
• Recursive-descent parsing is a top-down method of syntax analysis in which a set of recursive procedures is used to process the input string with a procedure associated with each nonterminal of the grammar. See Fig. 4.13, p. 219.
• A nonrecursive predictive parser uses an explicit stack and a parsing table to do deterministic top-down parsing.
• In this class we will develop an algorithm to construct predictive parsing tables for a large class of useful grammars called LL(1) grammars.
• To implement this algorithm we need to define two functions on grammar symbols, FIRST and FOLLOW.

## 2. FIRST

• FIRST(α), where α is a string of terminal and nonterminal symbols, is the set of terminal symbols that are prefixes of the strings derivable from α. If α can derive ε, then we add ε to FIRST(α).
• Algorithm to compute FIRST(X):
1. If X is a terminal, then FIRST(X) = { X }.
2. If X → ε is a production, then add ε to FIRST(X).
3. If XY1 Y2 ... Yk is a production for k ≥ 1, and
for some ik, Y1Y2 ... Yi-1 derives the empty string, and `a` is in FIRST(Yi), then add `a` to FIRST(X).
If Y1Y2 ... Yk derives the empty string, then add ε to FIRST(X).
• Example. Consider the grammar G:
• ``S → ( S ) S | ε``
For G, FIRST(`S`) = {`(, ε`}.

## 3. FOLLOW

• FOLLOW(A), where A is a nonterminal of G, is the set of terminals that can appear immediately to the right of A in some sentential form of G. Let us assume the string to be parsed is terminated by an end-of-string endmarker \$. If A can be the rightmost symbol in some sentential form, then we add the right endmarker \$ to FOLLOW(A).
• Algorithm to compute FOLLOW(A) for all nonterminals A of a grammar:
1. Place \$ in FOLLOW(S) where S is the start symbol of the grammar.
2. If A → αBβ is a production, then add every terminal symbol `a` in FIRST(β) to FOLLOW(B).
3. If there is a production A → αB, or a production A → αBβ, where FIRST(β) contains ε,
then add every symbol in FOLLOW(A) to FOLLOW(B).
• Example. For G above, FOLLOW(`S`) = {`)`, \$}.

## 4. How to Construct a Predictive Parsing Table

• Input: Grammar G.
• Output: Predictive parsing table M.
• Method:
• ``````
for (each production A → α in G) {
for (each terminal a in FIRST(α))
add A → α to M[A, a];
if (ε is in FIRST(α))
for (each symbol b in FOLLOW(A))
add A → α to M[A, b];
}
make each undefined entry of M be error;
``````
• Example 1. Predictive parsing table for the grammar:
• ``S → +SS | *SS | a;``
FIRST(`S`) = {`+, *, a`}
FOLLOW(`S`) = {`+, *, a, \$`}
``````
Input Symbol
Nonterminal      a        +          *        \$

S          S → a    S → +SS    S → *SS  error
``````
• Example 2. Predictive parsing table for the grammar:
• ``S → ( S ) S | ε``
FIRST(`S`) = {`(, ε`}
FOLLOW(`S`) = {`), \$`}
``````
Input Symbol
Nonterminal      (         )        \$

S        S → (S)S    S → ε    S → ε
``````
• Example 3. Predictive parsing table for the grammar:
• ``S → S ( S ) | ε``
FIRST(`S`) = {`(, ε`}
FOLLOW(`S`) = {`(, ), \$`}
``````
Input Symbol
Nonterminal      (         )        \$

S        S → S(S)    S → ε    S → ε
S → ε
``````

## 5. LL(1) Grammars

• A grammar is LL(1) iff whenever `A → α | β` are two distinct productions, the following three conditions hold:
1. For no terminal `a` do both α and β derive strings beginning with `a`.
2. At most one of α and β can derive the empty string.
3. If β derives the empty string, then α does not derive any string beginning with a terminal in FOLLOW(`A`).
Likewise, if α derives the empty string, then β does not derive any string beginning with a terminal in FOLLOW(`A`).
• We can use the algorithm above to construct a predictive parsing table with uniquely defined entries for any LL(1) grammar.
• The first "L" in LL(1) means scanning the input from left to right, the second "L" for producing a leftmost derivation, and the "1" for using one symbol of lookahead to make each parsing action decision.

## 6. Transformations on Grammars

• Two common language-preserving transformations are often applied to grammars to try to make them parsable by top-down methods. These are eliminating left recursion and left factoring.
• Eliminating left recursion:
• Replace
• ``````
expr →  expr + term
|  term
``````
by
``````
expr  →  term expr'

expr' →  + term expr'
|  ε
``````
• Left factoring:
• Replace
• ``````
stmt → if ( expr ) stmt else stmt
| if (expr) stmt
| other
``````
by
``````
stmt  → if ( expr ) stmt stmt'
| other

stmt' → else stmt
| ε
``````

## 7. Practice Problems

Consider the following grammar G for Boolean expressions:
``````
B → B or T | T
T → T and F | F
F → not B | ( B ) | true | false
``````
1. What precedence and associativity does this grammar give to the operators `and, or, not`?
2. Compute FIRST and FOLLOW for each nonterminal in G.
3. Transform G into an equivalent LL(1) grammar G'.
4. Construct a predictive parsing table for G'.
5. Show how your predictive parser processes the input string
• `true and not false or true`
Draw the parse tree traced out by your parser.