Programming Languages and Translators

Lecture 10: Predictive Top-Down Parsers

February 24, 2014

- Top-down parsing
- FIRST
- FOLLOW
- How to construct a predictive parsing table
- LL(1) grammars
- Transformations on grammars

- Throughout this lecture, we assume we have been given a grammar
*G*and an input string*w*of nonterminals (token names) that we want to parse. - Top-down parsing consists of trying to construct a parse tree
for
*w*starting from the root of the tree and creating the nodes of the parse tree in preorder. - Equivalently, top-down parsing consists of trying to find a
leftmost derivation for
*w*from the start symbol of*G*. - Recursive-descent parsing is a top-down method of syntax analysis in which a set of recursive procedures is used to process the input string with a procedure associated with each nonterminal of the grammar. See Fig. 4.13, p. 219.
- A nonrecursive predictive parser uses an explicit stack and a parsing table to do deterministic top-down parsing.
- In this class we will develop an algorithm to construct predictive parsing tables for a large class of useful grammars called LL(1) grammars.
- To implement this algorithm we need to define two functions on grammar symbols, FIRST and FOLLOW.

- FIRST(α), where α is a string of terminal and nonterminal symbols, is the set of terminal symbols that are prefixes of the strings derivable from α. If α can derive ε, then we add ε to FIRST(α).
- Algorithm to compute FIRST(
*X*): - If
*X*is a terminal, then FIRST(*X*) = {*X*}. - If
*X*→ ε is a production, then add ε to FIRST(*X*). - If
*X*→*Y*_{1}*Y*_{2}...*Y*is a production for_{k}*k*≥ 1, and

for some*i*≤*k*,*Y*_{1}*Y*_{2}...*Y*_{i}_{-1}derives the empty string, and`a`

is in FIRST(*Y*), then add_{i}`a`

to FIRST(*X*).

If*Y*_{1}*Y*_{2}...*Y*derives the empty string, then add ε to FIRST(_{k}*X*). **Example.**Consider the grammar*G*:`S → ( S ) S | ε`

- For
*G*, FIRST(`S`

) = {`(, ε`

}.

- FOLLOW(
*A*), where*A*is a nonterminal of*G*, is the set of terminals that can appear immediately to the right of*A*in some sentential form of*G*. Let us assume the string to be parsed is terminated by an end-of-string endmarker $. If*A*can be the rightmost symbol in some sentential form, then we add the right endmarker $ to FOLLOW(*A*). - Algorithm to compute FOLLOW(
*A*) for all nonterminals*A*of a grammar: - Place $ in FOLLOW(
*S*) where*S*is the start symbol of the grammar. - If
*A*→ α*B*β is a production, then add every terminal symbol`a`

in FIRST(β) to FOLLOW(*B*). - If there is a production
*A*→ α*B*, or a production*A*→ α*B*β, where FIRST(β) contains ε,

then add every symbol in FOLLOW(*A*) to FOLLOW(*B*). **Example.**For*G*above, FOLLOW(`S`

) = {`)`

, $}.

- Input: Grammar
*G*. - Output: Predictive parsing table
*M*. - Method:

```
for (each production
```*A* → α in *G*) {
for (each terminal *a* in FIRST(α))
add *A* → α to *M*[*A, a*];
if (ε is in FIRST(α))
for (each symbol *b* in FOLLOW(*A*))
add *A* → α to *M*[*A, b*];
}
make each undefined entry of *M* be **error**;

`S → +SS | *SS | a;`

`S`

) = {`+, *, a`

}`S`

) = {`+, *, a, $`

}```
Input Symbol
Nonterminal a + * $
S S → a S → +SS S → *SS
```**error**

`S → ( S ) S | ε`

`S`

) = {`(, ε`

}`S`

) = {`), $`

}```
Input Symbol
Nonterminal ( ) $
S S → (S)S S → ε S → ε
```

`S → S ( S ) | ε`

`S`

) = {`(, ε`

}`S`

) = {`(, ), $`

}```
Input Symbol
Nonterminal ( ) $
S S → S(S) S → ε S → ε
S → ε
```

- A grammar is LL(1) iff whenever
`A → α | β`

are two distinct productions, the following three conditions hold: - For no terminal
`a`

do both α and β derive strings beginning with`a`

. - At most one of α and β can derive the empty string.
- If β derives the empty string, then α does not derive any string
beginning with a terminal in FOLLOW(
`A`

).

Likewise, if α derives the empty string, then β does not derive any string beginning with a terminal in FOLLOW(`A`

). - We can use the algorithm above to construct a predictive parsing table with uniquely defined entries for any LL(1) grammar.
- The first "L" in LL(1) means scanning the input from left to right, the second "L" for producing a leftmost derivation, and the "1" for using one symbol of lookahead to make each parsing action decision.

- Two common language-preserving transformations are often applied to grammars to try to make them parsable by top-down methods. These are eliminating left recursion and left factoring.
- Eliminating left recursion:
- Replace

```
expr → expr + term
| term
```

by
```
expr → term expr'
expr' → + term expr'
| ε
```

- Replace

```
stmt → if ( expr ) stmt else stmt
| if (expr) stmt
| other
```

by
```
stmt → if ( expr ) stmt stmt'
| other
stmt' → else stmt
| ε
```

- Consider the following grammar G for Boolean expressions:
- What precedence and associativity does this grammar give to the operators
`and, or, not`

? - Compute FIRST and FOLLOW for each nonterminal in G.
- Transform G into an equivalent LL(1) grammar G'.
- Construct a predictive parsing table for G'.
- Show how your predictive parser processes the input string
- Draw the parse tree traced out by your parser.

```
B → B or T | T
T → T and F | F
F → not B | ( B ) | true | false
```

`true and not false or true`

- ALSU, Section 4.4.

aho@cs.columbia.edu