Programming Languages and Translators

Lecture 9: Bottom-Up Parsing

February 18, 2015

- Bottom-up parsing
- LR(1) parsing
- Constructing a simple LR(1) parser
- DFA for viable prefixes

- Bottom-up parsing can be viewed as trying to find a rightmost derivation in reverse for an input string.
- A
*handle*is a rightmost substring in right-sentential form that matches the body of a production such that replacing that right-hand side by the nonterminal on the left-hand side of that production represents one step in the reverse of the rightmost derivation. - Consider the grammar
*G*:

```
(1) S → S ( S )
(2) S → ε
```

`( ) ( )`

:```
S ⇒ S ( S ) // handle is S ( S )
⇒ S ( ) // handle is the empty string between ( )
⇒ S ( S ) ( ) // handle is S ( S )
⇒ S ( ) ( ) // handle is the empty string between first ( )
⇒ ( ) ( ) // handle is the empty string prefix
```

- Model of an LR(1) parser (Fig. 4.35).
- "L" means left-to-right scanning of the input, the "R" means constructing a rightmost derivation in reverse, and the "1" means one symbol of lookahead in making parsing decisions.
- LR parsing table for
*G*: - The entry
`action[1,(] = s2`

means when in state 1 scanning the token '(', shift the token '(' on top of the stack and then push state 2 on top of the stack. The input pointer is advanced to the next token. - The entry
`action[0,(] = r2`

means when in state 0 scanning the token '(', reduce the handle consisting of the empty string on top of the stack according to production (2)`S → ε`

. Put the nonterminal`S`

on the left side of the production on top of the stack and then put state 1 =`goto[0,S]`

on top of the stack. The input pointer is not advanced. - The entry
`action[4,(] = r1`

means when in state 4 scanning the token '(', reduce the handle consisting of the string`S(S)`

on top of the stack according to production (1)`S → S(S)`

. The handle and the intervening states are removed. The nonterminal`S`

on the left side of the production is then put on top of the stack and then state 1 =`goto[0,S]`

is put on top of the stack. The input pointer is not advanced. `acc`

means accept and stop parsing.- A blank entry means report a syntax error and stop parsing.
- Moves made by an LR(1) parser on input
`( ) ( )`

[Alg. 4.44].

` state `

`action`

`goto`

` ( `

` ) `

` $ `

` S `

`0`

`r2`

`r2`

`r2`

`1`

`1`

`s2`

` `

`acc`

` `

`2`

`r2`

`r2`

`r2`

`3`

`3`

`s2`

`s4`

` `

` `

`4`

`r1`

`r1`

`r1`

` `

```
```**Stack Input Action**
0 ()()$ reduce by (2) S → ε and push state 1 on stack
0S1 ()()$ shift ( on stack; push state 2 on stack
0S1(2 )()$ reduce by (2) S → ε and push state 3
0S1(2S3 )()$ shift ) and push state 4
0S1(2S3)4 ()$ reduce by (1) S → S(S) and push state 1
0S1 ()$ shift ( and push state 2
0S1(2 )$ reduce by (2) S → ε and push state 3
0S1(2S3 )$ shift ) and push state 4
0S1(2S3)4 $ reduce by (1) S → S(S) and push state 1
0S1 $ accept

- An
*LR*(0)*item*of a grammar is a production of the grammar with a dot at some position of the right side. E.g.,`S → ·S(S)`

,`S → S·(S)`

, or`S → S(S)·`

. - We will use two functions to construct the sets of items for a grammar:
*closure*(*I*), where*I*is a set of items, is the set of items constructed by the following two rules:- Initially, put every item in
*I*into*closure*(*I*). - If
*A*→ α·*B*β is in*closure*(*I*) and*B → γ*is a production, then add the item*B*→ ·γ to*closure*(*I*) if it is not already there. Keep repeating this step until no more new items can be added to*I*. *goto*(*I*,*X*), where*I*is a set of items and*X*is a grammar symbol, is the closure of the set of all items*A*→ α*X*·β where*A*→ α·*X*β is in*I*.- An
*augmented*grammar*G'*is one to which we have added a new starting production*S'*→*S*where*S*is the start symbol of the given grammar*G*. Reducing by the new starting production signals acceptance of the input string being parsed. We will always augment a grammar when we construct an SLR parsing table for it. - The sets-of-items construction
- Input: An augmented grammar
*G'*. - Output:
*C*, the canonical collection of sets of LR(0) items for*G'*. - Method:

`I`

_{0}= closure({[S' → ·S]}); C = {I_{0}}; repeat for each set of items I in C and grammar symbol X such that goto(I,X) is not empty and not in C do add goto(I,X) to C; until no more sets of items can be added to C; - Input: An augmented grammar

```
S' → S
S → S(S)
S → ε
```

```
I
```_{0}: S' → ·S
S → ·S(S)
S → ·
I_{1}: S' → S·
S → S·(S)
I_{2}: S → S(·S)
S → ·S(S)
S → ·
I_{3}: S → S(S·)
S → S·(S)
I_{4}: S' → S(S)·

`C`

,
the canonical collection of sets of LR(0) items for an augmented grammar
- Input:
`C = {I`

}._{0}, I_{1}, ... , I_{n} - Output: The SLR parsing table functions
`action`

and`goto`

. - Method:
- State
`i`

and its`action`

and`goto`

functions are constructed from`I`

as follows:_{i} - If item [
`A→ α·aβ`

] is in`I`

and_{i}`goto(I`

, then add "_{i}, a) = I_{j}`shift j`

" to`action[i, a]`

. Here`a`

is a terminal. - If item [
`A → α·`

] is in`I`

, then add "_{i}`reduce A → α`

" to`action[i, a]`

for all`a`

in FOLLOW(`A`

). Here`A`

cannot be`S'`

. - If item [
`S' → S·`

] is in`I`

, then add "_{i}`accept`

" to`action[i, $]`

. - If
`goto(I`

, then in the parsing table set_{i}, A) = I_{j}`goto[i, A] = j`

. - The initial state of the parser is constructed from the set of items
containing [
`S' → ·S`

]. - Notes:
- If each parsing table entry has at most one action, then the grammar
is said to be
*SLR(1)*. If any entry has more than one action, then the algorithm fails to produce a parser. - All undefined entries are made
`error`

. - Example: the LR parsing table above is an SLR(1) parsing table for the balanced-parentheses grammar.

- A
*viable prefix*is a prefix of a right sentential form that does not continue past the right end of the rightmost handle of that sentential form. - The shift and goto functions of the canonical collection of sets
of LR(0) items for a grammar
*G*define a DFA that recognizes the viable prefixes of*G*. - An item [
`A → β·γ`

] is*valid*for a viable prefix`αβ`

if there is a rightmost derivation from`S'`

to`αAw`

to`αβγw`

.

- Consider the following grammar G:
- Construct a rightmost derivation and parse tree for the input string
`aaa*+$`

. - Show the handle in each sentential form in the derivation.
- Construct the canonical collection of sets of LR(0) items for the augmented grammar.
- Construct an SLR(1) parsing table for G.
- Show how your SLR(1) parser processes the input string
`aaa*+$`

.

```
(1) S → S S +
(2) S → S S *
(3) S → a
```

- ALSU, Sects. 4.5, 4.6.

aho@cs.columbia.edu