Programming Languages and Translators

Lecture 7: Parsing Context-Free Grammars

February 11, 2015

- Proving a language is not context free
- Closure and decidability results for context-free languages
- The parsing problem for context-free grammars
- Top-down parsing
- Transformations on grammars

- In Lecture 5, Section 5, we stated the pumping lemma for regular languages. This lemma can be used to prove certain languages are not regular.
- There is an analogous result for context-free languages, called the pumping lemma for context-free languages, that can be used to show certain languages are not context free.
- The pumping lemma for context-free languages says: If L is a context-free language,
then there exists a
constant
*n*such that if z is any string in L of length*n*or more, then z can be written as uvwxy subject to the following conditions: - The length of vwx is less than or equal to
*n*. - The length of vx is one or more. (That is, not both of v and x can be empty.)
- For all
*i*≥ 0, uv^{i}wx^{i}y is in L. - A typical proof using the pumping lemma to show a language L is not context free proceeds by assuming L is context free, and then finding a long string in L which, when pumped, yields a string not in L, thereby deriving a contradiction.
- Examples of non-context-free languages:
- {
`a`

|^{n}b^{n}c^{n}`n`

≥ 0 } - {
`ww`

|`w`

is in (a|b)* } - {
`a`

|^{m}b^{n}a^{m}b^{n}`n`

≥ 0 }

- Like the regular languages, the context-free languages are closed under the following operations:
- union
- reversal
- Kleene star
- homomorphism
- inverse homomorphism
- Unlike the regular languages, the context-free languages are not closed under intersection or complement.
- Decidability results
- Given a CFG G and a string w, it is decidable whether w is in
*L*(G). - Given a CFG G, it is decidable whether
*L*(G) is empty. - Given a CFG G, it is undecidable whether G is ambiguous.
- Given two CFGs G
_{1}and G_{2}, it is undecidable whether*L*(G_{1}) =*L*(G_{2}).

- The parsing problem for context-free grammars is given a CFG G and an input string w to construct all parse trees for w according to G, if w is in L(G).
- The Cocke-Younger-Kasami algorithm is a dynamic programming algorithm
that given a Chomsky Normal Form grammar G and an input string w will
create in O(|w|
^{3}) time a table from which all parse trees for w according to G can be constructed. - For compiler applications two styles of parsing algorithms are common: top-down parsing and bottom-up parsing.

- Top-down parsing consists of trying to construct a parse tree for an input string starting from the root and creating the nodes of the parse tree in preorder.
- Equivalently, top-down parsing consists of trying to find a leftmost derivation for the input string.
- Consider grammar
*G*:

```
S → + S S | * S S | a
```

```
S ⇒ + S S
⇒ + a S
⇒ + a * S S
⇒ + a * a S
⇒ + a * a a
```

- Recursive-descent parsing is a top-down method of syntax analysis in which a set of recursive procedures is used to process the input string.
- One procedure is associated with each nonterminal of the grammar. See Fig. 4.13, p. 219.
- The sequence of successful procedure calls defines the parse tree.

- A nonrecursive predictive parser uses an explicit stack.
- See Fig. 4.19, p. 227, for a model of table-driven predictive parser.
- Parsing table for
*G*:

```
Input Symbol
Nonterminal a + * $
S S → a S → +SS S → *SS
```

`+a*aa`

.
(The top of the stack is to the left.)```
Stack Input Output
S$ +a*aa$
+SS$ +a*aa$ S → +SS
SS$ a*aa$
aS$ a*aa$ S → a
S$ *aa$
*SS$ *aa$ S → *SS
SS$ aa$
aS$ aa$ S → a
S$ a$
a$ a$ S → a
$ $
```

- Two common language-preserving transformations are often applied to grammars to try to make them parsable by top-down methods. These are eliminating left recursion and left factoring.
- Eliminating left recursion:
- Replace

```
expr → expr + term
| term
```

by
```
expr → term expr'
expr' → + term expr'
| ε
```

- Replace

```
stmt → if ( expr ) stmt else stmt
| if (expr) stmt
| other
```

by
```
stmt → if ( expr ) stmt stmt'
| other
stmt' → else stmt
| ε
```

- Write down a CFG for regular expressions over the alphabet
{
`a`

,`b`

}. Show a parse tree for the regular expression`a | b*a`

. - Using the nonterminals
`stmt`

and`expr`

, design context-free grammar productions to model - C while-statements
- C for-statements
- C do-while statements
- Consider grammar
*G*: - What language does this grammar generate?
- Eliminate the left recursion from this grammar.
- Use the pumping lemma to show that
{
`a`

|^{n}b^{n}c^{n}`n`

≥ 0 } is not context free.

```
S → S S + | S S * | a
```

- ALSU, Sects. 2.4, 4.3, 4.4

aho@cs.columbia.edu