# COMS W4115 Programming Languages and Translators Lecture 7: Parsing Context-Free Grammars February 11, 2015

## Outline

1. Proving a language is not context free
2. Closure and decidability results for context-free languages
3. The parsing problem for context-free grammars
4. Top-down parsing
5. Transformations on grammars

## 1. Proving a Language is Not Context Free

• In Lecture 5, Section 5, we stated the pumping lemma for regular languages. This lemma can be used to prove certain languages are not regular.
• There is an analogous result for context-free languages, called the pumping lemma for context-free languages, that can be used to show certain languages are not context free.
• The pumping lemma for context-free languages says: If L is a context-free language, then there exists a constant n such that if z is any string in L of length n or more, then z can be written as uvwxy subject to the following conditions:
1. The length of vwx is less than or equal to n.
2. The length of vx is one or more. (That is, not both of v and x can be empty.)
3. For all i ≥ 0, uviwxiy is in L.
• A typical proof using the pumping lemma to show a language L is not context free proceeds by assuming L is context free, and then finding a long string in L which, when pumped, yields a string not in L, thereby deriving a contradiction.
• Examples of non-context-free languages:
• {`anbncn` | `n` ≥ 0 }
• {`ww` | `w` is in (a|b)* }
• {`ambnambn` | `n` ≥ 0 }

## 2. Closure and Decidability Results for Context-Free Languages

• Like the regular languages, the context-free languages are closed under the following operations:
• union
• reversal
• Kleene star
• homomorphism
• inverse homomorphism
• Unlike the regular languages, the context-free languages are not closed under intersection or complement.
• Decidability results
• Given a CFG G and a string w, it is decidable whether w is in L(G).
• Given a CFG G, it is decidable whether L(G) is empty.
• Given a CFG G, it is undecidable whether G is ambiguous.
• Given two CFGs G1 and G2, it is undecidable whether L(G1) = L(G2).

## 3. The Parsing Problem for Context-Free Grammars

• The parsing problem for context-free grammars is given a CFG G and an input string w to construct all parse trees for w according to G, if w is in L(G).
• The Cocke-Younger-Kasami algorithm is a dynamic programming algorithm that given a Chomsky Normal Form grammar G and an input string w will create in O(|w|3) time a table from which all parse trees for w according to G can be constructed.
• For compiler applications two styles of parsing algorithms are common: top-down parsing and bottom-up parsing.

## 4. Top-Down Parsing

• Top-down parsing consists of trying to construct a parse tree for an input string starting from the root and creating the nodes of the parse tree in preorder.
• Equivalently, top-down parsing consists of trying to find a leftmost derivation for the input string.
• Consider grammar G:
• ``````
S → + S S | * S S | a
``````
• Leftmost derivation for + a * a a:
• ``````
S ⇒ + S S
⇒ + a S
⇒ + a * S S
⇒ + a * a S
⇒ + a * a a
``````
• Recursive-descent parsing
• Recursive-descent parsing is a top-down method of syntax analysis in which a set of recursive procedures is used to process the input string.
• One procedure is associated with each nonterminal of the grammar. See Fig. 4.13, p. 219.
• The sequence of successful procedure calls defines the parse tree.
• Nonrecursive predictive parsing
• A nonrecursive predictive parser uses an explicit stack.
• See Fig. 4.19, p. 227, for a model of table-driven predictive parser.
• Parsing table for G:
• ``````
Input Symbol
Nonterminal    a        +          *        \$

S        S → a    S → +SS    S → *SS
``````
• Moves made by this predictive parser on input `+a*aa`. (The top of the stack is to the left.)
• ``````
Stack        Input   Output
S\$         +a*aa\$
+SS\$       +a*aa\$   S → +SS
SS\$         a*aa\$
aS\$         a*aa\$   S → a
S\$           *aa\$
*SS\$         *aa\$   S → *SS
SS\$           aa\$
aS\$           aa\$   S → a
S\$             a\$
a\$             a\$   S → a
\$               \$
``````
• Note that these moves trace out a leftmost derivation for the input.

## 5. Transformations on Grammars

• Two common language-preserving transformations are often applied to grammars to try to make them parsable by top-down methods. These are eliminating left recursion and left factoring.
• Eliminating left recursion:
• Replace
• ``````
expr →  expr + term
|  term
``````
by
``````
expr  →  term expr'

expr' →  + term expr'
|  ε
``````
• Left factoring:
• Replace
• ``````
stmt → if ( expr ) stmt else stmt
| if (expr) stmt
| other
``````
by
``````
stmt  → if ( expr ) stmt stmt'
| other

stmt' → else stmt
| ε
``````

## 6. Practice Problems

1. Write down a CFG for regular expressions over the alphabet {`a`, `b`}. Show a parse tree for the regular expression `a | b*a`.
2. Using the nonterminals `stmt` and `expr`, design context-free grammar productions to model
1. C while-statements
2. C for-statements
3. C do-while statements
3. Consider grammar G:
4. ``````
S →  S S + | S S * | a
``````
1. What language does this grammar generate?
2. Eliminate the left recursion from this grammar.
5. Use the pumping lemma to show that {`anbncn` | `n` ≥ 0 } is not context free.