COMS W4115
Programming Languages and Translators
Lecture 7: Parsing Context-Free Grammars
February 11, 2015

Outline

Proving a language is not context free
Closure and decidability results for context-free languages
The parsing problem for context-free grammars
Top-down parsing
Transformations on grammars

1. Proving a Language is Not Context Free

In Lecture 5, Section 5, we stated the pumping lemma for regular languages. This lemma can be used to prove certain languages are not regular.
There is an analogous result for context-free languages, called the pumping lemma for context-free languages, that can be used to show certain languages are not context free.
The pumping lemma for context-free languages says: If L is a context-free language, then there exists a constant n such that if z is any string in L of length n or more, then z can be written as uvwxy subject to the following conditions:

The length of vwx is less than or equal to n.
The length of vx is one or more. (That is, not both of v and x can be empty.)
For all i ≥ 0, uvⁱwxⁱy is in L.

A typical proof using the pumping lemma to show a language L is not context free proceeds by assuming L is context free, and then finding a long string in L which, when pumped, yields a string not in L, thereby deriving a contradiction.
Examples of non-context-free languages:

{aⁿbⁿcⁿ | n ≥ 0 }
{ww | w is in (a|b)* }
{a^mbⁿa^mbⁿ | n ≥ 0 }

2. Closure and Decidability Results for Context-Free Languages

Like the regular languages, the context-free languages are closed under the following operations:

union
reversal
Kleene star
homomorphism
inverse homomorphism

Unlike the regular languages, the context-free languages are not closed under intersection or complement.
Decidability results

Given a CFG G and a string w, it is decidable whether w is in L(G).
Given a CFG G, it is decidable whether L(G) is empty.
Given a CFG G, it is undecidable whether G is ambiguous.
Given two CFGs G₁ and G₂, it is undecidable whether L(G₁) = L(G₂).

3. The Parsing Problem for Context-Free Grammars

The parsing problem for context-free grammars is given a CFG G and an input string w to construct all parse trees for w according to G, if w is in L(G).
The Cocke-Younger-Kasami algorithm is a dynamic programming algorithm that given a Chomsky Normal Form grammar G and an input string w will create in O(|w|³) time a table from which all parse trees for w according to G can be constructed.
For compiler applications two styles of parsing algorithms are common: top-down parsing and bottom-up parsing.

4. Top-Down Parsing

Top-down parsing consists of trying to construct a parse tree for an input string starting from the root and creating the nodes of the parse tree in preorder.
Equivalently, top-down parsing consists of trying to find a leftmost derivation for the input string.
Consider grammar G:


      S → + S S | * S S | a

Leftmost derivation for + a * a a:


      S ⇒ + S S
        ⇒ + a S
        ⇒ + a * S S
        ⇒ + a * a S
        ⇒ + a * a a

Recursive-descent parsing

Recursive-descent parsing is a top-down method of syntax analysis in which a set of recursive procedures is used to process the input string.
One procedure is associated with each nonterminal of the grammar. See Fig. 4.13, p. 219.
The sequence of successful procedure calls defines the parse tree.

Nonrecursive predictive parsing

A nonrecursive predictive parser uses an explicit stack.
See Fig. 4.19, p. 227, for a model of table-driven predictive parser.
Parsing table for G:


                        Input Symbol
   Nonterminal    a        +          *        $

       S        S → a    S → +SS    S → *SS

Moves made by this predictive parser on input +a*aa. (The top of the stack is to the left.)


       Stack        Input   Output
        S$         +a*aa$
        +SS$       +a*aa$   S → +SS
        SS$         a*aa$
        aS$         a*aa$   S → a
        S$           *aa$
        *SS$         *aa$   S → *SS
        SS$           aa$
        aS$           aa$   S → a
        S$             a$
        a$             a$   S → a
        $               $

Note that these moves trace out a leftmost derivation for the input.

5. Transformations on Grammars

Two common language-preserving transformations are often applied to grammars to try to make them parsable by top-down methods. These are eliminating left recursion and left factoring.
Eliminating left recursion:

Replace


      expr →  expr + term
           |  term


      expr  →  term expr'

      expr' →  + term expr'
            |  ε

Left factoring:

Replace


      stmt → if ( expr ) stmt else stmt
           | if (expr) stmt
           | other


      stmt  → if ( expr ) stmt stmt'
            | other

      stmt' → else stmt
            | ε

6. Practice Problems

Write down a CFG for regular expressions over the alphabet {a, b}. Show a parse tree for the regular expression a | b*a.
Using the nonterminals stmt and expr, design context-free grammar productions to model

C while-statements
C for-statements
C do-while statements

Consider grammar G:


      S →  S S + | S S * | a

What language does this grammar generate?
Eliminate the left recursion from this grammar.

Use the pumping lemma to show that {aⁿbⁿcⁿ | n ≥ 0 } is not context free.

7. Reading

ALSU, Sects. 2.4, 4.3, 4.4

aho@cs.columbia.edu

COMS W4115 Programming Languages and Translators Lecture 7: Parsing Context-Free Grammars February 11, 2015

Outline

1. Proving a Language is Not Context Free

2. Closure and Decidability Results for Context-Free Languages

3. The Parsing Problem for Context-Free Grammars

4. Top-Down Parsing

5. Transformations on Grammars

6. Practice Problems

7. Reading

COMS W4115
Programming Languages and Translators
Lecture 7: Parsing Context-Free Grammars
February 11, 2015