# COMS W3261 CS Theory Lecture 10: Pumping Lemma for CFL's

## 1. The Pumping Lemma for CFL's

• Analogous to the pumping lemma for regular languages, there is a pumping lemma for context-free languages. The pumping lemma for CFL's can be used to show certain languages are not context free.

• The pumping lemma for CFL's states that for every infinite context-free language L, there exists a constant n that depends on L such that for all sentences z in L of length n or more, we can write z as uvwxy where
1. |vwx| ≤ n,
2. vx ≠ ε (that is, v and x cannot both be empty), and
3. for all i ≥ 0, the string uviwxiy is in L.

• Outline of proof:
• The starting point is a Chomsky Normal Form grammar for L.
• An important property of a parse tree in a CNF grammar is that if the length of a longest path in a parse tree for a sentence in L is p, then the length of the sentence is at most 2p-1. This can be easily proven by induction on p. See HMU, Sect. 7.2.1, p. 280.
• Suppose a CNF grammar for L has m variables. Let n = 2m and consider a sentence z in L such that |z| ≥ n-1. From the observation above, a parse tree for z must have a path longer than m. Because the grammar has only m variables, this means there must be at least two identical variables on a longest path in that parse tree. Let
A0, A1, …, Ai, …, Aj, …, Ak
be the sequence of variables on this path where A0 = S and Ai = Aj is the last repeated variable on this path.
• This means the parse tree represents a derivation of the form
A0 ⇒* uAiy ⇒* uvAjxy ⇒* uvwxy
1. Since Ai is the last repeated variable along this path, the length of vwx must be less than or equal to n.
2. Since the grammar is in CNF, at least one of v and x must be nonempty.
3. Since Ai = Aj, the portion of the derivation Ai ⇒* vAjx can be repeated in a derivation zero or more times.
• For more details, see HMU, Sect. 7.2.2, pp. 281-282.

• One important use of the pumping lemma is to prove certain languages are not context free.
• Example: Let us use the pumping lemma to show that the language L = { anbncn | n ≥ 0 } is not context free.
• The proof will be by contradiction. Assume L is context free. Then by the pumping lemma there is a constant n associated with L such that for all z in L with |z| ≥ n, z can be written as uvwxy such that
1. |vwx| ≤ n,
2. vx ≠ ε, and
3. for all i ≥ 0, the string uviwxiy is in L.
• Consider the string z = anbncn.
• From condition (1), vwx cannot contain both a's and c's.
• Two cases arise:
1. vwx has no c's. But then uwy cannot be in L since at least one of v or x is nonempty.
2. vwx has no a's. Again similarly, uwy cannot be in L.
• In both cases we have a contradiction, so we must conclude L cannot be context free. The details of the proof can be found in HMU, p. 284.

• Note that the pumping lemma can be treated as an "adversarial game." See HMU, Sect. 7.2.3, p. 283.

## 2. Cocke-Younger-Kasami Algorithm for Testing Membership in a CFL

• The Cocke-Younger-Kasami algorithm can be used to determine whether a given input string w is generated by a given CFG G. It does so by determining whether a parse tree exists for w in G.
• Input: a Chomsky Normal Form CFG G = (V, T, P, S) and a string w = a1a2an in T*.
• Output: "yes" if w is in L(G), "no" otherwise.
• Method: The CYK algorithm is a dynamic programming algorithm that fills in the cells `Xij` of a triangular parsing table with nonterminals A iff A ⇒* aiai+1 ... aj.
• ``````
for i = 1 to n do
if A → ai is in P then
add A to Xii
fill in the table, row-by-row, from row 2 to row n
fill in the cells in each row from left-to-right
if (A → BC is in P) and for some i ≤ k < j
(B is in Xik) and (C is in Xk+1,j) then
add A to Xij
if S is in X1n then
output "yes"
else
output "no"
``````
• The algorithm starts by adding a nonterminal A to the cell `Xii` in the bottom row of the table if there is a production Aa in P where a = ai for 1 ≤ `i``n`.
• Then going up the table a row at a time, the algorithm adds the nonterminal A to the cell `Xij` iff there is a production ABC in P and for some `k` between `i``k` < `j`, B is in `Xik` and C is in `Xk+1,j`. Since B is in `Xik`, we already know B ⇒* aiai+1ak, and since C is in `Xk+1,j`, we already know C ⇒* ak+1ak+2aj. This means we now know there is a derivation A ⇒* aiai+1aj.
• Note that to compute entry `Xij`, the algorithm examines at most n pairs of entries: (`Xii`, `Xi+1,j`), (`Xi,i+1`, `Xi+2,j`), and so on up to pair (`Xi,j-1`, `Xj,j`).
• When the parsing table has been filled in, the algorithm outputs "yes" if S is in cell `X1n`; "no" otherwise.
• The running time of the algorithm is O(n3).
• See HMU, p. 306, for a worked-out example of the CYK algorithm.
• The Cocke-Younger-Kasami algorithm can be extended to construct from the parsing table all parse trees for the input string w if w is in L(G).

## 3. Practice Problems

1. Show that the language { ww | ww is a string of a's and b's } is not context free.
2. Show that the language { anbnci | n ≥ 0 and in } is not context free.
3. Show that the language { wwRw | w is a string of a's and b's } is not context free.
4. Construct the CYK parsing table for the input string `baaba` using the following CNF grammar:
• ``````S → AB | BC
A → BA | a
B → CC | b
C → AB | a
``````
5. Construct a parse tree for the input string `baaba` from the parsing table created in problem (4).

## 4. Reading

• HMU: Ch. 7

aho@cs.columbia.edu
verma@cs.columbia.edu