Lecture 10: Pumping Lemma for CFL's

- Analogous to the pumping lemma for regular languages, there is a pumping lemma for context-free languages. The pumping lemma for CFL's can be used to show certain languages are not context free.
- The pumping lemma for CFL's states that
for every infinite context-free language
*L*, there exists a constant*n*that depends on*L*such that for all sentences*z*in*L*of length*n*or more, we can write*z*as*uvwxy*where - |
*vwx*| ≤*n*, *vx*≠ ε (that is,*v*and*x*cannot both be empty), and- for all
*i*≥ 0, the string*uv*is in^{i}wx^{i}y*L*. - Outline of proof:
- The starting point is a Chomsky Normal Form grammar for
*L*. - An important property of a parse tree in a CNF grammar
is that if the length of a longest path in a parse tree for a sentence
in
*L*is*p*, then the length of the sentence is at most 2^{p-1}. This can be easily proven by induction on*p*. See HMU, Sect. 7.2.1, p. 280. - Suppose a CNF grammar for
*L*has*m*variables. Let*n*= 2^{m}and consider a sentence*z*in*L*such that |*z*| ≥*n*-1. From the observation above, a parse tree for*z*must have a path longer than*m*. Because the grammar has only*m*variables, this means there must be at least two identical variables on a longest path in that parse tree. Let*A*_{0},*A*_{1}, …,*A*, …,_{i}*A*, …,_{j}*A*_{k}

*A*_{0}=*S*and*A*=_{i}*A*is the last repeated variable on this path._{j} - This means the parse tree represents a derivation of the form
*A*_{0}⇒**uA*⇒*_{i}y*uvA*⇒*_{j}xy*uvwxy*

- Since
*A*is the last repeated variable along this path, the length of_{i}*vwx*must be less than or equal to*n*. - Since the grammar is in CNF, at least one of
*v*and*x*must be nonempty. - Since
*A*=_{i}*A*, the portion of the derivation_{j}*A*⇒*_{i}*vA*can be repeated in a derivation zero or more times._{j}x

- For more details, see HMU, Sect. 7.2.2, pp. 281-282.
- One important use of the pumping lemma is to prove certain languages are not context free.
- Example: Let us use the pumping lemma to show that the language
*L*= {*a*|^{n}b^{n}c^{n}*n*≥ 0 } is not context free. - The proof will be by contradiction. Assume
*L*is context free. Then by the pumping lemma there is a constant*n*associated with*L*such that for all*z*in*L*with |*z*| ≥*n*,*z*can be written as*uvwxy*such that - |
*vwx*| ≤*n*, *vx*≠ ε, and- for all
*i*≥ 0, the string*uv*is in^{i}wx^{i}y*L*. - Consider the string
*z*=*a*.^{n}b^{n}c^{n} - From condition (1),
*vwx*cannot contain both*a*'s and*c*'s. - Two cases arise:
*vwx*has no*c*'s. But then*uwy*cannot be in*L*since at least one of*v*or*x*is nonempty.*vwx*has no*a*'s. Again similarly,*uwy*cannot be in*L*.- In both cases we have a contradiction, so we must conclude
*L*cannot be context free. The details of the proof can be found in HMU, p. 284. - Note that the pumping lemma can be treated as an "adversarial game." See HMU, Sect. 7.2.3, p. 283.

- The Cocke-Younger-Kasami algorithm can be used to determine whether
a given input string
*w*is generated by a given CFG*G*. It does so by determining whether a parse tree exists for*w*in*G*. - Input: a Chomsky Normal Form CFG
*G*= (*V*,*T*,*P*,*S*) and a string*w*=*a*_{1}*a*_{2}…*a*_{n}in*T**. - Output: "yes" if
*w*is in L(*G*), "no" otherwise. - Method: The CYK algorithm is a dynamic programming algorithm that fills in
the cells
`X`

of a triangular parsing table with nonterminals_{ij}*A*iff*A*⇒**a*_{i}*a*_{i+1}...*a*_{j}.

```
for i = 1 to n do
if A → a
```_{i} is in P then
add A to X_{ii}
fill in the table, row-by-row, from row 2 to row n
fill in the cells in each row from left-to-right
if (A → BC is in P) and for some i ≤ k < j
(B is in X_{ik}) and (C is in X_{k+1,j}) then
add A to X_{ij}
if S is in X_{1n} then
output "yes"
else
output "no"

`X`_{ii}

in the bottom row of the table if
there is a production `i`

≤ `n`

.`X`_{ij}

iff there is a
production `k`

between
`i`

≤ `k`

< `j`

,
`X`_{ik}

and
`X`_{k+1,j}

.
Since `X`_{ik}

, we already know `X`_{k+1,j}

, we already know `X`_{ij}

, the algorithm examines at most
`X`_{ii}

, `X`_{i+1,j}

),
(`X`_{i,i+1}

, `X`_{i+2,j}

),
and so on up to pair
(`X`_{i,j-1}

, `X`_{j,j}

).`X`_{1n}

;
"no" otherwise.- Show that the language
{
*ww*|*ww*is a string of*a*'s and*b*'s } is not context free. - Show that the language
{
*a*|^{n}b^{n}c^{i}*n*≥ 0 and*i*≤*n*} is not context free. - Show that the language
{
*ww*^{R}*w*|*w*is a string of*a*'s and*b*'s } is not context free. - Construct the CYK parsing table for the input string
`baaba`

using the following CNF grammar: - Construct a parse tree for the input string
`baaba`

from the parsing table created in problem (4).

```
S → AB | BC
A → BA | a
B → CC | b
C → AB | a
```

- HMU: Ch. 7

aho@cs.columbia.edu verma@cs.columbia.edu