# COMS W3261 Computer Science Theory Lecture 10: October 8, 2012 Properties of Context-Free Languages

## Outline

• Eliminating useless symbols
• Eliminating ε-productions
• Eliminating unit productions
• Chomsky normal form
• Pumping lemma for CFL's
• Cocke-Younger-Kasami algorithm

## 1. Eliminating Useless Symbols from a CFG

• A symbol X is useful for a CFG if there is a derivation of the form S ⇒* αXβ ⇒* w for some string of terminals w.
• If X is not useful, then we say X is useless.
• To be useful, a symbol X needs to be
1. generating; that is, X needs to be able to derive some string of terminals.
2. reachable; that is, there needs to be a derivation of the form S ⇒* αXβ where α and β are strings of nonterminals and terminals.
• To eliminate useless symbols from a grammar, we
1. identify the nongenerating symbols and eliminate all productions containing one or more of these symbols, and then
2. eliminate all productions containing symbols that are not reachable from the start symbol.
• In the grammar
• ``````S → AB | a
A → b``````
`S`, `A`, `a`, and `b` are generating. `B` is not generating.
Eliminating the productions containing the nongenerating symbols we get
``````S → a
A → b``````
Now we see `A` is not reachable from `S`, so we can eliminate the second production to get
``S → a``
• The generating symbols can be computed inductively bottom-up from the set of terminal symbols.
• The reachable symbols can be computed inductively starting from `S`.

## 2. Eliminating ε-productions from a CFG

• If a language L has a CFG, then L - { ε } has a CFG without any ε-productions.
• A nonterminal A in a grammar is nullable if A ⇒* ε.
• The nullable nonterminals can be determined iteratively.
• We can eliminate all ε-productions in a grammar as follows:
• Eliminate all productions with ε bodies.
• Suppose A → X1X2 ... Xk is a production and m of the k Xi's are nullable. Then add the 2m versions of this production where the nullable Xi's are present or absent. (But if all symbols are nullable, do not add an ε-production.)
• Let us eliminate the ε-productions from the grammar G
• ``````S → AB
A → aAA | ε
B → bBB | ε``````
S, A and B are nullable.
For the production `S → AB` we add the productions `S → A | B`
For the production `A → aAA` we add the productions `A → aA | a`
For the production `B → bBB` we add the productions `B → bB | b`
The resulting grammar H with no ε-productions is
``````S → AB | A | B
A → aAA | aA | a
B → bBB | bB | b``````
We can prove that L(H) = L(G) - { ε }.

## 3. Eliminating Unit Productions from a CFG

• A unit production is one of the form `A → B` where both `A` and `B` are nonterminals.
• Let us assume we are given a grammar G with no ε-productions.
• From G we can create an equivalent grammar H with no unit productions as follows.
• Define (A, B) to be a unit pair if A ⇒* B in G.
• We can inductively construct all unit pairs for G.
• For each unit pair (A, B) in G, we add to H the productions A → α where B → α is a nonunit production of G.
• Consider the standard grammar G for arithmetic expressions:
• ``````E → E + T | T
T → T * F | F
F → ( E ) | a``````
The unit pairs are `(E,E), (E,T), (E,F), (T,T), (T,F), (F,F)`.
The equivalent grammar H with no unit productions is:
``````E → E + T | T * F | ( E ) | a
T → T * F | ( E ) | a
F → ( E ) | a``````

## 4. Putting a CFG into Chomsky Normal Form

• A grammar G is in Chomsky Normal Form if each production in G is one of two forms:
1. A → BC where A, B, and C are nonterminals, or
2. A → a where a is a terminal.
• Every context-free language without ε can be generated by a Chomsky Normal Form grammar.
• Let us assume we have a CFG G with no useless symbols, ε-productions, or unit productions. We can transform G into an equivalent Chomsky Normal Form grammar as follows:
• Arrange that all bodies of length two or more consist only of nonterminals.
• Replace bodies of length three or more with a cascade of productions, each with a body of two nonterminals.
• Applying these two transformations to the grammar H above, we get:
• ``````E → EA | TB | LC | a
A → PT
P → +
B → MF
M → *
L → (
C → ER
R → )
T → TB | LC | a
F → LC | a``````

## 5. Pumping Lemma for CFL's

• For every nonfinite context-free language L, there exists a constant n that depends on L such that for all z in L with |z| ≥ n, we can write z as uvwxy where
1. vx ≠ ε,
2. |vwx| ≤ n, and
3. for all i ≥ 0, the string uviwxiy is in L.
• Proof: See HMU, pp. 281-282.
• One important use of the pumping lemma is to prove certain languages are not context free.
• Example: The language L = { anbncn | n ≥ 0 } is not context free.
• The proof will be by contradiction. Assume L is context free. Then by the pumping lemma there is a constant n associated with L such that for all z in L with |z| ≥ n, z can be written as uvwxy such that
1. vx ≠ ε,
2. |vwx| ≤ n, and
3. for all i ≥ 0, the string uviwxiy is in L.
• Consider the string z = anbncn.
• From condition (2), vwx cannot contain both a's and c's.
• Two cases arise:
1. vwx has no c's. But then uwy cannot be in L since at least one of v or x is nonempty.
2. vwx has no a's. Again, uwy cannot be in L.
• In both cases we have a contradiction, so we must conclude L cannot be context free. The details of the proof can be found in HMU, p. 284.

## 6. Cocke-Younger-Kasami Algorithm for Testing Membership in a CFL

• Input: a Chomsky normal form CFG G = (V, T, P, S) and a string w = a1a2 ... an in T*.
• Output: "yes" if w is in L(G), "no" otherwise.
• Method: The CYK algorithm is a dynamic programming algorithm that fills in a triangular table `Xij` with nonterminals A such that A ⇒* aiai+1 ... aj.
• ``````
for i = 1 to n do
if A → ai is in P then
fill in the table, row-by-row, from row 2 to row n
fill in the cells in each row from left-to-right
if (A → BC is in P) and for some i ≤ k < j
(B is in Xik) and (C is in Xk+1,j) then
if S is in X1n then
output "yes"
else
output "no"
``````
• The algorithm adds nonterminal A to `Xij` iff there is a production A → BC in P where B ⇒* aiai+1 ... ak and C ⇒* ak+1ak+2 ... aj.
• To compute entry `Xij`, we examine at most n pairs of entries: (`Xii`, `Xi+1,j`), (`Xi,i+1`, `Xi+2,j`), and so on until (`Xi,j-1`, `Xj,j`).
• The running time of the CYK algorithm is O(n3).

## 7. Practice Problems

1. Eliminate useless symbols from the following grammar:
2. ``````S → AB | CA
A → a
B → BC | AB
C → aB | b
``````
3. Put the following grammar into Chomsky Normal Form:
4. ``````S → ASB | ε
A → aAS | a
B → BbS | A | bb
C → aB | b
``````
5. Show that { anbncn | n ≥ 0 } is not context free.
6. Show that { anbnci | in } is not context free.
7. Show that { ssRs | s is a string of a's and b's } is not context free.
8. (Hard) Show that the complement of { ss | ss is a string of a's and b's } is context free.