# COMS W3261 Computer Science Theory Lecture 9: October 3, 2012 CFGs and PDAs

## Outline

• From a CFG to a PDA
• From a PDA to a CFG
• Eliminating useless symbols
• Eliminating ε-productions
• Eliminating unit productions
• Chomsky normal form

## 1. From a CFG to an equivalent PDA

• Given a CFG G, we can construct a PDA P such that N(P) = L(G).
• The PDA will simulate leftmost derivations of G.
• Algorithm to construct a PDA for a CFG
• Input: a CFG G = (V, T, Q, S).
• Output: a PDA P such that N(P) = L(G).
• Method: Let P = ({q}, T, V ∪ T, δ, q, S) where
1. δ(q, ε, A) = {(q, β) | A → β is in Q } for each nonterminal A in V.
2. δ(q, a, a) = {(q, ε)} for each terminal a in T.
• For a given input string w, the PDA simulates a leftmost derivation for w in G.
• We can prove that N(P) = L(G) by showing that w is in N(P) iff w is in L(G):
• If part: If w is in L(G), then there is a leftmost derivation
```S = γ1 ⇒ γ2 ⇒ ... ⇒ γn = w
```
We show by induction on i that P simulates this leftmost derivation by the sequence of moves
(q, w, S) |–* (q, yi, αi)
such that if γi = xiαi, then xiyi = w.
• Only-if part: If (q, x, A) |–* (q, ε, ε), then A ⇒* x.
• We can prove this statement by induction on the number of moves made by P.

## 2. From a PDA to an equivalent CFG

• Given a PDA P, we can construct a CFG G such that L(G) = N(P).
• The basic idea of the proof is to generate the strings that cause P to go from state q to state p, popping a symbol X off the stack, by a nonterminal of the form [qXp].
• Algorithm to construct a CFG for a PDA
• Input: a PDA P = (Q, Σ, Γ, δ, q0, Z0, F).
• Output: a CFG G = (V, Σ, R, S) such that L(G) = N(P).
• Method:
1. Let the nonterminal S be the start symbol of G. The other nonterminals in V will be symbols of the form [pXq] where p and q are states in Q, and X is a stack symbol in Γ.
2. The set of productions R is constructed as follows:
• For all states p, R has the production S → [q0Z0p].
• If δ(q, a, X) contains (r, Y1Y2 … Yk), then R has the productions
[qXrk] → a[rY1r1] [r1Y2r2] … [rk-1Ykrk]
for all lists of states r1, r2, … , rk.
• We can prove that [qXp] ⇒* w iff (q, w, X) |–* (p, ε, ε).
• From this, we have [q0Z0p] ⇒* w iff (q0, w, Z0) |–* (p, ε, ε), so we can conclude L(G) = N(P).

## 3. Eliminating Useless Symbols from a CFG

• A symbol X is useful for a CFG if there is a derivation of the form S ⇒* αXβ ⇒* w for some string of terminals w.
• If X is not useful, then we say X is useless.
• To be useful, a symbol X needs to be
1. generating; that is, X needs to be able to derive some string of terminals.
2. reachable; that is, there needs to be a derivation of the form S ⇒* αXβ where α and β are strings of nonterminals and terminals.
• To eliminate useless symbols from a grammar, we
1. identify the nongenerating symbols and eliminate all productions containing one or more of these symbols, and then
2. eliminate all productions containing symbols that are not reachable from the start symbol.
• In the grammar
• ``````S → AB | a
A → b``````
`S`, `A`, `a`, and `b` are generating. `B` is not generating.
Eliminating the productions containing the nongenerating symbols we get
``````S → a
A → b``````
Now we see `A` is not reachable from `S`, so we can eliminate the second production to get
``S → a``
• The generating symbols can be computed inductively bottom-up from the set of terminal symbols.
• The reachable symbols can be computed inductively starting from `S`.

## 4. Eliminating ε-productions from a CFG

• If a language L has a CFG, then L - { ε } has a CFG without any ε-productions.
• A nonterminal A in a grammar is nullable if A ⇒* ε.
• The nullable nonterminals can be determined iteratively.
• We can eliminate all ε-productions in a grammar as follows:
• Eliminate all productions with ε bodies.
• Suppose A → X1X2 ... Xk is a production and m of the k Xi's are nullable. Then add the 2m versions of this production where the nullable Xi's are present or absent. (But if all symbols are nullable, do not add an ε-production.)
• Let us eliminate the ε-productions from the grammar G
• ``````S → AB
A → aAA | ε
B → bBB | ε``````
S, A and B are nullable.
For the production `S → AB` we add the productions `S → A | B`
For the production `A → aAA` we add the productions `A → aA | a`
For the production `B → bBB` we add the productions `B → bB | b`
The resulting grammar H with no ε-productions is
``````S → AB | A | B
A → aAA | aA | a
B → bBB | bB | b``````
We can prove that L(H) = L(G) - { ε }.

## 5. Eliminating Unit Productions from a CFG

• A unit production is one of the form `A → B` where both `A` and `B` are nonterminals.
• Let us assume we are given a grammar G with no ε-productions.
• From G we can create an equivalent grammar H with no unit productions as follows.
• Define (A, B) to be a unit pair if A ⇒* B in G.
• We can inductively construct all unit pairs for G.
• For each unit pair (A, B) in G, we add to H the productions A → α where B → α is a nonunit production of G.
• Consider the standard grammar G for arithmetic expressions:
• ``````E → E + T | T
T → T * F | F
F → ( E ) | a``````
The unit pairs are `(E,E), (E,T), (E,F), (T,T), (T,F), (F,F)`.
The equivalent grammar H with no unit productions is:
``````E → E + T | T * F | ( E ) | a
T → T * F | ( E ) | a
F → ( E ) | a``````

## 6. Putting a CFG into Chomsky Normal Form

• A grammar G is in Chomsky Normal Form if each production in G is one of two forms:
1. A → BC where A, B, and C are nonterminals, or
2. A → a where a is a terminal.
• We will further assume G has no useless symbols.
• Every context-free language without ε can be generated by a Chomsky Normal Form grammar.
• Let us assume we have a CFG G with no useless symbols, ε-productions, or unit productions. We can transform G into an equivalent Chomsky Normal Form grammar as follows:
• Arrange that all bodies of length two or more consist only of nonterminals.
• Replace bodies of length three or more with a cascade of productions, each with a body of two nonterminals.
• Applying these two transformations to the grammar H above, we get:
• ``````E → EA | TB | LC | a
A → PT
P → +
B → MF
M → *
L → (
C → ER
R → )
T → TB | LC | a
F → LC | a``````

## 7. Practice Problems

1. Eliminate useless symbols from the following grammar:
2. ``````S → AB | CA
A → a
B → BC | AB
C → aB | b
``````
3. Put the following grammar into Chomsky Normal Form:
4. ``````S → ASB | ε
A → aAS | a
B → BbS | A | bb
C → aB | b
``````