COMS W3261
Computer Science Theory
Lecture 9: October 3, 2012
CFGs and PDAs
Outline
- From a CFG to a PDA
- From a PDA to a CFG
- Eliminating useless symbols
- Eliminating ε-productions
- Eliminating unit productions
- Chomsky normal form
1. From a CFG to an equivalent PDA
- Given a CFG G, we can construct a PDA P such that
N(P) = L(G).
- The PDA will simulate leftmost derivations of G.
- Algorithm to construct a PDA for a CFG
- Input: a CFG G = (V, T, Q, S).
- Output: a PDA P such that N(P) = L(G).
- Method: Let P = ({q}, T, V ∪ T, δ, q, S) where
- δ(q, ε, A) =
{(q, β) | A → β is in Q }
for each nonterminal A in V.
- δ(q, a, a) = {(q, ε)}
for each terminal a in T.
- For a given input string w, the PDA simulates a leftmost derivation
for w in G.
- We can prove that N(P) = L(G) by showing that
w is in N(P) iff w is in L(G):
2. From a PDA to an equivalent CFG
- Given a PDA P, we can construct a CFG G such that
L(G) = N(P).
- The basic idea of the proof is to generate the strings that cause
P to go from state q to state p,
popping a symbol X off the
stack, by a nonterminal of the form [qXp].
- Algorithm to construct a CFG for a PDA
- Input: a PDA P =
(Q, Σ, Γ, δ, q0, Z0, F).
- Output: a CFG G = (V, Σ, R, S)
such that L(G) = N(P).
- Method:
- Let the nonterminal S be the start symbol of G.
The other nonterminals in V will be symbols of the form
[pXq] where p and q are states in Q,
and X is a stack symbol in Γ.
- The set of productions R is constructed as follows:
- For all states p, R has the production
S → [q0Z0p].
- If δ(q, a, X) contains
(r, Y1Y2 … Yk),
then R has the productions
- [qXrk] →
a[rY1r1]
[r1Y2r2] …
[rk-1Ykrk]
- for all lists of states r1, r2, … ,
rk.
- We can prove that [qXp] ⇒* w iff
(q, w, X) |–* (p, ε, ε).
- From this, we have
[q0Z0p] ⇒* w iff
(q0, w, Z0) |–*
(p, ε, ε), so we can conclude
L(G) = N(P).
3. Eliminating Useless Symbols from a CFG
- A symbol X is useful for a CFG if there is a derivation of the
form S ⇒* αXβ ⇒* w
for some string of terminals w.
- If X is not useful, then we say X is useless.
- To be useful, a symbol X needs to be
- generating; that is, X needs to be able to derive some string of terminals.
- reachable; that is, there needs to be a derivation of the form
S ⇒* αXβ where α and β
are strings of nonterminals and terminals.
- To eliminate useless symbols from a grammar, we
- identify
the nongenerating symbols and eliminate all productions containing
one or more of these symbols, and then
- eliminate all productions containing symbols that are not reachable
from the start symbol.
- In the grammar
S → AB | a
A → b
S, A, a, and
b are generating. B is not generating.
Eliminating the productions containing the nongenerating symbols we get
S → a
A → b
Now we see A is not reachable from S, so
we can eliminate the second production to get
S → a
The generating symbols can be computed inductively bottom-up from the
set of terminal symbols.
The reachable symbols can be computed inductively starting from S.
4. Eliminating ε-productions from a CFG
- If a language L has a CFG, then L - { ε } has a CFG without
any ε-productions.
- A nonterminal A in a grammar is nullable if
A ⇒* ε.
- The nullable nonterminals can be determined iteratively.
- We can eliminate all ε-productions in a grammar as follows:
- Eliminate all productions with ε bodies.
- Suppose A → X1X2 ... Xk
is a production and m of the k Xi's
are nullable. Then add the 2m versions of this
production where the nullable Xi's are present
or absent. (But if all symbols are nullable, do not add an
ε-production.)
- Let us eliminate the ε-productions from the grammar G
S → AB
A → aAA | ε
B → bBB | ε
S, A and B are nullable.
For the production S → AB
we add the productions S → A | B
For the production A → aAA
we add the productions A → aA | a
For the production B → bBB
we add the productions B → bB | b
The resulting grammar H with no ε-productions is
S → AB | A | B
A → aAA | aA | a
B → bBB | bB | b
We can prove that L(H) = L(G) - { ε }.
5. Eliminating Unit Productions from a CFG
- A unit production is one of the form
A → B
where both A and B are nonterminals.
- Let us assume we are given a grammar G with no ε-productions.
- From G we can create an equivalent grammar H with no unit productions
as follows.
- Define (A, B) to be a unit pair if A ⇒* B in G.
- We can inductively construct all unit pairs for G.
- For each unit pair (A, B) in G, we add to H the productions
A → α where B → α is a nonunit production of G.
- Consider the standard grammar G for arithmetic expressions:
E → E + T | T
T → T * F | F
F → ( E ) | a
The unit pairs are (E,E), (E,T), (E,F), (T,T), (T,F), (F,F).
The equivalent grammar H with no unit productions is:
E → E + T | T * F | ( E ) | a
T → T * F | ( E ) | a
F → ( E ) | a
6. Putting a CFG into Chomsky Normal Form
- A grammar G is in Chomsky Normal Form if each production in G is
one of two forms:
- A → BC where A, B, and C are nonterminals, or
- A → a where a is a terminal.
- We will further assume G has no useless symbols.
- Every context-free language without ε can be generated by
a Chomsky Normal Form grammar.
- Let us assume we have a CFG G with no useless symbols, ε-productions,
or unit productions. We can transform G into an equivalent Chomsky Normal
Form grammar as follows:
- Arrange that all bodies of length two or more consist only of nonterminals.
- Replace bodies of length three or more with a cascade of productions, each with
a body of two nonterminals.
- Applying these two transformations to the grammar H above, we get:
E → EA | TB | LC | a
A → PT
P → +
B → MF
M → *
L → (
C → ER
R → )
T → TB | LC | a
F → LC | a
7. Practice Problems
- Eliminate useless symbols from the following grammar:
S → AB | CA
A → a
B → BC | AB
C → aB | b
- Put the following grammar into Chomsky Normal Form:
S → ASB | ε
A → aAS | a
B → BbS | A | bb
C → aB | b
8. Reading Assignment
aho@cs.columbia.edu