Lecture 9: CFG's and PDA's

- Given a CFG
*G*, we can construct a PDA*P*such that N(*P*) = L(*G*). - The PDA will simulate leftmost derivations of
*G*. - Algorithm to construct a PDA for a CFG
- Input: a CFG
*G*= (*V*,*T*,*Q*,*S*). - Method: Let
*P*= ({*q*},*T*,*V*∪*T*, δ,*q*,*S*) where - δ(
*q*, ε,*A*) = {(*q*, β) |*A*→ β is in*Q*} for each nonterminal*A*in V. - δ(
*q*,*a*,*a*) = {(*q*, ε)} for each terminal*a*in*T*. - For a given input string
*w*, the PDA can simulate a leftmost derivation for*w*in*G*. - We can prove that N(
*P*) = L(*G*) by showing that*w*is in N(*P*) iff*w*is in L(*G*): - If part: If
*w*is in L(*G*), then there is a leftmost derivation

S= γ_{1}⇒ γ_{2}⇒ … ⇒ γ_{i}⇒ … ⇒ γ_{n}= w

- (
*q*,*w*,*S*) |–* (*q*,*y*, α_{i}_{i}),

- We now show that every language recognized by a PDA can be generated by a context-free grammar.
- Given a PDA
*P*, we can construct a CFG*G*such that L(*G*) = N(*P*). - The basic idea of the proof is to generate the strings that cause
*P*to go from state*q*to state*p*, popping a symbol*X*off the stack, using a nonterminal of the form [*q**X**p*]. - Algorithm to construct a CFG for a PDA
- Input: a PDA
*P*= (*Q*, Σ, Γ, δ,*q*_{0},*Z*_{0},*F*). - Output: a CFG
*G*= (*V*, Σ,*R*,*S*) such that L(*G*) = N(*P*). - Method:
- Let the nonterminal
*S*be the start symbol of*G*. The other nonterminals in*V*will be symbols of the form [*p*X*q*] where*p*and*q*are states in*Q*, and*X*is a stack symbol in Γ. - The set of productions
*R*is constructed as follows: - For all states
*p*,*R*has the production*S*→ [*q*_{0}*Z*_{0}*p*]. - If δ(
*q*,*a*,*X*) contains (*r*,*Y*_{1}*Y*_{2}…*Y*_{k}), then*R*has the productions- [
*q**X**r*_{k}] →*a*[*r**Y*_{1}*r*_{1}] [*r*_{1}*Y*_{2}*r*_{2}] … [*r*_{k-1}*Y*_{k}*r*_{k}]

- for all lists of states
*r*_{1},*r*_{2}, … ,*r*_{k}. - [
- We can prove that [
*q**X**p*] ⇒**w*iff (*q*,*w*,*X*) |–* (*p*, ε, ε). - From this, we have
[
*q*_{0}*Z*_{0}*p*] ⇒**w*iff (*q*_{0},*w*, Z_{0}) |–* (*p*, ε, ε), so we can conclude L(*G*) = N(*P*). - Sections 6 and 7 allow us to conclude that family of languages generated by context-free grammars is the same as the family of languages recognized by pushdown automata.
- In summary, the regular languages are a proper subset of the deterministic CFL’s which are a proper subset of all CFL’s.

- A symbol
*X*is*useful*for a CFG if there is a derivation of the form*S*⇒^{*}α*X*β ⇒^{*}*w*for some string of terminals*w*. - If
*X*is not useful, then we say*X*is*useless*. - To be useful, a symbol
*X*needs to be *generating*; that is,*X*needs to be able to derive some string of terminals.*reachable*; that is, there needs to be a derivation of the form*S*⇒^{*}α*X*β where α and β are strings of nonterminals and terminals.- To eliminate useless symbols from a grammar, we
- identify the nongenerating symbols and eliminate all productions containing one or more of these symbols, and then
- eliminate all productions containing symbols that are not reachable from the start symbol.

- In the grammar

```
S → AB | a
A → b
```

`S`

, `A`

, `a`

, and
`b`

are generating. `B`

is not generating.```
S → a
A → b
```

`A`

is not reachable from `S`

, so
we can eliminate the second production to get`S → a`

`S`

.- If a language
*L*has a CFG, then*L*- { ε } has a CFG without any ε-productions. - A nonterminal
*A*in a grammar is*nullable*if*A*⇒^{*}ε. - The nullable nonterminals can be determined iteratively.
- We can eliminate all ε-productions in a grammar as follows:
- Eliminate all productions with ε bodies.
- Suppose A → X
_{1}X_{2}... X_{k}is a production and*m*of the*k*X_{i}'s are nullable. Then add the 2^{m}versions of this production where the nullable X_{i}'s are present or absent. (But if all symbols are nullable, do not add an ε-production.) - Let us eliminate the ε-productions from the grammar
*G*

```
S → AB
A → aAA | ε
B → bBB | ε
```

`S`

, `A`

and `B`

are nullable.`S → AB`

we add the productions `S → A | B`

`A → aAA`

we add the productions `A → aA | a`

`B → bBB`

we add the productions `B → bB | b`

```
S → AB | A | B
A → aAA | aA | a
B → bBB | bB | b
```

- A
*unit*production is one of the form*A*→*B*where both*A*and*B*are nonterminals. - Let us assume we are given a grammar
*G*with no ε-productions. - From
*G*we can create an equivalent grammar*H*with no unit productions as follows. - Define (
*A*,*B*) to be a unit pair if*A*⇒^{*}*B*in*G*. - We can inductively construct all unit pairs for
*G*. - For each unit pair (
*A*,*B*) in*G*, we add to*H*the productions*A*→ α where*B*→ α is a nonunit production of*G*. - Consider our standard grammar
*G*for arithmetic expressions:

```
E → E + T | T
T → T * F | F
F → ( E ) | a
```

`(E,E), (E,T), (E,F), (T,T), (T,F), (F,F)`

.```
E → E + T | T * F | ( E ) | a
T → T * F | ( E ) | a
F → ( E ) | a
```

- A grammar
*G*is in*Chomsky Normal Form*if each production in*G*is one of forms: *A*→*BC*where*A*,*B*, and*C*are nonterminals except*B*and*C*may not be the start symbol, or*A*→*a*where*a*is a terminal.- If ε is in L(
*G*), then*G*contains the production*S*→ ε. - We will further assume
*G*has no useless symbols. - This is slight generalization of the definition of Chomsky Normal Form in HMU to permit a CNF grammar to generate the empty string.
- Every context-free language can be generated by a Chomsky Normal Form grammar.
- Let us assume we have a CFG
*G*with no useless symbols. We can transform*G*into an equivalent Chomsky Normal Form grammar as follows: - If L(
*G*) contains ε, add the new starting production*S'*→*S*where*S'*is a new start symbol and*S*is the old start symbol and the new ε-production*S'*→ ε. - Eliminate all ε productions except
*S'*→ ε if it was added in step (1). - Eliminate all unit productions.
- Arrange that all bodies of length two or more consist only of nonterminals
by replacing each terminal
*a*in a body of length two or more with a new nonterminal*A'*and adding the new production*A'*→*a*. - Replace bodies of length three or more with a cascade of productions, each with
a body of two nonterminals.
For example, we can replace the production
*A*→*BCDE*with the cascade of productions *A*→*BC'**C'*→*CD'**D'*→*DE*- where
*C'*and*D'*are new nonterminals. - We can put the grammar
*H*above into Chomsky Normal Form to get:

```
E → EA | TB | LC | a
A → PT
P → +
B → MF
M → *
L → (
C → ER
R → )
T → TB | LC | a
F → LC | a
```

- Convert the following grammar
`E → + E E | * E E | a`

- to an equivalent PDA that accepts by empty stack.
- Convert your PDA from problem (1) to an equivalent PDA that accepts by final state.
- Convert your PDA from problem (1) to an equivalent CFG.
- Eliminate useless symbols from the following grammar:
- Put the following grammar into Chomsky Normal Form:

```
S → AB | CA
A → a
B → BC | AB
C → aB | b
```

```
S → ASB | ε
A → aAS | a
B → BbS | A | bb
C → aB | b
```

- HMU: Sections 6.3, 7.1

aho@cs.columbia.edu verma@cs.columbia.edu