Computer Science Theory

Lecture 7: September 26, 2012

Context-Free Grammars

- Review
- Definition of a context-free grammar
- Derivations
- Leftmost and rightmost derivations
- Parse trees
- Ambiguity

- Closure properties of regular languages
- Decision problems for regular languages
- Testing equivalence of states
- Testing equivalence of DFA's
- Minimizing the number of states in a DFA

- A CFG is a formalism for defining a language.
- A CFG has four components (V, T, P, S):
- V is a finite set of variables called nonterminals, sometimes called syntactic categories.
- Each variable represents a language.
- T is a finite set of symbols called terminals.
- The set of terminals is the alphabet of the language
defined by the grammar.
- P is a finite set of productions, rewrite rules of the form
A → α

- where A is a nonterminal and α is a string (possibly empty) of nonterminals and terminals.
- S is a nonterminal, called the start symbol.
- Example grammar G1:
- V = {
`S`

} - T = { ( , ) }
- P is the set with the two productions
S → S ( S ) S → ε

- S is the start symbol.
- G1 generates the language consisting of all strings of balanced parentheses.

- A grammar is used to define a language.
- Example of a derivation of
`( )( )`

from`S`

in G1:

S ⇒ S ( S ) ⇒ S ( S ) ( S ) ⇒ ( S ) ( S ) ⇒ ( ) ( S ) ⇒ ( ) ( )

`( )( )`

is string in the
language defined by G1.- A derivation in which at each step we replace the leftmost nonterminal by one of its production bodies is called a leftmost derivation.
- The derivation above is a leftmost derivation of
`( )( )`

from`S`

in G1. - A rightmost derivation is one in which at each step we replace the rightmost nonterminal by one of its production bodies.
- Here is a rightmost derivation of
`( )( )`

from`S`

in G1:

S ⇒ S ( S ) ⇒ S ( ) ⇒ S ( S ) ( ) ⇒ S ( ) ( ) ⇒ ( ) ( )

- A derivation can be represented by a parse tree.
- Let G = (V, T, P, S) be a CFG. A parse tree for G is a tree in which:
- Each interior node is labeled by a nonterminal in V.
- Each leaf is labeled by a nonterminal, or a terminal, or ε
- If an interior node is labeled by a nonterminal A and its children are
labeled X
_{1}, X_{2}, ... , X_{k}, then A → X_{1}X_{2}... X_{k}is a production in P. - The
*yield*of a parse tree is the string obtained by concatenating the labels of the leaves from the left. - Derivations, parse trees, leftmost derivations, rightmost derivations, and recursive inference are equivalent.
- A parser for a grammar G is a program that takes as input a string and produces as output a parse tree for the string or a message saying that the string cannot be generated by G.
- A parser generator is a program that takes as input a grammar G and produces as output a parser for G. YACC is a widely used parser generator.

- A grammar G is ambiguous if there is a sentence in L(G) with two or more distinct parse trees.
- The following grammar G2 for arithmetic expressions is ambiguous
because
`a + a * a`

has two parse trees.

E → E + E | E * E | ( E ) | a

`+`

and `*`

.`*`

have higher precedence than `+`

and makes both
`*`

and `+`

left associative.E → E + T | T T → T * F | F F → ( E ) | a

- Construct a CFG that generates the language
{
`a`

| n ≥ 0 }.^{n}b^{n} - Prove that the language generated by the grammar G1 in section 2 consists of all and only all strings of balanced parentheses.
- Construct a CFG that generates ELP = {
`ww`

|^{R}`w`

is any string of`a`

's and`b`

's }. This is the language of even-length palindromes over the alphabet {`a`

,`b`

}. A palindrome is a string that reads the same in both directions. - Prove that ELP is not a regular language.
- Construct a CFG for all regular expressions over the alphabet {a, b}.

- HMU: Ch. 5

aho@cs.columbia.edu