Programming Languages and Translators

Lecture 8: Context-Free Grammars

February 17, 2014

- Context-free grammars
- Derivations and parse trees
- Ambiguity
- Examples of context-free grammars
- Yacc: a language for specifying syntax-directed translators

- CFG's are very useful for representing the syntactic structure of programming languages.
- A CFG is sometimes called Backus-Naur Form (BNF).
- A context-free grammar consists of
- A finite set of terminal symbols,
- A finite nonempty set of nonterminal symbols,
- One distinguished nonterminal called the start symbol, and
- A finite set of rewrite rules, called productions, each of the form A → α where A is a nonterminal and α is a string (possibly empty) of terminals and nonterminals.
- Consider the context-free grammar G with the productions

```
E → E + T | T
T → T * F | F
F → ( E ) | id
```

- The terminal symbols are the alphabet from which strings are formed. In this grammar the set of terminal symbols is { id, +, *, (, ) }. The terminal symbols are the token names.
- The nonterminal symbols are syntactic variables that denote sets
of strings of terminal symbols. In this grammar the set of nonterminal
symbols is {
`E`

,`T`

,`F`

}. - The start symbol is
`E`

.

*L*(G), the language generated by a grammar G, consists of all strings of terminal symbols that can be derived from the start symbol of G.- A leftmost derivation expands the leftmost nonterminal in each sentential form:

```
E ⇒ E + T
⇒ T + T
⇒ F + T
⇒ id + T
⇒ id + T * F
⇒ id + F * F
⇒ id + id * F
⇒ id + id * id
```

```
E ⇒ E + T
⇒ E + T * F
⇒ E + T * id
⇒ E + F * id
⇒ E + id * id
⇒ T + id * id
⇒ F + id * id
⇒ id + id * id
```

- Consider the context-free grammar G with the productions

```
E → E + E | E * E | ( E ) | id
```

`id + id * id`

```
E ⇒ E + E
⇒ id + E
⇒ id + E * E
⇒ id + id * E
⇒ id + id * id
```

`id + id * id`

```
E ⇒ E * E
⇒ E + E * E
⇒ id + E * E
⇒ id + id * E
⇒ id + id * id
```

- the precedence of the + and * operators, or
- the associativity of the + and * operators

`a`^{m}b^{m}a^{n}b^{n}

| `a`^{m}b^{n}a^{n}b^{m}

| - Nonempty palindromes of
`a`

's and`b`

's. (A palindrome is a string that reads the same forwards as backwards; e.g.,`abba`

.) - CFG:
`S → a S a | b S b | a a | b b | a | b`

- Note that the language generated by this grammar is not regular. Can you prove this using the pumping lemma for regular languages?
- Strings with an equal number of
`a`

's and`b`

's: - CFG:
`S → a S b | b S a | S S | ε`

- Note that this grammar is ambiguous. Can you find an equivalent unambiguous grammar?
- If- and if-else statements:

```
stmt → if ( expr ) stmt else stmt
| if (expr) stmt
| other
```

```
stmt → expr ;
| if (expr) stmt
| for ( optexpr; optexpr; optexpr;) stmt
| other
optexpr → ε
| expr
```

- Yacc is popular language, created by Steve Johnson of Bell Labs, for specifying and implementing syntax-directed translators.
- Bison is a gnu version of Yacc, upwards compatible with the original Yacc, written by Charles Donnelly and Richard Stallman. Many other versions of Yacc are also available.
- The original Yacc used C for semantic actions. Yacc has been rewritten for many other languages including Java, ML, OCaml, and Python.
- Yacc specifications
- A Yacc program has three parts:

declarations`%%`

translation rules`%%`

supporting C-routines

`%%`

followed by the supporting C-routines) may be omitted.```
%{
#include <ctype.h>
#include <stdio.h>
#define YYSTYPE double
%}
%token NUMBER
%left '+'
%left '*'
%%
lines : lines expr '\n' { printf("%g\n", $2); }
| lines '\n'
| /* empty */
;
expr : expr '+' expr { $$ = $1 + $3; }
| expr '*' expr { $$ = $1 * $3; }
| '(' expr ')' { $$ = $2; }
| NUMBER
;
%%
/* the lexical analyzer; returns <token-name, yylval> */
int yylex() {
int c;
while ((c = getchar()) == ' ');
if ((c == '.') || (isdigit(c))) {
ungetc(c, stdin);
scanf("%lf", &yylval);
return NUMBER;
}
return c;
}
```

`%left '+'`

`%left '*'`

`+`

left associative and of lower
precedence than the left-associative operator `*`

.- Put the yacc program in a file, say
`desk.y`

. - Invoke
`yacc desk.y`

to create the yacc output file`y.tab.c`

. - Compile this output file with a C compiler by typing
`gcc y.tab.c -ly`

to get`a.out`

. (The library -ly contains the Yacc parsing program.) `a.out`

is the desk calculator. Try it!

```
int yyerror(char const *message) {
fputs(message, stderr);
fputc('\n', stderr);
return 0;
}
```

- Let G be the grammar
S → a S b S | b S a S | ε.
- What language is generated by this grammar?
- Draw all parse trees for the sentence
`abab`

. - Is this grammar ambiguous?

- Let G be the grammar
S → a S b | ε.
Prove that
*L*(G) = {`a`

^{n}`b`

^{n}|*n*≥ 0 }. - Consider a sentence of the form
`id + id + ... + id`

where there are*n*plus signs. Let G be the grammar in section (3) above. How many parse trees are there in G for this sentence when*n*equals - 1
- 2
- 3
- 4
*m*?- Write down a CFG for regular expressions over the alphabet {
`a`

,`b`

}. Show a parse tree for the regular expression`a | b*a`

.

- ALSU Sects. 4.1-4.2, 4.9
- A nice Lex & Yacc tutorial

aho@cs.columbia.edu