Programming Languages and Translators

Lecture 5: Implementing a Lexical Analyzer

February 4, 2015

- Finite automata
- Converting an NFA to a DFA
- Equivalence of regular expressions and finite automata
- Simulating an NFA
- The pumping lemma for regular languages
- Closure and decision properties of regular languages

- Variants of finite automata are commonly used to match regular expression patterns. This lecture summarizes the relevant results linking finite automata and regular expressions usually covered in computer science theory courses. Of particular relevance to the implementation of lexers are the algorithms that use deterministic and nondeterministic finite automata to match regular expression patterns.
- A nondeterministic finite automaton (NFA) consists of
- A finite set of states
*S*. - An input alphabet consisting of a finite set of symbols Σ.
- A transition function δ that maps
*S*× (Σ ∪ {ε}) to subsets of*S*. This transition function can be represented by a transition graph in which the nodes are labeled by states and there is a directed edge labeled*a*from node*w*to node*v*if δ(*w*,*a*) contains*v*. - An initial state
*s*_{0}in*S*. *F*, a subset of*S*, called the final (or accepting) states.- An NFA accepts an input string
*x*iff there is a path in the transition graph from the initial state to a final state that spells out*x*. - The language defined by an NFA is the set of strings accepted by the NFA.
- A deterministic finite automaton (DFA) is an NFA in which
- There are no ε moves, and
- For each state
*s*and input symbol*a*there is exactly one transition out of*s*labeled*a*.

- Every NFA can be converted to an equivalent DFA using the subset construction (Algorithm 3.20, ALSU, pp. 153-154).
- Every DFA can be converted into an equivalent minimum-state DFA Using Algorithm 3.39, ALSU, pp. 181-183. All equivalent minimum-state DFAs are isomorphic up to state renaming.

- Regular expressions and finite automata define the same class of languages, namely the regular sets.
- Every regular expression can be converted into an equivalent NFA using the McNaughton-Yamada-Thompson algorithm (Algorithm 3.23, ALSU, pp. 159-161).
- Every finite automaton can be converted into a regular expression using Kleene's algorithm.

- Two-stack simulation of an NFA: Algorithm 3.22, ALSU, pp. 156-159.

- The pumping lemma allows us to prove certain languages, like
{
`a`

^{n}`b`

^{n}|*n*≥ 0 }, are not regular. **The pumping lemma.**If L is a regular language, then there exists a constant*n*associated with L such that for every string*w*in L where |*w*| ≥*n*, we can partition*w*into three strings*xyz*(i.e.,*w*=*xyz*) such that*y*is not the empty string,- the length of
*xy*is less than or equal to*n*, and - for all
*k*≥ 0, the string*xy*is in L.^{k}z

- The regular languages are closed under the following operations:
- union
- intersection
- complement
- reversal
- Kleene star
- homomorphism
- inverse homomorphism
- Decision properties
- Given a regular expression
*r*and a string*w*, it is decidable whether*r*matches*w*. - Give a finite automaton A, it is decidable whether L(A) is empty.
- Given two finite automata A and B, it is decidable whether L(A) = L(B).

- Write down deterministic finite automata for the following regular expressions:
`(a*b*)*`

`(aa|bb)*((ab|ba)(aa|bb)*(ab|ba)(aa|bb)*)*`

`a(ba|a)*`

`ab(a|b*c)*bb*a`

- Construct a deterministic finite automaton that will recognize all strings of 0's and 1's representing integers that are divisible by 3. Assume the empty string represents 0.
- Use the McNaughton-Yamada-Thompson algorithm to convert the regular
expression
`a(a|b)*a`

into a nondeterministic finite automaton. - Convert the NFA of (3) into a DFA.
- Minimize the number of states in the DFA of (4).

- ALSU Chapter 3, all sections except 3.9.
- Russ Cox's article Regular Expression Matching Can Be Simple and Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...) has a good historical account on the evolution of regular expression matching programs.

aho@cs.columbia.edu