COMS W4115
Programming Languages and Translators
Lecture 5: Implementing a Lexical Analyzer
February 6, 2013
Outline
- Finite automata
- Converting an NFA to a DFA
- Equivalence of regular expressions and finite automata
- Simulating an NFA
- The pumping lemma for regular languages
- Closure and decision properties of regular languages
1. Finite Automata
- Variants of finite automata are commonly used to match regular expression patterns.
- A nondeterministic finite automaton (NFA) consists of
- A finite set of states S.
- An input alphabet consisting of a finite set of symbols Σ.
- A transition function δ that maps S × (Σ ∪ {ε})
to subsets of S. This transition function can be represented
by a transition graph in which the nodes are labeled by states
and there is a directed edge labeled a from node w to node v
if δ(w, a) contains v.
- An initial state s0 in S.
- F, a subset of S, called the final (or accepting) states.
- An NFA accepts an input string x iff there is a path in the transition
graph from the initial state to a final state that spells out x.
- The language defined by an NFA is the set of strings accepted by the NFA.
- A deterministic finite automaton (DFA) is an NFA in which
- There are no ε moves, and
- For each state s and input symbol a there is exactly one transition
out of s labeled a.
2. Converting an NFA to a DFA
- Every NFA can be converted to an equivalent DFA using the subset construction
(Algorithm 3.20, ALSU, pp. 153-154).
- Every DFA can be converted into an equivalent minimum-state DFA
Using Algorithm 3.39, ALSU, pp. 181-183.
All equivalent minimum-state DFAs are isomorphic up to state renaming.
3. Equivalence of Regular Expressions and Finite Automata
- Regular expressions and finite automata define the same
class of languages, namely the regular sets.
- Every regular expression can be converted into an equivalent
NFA using the McNaughton-Yamada-Thompson algorithm
(Algorithm 3.23, ALSU, pp. 159-161).
- Every finite automaton can be converted into a regular expression
using Kleene's algorithm.
4. Simulating an NFA
- Two-stack simulation of an NFA: Algorithm 3.22, ALSU, pp. 156-159.
5. The Pumping Lemma for Regular Languages
- The pumping lemma allows us to prove certain languages, like
{
anbn |
n ≥ 0 }, are not regular.
- The pumping lemma. If L is a regular language, then there exists a
constant n associated with L such that for every string w in L where
|w| ≥ n, we can partition w into three strings
xyz (i.e., w = xyz) such that
- y is not the empty string,
- the length of xy is less than or equal to n, and
- for all k ≥ 0, the string xykz is in L.
6. Closure and Decision Properties of Regular Languages
- The regular languages are closed under the following operations:
- union
- intersection
- complement
- reversal
- Kleene star
- homomorphism
- inverse homomorphism
- Decision properties
- Given a regular expression r and a string w, it is decidable
whether r matches w.
- Give a finite automaton A, it is decidable whether L(A) is empty.
- Given two finite automata A and B, it is decidable whether L(A) = L(B).
7. Practice Problems
- Write down deterministic finite automata for the following regular expressions:
(a*b*)*
(aa|bb)*((ab|ba)(aa|bb)*(ab|ba)(aa|bb)*)*
a(ba|a)*
ab(a|b*c)*bb*a
- Construct a deterministic finite automaton that will recognize all strings
of 0's and 1's representing integers that are divisible by 3.
Assume the empty string represents 0.
- Use the McNaughton-Yamada-Thompson algorithm to convert the regular
expression
a(a|b)*a into a nondeterministic finite automaton.
- Convert the NFA of (3) into a DFA.
- Minimize the number of states in the DFA of (4).
8. Reading Assignment
aho@cs.columbia.edu