COMS W3261
Computer Science Theory
Lecture 4: September 17, 2012
Equivalence of Regular Expressions and Finite Automata
Overview
- We say that a regular expression E and a finite automaton A
are equivalent if L(E) = L(A).
- We will prove that regular expressions and finite automata
are equivalent in definitional power by showing how to convert a
- a regular expression into an equivalent epsilon-NFA
- an epsilon-NFA into an equivalent DFA
- a DFA into an equivalent regular expression
1. Review
- Deterministic finite automata
- Nondeterministic finite automata
- The subset construction
2. ε-NFA: NFA with Epsilon-Transitions
- An ε-NFA is an NFA (Q, Σ, δ, q0, F)
whose transition function δ is a mapping from
Σ ∪ {ε} to P(Q), the set of subsets of Q.
- The language of an ε-NFA is the set of all strings that
spell out a path from the start state to a final state.
There can be ε-transitions along this path.
3. Epsilon-Closures
- We define ECLOSE(q), the ε-closure of a state q of an ε-NFA,
recursively as follows:
- State q is in ECLOSE(q).
- If state p is in ECLOSE(q), then all states in δ(p, ε)
are also in ECLOSE(q).
- We can compute the ε-closure of a set of states S by taking the
union of the ε-closures of all the individual states in S.
4. McNaughton-Yamada-Thompson Algorithm:
Converts an RE to an equivalent ε-NFA
- Theorem: If L = L(R) for some regular expression R, then there is
an ε-NFA N such that L = L(N).
- Proof: See HMU, Sect. 3.2.3, pp. 102-107.
- The proof is in the form of an algorithm that takes as input
a regular expression R of length n and recursively constructs from
it an equivalent ε-NFA that has
- exactly one start state and one final state,
- at most 2n states,
- no arcs coming into its start state,
- no arcs leaving its final state,
- at most two arcs leaving any nonfinal state
- This algorithm was discovered by McNaughton and Yamada, and then
independently by Ken Thompson who used it in the string-matching program
grep on Unix. On an input string of length m,
an n-state MYT ε-NFA can be efficiently simulated
in time O(mn) using a two-stack algorithm.
5. Converting an ε-NFA to an equivalent DFA
- We can eliminate all ε-transitions from an ε-NFA
by converting it into an equivalent DFA using the subset construction.
- Given an ε-NFA E =
(QE, Σ, δE, qE, FE),
we construct the DFA D =
(QD, Σ, δD, qD, FD)
as follows:
- QD = P(QE).
- δD is computed as follows:
- Let S = {p1, p2,..., pk} and let
{r1, r2,..., rm} be the union of
δE(pi, a) for
i = 1, 2, ..., k and all a in Σ.
- Then, δD(S, a) =
ECLOSE({r1, r2,..., rm}).
- qD = ECLOSE(qE).
- FD = { S | S is in QD and S contains a state in
FE }.
- As with the subset construction, we can prove by induction that
L(D) = L(E).
6. Kleene's Algorithm: Converting a DFA to an Equivalent Regular Expression
- Given a DFA A, Kleene's algorithm constructs a regular expression
R from A such that L(R) = L(A).
- Suppose the states of A are numbered 1, 2, ..., n.
- Kleene's algorithm is a dynamic programming algorithm that constructs
a regular expression R[i,j,k] that denotes all paths from state i
to state j with no intermediate node in the path numbered higher
than k as follows:
for (i = 1; i <= n; i++)
for (j = 1; j <= n; j++)
if (i != j)
if (there are transitions from state i to state j labeled a1, a2, ..., ak)
R[i,j,0] = a1 + a2 + ... + ak;
else
R[i,j,0] = ∅;
else if (i == j)
if (there are transitions from state i to state i labeled a1, a2, ..., ak)
R[i,i,0] = ε + a1 + a2 + ... + ak;
else
R[i,i,0] = ε;
for (k = 1; k <= n; k++)
for (i = 1; i <= n; i++)
for (j = 1; j <= n; j++)
R[i,j,k] = R[i,j,k-1] + R[i,k,k-1](R[k,k,k-1])*R[k,j,k-1];
Assuming the start state is 1, the regular expression for the DFA A is then
the sum (union) of all expressions R[1,j,n] where j is
a final state.
Note that Kleene's algorithm for constructing a regular expression from
a DFA reduces to Warshall's transitive closure algorithm
and to Floyd's all-pairs shortest paths algorithm
for directed graphs.
Example 3.5, HMU, pp. 95-97.
7. Converting a DFA to an equivalent RE by Eliminating States
- Kleene's algorithm gives us a mechanical way to construct
a regular expression from a DFA, or for that matter, from any NFA
or ε-NFA.
- Another approach that avoids duplicating work is to eliminate
states, one at a time, from the DFA using the procedure
outlined in Section 3.2.2 of HMU.
- Example 3.6, HMU, pp. 101-102.
- You can see an expanded treatment of state elimination
with more examples in pp. 583-588 of Chapter 10 of
Aho and Ullman, Foundations of Computer Science.
8. Practice Problems
- Use the MYT algorithm to construct an equivalent ε-NFA
for the regular expression a+b*a.
- Show the behavior of your ε-NFA on the input string bba.
- Use the subset construction to convert your ε-NFA into a DFA.
- Show the behavior of your DFA on the input string bba.
- Consider the DFA D with:
- Q = {
1, 2, 3}
- Σ = {
a, b}
- δ:
State |
Input Symbol |
a |
b |
1 |
2 |
1 |
2 |
3 |
1 |
3 |
3 |
2 |
- Start state:
1
- F = {
3}
- a) Use Kleene's algorithm to construct a regular expression for L(D).
Simplify your expressions as much as possible at each stage.
- b) Construct a regular expression for L(D) by eliminating state 2.
9. Reading Assignment
aho@cs.columbia.edu