# COMS W3261 CS Theory Lecture 4: Regular Expressions and Finite Automata

## 1. Equivalence of DFA's, NFA's, ε-NFA's, and Regular Expressions

• We say that two finite automata are equivalent if they define the same language.
• We have shown the subset construction can be used to convert either an NFA or an ε-NFA into an equivalent DFA.
• We say that a regular expression and a finite automaton are equivalent if they both define the same language.
• Using the McNaughton-Yamada-Thompson algorithm, we can construct an equivalent ε-NFA from a regular expression to show that regular expressions are no more powerful in defining languages than finite automata.
• We can also show how to construct an equivalent regular expression from a DFA.
• These algorithms allow us to conclude that DFA's, NFA's, ε-NFA's, and regular expressions are all equivalent in definitional power -- they all define precisely the regular languages.

## 2. McNaughton-Yamada-Thompson Algorithm: From an RE to an Equivalent ε-NFA

• Let R be a regular expression. Using the algorithm in Section 3.2.3 of HMU, pp. 102-107, we can construct an ε-NFA N such that L(N) = L(R).
• The algorithm takes as input a regular expression R of length n and recursively constructs from it an equivalent ε-NFA that has
• exactly one start state and one final state,
• at most 2n states,
• no arcs coming into its start state,
• no arcs leaving its final state,
• at most two arcs leaving any nonfinal state
• This algorithm was discovered by McNaughton and Yamada, and independently by Ken Thompson who used it in the string-matching program `grep` on Unix. On an input string of length m, an n-state MYT ε-NFA can be efficiently simulated in time O(mn) using a two-stack algorithm. (See pp. 157-159, A. Aho, M. Lam, R. Sethi, J. Ullman, Compilers: Principles, Techniques, and Tools, Second Edition, Addison Wesley, 2007, or Russ Cox's paper cited below for details.)

## 3. From a DFA to an Equivalent RE

• We will show that every regular language can be defined by a regular expression by constructing from a DFA an equivalent regular expression.
• To facilitate this construction we introduce another form of epsilon-NFA that we shall call a generalized NFA (GNFA). A GNFA is a just an epsilon-NFA in which the labels on the arcs are regular expressions rather than epsilons or symbols from the input alphabet. A GNFA accepts an input string w iff there is a path of transitions from its start state to one of its final states in which the language denoted by the concatenation of the regular expressions along that path contains the string w.
• Here is an algorithm (sometimes called the "ripping algorithm") for constructing an equivalent regular expression from a DFA D.
1. Transform D into a GNFA G by creating the new start state S and the new distinct accept state A.
2. In G, add an ε-transition from S to the old start state of D. There will be no transitions in G into S. There will be a transition out of S to every other state in G, labeled by ∅ if it has no other label.
3. In G, add an ε-transition from each old final state of D to A. There will be no transitions in G out of A.
4. If an arc has multiple labels, say a, b, c, replace these labels by the regular expression a+b+c.
5. If there is no arc from a state s to a state t, then add an arc from s to t in G with the label ∅. (We won't normally show these arcs.)
6. As long as old DFA states remain in G, repeatedly remove a DFA state r from G. For every pair of states p and q in G, let Rpr, Rrr, Rrp, and Rpq be the current regular expressions labeling the arcs from p to r, r to r, r to p, and p to q, respectively. Then add the regular expression
(Rpr)(Rrr)*(Rrp) + (Rpq)
as the new label on the arc from state p to state q. We do this even when p = q.
7. When only S, the new start state, and A, the new accept state, of G remain, the regular expression on the arc from S to A is the equivalent regular expression for D.
• We can show that as each DFA state is removed, the resulting GNFA is equivalent to the previous one and hence to the original DFA.
• This algorithm can be easily generalized to transform an epsilon-NFA into an equivalent regular expression.
• This result shows that DFAs, NFAs, epsilon-NFAs, GNFAs, and regular expressions are all equivalent in their defining power. They all define exactly the class of regular languages.

## 4. Practice Problems

1. Using structural induction prove that the MYT algorithm produces an ε-NFA with at most 2n states from a regular expression of length n.
2. Consider the regular expression R = `a+b*a`.
1. Use the MYT algorithm to construct an equivalent ε-NFA for the regular expression R.
2. Simulate the behavior of your ε-NFA on the input string `bba` using the two-stack algorithm.
3. Use the subset construction to convert your ε-NFA into a DFA.
4. Show the behavior of your DFA on the input string `bba`.
3. Convert the following DFA into an equivalent regular expression. `A` is the start state and {`B`} is the set of final states.

` State ` `Input Symbol`
` 0 ` ` 1 `
`A` `A` `B`
`B` `B` `B`

4. Convert the following DFA into an equivalent regular expression. `A` is the start state and the only final state.

• ` State ` `Input Symbol`
` 0 ` ` 1 `
`A` `B` `A`
`B` `A` `B`

5. Convert the following DFA into an equivalent regular expression. `A` is the start state and {`B`, `C`} is the set of final states.

• ` State ` `Input Symbol`
` 0 ` ` 1 `
`A` `B` `C`
`B` `A` `B`
`C` `B` `A`

6. Extend the ripping algorithm to apply to epsilon-NFAs.
7. Prove that the ripping algorithm constructs an equivalent regular expression for the DFA to which it is applied.