# COMS W3261 CS Theory Lecture 2: Finite Automata and Regular Languages

## 1. The Chomsky Hierarchy of Classes of Languages

• In this course we will study the following classes of languages in the Chomsky hierarchy:
• The regular languages
• The context-free languages
• The recursive languages
• The recursively enumerable languages
• We will show that this is a proper hierarchy of classes of languages.

## 2. Deterministic Finite Automata

• A deterministic finite automaton (DFA) is one of the simplest and most useful models of computation. A DFA is used to define or recognize a language, called a regular language.
• A DFA has five components:
1. A finite nonempty set of states Q.
2. An input alphabet consisting of a finite set of symbols Σ.
3. A transition function δ that maps Q × Σ to Q.
4. This transition function can be represented by a transition diagram in which the nodes are labeled by states and arcs by input symbols.
5. A start state which is one of the states in Q.
6. A set of final (or accepting) states F that is a subset (possibly empty) of Q.
• A DFA accepts an input string x if and only if there is a sequence of transitions from the initial state to a final state that spells out x. We will often say a DFA matches a string x if the DFA accepts x.
• L(D), the language defined by a DFA D, is the set of strings accepted by D.
• A language defined by a DFA is said to be regular.

## 3. Example 1: DFA for Binary Strings with an Even Number of 0's

• Consider the following DFA D with:
1. The finite set of states Q = {`A`, `B`}
2. The input alphabet Σ = {`0`, `1`}
3. The transition function δ as represented by the following transition table:

4. ` State ` `Input Symbol`
` 0 ` ` 1 `
`A` `B` `A`
`B` `A` `B`

5. Start state `A`
6. The set of final states F = {`A`}

• Given the input string `10101`, D makes the following sequence of transitions from its start state `A`:
• ```    1   0   1   0   1  A   A   B   B   A   A ```
This sequence of transitions takes D from its start state `A` to the final state `A`. Thus, the input `10101` is accepted by D.

• Given the input string `101`, D makes the following transitions from its start state `A`:
• ```    1   0   1  A   A   B   B ```
This sequence of transitions takes D from its start state `A` to the nonfinal state `B`. Therefore, the input `101` is not accepted by D.

• We can prove the language defined by D is the set of all strings of 0's and 1's having an even number of 0's.

## 4. Example 2: DFA for Binary Numbers Divisible by 3

• We can create a DFA to recognize all strings of 0's and 1's representing binary numbers divisible by three. We assume the binary string 0 represents the number 0, 1 represents 1, 00 represents 0, 01 represents 1, 10 represents 2, 11 represents 3, and so on. We will also assume the empty string represents the number 0.
• Consider the following DFA D with:
1. The finite set of states Q = {`A`, `B`, `C`}
2. The input alphabet Σ = {`0`, `1`}
3. The transition function δ as represented by the following transition table:

4. ` State ` `Input Symbol`
` 0 ` ` 1 `
`A` `A` `B`
`B` `C` `A`
`C` `B` `C`

5. Start state `A`
6. The set of final states F = {`A`}

• State `A` represents all prefixes of binary strings that are congruent to 0 mod 3, state `B` all prefixes congruent to 1 mod 3, and state `C` all prefixes congruent to 2 mod 3.
• Given the input string `1001`, D makes the following transitions from its start state `A`:
• ```    1   0   0   1  A   B   C   B   A ```
In this sequence of transitions D goes from its start state `A` to the final state `A`. Thus, the binary string `1001`, which represents the number 9, is accepted by D.
• We can prove the language defined by D is the set of all binary strings representing integers divisible by three.

## 5. Nondeterministic Finite Automata

• A nondeterministic finite automaton (NFA) consists of
• A finite nonempty set of states Q.
• An input alphabet consisting of a finite set of symbols Σ.
• A transition function δ that maps Q × Σ to P(Q), the set of subsets of Q.
• Like a DFA, this transition function can be represented by a transition diagram in which the nodes are labeled by states and arcs by input symbols. Unlike a DFA, an NFA may have transitions to zero or more states from a given state on a given input symbol.
• A start state that is one of the states in Q.
• A set of final (or accepting) states F that is a subset of Q.
• An NFA accepts an input string x if there is a path of transitions from the initial state to a final state that spells out x. Note that, unlike for a DFA, in an NFA there may be more than one path from the initial state to a final state that spells out x.
• Note the definition of an extended transition function for an NFA. See Sect 2.3.3 of HMU.
• L(N), the language defined by an NFA N, is the set of strings accepted by N.

## 6. Equivalence of DFA's and NFA's: the Subset Construction

• The subset construction transforms an NFA into an equivalent DFA (one that defines the same language). The subset construction is one of the fundamental algorithms in automata theory. It shows that NFAs are no more powerful than DFAs in their ability to define languages.
• From an NFA N = (QN, Σ, δN, q0, FN), the subset construction builds a DFA D = (QD, Σ, δD, {q0}, FD) that simulates in parallel all possible moves of N on any given input as follows:
• QD is the set of all subsets of QN; that is, QD is the power set of QN.
• The input alphabet of D is the same as that of N.
• For each state S in QN and each input symbol a in Σ,
• δD(S, a) is the union of all states in δN(p, a) for all p in S.
• The start state of D is the set of states {q0}.
• FD, the set of final states of D, is the set of subsets of states of QD that contain a state in FN.
• We can prove by induction on the length of an input string w that D can reach deterministic state S in QD on w iff D can reach each nondeterministic state p contained in S on w. See Sect. 2.3.5 of HMU.

## 7. A Bad Case for the Subset Construction

• Let L be the language (a+b)*a(a+b)n-1, that is, the set of all strings of a's and b's in which the nth character from the end is an a. We can prove that every smallest DFA, one that contains the fewest number of states, that accepts L must have at least 2n states. See Sect. 2.3.6 of HMU.

## 8. Practice Problems

1. Let L be the language { `w` | `w` is any string of `a`'s and `b`'s containing at least one `a` and at least one `b` }.
1. Construct a DFA for L.
2. Show the behavior of your DFA processing the input string `aabaa`.
2. Let L be the language { `ab`x`ba` | x is any string of `a`'s, `b`'s, and `c`'s not containing the substring `ba` } over the alphabet {`a`, `b`, `c`}. This language models comments in the programming language C.
1. Construct a DFA for L.
2. Show the behavior of your DFA processing the input string `abcbaba`.
3. Let L be the language consisting of all strings of `a`'s and `b`'s containing the substring `abba`. Construct a DFA for L.
4. Let L be the language consisting of all strings of `0`'s and `1`'s having an even number of `0`'s and an even number of `1`'s.
1. Construct a DFA for L.
2. Show the behavior of your DFA processing the input string `01100101`.
3. Prove that your DFA is correct.
5. Construct an NFA that accepts all strings of `a`'s and `b`'s ending in `abb`.
1. Show all sequences of moves that your NFA can make on the input string ababb.
2. Use the subset construction to convert your NFA into an equivalent DFA.
6. Construct an NFA that accepts all strings of `a`'s and `b`'s in which the third last character is an `a`.
1. Use the subset construction to convert your NFA into an equivalent DFA.
2. Prove that any DFA recognizing this language must have at least eight states.
7. Prove that the DFA in section 3 above accepts a string w iff w has an even number of 0's.
8. Prove that the DFA in section 4 above accepts a binary string w iff w represents an integer divisible by three.