# COMS W3261 CS Theory Lecture 5: Properties of Regular Languages - I

## 1. Closure Properties of Regular Languages

• Given a set, a closure property of the set is an operation that when applied to members of the set always returns as its answer a member of that set. For example, the set of integers is closed under addition.
• The set of regular languages is closed under the operations of
• union, intersection, complement, difference
• concatenation, Kleene closure
• reversal
• homomorphism, inverse homomorphism
• We can use finite automata or regular expressions to prove these closure properties about the set of regular languages.
• For example, given two DFA's D1 and D2 for regular languages L1 and L2, respectively, we can construct an ε-NFA for the concatenation L1L2 by adding epsilon-transitions from the final states of D1 to the start state of D2, making the start state of D1 the start state of the ε-NFA, and making the set of final states of D2 the set of final states of the ε-NFA.
• Alternatively, given two regular expressions R1 and R2 for L1 and L2, respectively, the regular expression R1R2 generates L1L2.
• Some closure properties are easier to prove using finite automata and others using regular expressions.
• Closure properties for regular languages are often useful in proving that a given language is regular.
• For example, we can show that the set of strings of a's and b's that do not contain the substring abb is regular by pointing out that this set is the complement of the language generated by the regular expression (a+b)*abb(a+b)*.

## 2. The Pumping Lemma for Regular Languages

• The pumping lemma gives a necessary (but not sufficient) condition for a language to be regular. It is useful for proving given languages are not regular, and hence cannot be recognized by a finite automaton or defined by a regular expression.
• The pumping lemma for regular languages states that for every nonfinite regular language L, there exists a constant n that depends on L such that for all w in L with |w| ≥ n, there exists a decomposition of w into xyz such that the following three conditions must hold:
1. y ≠ ε,
2. |xy| ≤ n, and
3. for all k ≥ 0, the string xykz is in L.

Proof.
Suppose D is a DFA such that L(D) = L. Consider a long input string w = a1a2am where mn and let q0q1q2qm be the sequence of states that D goes through in processing the input string w. By the pigeonhole principle, there must be at least one repeated state in this sequence of states. Let qi be the first repeated state in this sequence so there is a closest following state qj, i < jn, such that qj = qi.

We can now break w into three substrings x, y, z such that w = xyz and
1. x = a1a2ai takes D from state q0 to state qi
2. y = ai+1ai+2aj takes D from state qi to state qj
3. z = aj+1aj+2am takes D from state qj to state qm

The three conditions of the pumping lemma now follow:
1. y cannot be the empty string since j is strictly greater than i.
2. The total length of xy must be less than or equal to n since qi is the first repeated state and jn.
3. We can repeat the substring y within w zero or more times since it takes D from state qi to state qj
4. and qj = qi. So D must accept the inputs xz, xyz, xyyz, xyyyz, and so on.

• One important use of the pumping lemma is to prove some languages are not regular.
• Example: The language L consisting of all strings of a's and b's of the form aibi, i ≥ 0, is not regular.
• The proof will be by contradiction. Assume L is regular. Then by the pumping lemma there is a constant n associated with L such that for all w in L with |w| ≥ n, w can be written as xyz such that
1. y ≠ ε,
2. |xy| ≤ n, and
3. for all k ≥ 0, the string xykz is in L.
• Since |xy| ≤ n, xy = am for some 0 < mn.
• Setting k = 0, condition (3) of the pumping lemma says xz must also be in L.
• But xz is of the form apbn, where p < n.
• This violates the requirement that xz must be in L. We have now demonstrated a contradiction to the assertion that L is regular.
• We can capture the essence of this proof with the observation "finite automata can't count."

## 3. Practice Problems

1. Let FirstHalf(L) = { x | xy is in L and |x| = |y| }. Show that if L is regular, then FirstHalf(L) is also regular.
2. Show that the language consisting of all strings of balanced parentheses is not regular.
3. Show that the language consisting of all strings of a's and b's that read the same forwards as backwards is not regular.
4. Prove that the language L = { w | w = aibj where i is not equal to j } is not regular.
5. [Hard] Let FTLT(L) = { xz | xyz is in L and |x| = |y| = |z| }. (FTLT(L) concatenates the first-third and last-third of strings in L whose length is a multiple of 3.) Show that if L is regular, then FTLT(L) is not always regular.