Lecture 5: Properties of Regular Languages - I

- Given a set, a
*closure property*of the set is an operation that when applied to members of the set always returns as its answer a member of that set. For example, the set of integers is closed under addition. - The set of regular languages is closed under the operations of
- union, intersection, complement, difference
- concatenation, Kleene closure
- reversal
- homomorphism, inverse homomorphism
- We can use finite automata or regular expressions to prove these closure properties about the set of regular languages.
- For example, given two DFA's
*D*_{1}and*D*_{2}for regular languages*L*_{1}and*L*_{2}, respectively, we can construct an ε-NFA for the concatenation*L*_{1}*L*_{2}by adding epsilon-transitions from the final states of*D*_{1}to the start state of*D*_{2}, making the start state of*D*_{1}the start state of the ε-NFA, and making the set of final states of*D*_{2}the set of final states of the ε-NFA. - Alternatively, given two regular expressions
*R*_{1}and*R*_{2}for*L*_{1}and*L*_{2}, respectively, the regular expression*R*_{1}*R*_{2}generates*L*_{1}*L*_{2}. - Some closure properties are easier to prove using finite automata and others using regular expressions.
- Closure properties for regular languages are often useful in proving that a given language is regular.
- For example, we can show that the set of strings of
*a*'s and*b*'s that do not contain the substring*abb*is regular by pointing out that this set is the complement of the language generated by the regular expression (*a*+*b*)**abb*(*a*+*b*)*.

- The pumping lemma gives a necessary (but not sufficient) condition for a language to be regular. It is useful for proving given languages are not regular, and hence cannot be recognized by a finite automaton or defined by a regular expression.
- The pumping lemma for regular languages
states that for every nonfinite regular language
*L*, there exists a constant*n*that depends on*L*such that for all*w*in*L*with |*w*| ≥*n*, there exists a decomposition of*w*into*xyz*such that the following three conditions must hold: *y*≠ ε,- |
*xy*| ≤*n*, and - for all
*k*≥ 0, the string*xy*is in^{k}z*L*. - Proof.
- Suppose
*D*is a DFA such that*L*(*D*) =*L*. Consider a long input string*w*=*a*_{1}*a*_{2}…*a*where_{m}*m*≥*n*and let*q*_{0}*q*_{1}*q*_{2}…*q*be the sequence of states that_{m}*D*goes through in processing the input string*w*. By the pigeonhole principle, there must be at least one repeated state in this sequence of states. Let*q*be the first repeated state in this sequence so there is a closest following state_{i}*q*,_{j}*i*<*j*≤*n*, such that*q*=_{j}*q*._{i} - We can now break
*w*into three substrings*x*,*y*,*z*such that*w*=*xyz*and *x*=*a*_{1}*a*_{2}…*a*takes_{i}*D*from state*q*_{0}to state*q*_{i}*y*=*a*_{i+1}*a*_{i+2}…*a*takes_{j}*D*from state*q*to state_{i}*q*_{j}*z*=*a*_{j+1}*a*_{j+2}…*a*takes_{m}*D*from state*q*to state_{j}*q*_{m}- The three conditions of the pumping lemma now follow:
*y*cannot be the empty string since*j*is strictly greater than*i*.- The total length of
*xy*must be less than or equal to*n*since*q*is the first repeated state and_{i}*j*≤*n*. - We can repeat the substring
*y*within*w*zero or more times since it takes*D*from state*q*to state_{i}*q*_{j} and - One important use of the pumping lemma is to prove some languages are not regular.
- Example: The language
*L*consisting of all strings of*a*'s and*b*'s of the form*a*^{i}*b*^{i},*i*≥ 0, is not regular. - The proof will be by contradiction. Assume
*L*is regular. Then by the pumping lemma there is a constant*n*associated with*L*such that for all*w*in*L*with |*w*| ≥*n*,*w*can be written as*xyz*such that *y*≠ ε,- |
*xy*| ≤*n*, and - for all
*k*≥ 0, the string*xy*^{k}*z*is in*L*. - Since |
*xy*| ≤*n*,*xy*=*a*^{m}for some 0 <*m*≤*n*. - Setting
*k*= 0, condition (3) of the pumping lemma says*xz*must also be in*L*. - But
*xz*is of the form*a*^{p}*b*^{n}, where*p*<*n*. - This violates the requirement that
*xz*must be in*L*. We have now demonstrated a contradiction to the assertion that*L*is regular. - We can capture the essence of this proof with the observation "finite automata can't count."

- Let FirstHalf(
*L*) = {*x*|*xy*is in*L*and |*x*| = |*y*| }. Show that if*L*is regular, then FirstHalf(*L*) is also regular. - Show that the language consisting of all strings of balanced parentheses is not regular.
- Show that the language consisting of all strings of
*a*'s and*b*'s that read the same forwards as backwards is not regular. - Prove that the language
*L*= {*w*|*w*=*a*^{i}*b*^{j}where*i*is not equal to*j*} is not regular. - [Hard] Let FTLT(
*L*) = {*xz*|*xyz*is in*L*and |*x*| = |*y*| = |*z*| }. (FTLT(*L*) concatenates the first-third and last-third of strings in*L*whose length is a multiple of 3.) Show that if*L*is regular, then FTLT(*L*) is not always regular.

- HMU: Sections 4.1 - 4.2

aho@cs.columbia.edu verma@cs.columbia.edu