Determining Whether a Langauge is Regular or Not
Suppose, perish the thought, that we are confronted with a language and must determine whether it is regular or not. What tools do we have to help us in this effort? None really. At least not yet. All we have is the definition of a regular language: it is a language that is recognized by some FSA. Thus, we must rely on the definition of an FSA to prove that a language is or is not regular.
Let's copy the definition here again:
A finite state automaton is a mathematical object with five conponents, M = (Q, Σ, δ, q0, F), where
- Q is a finite set of states
- Σ is a finite set of input symbols
- δ: Q×Σ →Q is a total function (called the transition function)
- q0∈Σ is the start state
- F ⊆ Q is the set of accepting states. □
Let's try some examples to see what we mean.
Example
Is the language L = {w ⊆ {0,1}* | w contains a substring 101} regular?
The answer is yes. We know so, because we have already constructed an FSA for this language (click here for a review). Thus, by the definition of regular languages, L must be regular because it has an FSA that recognizes it.
Example
Is the language L = {ww | w ⊆ {0,1}*} regular?
(First, how do we read the definition of L? Whenever we see the same name used in a mathematical definition, the name has the same meaning in both usages. Here we see w repeated, as in ww, where w is a binary string. The fact that two w's appear together means that the binary string in question can be split in half, and the first half equals the second. We could write this same definition as L = {wx | w, x ⊆ {0,1}* and w = x}. Both definitions mean the same thing).
Intuition. We first use our intuition on this language. How could we remember the first half of a binary string in order to compare it with the second half of the string with an FSA? It would seem that we would need to somehow remember the first half, which could be arbitrarily long (certainly longer than the fixed number of states that any FSA recognizing this language would have) , in order to check the second half against it. No FSA can remember an arbitray number of different values.
This leads us to try to prove the following lemma.
Lemma
The language L = {ww | w ⊆ {0,1}*} is not regular.
Proof
We prove this lemma by contradiction.
Suppose that L is regular. Then there exists some FSA M = (Q, Σ, δ, q0, F) that recognizes L, where M has some fixed number of states k (i.e., |Q| = k).
Consider the string x = 0k10k1, which is in L (i.e., it has the form ww; that is, split in half, the first half equals the second). Since x is in L, M accepts x. That is,
(q0, x) = r ⊆ F
However, since M has k states, and x has the prefix 0k, we know that M will have to loop while processing this prefix 0k. That is, after zero or more, say i, of these 0's M will encounter some state p that it will encounter again after reading some more (one or more) of these 0's. From that point, M will continue processing until it reaches the end of the entire string 0k10k1 and winds up in state r ⊆ F. Visually, we can write this as:
q0 0i p 0j p 0k-i-j10k1 r
where the states arrived at after processing the shown substrings are highlighted in bold.
Oops. Now we see that if M is in state p, it will arrive at state r after processing 0k-i-j10k1 regardless of how it got to state p. So, we can create a new string by removing the 0j substring identified above that takes M from state p back to state p in a loop, giving us 0i 0k-i-j10k1. But M will accept this string as well, as we can see by writing the computation as
q0 0i p 0k-i-j10k1 r
So this new string, which has j > 0 fewer zeros in the prefix than the original string is also in L. This is a contradiction, since L was defined to not contain such strings that did not have the form ww for some binary string w.
The only thing that can be questioned about this line of reasoning is our assumption that L is regular. Thus, L is not regular, and we have proved our lemma.
However, we can, and should write this with more mathematical precision. We can do it this way.
Since
(q0, x) = r ⊆ F
and since M has k states, there is a way to factor the computation of on 0k10k1 that exhibits a loop in the processing of 0k10k1. That is, there must be some state p such that
(q0, 0i) = p,
(p, 0j) = p, and
(p, 0k-i-j10k1) = r ⊆ F.
From this it is clear that the identified 0j substring can be removed, creating a new binary string z = 0i 0k-i-j10k1 = 0k-j10k1 not in L that M will also accept, leading to the contradiction. □