Untitled Document

< Previous

Page 8 of 8

Determining Whether a Langauge is Regular or Not

Suppose, perish the thought, that we are confronted with a language and must determine whether it is regular or not. What tools do we have to help us in this effort? None really. At least not yet. All we have is the definition of a regular language: it is a language that is recognized by some FSA. Thus, we must rely on the definition of an FSA to prove that a language is or is not regular.

Let's copy the definition here again:

A finite state automaton is a mathematical object with five conponents, M = (Q, Σ, δ, q₀, F), where

Q is a finite set of states
Σ is a finite set of input symbols
δ: Q×Σ →Q is a total function (called the transition function)
q₀∈Σ is the start state
F ⊆ Q is the set of accepting states. □

Let's try some examples to see what we mean.

Example

Is the language L = {w ⊆ {0,1}* | w contains a substring 101} regular?

The answer is yes. We know so, because we have already constructed an FSA for this language (click here for a review). Thus, by the definition of regular languages, L must be regular because it has an FSA that recognizes it.

Example

Is the language L = {ww | w ⊆ {0,1}*} regular?

(First, how do we read the definition of L? Whenever we see the same name used in a mathematical definition, the name has the same meaning in both usages. Here we see w repeated, as in ww, where w is a binary string. The fact that two w's appear together means that the binary string in question can be split in half, and the first half equals the second. We could write this same definition as L = {wx | w, x ⊆ {0,1}* and w = x}. Both definitions mean the same thing).

Intuition. We first use our intuition on this language. How could we remember the first half of a binary string in order to compare it with the second half of the string with an FSA? It would seem that we would need to somehow remember the first half, which could be arbitrarily long (certainly longer than the fixed number of states that any FSA recognizing this language would have) , in order to check the second half against it. No FSA can remember an arbitray number of different values.

This leads us to try to prove the following lemma.

Lemma

The language L = {ww | w ⊆ {0,1}*} is not regular.

Proof

We prove this lemma by contradiction.

Suppose that L is regular. Then there exists some FSA M = (Q, Σ, δ, q₀, F) that recognizes L, where M has some fixed number of states k (i.e., |Q| = k).

Consider the string x = 0^k10^k1, which is in L (i.e., it has the form ww; that is, split in half, the first half equals the second). Since x is in L, M accepts x. That is,

$\hat{δ}$ (q₀, x) = r ⊆ F

However, since M has k states, and x has the prefix 0^k, we know that M will have to loop while processing this prefix 0^k. That is, after zero or more, say i, of these 0's M will encounter some state p that it will encounter again after reading some more (one or more) of these 0's. From that point, M will continue processing until it reaches the end of the entire string 0^k10^k1 and winds up in state r ⊆ F. Visually, we can write this as:

q₀0ⁱ p 0^j p 0^k-i-j10^k1 r

where the states arrived at after processing the shown substrings are highlighted in bold.

Oops. Now we see that if M is in state p, it will arrive at state r after processing 0^k-i-j10^k1 regardless of how it got to state p. So, we can create a new string by removing the 0^j substring identified above that takes M from state p back to state p in a loop, giving us 0ⁱ 0^k-i-j10^k1. But M will accept this string as well, as we can see by writing the computation as

q₀0ⁱ p 0^k-i-j10^k1 r

So this new string, which has j > 0 fewer zeros in the prefix than the original string is also in L. This is a contradiction, since L was defined to not contain such strings that did not have the form ww for some binary string w.

The only thing that can be questioned about this line of reasoning is our assumption that L is regular. Thus, L is not regular, and we have proved our lemma.

However, we can, and should write this with more mathematical precision. We can do it this way.

Since

$\hat{δ}$ (q₀, x) = r ⊆ F

and since M has k states, there is a way to factor the computation of $\hat{δ}$ on 0^k10^k1 that exhibits a loop in the processing of 0^k10^k1. That is, there must be some state p such that

$\hat{δ}$ (q₀, 0ⁱ) = p,

$\hat{δ}$ (p, 0^j) = p, and

$\hat{δ}$ (p, 0^k-i-j10^k1) = r ⊆ F.

From this it is clear that the identified 0^j substring can be removed, creating a new binary string z = 0ⁱ 0^k-i-j10^k1 = 0^k-j10^k1 not in L that M will also accept, leading to the contradiction. □