chapter002 Finite State Automata, Rebular Languages, and Regular Expressions
section002Finite State Automata
A Formal Definition of Regular Languages

A Formal Definition of Regular Languages

We just defined an FSA in formal terms on the previous page, but just to keep you from flipping back and forth, we'll copy the definition here:

A finite state automaton is a mathematical object with five conponents, M = (Q, Σ, δ, q0, F), where

  • Q is a finite set of states
  • Σ is a finite set of input symbols
  • δ: Q×Σ →Q is a total function (called the transition function)
  • q0Σ is the start state
  • FQ is the set of accepting states.   □

We know by now what an FSA computes. It simply computes whether a given string over the alphabet Σ is accepted or not. Some strings it accepts, others it rejects.  Yes or no, in or out, up or down to every string submitted to it.  The FSA thus solves a decision problem, namely whether a given input string is in the language recognized by the FSA or not.

Intuitively we know how an FSA works.  We supply a string to the FSA, let it run on that string, and check whether the FSA is in an accepting state or a non-accepting state at the end. Intuition is good, but now we need to specifiy this more formally.  We need some way to convey that a computation of an FSA is what happens when:

That is, we need to know what state the FSA winds up in when δ is applied to the current state and input symbol at each step.  If the input string, w, to be processed by the FSA is

w = a1 a2...an

where each ai is a symbol from Σ, we can write the computation of the FSA as

δ(...(δ(δ(q0, a1), a2)...),an)

This long expression is constructed by normal function composition.  Notice that the innermost function evaluation is

δ(q0, a1).

Evaluation of the function with these value results in some new state, p, so the next innermost function evaluation will be

δ(p, a2),

which will lead to another state r, and so forth, until the last (outermost) function evaluation is applied to the last input symbol, an, leading to the last state of the computation. If that state is an accepting state, then the original string w is accepted by the FSA, otherwise it is rejected.

This way of writing the computation, δ(..(δ(δ(q0, a1), a2)...),an), is just too cumbersome and not entirely clear, so we define a new function that captures the same meaning.

Definition

Let M = (Q, Σ, δ, q0, F), = be a finite state automaton. Define the function δ ̂ : Q× Σ * Q as follows:

δ ̂ (q, w) = δ ̂ (δ(q, a),x) for any string w Σ + , where a is the first symbol of  w; that is, w = ax, aΣ, and x Σ *

δ ̂ (q, ε) = q 

A string w Σ * is said to be accepted by M if δ ̂ (q0, w) ∈ F; otherwise the string w is said to be rejected.

The set

L(M) = {w | δ ̂ (q0,w) ∈ F}

is refered to as the set, or language, recognized by M. □

 

Notice that this is a recursive definition that simply states with mathematical precision that δ ̂ applied to any string over the alphabet of an FSA is computed by applying δ at each step of the computation, given the current state and curent input symbol, just as we would expect. The last line of the definition of δ ̂ shows how the recursion ends, and also states that δ ̂ applied to any state and the empty string is just that state (i.e., the FSA does not change states if there are no input symbols left to read). Also note that we say that an FSA accepts or rejects single strings, but that it recognizes an entire language of strings.

Definition

A set L of strings over a specified alphabet is said to be a regular language if and only if L is recognized by some FSA. □