Formally Defining Regular Grammars

Page 4 of 4

A Formal Definition of Regular Grammars

OK, it's time now to put on our theoretician's hat, buckle down, and come up with a formal definition for a regular grammar. We've seen what they look like and have a concept of their form. Now we need to capture all of that in with formal notation. We should be getting pretty good at this by now. In any case, we know we can do it!

Let's give it a try.

Definition: Regular Grammar

A regular grammar is a mathematical object, G, with four components, G = (N, Σ, P, S), where

N is a nonempty, finite set of nonterminal symbols,
Σ is a finite set of terminal symbols , or alphabet, symbols,
P is a set of grammar rules, each of one having one of the forms
- A → aB
- A → a
- A → ε, for A, B ∈ N, a ∈ Σ, and ε the empty string, and
S ∈ N is the start symbol. □

Notice that this definition captures all of the components of a regular grammar that we have heretofore identified (heretofor, how often does one get to use that word?).

This definition is pretty useless by itself. It tells us what a regular grammar is, but not what it does. Sort of like some people we know. The purpose of a regular grammar is to specify how to form grammatically correct strings in the language the grammar represents. All strings in Σ^* that can be produced from the start symbol by application of the rules of the grammar are in the language of this grammar, all other strings are not. Let's see how we might capture that in a definition.

First we need yet another definition. We have talked about how strings in a language can be derived by applying the rules of the grammar to nonterminals, starting with the start symbol S. We now need to make that concept formal. We do this by defining the operation "derives in one step " as we have used it in our previous examples and exercises by way of the ⇒ arrow. For example, we have said that

010110A derives in one step 0101101S

or simply

010110A ⇒ 0101101S

if there is a rule A → 1S in the grammar we are using. (Notice that because of the way regular grammars are defined, all intermediate strings on the way to a string in the language have the form "zero or more terminals followed by one nonterminal".)

Definition: Derivations in a Regular Grammar

Let G = (N, Σ, P, S) be a regular grammar. We will define three different notations that we use for derivations in G:

⇒      (single step derivations)

$\overset{k}{\Rightarrow}$       (k step derivations)

$\overset{*}{\Rightarrow}$       (derivations of 0 or more steps)

Part 1. The definition of ⇒. We define the operation ⇒, called "derives in one step" to be the mapping from strings of the form a₁...a_nA to strings of the form a₁... a_nα, or

a₁...a_nA ⇒ a₁... a_nα

if and only if there is a rule

A → α

in P, where each of the a₁ are terminal symbols in Σ and A is a nonterminal in N. (Of course, by the definition of regular grammars, α must have one of the forms bB, b, or ε since α is the right hand side of a rule or a regular grammar).

Part 2. The definition of $\overset{k.}{\Rightarrow}$ We further define the operation $\overset{k}{\Rightarrow}$ , called a k-step derivation, to be the extension of ⇒ to k consecutive single-step derivations. Thus, there is a derivation

a₁...a_nA $\overset{k}{\Rightarrow}$ a₁...a_n, a_n+1...a_n+k-1α

if and only if there are k single-step derivations in G

a₁...a_nA ⇒ a₁...a_na_n+1A₁ ⇒ ... ⇒ a₁...a_na_n+1...a_n+k-1A_k-1 ⇒ a₁...a_na_n+1...a_n+k-1α

for k ≥ 0. If k = 0 this just means that no rules are applied, so the string being derived from does not change:

a₁...a_nA $\overset{0}{\Rightarrow}$ a₁...a_nA

Part 3. The definition of $\overset{*}{\Rightarrow}$ . We define $\overset{*}{\Rightarrow}$ to stand for zero or more single step derivations in sequence. □

Finally, we are able to give a formal definition for the set of all strings generated by a grammar G. We call this set the language of G.

Definition: The Language Generated by a Regular Grammar

Let G = (N, Σ, P, S) be a regular grammar. We define the language generated by G to be L(G)

L(G) = {w | S $\overset{*}{\Rightarrow}$ w, where w ∈ Σ^*}

All this formal definition says is that the language of a regular grammar is the set of all strings over the alphabet Σ that can be derived from the start symbol by application of the grammar rules starting with S. □

Although this appears to be a lot of words and unnecessary mathematical notation, it turns out to be unbelievably convenient. From now on we can simply discuss the derivation of strings in a regular grammar by way of the production rules (→) and the three notations for derivations (⇒, $\overset{k}{\Rightarrow}$ , and $\overset{*}{\Rightarrow}$ ).