Regular Grammars: A Second Look
From the previous page we learned that a regular grammar specifies grammar rules that describe how valid strings in a language can be produced. Any string over the alphabet of the language that can be produced from the grammar rules is in the language. Any string that cannot be produced by application of the rules is not in the language.
Notation
Of course, there is some standard notation that is used for describing regular grammars.
- The Start Symbol. There is always a special start symbol from which the application of rules must start in order to derive a string in the language. In the only example we have seen so far, this symbol is S. Why S? Historical reasons. Grammars were invented for natural languages, like English. Since the issue with a natrual language is whether a sentence is properly formed according to the rules of the grammar, S was used to stand for "sentence." At least that's our best guess at this point. We can talk about strings that can be derived from other symbols, such as A, but these strings are not necessarily in the language. Only strings that can be derived by rule applications from the start symbol are grammatically correct strings for the grammar in question.
- Nonterminals. The capital letters we use on the left hand side of rules are usually called nonterminal symbols, or just nonterminals for short . This is becuase they themselves always must be replaced through application of a rule. That is, they are not the symbols of the language, but rather grammar symbols for rule applications. No string that contains a nonterminal symbol is in the language, because the derivation is not complete if there is still one of these symbols in a string.
- Terminals. The other symbols in the grammar are referred to as terminal symbols, (terminals for short) or alphabet symbols. They are called terminals because they represent the symbols of the alphabet that over which the grammar generates strings. No rule can be applied to them; they terminate right where they are. These are the symbols that make up strings in the language.
Standards
We also are beginning to see that there are apparently some formatting rules for regular grammars.
- The first rule is applied to the start symbol. By convention, the first rule in the grammar is one that applies to the start symbol. The start symbol does not need to be S, but it usually is, unless the language of the grammar is a real, practical language rather than just an abstract one.
- Rules are of one of the following forms:
- A → aB, where A and B are nonterminal symbols, and a is a terminal symbol
- A → a, where A is a nonterminal symbol and a is a terminal symbol
- A → ε, where A is a nonterminal symbol and ε is the empty string.
Let's look at aonther example.
Example: A Regular Grammar for Even Parity
Recall that even parity describes the set of all binary strings that contain an even number of 1's. Here is a set of grammar rules for this language.
- S → 0S
- S → ε
- S → 1A
- A → 0A
- A → 1S
Notice the production S → ε. This means that S can be replaced by the empty string. Since the empty string contains zero 1's and we count zero to be an even number, the empty string has even parity.
Now consider the string 00101101 that has even parity. The following derivation, in which at each step we apply the apprpriate rule to the capital letter on the left side of the double arrow, produces this string.
S ⇒ 0S ⇒ 00S ⇒ 001A ⇒ 0010A ⇒ 00101S ⇒ 001011A ⇒ 0010110A ⇒ 00101101S ⇒ 00101101
Notice that the last rule applied in the last step of the derivation is S → ε.
Similarly, a parse tree for this same string would look like
□
Try it Yourself
Again, if you would find it helpful, try this grammar yourself below.