A character set for a language defines the characters that can be used in the source code, or input and output by a program. Some characters have special meaning for a language and must be available in the character set.
A character set for C should contain both upper- and lowercase alphabetic characters, the digits, and most of the punctuation, formatting, and graphic characters. The ASCII collating sequence is used most often, but other sequences such as EBCDIC could be used.
Theoretically, any C compiler will be matched to the available character set on the underlying computer so that it is possible to deviate from the above requirements. Note that C source code that uses a character set with different characteristics may not be portable. Thc ASCII character set is used for the examples in this text. A few of the algorithms in the examples rely on thc use of ASCII characters and may not work with another character set. These algorithms wi11 be noted when they arise.
C is a free-format language. That is, the programmer may format the source code in the way that makes it most readable. There are no requirements that code begin in a certain column, that statements must be contained on a single line, or that comments must be located in a special place.
The space, line feed, backspace, horizontal tab, vertical tab, form feed, and carriage return are called whitespace characters. Whitespace characters separate identifiers or other elements (tokens) in the source code. Otherwise, the compiler ignores them. Whitespace should be used to enhance a program's readability.
The idea of a token in a programming language is important in understanding how a compiler views a program. The compiler divides a C program into groups of characters that belong together. Each group is a token. The compiler then inspects thc sequence of tokens to generate the object code. Each keyword in a language is a token; so is any identifier. Other examples of tokens include a 1eft parenthesis, a right parenthesis, a left or right brace, and each operation symbol, like those for assignment or addition.
|