This manual describes the C language specified by the draft submitted to ANSI on 31 October, 1988, for approval as “American National Standard for Information Systems—Programming Language C, X3.159-1989.” The manual is an interpretation of the proposed standard, not the Standard itself, although care has been taken to make it a reliable guide to the language.
For the most part, this document follows the broad outline of the Standard, which in turn follows that of the first edition of this book, although the organization differs in detail. Except for renaming a few productions, and not formalizing the definitions of the lexical tokens or the preprocessor, the grammar given here for the language proper is equivalent to that of the Standard.
Throughout this manual, commentary material is indented and written in smaller type, as this is. Most often these comments highlight ways in which ANSI Standard C differs from the language defined by the first edition of this book, or from refinements subsequently introduced in various compilers.
A program consists of one or more translation units stored in files. It is translated in several phases, which are described in §A12. The first phases do low-level lexical transformations, carry out directives introduced by lines beginning with the # character, and perform macro definition and expansion. When the preprocessing of §A12 is complete, the program has been reduced to a sequence of tokens.
There are six classes of tokens: identifiers, keywords, constants, string literals, operators, and other separators. Blanks, horizontal and vertical tabs, newlines, formfeeds, and comments as described below (collectively, “white space”) are ignored except as they separate tokens. Some white space is required to separate otherwise adjacent identifiers, keywords, and constants.
If the input stream has been separated into tokens up to a given character, the next token is the longest string of characters that could constitute a token.
The characters /* introduce a comment, which terminates with the characters */. Comments do not nest, and they do not occur within string or character literals.
An identifier is a sequence of letters and digits. The first character must be a letter; the underscore _ counts as a letter. Upper and lower case letters are different. Identifiers may have any length, and for internal identifiers, at least the first 31 characters are significant; some implementations may make more characters significant. Internal identifiers include preprocessor macro names and all other names that do not have external linkage (§A11.2). Identifiers with external linkage are more restricted: implementations may make as few as the first six characters as significant, and may ignore case distinctions.
The following identifiers are reserved for use as keywords, and may not be used otherwise:
auto double int struct
break else long switch
case enum register typedef
char extern return union
const float short unsigned
continue for signed void
default goto sizeof volatile
do if static while
Some implementations also reserve the words fortran and asm.
The keywords const, signed, and volatile are new with the ANSI standard; enum and void are new since the first edition, but in common use; entry, formerly reserved but never used, is no longer reserved.
There are several kinds of constants. Each has a data type; §A4.2 discusses the basic types.
constant:
integer-constant
character-constant
floating-constant
enumeration-constant
An integer constant consisting of a sequence of digits is taken to be octal if it begins with 0 (digit zero), decimal otherwise. Octal constants do not contain the digits 8 or 9. A sequence of digits preceded by 0x or 0X (digit zero) is taken to be a hexadecimal integer. The hexadecimal digits include a or A through f or F with values 10 through 15.
An integer constant may be suffixed by the letter u or U, to specify that it is unsigned. It may also be suffixed by the letter l or L to specify that it is long.
The type of an integer constant depends on its form, value and suffix. (See §A4 for a discussion of types.) If it is unsuffixed and decimal, it has the first of these types in which its value can be represented: int, long int, unsigned long int. If it is unsuffixed octal or hexadecimal, it has the first possible of these types: int, unsigned int, long int, unsigned long int. If it is suffixed by u or U, then unsigned int, unsigned long int. If it is suffixed by l or L, then long int, unsigned long int.
The elaboration of the types of integer constants goes considerably beyond the first edition, which merely caused large integer constants to be long. The U suffixes are new.
A character constant is a sequence of one or more characters enclosed in single quotes, as in ′x′. The value of a character constant with only one character is the numeric value of the character in the machine’s character set at execution time. The value of a multi-character constant is implementation-defined.
Character constants do not contain the ′ character or newlines; in order to represent them, and certain other characters, the following escape sequences may be used.
newline | NL (LF) | ||
horizontal tab | HT | ||
vertical tab | VT | v | |
backspace | BS | ||
carriage return | CR | ||
formfeed | FF | f | |
audible alert | BEL | a | |
backslash | \ | ||
question mark | ? | ? | |
single quote | ′ | ′ | |
double quote | " | " | |
octal number | ooo | ooo | |
hex number | hh | xhh |
The escape ooo consists of the backslash followed by 1, 2, or 3 octal digits, which are taken to specify the value of the desired character. A common example of this construction is