This document is an annotated (by the last author) version of the original paper of the same title. It describes a set of coding standards and recommendations which are local standards for officially-supported Unix programs. The scope is coding style, not functional organization.
This document is a result of a committee formed at Indian Hill to establish a common set of coding standards and recommendations for the Indian Hill community. The scope of this work is the coding style, not the functional organization of programs. The standards in this document are not specific to ESS programming only (footnote 1). We have tried to combine previous work [1,6] on C style into a uniform set of standards that should be appropriate for any project using C (footnote 2).
A file consists of various sections that should be separated by several blank lines. Although there is no maximum length requirement for source files, files with more than about 1500 lines are cumbersome to deal with. The editor may not have enough temp space to edit the file, compilations will go slower, etc. Since most of us use 300 baud terminals, entire rows of asterisks, for example, should be discouraged (footnote 3). Also lines longer than 80 columns are not handled well by all terminals and should be avoided if possible (footnote 4).
The suggested order of sections for a file is as follows:
Unix requires certain suffix conventions for names of files to be processed by the cc command [5] (footnote 8). The following suffixes are required:
In addition the following conventions are universally followed:
Header files are files that are included in other files prior to compilation by the C preprocessor. Some are defined at the system level like stdio.h which must be included by any program using the standard I/O library. Header files are also used to contain data declarations and defines that are needed by more than one program (footnote 11). Header files should be functionally organized, i.e., declarations for separate subsystems should be in separate header files. Also, if a set of declarations is likely to change when code is ported from one machine to another, those declarations should be in a separate header file.
Header files should not be nested. Some objects like typedefs and initialized data definitions cannot be seen twice by the compiler in one compilation. On non-Unix systems this is also true of uninitialized declarations without the extern keyword (footnote 12). This can happen if include files are nested and will cause the compilation to fail.
External declarations should begin in column 1. Each declaration should be on a separate line. A comment describing the role of the object being declared should be included, with the exception that a list of defined constants do not need comments if the constant names are sufficient documentation. The comments should be tabbed so that they line up underneath each other (footnote 13). Use the tab character (CTRL I if your terminal doesn't have a separate key) rather than blanks. For structure and union template declarations, each element should be alone on a line with a comment describing it. The opening brace ( { ) should be on the same line as the structure tag, and the closing brace should be alone on a line in column 1, i.e.
struct boat {
int wllength; /* water line length in feet */
int type; /* see below */
long sarea; /* sail area in square feet */
};
/*
* defines for boat.type (footnote 14)
*/
#define KETCH 1
#define YAWL 2
#define SLOOP 3
#define SQRIG 4
#define MOTOR 5
If an external variable is initialized the equal sign should not be
omitted (footnote 15).
int x = 1;
char *msg = "message";
struct boat winner = {
40, /* water line length */
YAWL,
600l /* sail area */
};
(footnote 16)
Comments that describe data structures, algorithms, etc., should be in block comment form with the opening /* in column one, a * in column 2 before each line of comment text (footnote 17), and the closing */ in columns 2-3.
/* * Here is a block comment. * The comment text should be tabbed over (footnote 18) * and the opening /* and closing star-slash * should be alone on a line. */
Note that grep ^.\* will catch all block comments in the file. In some cases, block comments inside a function are appropriate, and they should be tabbed over to the same tab setting as the code that they describe. Short comments may appear on a single line indented over to the tab setting of the code that follows.
if (argc > 1) {
/* Get input file from command line. */
if (freopen(argv[1], "r", stdin) == NULL)
error("can't open %s\n", argv[1]);
}
Very short comments may appear on the same line as the code they describe, but should be tabbed over far enough to separate them from the statements. If more than one short comment appears in a block of code they should all be tabbed to the same tab setting.
if (a == 2) return(TRUE); /* special case */ else return(isprime(a)); /* works only for odd a */
Each function should be preceded by a block comment prologue that gives the name and a short description of what the function does (footnote 19). If the function returns a value, the type of the value returned should be alone on a line in column 1 (do not default to int). If the function does not return a value then it should not be given a return type. If the value returned requires a long explanation, it should be given in the prologue; otherwise it can be on the same line as the return type, tabbed over. The function name and formal parameters should be alone on a line beginning in column 1. Each parameter should be declared (do not default to int), with a comment on a single line. The opening brace of the function body should also be alone on a line beginning in column 1. The function name, argument declaration list, and opening brace should be separated by a blank line (footnote 20). All local declarations and code within the function body should be tabbed over at least one tab.
If the function uses any external variables, these should have their own declarations in the function body using the extern keyword. If the external variable is an array the array bounds must be repeated in the extern declaration. There should also be extern declarations for all functions called by a given function. This is particularly beneficial to someone picking up code written by another. If a function returns a value of type other than int, it is required by the compiler that such functions be declared before they are used. Having the extern delcaration in the calling function's declarations section avoids all such problems (footnote 21).
In general each variable declaration should be on a separate line with a comment describing the role played by the variable in the function. If the variable is external or a parameter of type pointer which is changed by the function, that should be noted in the comment. All such comments for parameters and local variables should be tabbed so that they line up underneath each other. The declarations should be separated from the function's statements by a blank line.
A local variable should not be redeclared in nested blocks (footnote 22). Even though this is valid C, the potential confusion is enough that lint will complain about it when given the -h option.
/*
* skyblue()
*
* Determine if the sky is blue.
*/
int /* TRUE or FALSE */
skyblue()
{
extern int hour;
if (hour < MORNING || hour > EVENING)
return(FALSE); /* black */
else
return(TRUE); /* blue */
}
/*
* tail(nodep)
*
* Find the last element in the linked list
* pointed to by nodep and return a pointer to it.
*/
NODE * /* pointer to tail of list */
tail(nodep)
NODE *nodep; /* pointer to head of list */
{
register NODE *np; /* current pointer advances to NULL */
register NODE *lp; /* last pointer follows np */
np = lp = nodep;
while ((np = np->next) != NULL)
lp = np;
return(lp);
}
Compound statements are statements that contain lists of statements enclosed in braces. The enclosed list should be tabbed over one more than the tab position of the compound statement itself. The opening left brace should be at the end of the line beginning the compound statement and the closing right brace should be alone on a line, tabbed under the beginning of the compound statement. Note that the left brace beginning a function body is the only occurrence of a left brace which is alone on a line.
if (expr) {
statement;
statement;
}
if (expr) {
statement;
statement;
} else {
statement;
statement;
}
Note that the right brace before the else and the right brace
before the while of a do-while statement (below) are the
only places where a right braces appears that is not alone on a line.
for (i = 0; i < MAX; i++) {
statement;
statement;
}
while (expr) {
statement;
statement;
}
do {
statement;
statement;
} while (expr);
switch (expr) {
case ABC:
case DEF:
statement;
break;
case XYZ:
statement;
break;
default:
statement;
break (footnote 23);
}
Note that when multiple case labels are used, they are placed
on separate lines.
The fall through feature of the C switch statement should
rarely if ever be used when code is executed before falling through
to the next one.
If this is done it must be commented for future maintenance.
if (strcmp(reply, "yes") == EQUAL) {
statements for yes
...
} else if (strcmp(reply, "no") == EQUAL) {
statements for no
...
} else if (strcmp(reply, "maybe") == EQUAL) {
statements for maybe
...
} else {
statements for none of the above
...
}
The last example is a generalized switch statement and the
tabbing reflects the switch between exactly one of several
alternatives rather than a nesting of statements.
The old versions of equal-ops =+, =-, =*, etc. should not be used. The preferred use is +=, -=, *=, etc. All binary operators except . and -> should be separated from their operands by blanks (footnote 24). In addition, keywords that are followed by expressions in parentheses should be separated from the left parenthesis by a blank (footnote 25). Blanks should also appear after commas in argument lists to help separate the arguments visually. On the other hand, macros with arguments and function calls should not have a blank between the name and the left parenthesis. In particular, the C preprocessor requires the left parenthesis to be immediately after the macro name or else the argument list will not be recognized. Unary operators should not be separated from their single operand. Since C has some unexpected precedence rules, all expressions involving mixed operators should be fully parenthesized.
Examples
a += c + d; a = (a + b) / (c * d); strp->field = str.fl - ((x & MASK) >> DISP); while (*d++ = *s++) ; /* EMPTY BODY */
Individual projects will no doubt have their own naming conventions. There are some general rules however.
Numerical constants should not be coded directly (footnote 30). The define feature of the C preprocessor should be used to assign a meaningful name. This will also make it easier to administer large programs since the constant value can be changed uniformly by changing only the define. The enumeration data type is the preferred way to handle situations where a variable takes on only a discrete set of values, since additional type checking is available through lint.
There are some cases where the constants 0 and 1 may appear as themselves instead of as defines. For example if a for loop indexes through an array, then
for (i = 0; i < ARYBOUND; i++)is reasonable while the code
fptr = fopen(filename, "r");
if (fptr == 0)
error("can't open %s\n", filename);
is not.
In the last example the defined constant NULL is available as
part of the standard I/O library's header file stdio.h and
must be used in place of the 0.
The advantages of portable code are well known. This section gives some guidelines for writing portable code, where the definition of portable is taken to mean that a source file contains portable code if it can be compiled and executed on different machines with the only source change being the inclusion of possibly different header files. The header files will contain defines and typedefs that may vary from machine to machine. Reference [1] contains useful information on both style and portability. Many of the recommendations in this document originated in [1]. The following is a list of pitfalls to be avoided and recommendations to be considered when designing portable code:
type pdp11 3B IBM
________________________
char 8 8 8
short 16 16 16
int 16 32 32
long 32 32 32
In general if the word size is important, short or long
should be used to get 16 or 32 bit items on any of the above machines
(footnote 33).
If a simple loop counter is being used where either 16 or 32 bits will
do, then use int, since it will get the most efficient (natural)
unit for the current machine (footnote 34).
x &= 0177770will clear only the three rightmost bits of an int on a PDP11. On a 3B it will also clear the entire upper halfword. Use
x &= ~07instead which works properly on all machines (footnote 35).
if (f() != FAIL)is better than
if (f())even though FAIL may have the value 0 which is considered to mean false by C (footnote 39). This will help you out later when somebody decides that a failure return should be -1 instead of 0 (footnote 40).
Lint is a C program checker [2] that examines C source files to detect and report type incompatibilities, inconsistencies between function definitions and calls, potential program bugs, etc. It is expected that projects will require programs to use lint as part of the official acceptance procedure (footnote 46). In addition, work is going on in department 5521 to modify lint so that it will check for adherence to the standards in this document.
It is still too early to say exactly which of the standards given here will be checked by lint. In some cases such as whether a comment is misleading or incorrect there is little hope of mechanical checking. In other cases such as checking that the opening brace of a function body is alone on a line in column 1, the test has already been added (footnote 47). Future bulletins will be used to announce new additions to lint as they occur.
It should be noted that the best way to use lint is not as a barrier that must be overcome before official acceptance of a program, but rather as a tool to use whenever major changes or additions to the code have been made. Lint can find obscure bugs and insure portability before problems occur.
This section contains some miscellaneous do's and don'ts.
while ((c = getchar()) != EOF) {
process the character
}
Using embedded assignment statements to improve run-time performance
is also possible.
However, one should consider the tradeoff between increased speed and
decreased maintainability that results when embedded assignments are
used in artificial places.
For example, the code:
a = b + c; d = a + r;should not be replaced by
d = (a = b + c) + r;even though the latter may save one cycle. Note that in the long run the time difference between the two will decrease as the optimizer gains maturity, while the difference in ease of maintenance will increase as the human memory of what's going on in the latter piece of code begins to fade (footnote 49).
(x >= 0) ? x : -xNested ? : operators can be confusing and should be avoided if possible. There are some macros like getchar where they can be useful. The comma operator can also be useful in for statements to provide multiple initializations or incrementations.
for (...)
for (...) {
...
if (disaster)
goto error;
}
...
error:
clean up the mess
When a goto is necessary the accompanying label should be alone
on a line and tabbed one tab position to the left of the associated
code that follows.
Individual projects may wish to establish additional standards beyond those given here. The following issues are some of those that should be adddressed by each project program administration group.
A set of standards has been presented for C programming style. One of the most important points is the proper use of white space and comments so that the structure of the program is evident from the layout of the code. Another good idea to keep in mind when writing code is that it is likely that you or someone else will be asked to modify it or make it run on a different machine sometime in the future.
As with any standard, it must be followed if it is to be useful. The Indian Hill version of lint will enforce those standards that are amenable to automatic checking. If you have trouble following any of these standards don't just ignore them. Programmers at Indian Hill should bring their problems to the Software Development System Group (Lee Kirchhoff, contact) in department 5522. Programmers outside Indian Hill should contact the Processor Application Group (Layne Cannon, contact) in department 5512 (footnote 53).
#define STREQ(a, b) (strcmp((a), (b)) == 0)
/* * The C Style Summary Sheet Block comment, * by Henry Spencer, U of T Zoology describes file. */ #include; Headers; don't nest. typedef int SEQNO; /* ... */ Global definitions. #define STREQ(a, b) (strcmp((a), (b)) == 0) static char *foo = NULL; /* ... */ Global declarations. struct bar { Static whenever poss. SEQNO alpha; /* ... */ # define NOSEQNO 0 int beta; /* ... */ Don't assume 16 bits. }; /* * Many unnecessary braces, to show where. Functions. */ static int /* what is returned */ Don't default int. bletch(a) int a; /* ... */ Don't default int. { int bar; /* ... */ extern int errno; /* ..., changed here */ extern char *index(); if (foobar() != FAIL) { if (!isvalid()) { return(5); errno = ERANGE; } } else { x = &y + z->field; while (x == (y & MASK)) { } f += (x >= 0) ? x : -x; } for (i = 0; i < BOUND; i++) { /* Use lint -hpcax. */ do { } /* Avoid nesting ?: */ } while (index(a, b) != NULL); if (STREQ(x, "foo")) { x |= 07; switch (...) { } else if (STREQ(x, "bar")) { case ABC: x &= ~07; case DEF: } else if (STREQ(x, "ugh")) { printf("...", a, b); /* Avoid gotos */ break; } else { case XYZ: /* and continues. */ x = y; } /* FALLTHROUGH */ default: while ((c = getc()) != EOF) /* Limit imbedded =s. */ ; /* NULLBODY */ break; } }