ewhac 2021 C Coding Guidelines

2019.05.21

Last updated: 2024.05.01

Every place I've worked has an established style standard for writing C code. And every time I read it, it becomes blindingly apparent that the local standard was informed not by a desire to make code more readable or easier for newcomers to understand, but rather by the personal style quirks of whoever happened to check in the first few files.

As of this writing, the last place I worked tried to establish a C style guide, which itself took cues from the MISRA C style guide -- which you have to pay money for! The resulting style guide was never formally ratified and never enforced. And that incident pretty much served as the impetus to finally write down my thoughts on the subject, accumulated by working with the language over the last 40 years, and which no one ever asked for, but for which you all shall now be judged.

These guidelines are also informed by what some may regard as a "primitive" method of writing code -- namely, with a stock install of Vim, a plain text editor with no code analysis or navigation assists, an environment not altogether different from what may face you when visiting a remote server with a minimal OS install.

I generally don't realize these individual points until I encounter it done wrong somewhere. So this document will accrete over time.

It took me several years to finally realize this, but the Guiding Principle behind the all the ranting and armwaving up ahead is this:

Don't make me hunt for important things.

File Format

C source files should be written with "UNIX" style newlines (single linefeed character (0x0A, ^J, \n) ends each line). If you are editing on a platform that defaults to something different, reconfigure your editor.

All lines in the source file, including the last line, shall be terminated with a newline.

Rationale

If you cat a file whose last line has no newline, then the last line will have your command prompt tacked on at the end. Also, the last line should not require special handling. Also, GCC will whine if it's missing.

That this is an issue at all is the Mac's fault. No, seriously, the Mac was the first to "reveal" the terminating newline in their editing widgets, because they generalized their string editing widget into a free-form text editing widget – the result being that a buffer with a terminating newline has an empty line in the widget at the very bottom, which people thought (correctly) looked weird. So people got used to deleting it to get rid of the blank line, without fully realizing they were also deleting the last newline, and...

Look, a modern editor can be configured to not do this; and the paleolithic editors vi and EMACS never did it. So, just don't do it.

Hard TAB characters (0x09, ^I, \t) are modulo eight spaces. Period. No exceptions.

Rationale

I should not have to explain this, because the evidence is everywhere. Open an xterm, cat a file with TABs in it. Oh, look: Eight spaces. Open the same file in stock 'vi'. Oh, look: Eight spaces. Open the file in EMACS. Oh, look: Eight spaces. Drag the file over to Windows and type it on a console. Oh, look: Eight spaces. Open the file in Notepad. Oh, look: Eight spaces. Open the same file using a Web browser (doesn't matter which one). Oh, look: Eight spaces. Dump the file to your printer. Oh, look: Eight spaces...

Seeing the pattern? If you try to set anything else, you'll be fighting the defaults (and in many cases unchangeable defaults) of decades worth of software not written by you. And even if you do manage to get your own terminals and editors and browsers and printers aligned to your preferred size, the moment the file leaves your machine and lands on someone else's, it will still look wrong. Don't waste the effort.

Jamie Zawinski very comprehensively analyzed this issue decades ago, and his wisdom still applies. More precisely, I am speaking toward interoperability -- assuming TAB is MOD-8 will cause you and your readers the fewest headaches.

"But eight spaces is too big an indentation size for code!" First: No, it isn't; and B: I'm not talking about indentation depth. I'm talking about what a hard TAB character in the file expands to. Code indentation depth is an entirely different topic.

(That this is an issue at all is (again!) the Mac's fault. When MPW (Macintosh Programmer's Workshop) first appeared, in addition to its other goofball Mac-only-isms, it not only shipped with a configurable TAB size, but had it incorrectly set by default to four spaces.)

File content shall be 7-bit ASCII, or UTF-8 with no Byte Order Marker (BOM).

Code Formatting

Note that all the below rules are in fact guidelines. If, at any location in the code, the enforcement of a given guideline would detract from the Guiding Principle, then the rule may be modified or ignored. The goal is readability, not slavish compliance to a ruleset. (Corollary: It is impossible to write an automated tool to enforce these rules, since readability at any given point in the code is often subjective based on the surrounding code.)

Indentation

TABs or Spaces: Pick ONE for all your code. Do not intermix TABs and spaces for code indentation (except as noted below).

Space-based Indentation

If you choose spaces to indent your code, TAB characters should appear nowhere in your file except as part of string literals (and even there you should be using "\t").

TAB-based Indentation

If you choose to use TABs to indent your code, understand first that you're signing up for eight spaces per indentation level. If that's too big for you, use spaces.

Though it requires extra effort to do so, you should use TABs only to establish base indentation level. Lining up code fragments on a given indentation level should be done with spaces.

Rationale

This can best be illustrated by example:

Example code, 8-space TABs:

extern int very_long_function_name (int a, b, c, d, e, f);

void
func (void)
{
-------⇥int i, n, ret;

-------⇥for (i = 0;  i < 10;  ++i) {
-------⇥-------⇥for (n = 0;  n < 10;  ++n) {
-------⇥-------⇥-------⇥ret = very_long_function_name (that,
-------⇥-------⇥-------⇥-------⇥-------⇥-------⇥       takes,
-------⇥-------⇥-------⇥-------⇥-------⇥-------⇥       a,
-------⇥-------⇥-------⇥-------⇥-------⇥-------⇥       lot,
-------⇥-------⇥-------⇥-------⇥-------⇥-------⇥       of,
-------⇥-------⇥-------⇥-------⇥-------⇥-------⇥       arguments);
-------⇥-------⇥}
-------⇥}
}

Note how above, TAB characters are used not just to indent the code, but also to line up the parameters to very_long_function_name() (many editors will sabotage you like this behind your back). Now, watch what happens when a heretic who has their TABs set to four spaces loads this file:

Example Code, 4-space TABs:

extern int very_long_function_name (int a, b, c, d, e, f);

void
func (void)
{
---⇥int i, n, ret;

---⇥for (i = 0;  i < 10;  ++i) {
---⇥---⇥for (n = 0;  n < 10;  ++n) {
---⇥---⇥---⇥ret = very_long_function_name (that,
---⇥---⇥---⇥---⇥---⇥---⇥       takes,
---⇥---⇥---⇥---⇥---⇥---⇥       a,
---⇥---⇥---⇥---⇥---⇥---⇥       lot,
---⇥---⇥---⇥---⇥---⇥---⇥       of,
---⇥---⇥---⇥---⇥---⇥---⇥       arguments);
---⇥---⇥}
---⇥}
}

Our arguments, which we took no small effort to line up, now appear in the wrong place.

However, if we confine our use of TABs to establish only the base indentation level of the block, then we start with this:

Fixed code example, 8-space TABs:

extern int very_long_function_name (int a, b, c, d, e, f);

void
func (void)
{
-------⇥int i, n, ret;

-------⇥for (i = 0;  i < 10;  ++i) {
-------⇥-------⇥for (n = 0;  n < 10;  ++n) {
-------⇥-------⇥-------⇥ret = very_long_function_name (that,
-------⇥-------⇥-------⇥                               takes,
-------⇥-------⇥-------⇥                               a,
-------⇥-------⇥-------⇥                               lot,
-------⇥-------⇥-------⇥                               of,
-------⇥-------⇥-------⇥                               arguments);
-------⇥-------⇥}
-------⇥}
}

And in the heretic's editor, it ends up looking like this:

Fixed code example, 4-space TABs

extern int very_long_function_name (int a, b, c, d, e, f);

void
func (void)
{
---⇥int i, n, ret;

---⇥for (i = 0;  i < 10;  ++i) {
---⇥---⇥for (n = 0;  n < 10;  ++n) {
---⇥---⇥---⇥ret = very_long_function_name (that,
---⇥---⇥---⇥                               takes,
---⇥---⇥---⇥                               a,
---⇥---⇥---⇥                               lot,
---⇥---⇥---⇥                               of,
---⇥---⇥---⇥                               arguments);
---⇥---⇥}
---⇥}
}

So, if you use TAB characters only to establish base indentation level, your code will still maintain its beauty even in the editor of the most deluded blasphemer. (Side Benefit: It will also cause the blasphemer to call attention to themselves when they check in a change that wrecks the indendation.)

Typeface Selection

Use fixed-width, non-proportional typefaces in your editor when writing code.

Rationale

"I... What!?"

There is a (thankfully tiny) group of people who asked what seems on its (type)face to be a reasonable question: "Why are we still using fixed-width typefaces in our code editors?" After all, the compiler doesn't know, and certainly doesn't care. Moreover, nearly all printed material is in proportional typefaces, our eyes are used to it and flow more easily over it, and why should the character "I" subtend the same width as "W"?

Because this isn't English (or German, or Russian, or Hindi, or Katakana) text. This is code. It needs to express not only ideas and cause-effect relationships, but structure. And, like it or not, this structure is always communicated via visual layout. Don't believe me? Then tell me what this does:

char O,o[];main(l){for(;~l;O||puts(o))O=(O[o]=~(l=getchar())?4<(4^l>>5)?l:46:0)?-~O&printf("%02x ",l)*5:!O;}

"That's not fair; that's the 'Best One-Liner' winner of 2018's IOCCC." Correct, but it still illustrates my point: Code structure is communicated among humans visually.

To do that, the programmer needs to have some reasonable control over visual layout, as well as having some reasonable assurance the layout will make it across to the next programmer. Use of proportional typefaces voids this assurance. Your eyes may just love Times New Roman, but the ex-Apple guy in the next row may be partial to Garamond, not to mention the smart-arse down the hall coding in Comic Sans, or the millennial hipster who's ironically using Atari Classic for machine learning projects.

Things that you carefully lined up and arranged in one typeface will be destroyed when viewed using another typeface (this already happens with misguided souls trying to lay out columns/tables by hand in Microsoft Outlook). Just imagine the annoyance you already experience when you open a source file that was written assuming the wrong TAB size. Proportional typefaces multiply the opportunities to obfuscate structure, and offer almost no countervailing advantages.

Here's a lovely resource to help you pick a fixed-width typeface.

Variable Declarations

All variable declarations in a block shall appear at the beginning of said block.

Separate variable type and name into columns. This applies to declarations of local and global variables, and to fields in a struct.

Rationale

Honestly, when tracing though unfamiliar code and trying to find the declaration for a particular variable, which would you rather sift through – this:

void
app_task (void *param)
{
    volatile uint16_t RPMsgRemoteReadyEventData = 0;
    volatile unsigned long remote_addr = 0;
    struct rpmsg_lite_endpoint *my_ept;
    rpmsg_queue_handle my_queue;
    struct rpmsg_lite_instance *my_rpmsg;
    int len;
    uint32_t dsp_image_size;
    /*  ...  */

Or this:

void
app_task (void *param)
{
    volatile uint16_t           RPMsgRemoteReadyEventData = 0;
    volatile unsigned long      remote_addr = 0;
    struct rpmsg_lite_endpoint  *my_ept;
    rpmsg_queue_handle          my_queue;
    struct rpmsg_lite_instance  *my_rpmsg;
    int                         len;
    uint32_t                    dsp_image_size;
    /*  ...  */

"But if you do that, it makes the diffs harder to read if the columns ever need to move." Oh, you mean when something like this happens:

--- /tmp/xmp1.c    2018-12-20 18:19:43.130294034 -0800
+++ /tmp/xmp2.c    2018-12-20 18:19:07.802525925 -0800
@@ -1,9 +1,10 @@
 void app_task(void *param)
 {
-    volatile uint16_t           RPMsgRemoteReadyEventData = 0;
-    volatile unsigned long      remote_addr = 0;
-    struct rpmsg_lite_endpoint  *my_ept;
-    rpmsg_queue_handle          my_queue;
-    struct rpmsg_lite_instance  *my_rpmsg;
-    int                         len;
-    uint32_t                    dsp_image_size;
+    volatile uint16_t                       RPMsgRemoteReadyEventData = 0;
+    volatile unsigned long                  remote_addr = 0;
+    struct rpmsg_lite_endpoint * volatile   shared_ptr = NULL;
+    struct rpmsg_lite_endpoint              *my_ept;
+    rpmsg_queue_handle                      my_queue;
+    struct rpmsg_lite_instance              *my_rpmsg;
+    int                                     len;
+    uint32_t                                dsp_image_size;

If that is your concern, then may I introduce you to diff -b:

--- /tmp/xmp1.c    2018-12-20 18:19:43.130294034 -0800
+++ /tmp/xmp2.c    2018-12-20 18:19:07.802525925 -0800
@@ -2,6 +2,7 @@
 {
     volatile uint16_t           RPMsgRemoteReadyEventData = 0;
     volatile unsigned long      remote_addr = 0;
+    struct rpmsg_lite_endpoint * volatile   shared_ptr = NULL;
     struct rpmsg_lite_endpoint  *my_ept;
     rpmsg_queue_handle          my_queue;
     struct rpmsg_lite_instance  *my_rpmsg;

So, yeah. Put 'em in columns.

When declaring types, the pointer decorator goes next to the variable name, and not next to the type.

Rationale

This is an artifact – and requirement – of C syntax. Consider the following code, which I see far too often:

    struct rpmsg_lite_endpoint *    my_ept;
    uint16_t *                      skip_len;
    int                             count;
    /*  ...  */

Looking at this, it would be easy to get the idea that every name appearing after, say, uint16_t * is a pointer to a uint16_t, but that is not in fact how the line is parsed. If we hoped to add a new pointer variable named prefix_len like so:

    struct rpmsg_lite_endpoint *    my_ept;
    uint16_t *                      skip_len, prefix_len;
    int                             count;
    /*  ...  */

Then we've just made a mistake. skip_len is a pointer to a uint16_t, whereas prefix_len is an instance of a uint16_t because prefix_len lacks the pointer decorator. Yes, the compiler would catch it when you tried to misuse it, but that's not the point. The point is to not confuse or mislead the reader. Hence keeping the pointer decorator next to the name, which is what C demands, anyway:

    struct rpmsg_lite_endpoint      *my_ept;
    uint16_t                        *skip_len, prefix_len;
    int                             count;
    /*  ...  */

Not crystal clear, but far more clear than the mystifyingly popular alternative.

(Note: To mitigate this issue, some C coding guides mandate that each variable gets its own declaration on a separate line. Google's C coding guide is a bit more flexible, allowing multiple scalar declarations on one line, but requiring pointers to each have their own line. Google's guideline is probably the most reasonable balance.)

Declaration Order – Variables and Structure Fields

Declare variables and structure fields in order from largest to smallest object.

Rationale

This is especially important for structure fields, as C lays out fields in memory in the order specified. Most modern platforms have data alignment requirements. For example, a 16-bit value must be located on a 16-bit boundary in memory, and if you try to read one from an odd address Bad Things will happen, ranging from dreadful performance to a illegal address exception.

To ensure this doesn't happen, the compiler will insert pad bytes where necessary to ensure the object meets platform alignment requirements. A potential side-effect of this is, if you lay out structure fields poorly, you can end up with a lot of space wasted on pad bytes.

struct bad {
    int8_t      srctype;
    void        *src;
    int8_t      desttype;
    void        *dest;
};

struct good {
    void        *src;
    void        *dest;
    int8_t      srctype;
    int8_t      desttype;
};

printf ("bad = %d; good = %d\n", sizeof (struct bad), sizeof (struct good));

The sizes that will be printed out are platform-specific, depending on your machine's alignment requirements and pointer size. However, struct bad will always be larger than struct good because of the padding needed to ensure the src and dest pointers are properly aligned. Assuming a 32-bit machine requiring 32-bit alignment, the structures would be laid out in memory as follows:

        Bad:
        +---------+---------+---------+---------+
0000:   | srctype |  <pad>  |  <pad>  |  <pad>  |
        +---------+---------+---------+---------+
0004:   |                  src                  |
        +---------+---------+---------+---------+
0008:   |desttype |  <pad>  |  <pad>  |  <pad>  |
        +---------+---------+---------+---------+
000C:   |                  dest                 |
        +---------------------------------------+
0010:

        Good:
        +---------+---------+---------+---------+
0000:   |                  src                  |
        +---------------------------------------+
0004:   |                  dest                 |
        +---------+---------+---------+---------+
0008:   | srctype |desttype |  <pad>  |  <pad>  |
        +---------+---------+---------+---------+
000C:

On this machine, struct bad requires 16 bytes of storage, with six bytes wasted on padding, whereas struct good needs only 12 bytes with only two pad bytes at the end. (Yes, you could use GCC's __attribute__(packed) attribute to eliminate the padding, but src definitely and dest probably would be misaligned, causing an exception.)

By declaring fields in order of decreasing size, you reduce the need for padding and the chances of inadvertently wasted space.

Where possible, try to group together variables and fields that have a relationship to each other. This is admittedly vague and very subjective.

Declaration Order – Function Parameters

If the function operates on a struct or other data in an object-like fashion, the pointer to that data -- the thing upon which the function operates -- appears first. "Output" parameters, if any, appear next, followed by "input" parameters.

If a passed pointer requires a companion size (e.g. maximum length of a buffer), the size parameter immediately follows the pointer.

Rationale

Placing the "object" pointer first is already a fairly common convention, and should be no surprise:

struct Window	*w;
struct Canvas	*c;
/*  ...  */
MoveWindow (w, xpos, ypos);
ResizeWindow (w, wide, high);

c = w->w_Canvas;
FillCanvas (c, COLOR_BLACK);
SetFGPen (c, COLOR_GREEN);
MoveTo (c, 0, 0);
DrawTo (c, 320, 240);

The outputs-before-inputs guideline allows the parameter order to visually resemble simple assignments and algebraic expressions if they were available, saving you from having to flip parameters around inside your head.

/*
 * Copy the contents of src to dest.  Or, algebraically:
 *
 * dest = src;
 */
memcpy (dest, src, byte_count);

/*
 * Imaginary function that trims whitespace from the NULL-terminated input
 * string and copies the result to a destination buffer.  Or, algebraically:
 *
 * dest = trim (input);
 */
trim_whitespace (dest, dest_siz, input);

/*
 * Blend two RGB colors together.  Or, algebraically, and if we had tuples:
 *
 * (ro, go, bo) = (r1 + r2 / 2, g1 + g2 / 2, b1 + b2 / 2);
 */
blend_rgb (&ro, &go, &bo,  r1, g1, b1,  r2, g2, b2);

Brace Position

Wikipedia actually has a surprisingly nice discussion on various schools of thought regarding brace positioning, including their historical origins. My preferred embodiment most closely resembles "K&R" style.

In general, braces live at the same indentation level as the parent block or control statement.

Illustration:

/*  Good  */
void
func (void)
{
    if (var != 0)
    {
        do_the_non_zero_thing ();
    }
}

/*  Poor  */
void
func (void)
    {
    if (var != 0)
        {
        do_the_non_zero_thing ();
        }
    }

/*  No.  Just...  no.  */
void
func (void)
  {
    if (var != 0)
      {
        do_the_non_zero_thing ();
      }
  }

With limited exceptions, nothing appears to the right of an opening brace.

Illustration

void
func (void)
{
    /*  Poor.  */
    if (var != 0)
    {   do_the_non_zero_thing ();
    }

    /*
     * Exception: If you're doing many small tests and operations that are
     * related to each other, then you can lay out things like this:
     */
    if (var & (1 << 0)) {  ++x;  trig = true;   }
    if (var & (1 << 1)) {  ++y;  trig = false;  }
    if (var & (1 << 2)) {  --x;  trig = true;   }
    if (var & (1 << 3)) {  --y;  trig = false;  }
}

If the block structure remains obvious when the brace is on the same line as the control statement, then keep the brace on the same line (exception: function declarations). Otherwise, place the brace on its own line.

Examples

/*  Keeping the brace on same line.  */
void
func (void)
{
    /*
     * In this example, the block structure is perfectly clear.
     */
    if (i >= 0  &&  i < ARRAY_LEN) {
        do_loop_thing ();
    }

    /*
     * In this example, however, keeping the brace on the same line obscures
     * the block structure, so this is bad.
     */
    if (i >= 0  &&  i < ARRAY_LEN  &&  table[i] != NULL  &&
        table[i].type != SKIP_ENTRY  &&  table_entry_is_valid (&table[i])) {
        do_loop_thing ();
    }
}

/*  Giving the brace its own line.  */
void
func (void)
{
    /*
     * In this example, the block structure is still perfectly clear,
     * but *no clearer* than in the same-line case, and this consumes
     * vertical space, particularly if you have a large tower of 'if'
     * statements of this form.
     */
    if (i >= 0  &&  i < ARRAY_LEN)
    {
        do_loop_thing ();
    }

    /*
     * In this case, placing the brace on its own line *greatly* clarifies the
     * block structure, and does double-duty to alert you to the multi-line 'if'
     * statement.  Again, the goal is readability and reduction of hunting effort.
     */
    if (i >= 0  &&  i < ARRAY_LEN  &&  table[i] != NULL  &&
        table[i].type != SKIP_ENTRY  &&  table_entry_is_valid (&table[i]))
    {
        do_loop_thing ();
    }
}

Switch-Case Layout

case statements are aligned with their parent switch statement.

Rationale

Indenting the case statement, and then indenting the actual code just wastes horizontal space, and does not improve clarity (this is especially true if you're using TAB-based indentation).

/*
 * Poor.
 */
switch (var) {
	case VALUE1:
		do_thing_1();
		break;

	case VALUE2:
		do_thing_2();
		break;

	case VALUE3:
	default:
		do_common_thing();
		break;
}

/*
 * Better.
 */
switch (var) {
case VALUE1:
	do_thing_1();
	break;

case VALUE2:
	do_thing_2();
	break;

case VALUE3:
	/* FALLTHROUGH */
default:
	do_common_thing();
	break;
}

Try to leave a blank line between the case statement and the code above it, if any.

Unless there is a technical reason to do otherwise, place the default case at the bottom.

Using fallthrough is fine; just be clear about it. Old versions of the lint syntax checker would warn against fallthrough, but the warning could be squelched by adding the comment /* FALLTHROUGH */ at the end of the case, as if to say to lint and the reader, "I meant to do that." The Linux kernel sources also introduces the synthetic keyword fallthrough for precisely this purpose.

Duff's Device: Be very clear about what you're doing. But also be sure you actually need it -- benchmarks show Duff's Device as often slower than the optimizations and loop-unrolling provided by modern compilers. (Yeah, surprised me, too.)

Spacing

Operators

Unary arithmetic operators (-, ~) shall have no space on their right-hand side.

Binary arithmetic, bit-wise, and comparison operators (+, -, *, /, |, &, ^, ==, !=, etc.) shall be surrounded on each side by at least a single space.

Binary logical-and and logical-or operators (&&, ||) shall by surrounded on each side by at least two spaces.

Rationale

Again, ease of gloss-ability. This especially pays dividends when dealing with code doing a lot of bitfield masking and comparison (as in hardware and graphics drivers). Consider:

if (hw->status & REG_STATUS_IRQACTIVE && hw->irqs & REG_IRQS_FIFOEMPTY) {
    /*  ...  */
}

Versus:

if (hw->status & REG_STATUS_IRQACTIVE  &&  hw->irqs & REG_IRQS_FIFOEMPTY) {
    /*  ...  */
}

It's subtle, but the additional space lets the && stand out, helping visually separate the left and right expressions. And if your response is, "Just add parentheses," no. That adds to visual clutter, especially with nested parens:

if ((hw->status & REG_STATUS_IRQACTIVE) && (hw->irqs & (REG_IRQS_FIFOEMPTY | REG_IRQS_FIFOFULL))) {
    /*  ...  */
}

And even if operator precedence requires the parentheses, two spaces around the logic op still aids in visual separation:

if ((hw->status & REG_STATUS_IRQACTIVE)  &&  (hw->irqs & (REG_IRQS_FIFOEMPTY | REG_IRQS_FIFOFULL))) {
    /*  ...  */
}

For-Loops

If all three clauses of a for-loop will fit on a single line, separate each clause by two spaces following the semicolon. Where the clauses of the for-loop need to subtend multiple lines, place each clause on its own line.

Rationale

Same basic principle as above with the logical-AND and -OR operators – the extra bit of separation helps each clause stand out, letting you zero in on the significant bits.

/*
 * Here's an example of very wrong styling.  Everyone agrees it's wrong because the
 * lack of spacing causes everything to visually run together, obscuring structure
 * and content:
 */
for (i=0;i<length;++i) { /* ... */ }

/*
 * So naturally one would add a space to separate the clauses.  Still ugly,
 * but a marked improvement:
 */
for (i=0; i<length; ++i) { /* ... */ }

/*
 * However, in the following case, even though everything is spaced out, all the
 * spacing is uniform, causing everything to visually run together in a manner similar
 * to the very wrong example above:
 */
for (i = 0; i < length; ++i) {  /* ... */ }

/*  So, iterate on the solution:  Add another space after the semicolons:  */
for (i = 0;  i < length;  ++i) {  /* ... */ }


/*  Another example.  Every little bit helps:  */
for (node = FIRSTNODE (list); next_node = node->next; node = next_node) { /* ... */ }

for (node = FIRSTNODE (list);  next_node = node->next;  node = next_node) { /* ... */ }


/*
 * In loops with long expressions or tracking multiple iterators (yes, this is valid),
 * breaking it up by clause over multiple lines is preferable to leaving it all on one
 * one line, and far preferable to breaking it at some arbitrary column boundary:
 *
 * Single line, with single-spacing everywhere:
 */
for (nodeA = FIRSTNODE (listA), nodeB = FIRSTNODE (listB); (nextA = nodeA->next) && (nextB = nodeB->next); nodeA = nextA, nodeB = nextB)
{
    /* ... */
}

/*  Double-spacing between clauses and around '&&' operator.  Even this modest change helps:  */
for (nodeA = FIRSTNODE (listA), nodeB = FIRSTNODE (listB);  (nextA = nodeA->next)  &&  (nextB = nodeB->next);  nodeA = nextA, nodeB = nextB)
{
    /* ... */
}

/*  Arbitrary break before column 80 (MISRA C requirement):  */
for (nodeA = FIRSTNODE (listA), nodeB = FIRSTNODE (listB);  (nextA =
         nodeA->next)  &&  (nextB = nodeB->next);  nodeA = nextA, nodeB =
         nextB)
{
    /* ... */
}

/*  Each clause on its own line:  */
for (nodeA = FIRSTNODE (listA), nodeB = FIRSTNODE (listB);
     (nextA = nodeA->next)  &&  (nextB = nodeB->next);
     nodeA = nextA, nodeB = nextB)
{
    /*  ...  */
}

/*  Extra credit, visually grouping related operations together:  */
for (nodeA = FIRSTNODE (listA), nodeB = FIRSTNODE (listB);
     (nextA = nodeA->next)  &&  (nextB = nodeB->next);
     nodeA = nextA,             nodeB = nextB)
{
    /*  ...  */
}

Here's a real-world C++ example I recently bumped into:

for (auto it=s.begin(),end=s.end(); it != end; ++it) {
    if (*it >= 0x20)
        result += *it;
}

The initialization portion is all packed together. Depending on the typeface you're using to view this, that comma can look like a dot, adding to confusion, especially if you're familiar with the method-chaining techniques common in Java(Script) and Rust.

The point is: That line interrupts your reading and makes you stop and stare, trying to pick it apart. Spacing it out makes it much clearer:

for (auto it = s.begin(), end = s.end();  it != end;  ++it) {
    if (*it >= 0x20)
        result += *it;
}

Structure Field Assignment

When assigning values to fields of a given struct, arrange the fields and values into columns. The = operators should be aligned on the right-hand column.

Rationale

Again, this should be self-evident when spelunking unfamiliar code, and is simply a permutation of the guideline on variable declaration.

/*
 * Poor.
 */
IOInfo ioInfo;

memset (&ioInfo, 0, sizeof (ioInfo));
ioInfo.ioi_Command = SPORTCMD_COPY;
ioInfo.ioi_Offset = mask;
ioInfo.ioi_Send.iob_Buffer = sourceAddress;
ioInfo.ioi_Send.iob_Len = numBytes;
ioInfo.ioi_Recv.iob_Buffer = destinationAddress;
ioInfo.ioi_Recv.iob_Len = numBytes;

/*
 * Better.
 */
IOInfo ioInfo;

memset (&ioInfo, 0, sizeof (ioInfo));
ioInfo.ioi_Command          = SPORTCMD_COPY;
ioInfo.ioi_Offset           = mask;
ioInfo.ioi_Send.iob_Buffer  = sourceAddress;
ioInfo.ioi_Send.iob_Len     = numBytes;
ioInfo.ioi_Recv.iob_Buffer  = destinationAddress;
ioInfo.ioi_Recv.iob_Len     = numBytes;

Function Declarations, Calls, and Arguments

For function declarations and calls, a space shall appear between the function name and the opening parenthesis (exception: Function calls with no arguments). No space shall follow the opening parenthesis.

At least once space shall follow each comma in the parameters to a function call.

Rationale

If it's good enough for Strunk & White, it's good enough for you.

Seriously, I can appreciate that computer programs are math-y, and mathematical typography doesn't allow for spaces between the function name and its arguments. But it's english-y, too, and having an english word smashed against an open-paren interrupts my glossing of the text. Oh, and pointing out how for(), if(), sizeof(), etc, aren't really functions and therefore should be treated differently isn't going to pass the splitting-excessively-fine-hairs test.

Spaces between function arguments should be self-evident.

/*
 * Poor.
 */
memset(&ioInfo,0,sizeof(ioInfo));

/*
 * Also poor.
 */
memset( &ioInfo, 0, sizeof( ioInfo ) );
c = getchar( );

/*
 * Better.
 */
memset (&ioInfo, 0, sizeof (ioInfo));
c = getchar ();
c = getchar();   // Acceptable.

For function declarations, the function's return type, including the pointer decorator (if any), appears on its own line; the function's name shall begin in the first column of the following line. This guideline does not apply to forward declarations or prototypes.

Rationale

This ensures that the function's name is always at the leftmost edge, so I don't have to hunt for it in the middle of the line.

/*
 * Poor
 */
void io_reset (void *handle)
{
    /* ... */
}

struct io_block const *io_read_thing (void *handle, size_t nbytes)
{
    /* ... */
}

/*
 * Better
 */
void
io_reset (void *handle)
{
    /* ... */
}

struct io_block const *
io_read_thing (void *handle, size_t nbytes)
{
    /* ... */
}

Line Wrapping

When wrapping long lines, break the lines before operators such that the operators are aligned on the left. Align under the = operator or the enclosing opening parenthesis where practical.

Rationale

The idea is to avoid having to hunt on the ragged right edge of the code for the operator that applies to the next operand. Some examples should help illustrate this:

/*
 * Poor.
 */
if (i >= 0  &&  i < ARRAY_LEN  &&
    table[i] != NULL  &&
    table[i].type != SKIP_ENTRY  &&
    table_entry_is_valid (&table[i]))
{
    do_loop_thing ();
}

/*
 * Better.  For extra credit: Note extra space after the if's opening paren
 * so that all the expressions line up.
 */
if (    i >= 0
    &&  i < ARRAY_LEN
    &&  table[i] != NULL
    &&  table[i].type != SKIP_ENTRY
    &&  table_entry_is_valid (&table[i]))
{
    do_loop_thing ();
}

/*
 * Poor.
 */
sccb->sccb_Flags |= CCB_NPABS | CCB_SPABS | CCB_PPABS | CCB_LDSIZE |
    CCB_LDPRS | CCB_LDPPMP | CCB_YOXY | CCB_ACW | CCB_ACCW | ccbextra;

/*
 * Better.
 */
sccb->sccb_Flags |= CCB_NPABS | CCB_SPABS | CCB_PPABS
                  | CCB_LDSIZE | CCB_LDPRS | CCB_LDPPMP
                  | CCB_YOXY | CCB_ACW | CCB_ACCW
                  | ccbextra;

Improved Idiomatic Habits

When taking the size of a structure or type, take the sizeof() the variable that holds the allocation, not the sizeof() the type.

Rationale

Suppose we have this:

struct IORequest *ioreq;
/*  Blah, blah, blah...  */
ioreq = malloc (sizeof (struct IORequest));
if (ioreq) {
    memset (ioreq, 0, sizeof (struct IORequest));
}

This is perfectly fine. However, after several releases, a newer, larger IORequest structure is needed, and the code gets changed to:

/*  New I/O request, now supporting tint control!  */
struct IORequest_ext *ioreq;
/*  Oops!  Forgot to update the types given to sizeof().  */
ioreq = malloc (sizeof (struct IORequest));
if (ioreq) {
    memset (ioreq, 0, sizeof (struct IORequest));
}

In a compact example like this, this is obviously wrong. It can become far less obvious if the variable declaration and the allocation/memset() become widely separated in the code. In either case, the compiler will not notice, and the only time you'll find out is when the program crashes. Maybe you'll notice it when you grep the code; maybe your IDE's refactoring tool will point it out (yeah, right)...

Or you could avoid the issue entirely by taking the sizeof() the variable used to hold the allocation, like so:

struct IORequest_ext *ioreq;
/*  This will always be correct, regardless of the type of `ioreq`.  */
ioreq = malloc (sizeof (*ioreq));
if (ioreq) {
    memset (ioreq, 0, sizeof (*ioreq));
}

"Ah! But if I use an RAII-like pattern, then I'm less likely to get bitten!" Possibly. But you'd still have to change the type in at least two locations:

/*  Oops!  Forgot the second one.  */
struct IORequest_ext *ioreq = malloc (sizeof (struct IORequest));

/*  This still always works.  */
struct IORequest_ext *ioreq = malloc (sizeof (*ioreq));

C Language Misfeatures

Bitfields

Avoid using C's bitfield feature in general code. Never use the bitfield feature in code that accesses hardware registers.

Rationale

Recall that modifying a bitfield in a wider integer requires a read-modify-write operation. So if we have this:

struct fielding {
    unsigned int    valid : 1;
    unsigned int    count : 4;
    int             opcode : 3;
};

struct fielding f;

f.count = 12;
f.opcode = 2;
f.valid = 1;

What is the actual order of operations at the machine level? Is each field committed singly, or does the compiler collect them together and write all eight bits at once? Does it in fact write eight bits, or 16, or 32, or...? Do bitfields straddle byte/word boundaries? And do all compilers that your code might be fed through behave the same? For data that lives entirely in RAM, these are merely questions of efficiency/performance. When manipulating hardware registers, it becomes a crucial question of whether the values are written in the correct order so that the hardware will operate properly (and no, volatile does not necessarily save you).

While it's a well-intentioned language feature, its shortcomings make it unsuitable for anything except the most trivial uses. Just avoid it.