ewhac 2021 C Coding Guidelines
2019.05.21
Last updated: 2024.05.01
Every place I've worked has an established style standard for writing C code. And every time I read it, it becomes blindingly apparent that the local standard was informed not by a desire to make code more readable or easier for newcomers to understand, but rather by the personal style quirks of whoever happened to check in the first few files.
As of this writing, the last place I worked tried to establish a C style guide, which itself took cues from the MISRA C style guide -- which you have to pay money for! The resulting style guide was never formally ratified and never enforced. And that incident pretty much served as the impetus to finally write down my thoughts on the subject, accumulated by working with the language over the last 40 years, and which no one ever asked for, but for which you all shall now be judged.
These guidelines are also informed by what some may regard as a "primitive" method of writing code -- namely, with a stock install of Vim, a plain text editor with no code analysis or navigation assists, an environment not altogether different from what may face you when visiting a remote server with a minimal OS install.
I generally don't realize these individual points until I encounter it done wrong somewhere. So this document will accrete over time.
It took me several years to finally realize this, but the Guiding Principle behind the all the ranting and armwaving up ahead is this:
Don't make me hunt for important things.
File Format
C source files should be written with "UNIX" style newlines (single linefeed
character (0x0A, ^J
, \n
) ends each line). If you are editing on a
platform that defaults to something different, reconfigure your editor.
All lines in the source file, including the last line, shall be terminated with a newline.
Hard TAB characters (0x09, ^I
, \t
) are modulo eight spaces.
Period. No exceptions.
File content shall be 7-bit ASCII, or UTF-8 with no Byte Order Marker (BOM).
Code Formatting
Note that all the below rules are in fact guidelines. If, at any location in the code, the enforcement of a given guideline would detract from the Guiding Principle, then the rule may be modified or ignored. The goal is readability, not slavish compliance to a ruleset. (Corollary: It is impossible to write an automated tool to enforce these rules, since readability at any given point in the code is often subjective based on the surrounding code.)
Indentation
TABs or Spaces: Pick ONE for all your code. Do not intermix TABs and spaces for code indentation (except as noted below).
Space-based Indentation
If you choose spaces to indent your code, TAB characters should appear
nowhere in your file except as part of string literals (and even there you
should be using "\t
").
TAB-based Indentation
If you choose to use TABs to indent your code, understand first that you're signing up for eight spaces per indentation level. If that's too big for you, use spaces.
Though it requires extra effort to do so, you should use TABs only to establish base indentation level. Lining up code fragments on a given indentation level should be done with spaces.
Typeface Selection
Use fixed-width, non-proportional typefaces in your editor when writing code.
Rationale
"I... What!?"
There is a (thankfully tiny) group of people who asked what seems on its (type)face to be a reasonable question: "Why are we still using fixed-width typefaces in our code editors?" After all, the compiler doesn't know, and certainly doesn't care. Moreover, nearly all printed material is in proportional typefaces, our eyes are used to it and flow more easily over it, and why should the character "I" subtend the same width as "W"?
Because this isn't English (or German, or Russian, or Hindi, or Katakana) text. This is code. It needs to express not only ideas and cause-effect relationships, but structure. And, like it or not, this structure is always communicated via visual layout. Don't believe me? Then tell me what this does:
char O,o[];main(l){for(;~l;O||puts(o))O=(O[o]=~(l=getchar())?4<(4^l>>5)?l:46:0)?-~O&printf("%02x ",l)*5:!O;}
"That's not fair; that's the 'Best One-Liner' winner of 2018's IOCCC." Correct, but it still illustrates my point: Code structure is communicated among humans visually.
To do that, the programmer needs to have some reasonable control over visual layout, as well as having some reasonable assurance the layout will make it across to the next programmer. Use of proportional typefaces voids this assurance. Your eyes may just love Times New Roman, but the ex-Apple guy in the next row may be partial to Garamond, not to mention the smart-arse down the hall coding in Comic Sans, or the millennial hipster who's ironically using Atari Classic for machine learning projects.
Things that you carefully lined up and arranged in one typeface will be destroyed when viewed using another typeface (this already happens with misguided souls trying to lay out columns/tables by hand in Microsoft Outlook). Just imagine the annoyance you already experience when you open a source file that was written assuming the wrong TAB size. Proportional typefaces multiply the opportunities to obfuscate structure, and offer almost no countervailing advantages.
Here's a lovely resource to help you pick a fixed-width typeface.
Variable Declarations
All variable declarations in a block shall appear at the beginning of said block.
Rationale
This limits the number of places I have to look in order to find a variable declaration and discover its type. Yes, I know C99 and C++ expressly allow you to declare new variables anywhere, and that this is actually occasionally useful in C++ because it lets you control when the class's constructor fires off. But paper and simple text editors don't have syntax analysis and code navigation assists, so if there's no issue with controlling constructor invocation, put all your variable declarations in one place.
Separate variable type and name into columns. This applies to declarations of local and global variables, and to fields in a struct.
Rationale
Honestly, when tracing though unfamiliar code and trying to find the declaration for a particular variable, which would you rather sift through – this:
void
app_task (void *param)
{
volatile uint16_t RPMsgRemoteReadyEventData = 0;
volatile unsigned long remote_addr = 0;
struct rpmsg_lite_endpoint *my_ept;
rpmsg_queue_handle my_queue;
struct rpmsg_lite_instance *my_rpmsg;
int len;
uint32_t dsp_image_size;
/* ... */
Or this:
void
app_task (void *param)
{
volatile uint16_t RPMsgRemoteReadyEventData = 0;
volatile unsigned long remote_addr = 0;
struct rpmsg_lite_endpoint *my_ept;
rpmsg_queue_handle my_queue;
struct rpmsg_lite_instance *my_rpmsg;
int len;
uint32_t dsp_image_size;
/* ... */
"But if you do that, it makes the diffs harder to read if the columns ever need to move." Oh, you mean when something like this happens:
--- /tmp/xmp1.c 2018-12-20 18:19:43.130294034 -0800
+++ /tmp/xmp2.c 2018-12-20 18:19:07.802525925 -0800
@@ -1,9 +1,10 @@
void app_task(void *param)
{
- volatile uint16_t RPMsgRemoteReadyEventData = 0;
- volatile unsigned long remote_addr = 0;
- struct rpmsg_lite_endpoint *my_ept;
- rpmsg_queue_handle my_queue;
- struct rpmsg_lite_instance *my_rpmsg;
- int len;
- uint32_t dsp_image_size;
+ volatile uint16_t RPMsgRemoteReadyEventData = 0;
+ volatile unsigned long remote_addr = 0;
+ struct rpmsg_lite_endpoint * volatile shared_ptr = NULL;
+ struct rpmsg_lite_endpoint *my_ept;
+ rpmsg_queue_handle my_queue;
+ struct rpmsg_lite_instance *my_rpmsg;
+ int len;
+ uint32_t dsp_image_size;
If that is your concern, then may I introduce you to diff -b
:
--- /tmp/xmp1.c 2018-12-20 18:19:43.130294034 -0800
+++ /tmp/xmp2.c 2018-12-20 18:19:07.802525925 -0800
@@ -2,6 +2,7 @@
{
volatile uint16_t RPMsgRemoteReadyEventData = 0;
volatile unsigned long remote_addr = 0;
+ struct rpmsg_lite_endpoint * volatile shared_ptr = NULL;
struct rpmsg_lite_endpoint *my_ept;
rpmsg_queue_handle my_queue;
struct rpmsg_lite_instance *my_rpmsg;
So, yeah. Put 'em in columns.
When declaring types, the pointer decorator goes next to the variable name, and not next to the type.
Rationale
This is an artifact – and requirement – of C syntax. Consider the following code, which I see far too often:
struct rpmsg_lite_endpoint * my_ept;
uint16_t * skip_len;
int count;
/* ... */
Looking at this, it would be easy to get the idea that every name appearing
after, say, uint16_t *
is a pointer to a uint16_t
, but that is not in
fact how the line is parsed. If we hoped to add a new pointer variable
named prefix_len
like so:
struct rpmsg_lite_endpoint * my_ept;
uint16_t * skip_len, prefix_len;
int count;
/* ... */
Then we've just made a mistake. skip_len
is a pointer to a uint16_t
,
whereas prefix_len
is an instance of a uint16_t
because prefix_len
lacks the pointer decorator. Yes, the compiler would catch it when you
tried to misuse it, but that's not the point. The point is to not confuse
or mislead the reader. Hence keeping the pointer decorator next to the
name, which is what C demands, anyway:
struct rpmsg_lite_endpoint *my_ept;
uint16_t *skip_len, prefix_len;
int count;
/* ... */
Not crystal clear, but far more clear than the mystifyingly popular alternative.
(Note: To mitigate this issue, some C coding guides mandate that each variable gets its own declaration on a separate line. Google's C coding guide is a bit more flexible, allowing multiple scalar declarations on one line, but requiring pointers to each have their own line. Google's guideline is probably the most reasonable balance.)
Declaration Order – Variables and Structure Fields
Declare variables and structure fields in order from largest to smallest object.
Rationale
This is especially important for structure fields, as C lays out fields in memory in the order specified. Most modern platforms have data alignment requirements. For example, a 16-bit value must be located on a 16-bit boundary in memory, and if you try to read one from an odd address Bad Things will happen, ranging from dreadful performance to a illegal address exception.
To ensure this doesn't happen, the compiler will insert pad bytes where necessary to ensure the object meets platform alignment requirements. A potential side-effect of this is, if you lay out structure fields poorly, you can end up with a lot of space wasted on pad bytes.
struct bad {
int8_t srctype;
void *src;
int8_t desttype;
void *dest;
};
struct good {
void *src;
void *dest;
int8_t srctype;
int8_t desttype;
};
printf ("bad = %d; good = %d\n", sizeof (struct bad), sizeof (struct good));
The sizes that will be printed out are platform-specific, depending on your
machine's alignment requirements and pointer size. However, struct bad
will always be larger than struct good
because of the padding needed to
ensure the src
and dest
pointers are properly aligned. Assuming a
32-bit machine requiring 32-bit alignment, the structures would be laid out
in memory as follows:
Bad:
+---------+---------+---------+---------+
0000: | srctype | <pad> | <pad> | <pad> |
+---------+---------+---------+---------+
0004: | src |
+---------+---------+---------+---------+
0008: |desttype | <pad> | <pad> | <pad> |
+---------+---------+---------+---------+
000C: | dest |
+---------------------------------------+
0010:
Good:
+---------+---------+---------+---------+
0000: | src |
+---------------------------------------+
0004: | dest |
+---------+---------+---------+---------+
0008: | srctype |desttype | <pad> | <pad> |
+---------+---------+---------+---------+
000C:
On this machine, struct bad
requires 16 bytes of storage, with six bytes
wasted on padding, whereas struct good
needs only 12 bytes with only two
pad bytes at the end. (Yes, you could use GCC's __attribute__(packed)
attribute to eliminate the padding, but src
definitely and dest
probably
would be misaligned, causing an exception.)
By declaring fields in order of decreasing size, you reduce the need for padding and the chances of inadvertently wasted space.
Where possible, try to group together variables and fields that have a relationship to each other. This is admittedly vague and very subjective.
Declaration Order – Function Parameters
If the function operates on a struct or other data in an object-like fashion, the pointer to that data -- the thing upon which the function operates -- appears first. "Output" parameters, if any, appear next, followed by "input" parameters.
If a passed pointer requires a companion size (e.g. maximum length of a buffer), the size parameter immediately follows the pointer.
Rationale
Placing the "object" pointer first is already a fairly common convention, and should be no surprise:
struct Window *w;
struct Canvas *c;
/* ... */
MoveWindow (w, xpos, ypos);
ResizeWindow (w, wide, high);
c = w->w_Canvas;
FillCanvas (c, COLOR_BLACK);
SetFGPen (c, COLOR_GREEN);
MoveTo (c, 0, 0);
DrawTo (c, 320, 240);
The outputs-before-inputs guideline allows the parameter order to visually resemble simple assignments and algebraic expressions if they were available, saving you from having to flip parameters around inside your head.
/*
* Copy the contents of src to dest. Or, algebraically:
*
* dest = src;
*/
memcpy (dest, src, byte_count);
/*
* Imaginary function that trims whitespace from the NULL-terminated input
* string and copies the result to a destination buffer. Or, algebraically:
*
* dest = trim (input);
*/
trim_whitespace (dest, dest_siz, input);
/*
* Blend two RGB colors together. Or, algebraically, and if we had tuples:
*
* (ro, go, bo) = (r1 + r2 / 2, g1 + g2 / 2, b1 + b2 / 2);
*/
blend_rgb (&ro, &go, &bo, r1, g1, b1, r2, g2, b2);
Brace Position
Wikipedia actually has a surprisingly nice discussion on various schools of thought regarding brace positioning, including their historical origins. My preferred embodiment most closely resembles "K&R" style.
In general, braces live at the same indentation level as the parent block or control statement.
Illustration:
/* Good */
void
func (void)
{
if (var != 0)
{
do_the_non_zero_thing ();
}
}
/* Poor */
void
func (void)
{
if (var != 0)
{
do_the_non_zero_thing ();
}
}
/* No. Just... no. */
void
func (void)
{
if (var != 0)
{
do_the_non_zero_thing ();
}
}
With limited exceptions, nothing appears to the right of an opening brace.
Illustration
void
func (void)
{
/* Poor. */
if (var != 0)
{ do_the_non_zero_thing ();
}
/*
* Exception: If you're doing many small tests and operations that are
* related to each other, then you can lay out things like this:
*/
if (var & (1 << 0)) { ++x; trig = true; }
if (var & (1 << 1)) { ++y; trig = false; }
if (var & (1 << 2)) { --x; trig = true; }
if (var & (1 << 3)) { --y; trig = false; }
}
If the block structure remains obvious when the brace is on the same line as the control statement, then keep the brace on the same line (exception: function declarations). Otherwise, place the brace on its own line.
Examples
/* Keeping the brace on same line. */
void
func (void)
{
/*
* In this example, the block structure is perfectly clear.
*/
if (i >= 0 && i < ARRAY_LEN) {
do_loop_thing ();
}
/*
* In this example, however, keeping the brace on the same line obscures
* the block structure, so this is bad.
*/
if (i >= 0 && i < ARRAY_LEN && table[i] != NULL &&
table[i].type != SKIP_ENTRY && table_entry_is_valid (&table[i])) {
do_loop_thing ();
}
}
/* Giving the brace its own line. */
void
func (void)
{
/*
* In this example, the block structure is still perfectly clear,
* but *no clearer* than in the same-line case, and this consumes
* vertical space, particularly if you have a large tower of 'if'
* statements of this form.
*/
if (i >= 0 && i < ARRAY_LEN)
{
do_loop_thing ();
}
/*
* In this case, placing the brace on its own line *greatly* clarifies the
* block structure, and does double-duty to alert you to the multi-line 'if'
* statement. Again, the goal is readability and reduction of hunting effort.
*/
if (i >= 0 && i < ARRAY_LEN && table[i] != NULL &&
table[i].type != SKIP_ENTRY && table_entry_is_valid (&table[i]))
{
do_loop_thing ();
}
}
Switch-Case Layout
case
statements are aligned with their parent switch
statement.
Rationale
Indenting the case
statement, and then indenting the actual code just
wastes horizontal space, and does not improve clarity (this is especially
true if you're using TAB-based indentation).
/*
* Poor.
*/
switch (var) {
case VALUE1:
do_thing_1();
break;
case VALUE2:
do_thing_2();
break;
case VALUE3:
default:
do_common_thing();
break;
}
/*
* Better.
*/
switch (var) {
case VALUE1:
do_thing_1();
break;
case VALUE2:
do_thing_2();
break;
case VALUE3:
/* FALLTHROUGH */
default:
do_common_thing();
break;
}
Try to leave a blank line between the case
statement and the code above it, if any.
Unless there is a technical reason to do otherwise, place the default
case
at the bottom.
Using fallthrough is fine; just be clear about it. Old versions of the
lint
syntax checker would warn against fallthrough, but the warning could
be squelched by adding the comment /* FALLTHROUGH */
at the end of the
case, as if to say to lint
and the reader, "I meant to do that." The
Linux kernel sources also introduces the synthetic keyword fallthrough
for
precisely this purpose.
Duff's Device: Be very clear about what you're doing. But also be sure you actually need it -- benchmarks show Duff's Device as often slower than the optimizations and loop-unrolling provided by modern compilers. (Yeah, surprised me, too.)
Spacing
Operators
Unary arithmetic operators (-
, ~
) shall have no space on their
right-hand side.
Binary arithmetic, bit-wise, and comparison operators (+
, -
, *
, /
,
|
, &
, ^
, ==
, !=
, etc.) shall be surrounded on each side by at
least a single space.
Binary logical-and and logical-or operators (&&
, ||
) shall by surrounded
on each side by at least two spaces.
Rationale
Again, ease of gloss-ability. This especially pays dividends when dealing with code doing a lot of bitfield masking and comparison (as in hardware and graphics drivers). Consider:
if (hw->status & REG_STATUS_IRQACTIVE && hw->irqs & REG_IRQS_FIFOEMPTY) {
/* ... */
}
Versus:
if (hw->status & REG_STATUS_IRQACTIVE && hw->irqs & REG_IRQS_FIFOEMPTY) {
/* ... */
}
It's subtle, but the additional space lets the &&
stand out, helping
visually separate the left and right expressions. And if your response is,
"Just add parentheses," no. That adds to visual clutter, especially with
nested parens:
if ((hw->status & REG_STATUS_IRQACTIVE) && (hw->irqs & (REG_IRQS_FIFOEMPTY | REG_IRQS_FIFOFULL))) {
/* ... */
}
And even if operator precedence requires the parentheses, two spaces around the logic op still aids in visual separation:
if ((hw->status & REG_STATUS_IRQACTIVE) && (hw->irqs & (REG_IRQS_FIFOEMPTY | REG_IRQS_FIFOFULL))) {
/* ... */
}
For-Loops
If all three clauses of a for-loop will fit on a single line, separate each clause by two spaces following the semicolon. Where the clauses of the for-loop need to subtend multiple lines, place each clause on its own line.
Rationale
Same basic principle as above with the logical-AND and -OR operators – the extra bit of separation helps each clause stand out, letting you zero in on the significant bits.
/*
* Here's an example of very wrong styling. Everyone agrees it's wrong because the
* lack of spacing causes everything to visually run together, obscuring structure
* and content:
*/
for (i=0;i<length;++i) { /* ... */ }
/*
* So naturally one would add a space to separate the clauses. Still ugly,
* but a marked improvement:
*/
for (i=0; i<length; ++i) { /* ... */ }
/*
* However, in the following case, even though everything is spaced out, all the
* spacing is uniform, causing everything to visually run together in a manner similar
* to the very wrong example above:
*/
for (i = 0; i < length; ++i) { /* ... */ }
/* So, iterate on the solution: Add another space after the semicolons: */
for (i = 0; i < length; ++i) { /* ... */ }
/* Another example. Every little bit helps: */
for (node = FIRSTNODE (list); next_node = node->next; node = next_node) { /* ... */ }
for (node = FIRSTNODE (list); next_node = node->next; node = next_node) { /* ... */ }
/*
* In loops with long expressions or tracking multiple iterators (yes, this is valid),
* breaking it up by clause over multiple lines is preferable to leaving it all on one
* one line, and far preferable to breaking it at some arbitrary column boundary:
*
* Single line, with single-spacing everywhere:
*/
for (nodeA = FIRSTNODE (listA), nodeB = FIRSTNODE (listB); (nextA = nodeA->next) && (nextB = nodeB->next); nodeA = nextA, nodeB = nextB)
{
/* ... */
}
/* Double-spacing between clauses and around '&&' operator. Even this modest change helps: */
for (nodeA = FIRSTNODE (listA), nodeB = FIRSTNODE (listB); (nextA = nodeA->next) && (nextB = nodeB->next); nodeA = nextA, nodeB = nextB)
{
/* ... */
}
/* Arbitrary break before column 80 (MISRA C requirement): */
for (nodeA = FIRSTNODE (listA), nodeB = FIRSTNODE (listB); (nextA =
nodeA->next) && (nextB = nodeB->next); nodeA = nextA, nodeB =
nextB)
{
/* ... */
}
/* Each clause on its own line: */
for (nodeA = FIRSTNODE (listA), nodeB = FIRSTNODE (listB);
(nextA = nodeA->next) && (nextB = nodeB->next);
nodeA = nextA, nodeB = nextB)
{
/* ... */
}
/* Extra credit, visually grouping related operations together: */
for (nodeA = FIRSTNODE (listA), nodeB = FIRSTNODE (listB);
(nextA = nodeA->next) && (nextB = nodeB->next);
nodeA = nextA, nodeB = nextB)
{
/* ... */
}
Here's a real-world C++ example I recently bumped into:
for (auto it=s.begin(),end=s.end(); it != end; ++it) {
if (*it >= 0x20)
result += *it;
}
The initialization portion is all packed together. Depending on the typeface you're using to view this, that comma can look like a dot, adding to confusion, especially if you're familiar with the method-chaining techniques common in Java(Script) and Rust.
The point is: That line interrupts your reading and makes you stop and stare, trying to pick it apart. Spacing it out makes it much clearer:
for (auto it = s.begin(), end = s.end(); it != end; ++it) {
if (*it >= 0x20)
result += *it;
}
Structure Field Assignment
When assigning values to fields of a given struct, arrange the fields and
values into columns. The =
operators should be aligned on the right-hand
column.
Rationale
Again, this should be self-evident when spelunking unfamiliar code, and is simply a permutation of the guideline on variable declaration.
/*
* Poor.
*/
IOInfo ioInfo;
memset (&ioInfo, 0, sizeof (ioInfo));
ioInfo.ioi_Command = SPORTCMD_COPY;
ioInfo.ioi_Offset = mask;
ioInfo.ioi_Send.iob_Buffer = sourceAddress;
ioInfo.ioi_Send.iob_Len = numBytes;
ioInfo.ioi_Recv.iob_Buffer = destinationAddress;
ioInfo.ioi_Recv.iob_Len = numBytes;
/*
* Better.
*/
IOInfo ioInfo;
memset (&ioInfo, 0, sizeof (ioInfo));
ioInfo.ioi_Command = SPORTCMD_COPY;
ioInfo.ioi_Offset = mask;
ioInfo.ioi_Send.iob_Buffer = sourceAddress;
ioInfo.ioi_Send.iob_Len = numBytes;
ioInfo.ioi_Recv.iob_Buffer = destinationAddress;
ioInfo.ioi_Recv.iob_Len = numBytes;
Function Declarations, Calls, and Arguments
For function declarations and calls, a space shall appear between the function name and the opening parenthesis (exception: Function calls with no arguments). No space shall follow the opening parenthesis.
At least once space shall follow each comma in the parameters to a function call.
Rationale
If it's good enough for Strunk & White, it's good enough for you.
Seriously, I can appreciate that computer programs are math-y, and
mathematical typography doesn't allow for spaces between the function name
and its arguments. But it's english-y, too, and having an english word
smashed against an open-paren interrupts my glossing of the text. Oh, and
pointing out how for()
, if()
, sizeof()
, etc, aren't really functions
and therefore should be treated differently isn't going to pass the
splitting-excessively-fine-hairs test.
Spaces between function arguments should be self-evident.
/*
* Poor.
*/
memset(&ioInfo,0,sizeof(ioInfo));
/*
* Also poor.
*/
memset( &ioInfo, 0, sizeof( ioInfo ) );
c = getchar( );
/*
* Better.
*/
memset (&ioInfo, 0, sizeof (ioInfo));
c = getchar ();
c = getchar(); // Acceptable.
For function declarations, the function's return type, including the pointer decorator (if any), appears on its own line; the function's name shall begin in the first column of the following line. This guideline does not apply to forward declarations or prototypes.
Rationale
This ensures that the function's name is always at the leftmost edge, so I don't have to hunt for it in the middle of the line.
/*
* Poor
*/
void io_reset (void *handle)
{
/* ... */
}
struct io_block const *io_read_thing (void *handle, size_t nbytes)
{
/* ... */
}
/*
* Better
*/
void
io_reset (void *handle)
{
/* ... */
}
struct io_block const *
io_read_thing (void *handle, size_t nbytes)
{
/* ... */
}
Line Wrapping
When wrapping long lines, break the lines before operators such that the
operators are aligned on the left. Align under the =
operator or the
enclosing opening parenthesis where practical.
Rationale
The idea is to avoid having to hunt on the ragged right edge of the code for the operator that applies to the next operand. Some examples should help illustrate this:
/*
* Poor.
*/
if (i >= 0 && i < ARRAY_LEN &&
table[i] != NULL &&
table[i].type != SKIP_ENTRY &&
table_entry_is_valid (&table[i]))
{
do_loop_thing ();
}
/*
* Better. For extra credit: Note extra space after the if's opening paren
* so that all the expressions line up.
*/
if ( i >= 0
&& i < ARRAY_LEN
&& table[i] != NULL
&& table[i].type != SKIP_ENTRY
&& table_entry_is_valid (&table[i]))
{
do_loop_thing ();
}
/*
* Poor.
*/
sccb->sccb_Flags |= CCB_NPABS | CCB_SPABS | CCB_PPABS | CCB_LDSIZE |
CCB_LDPRS | CCB_LDPPMP | CCB_YOXY | CCB_ACW | CCB_ACCW | ccbextra;
/*
* Better.
*/
sccb->sccb_Flags |= CCB_NPABS | CCB_SPABS | CCB_PPABS
| CCB_LDSIZE | CCB_LDPRS | CCB_LDPPMP
| CCB_YOXY | CCB_ACW | CCB_ACCW
| ccbextra;
Improved Idiomatic Habits
When taking the size of a structure or type, take the sizeof()
the
variable that holds the allocation, not the sizeof()
the type.
Rationale
Suppose we have this:
struct IORequest *ioreq;
/* Blah, blah, blah... */
ioreq = malloc (sizeof (struct IORequest));
if (ioreq) {
memset (ioreq, 0, sizeof (struct IORequest));
}
This is perfectly fine. However, after several releases, a newer, larger IORequest structure is needed, and the code gets changed to:
/* New I/O request, now supporting tint control! */
struct IORequest_ext *ioreq;
/* Oops! Forgot to update the types given to sizeof(). */
ioreq = malloc (sizeof (struct IORequest));
if (ioreq) {
memset (ioreq, 0, sizeof (struct IORequest));
}
In a compact example like this, this is obviously wrong. It can become far
less obvious if the variable declaration and the allocation/memset()
become widely separated in the code. In either case, the compiler will not
notice, and the only time you'll find out is when the program crashes.
Maybe you'll notice it when you grep
the code; maybe your IDE's
refactoring tool will point it out (yeah, right)...
Or you could avoid the issue entirely by taking the sizeof()
the variable used
to hold the allocation, like so:
struct IORequest_ext *ioreq;
/* This will always be correct, regardless of the type of `ioreq`. */
ioreq = malloc (sizeof (*ioreq));
if (ioreq) {
memset (ioreq, 0, sizeof (*ioreq));
}
"Ah! But if I use an RAII-like pattern, then I'm less likely to get bitten!" Possibly. But you'd still have to change the type in at least two locations:
/* Oops! Forgot the second one. */
struct IORequest_ext *ioreq = malloc (sizeof (struct IORequest));
/* This still always works. */
struct IORequest_ext *ioreq = malloc (sizeof (*ioreq));
C Language Misfeatures
Bitfields
Avoid using C's bitfield feature in general code. Never use the bitfield feature in code that accesses hardware registers.
Rationale
Recall that modifying a bitfield in a wider integer requires a read-modify-write operation. So if we have this:
struct fielding {
unsigned int valid : 1;
unsigned int count : 4;
int opcode : 3;
};
struct fielding f;
f.count = 12;
f.opcode = 2;
f.valid = 1;
What is the actual order of operations at the machine level? Is each field
committed singly, or does the compiler collect them together and write all
eight bits at once? Does it in fact write eight bits, or 16, or 32, or...?
Do bitfields straddle byte/word boundaries? And do all compilers that
your code might be fed through behave the same? For data that lives
entirely in RAM, these are merely questions of efficiency/performance. When
manipulating hardware registers, it becomes a crucial question of whether
the values are written in the correct order so that the hardware will
operate properly (and no, volatile
does not necessarily save you).
While it's a well-intentioned language feature, its shortcomings make it unsuitable for anything except the most trivial uses. Just avoid it.