Dependable C, is an attempt to document a subset of C for developers who want to write Dependable C.
Imagine you are using a high level language, Like Python, Java or C#. You need a utility library to handle, some complicated maths, file parsing, or other basic functionality. What language would you want this library to be written in? The obvious one would be the language you are already using, but most languages aren't stable enough for this. A 10 year old Python library is unlikely to run in the version of Python you run without constantly being maintained. You want it to be fast so a compiled language would be better. C++ and Rust suffer from the same backwards compatibility issues and need special toolchains, they also require special bindings to be used form other languages. What you want is a library written in C. Its fast, simple, runs everywhere and in a pinch most people can read the code. But what kind of C do you want it to be written in? You want it to be written in the plainest possible C, that doesn't require any extensions, compiler settings, build steps. You should be able to use it with whatever toolchain your platform has available and not rely on a specific implementation or version of the language.
This is a library written in Dependable C. Its not fancy, its not "modern" but it works, always and it runs fast. The most important feature of any code is that you can make it compile and run as intended. That's what Dependable C prioritizes. Not the ergonomics of the programmer, but how much the user can depend on it working for their use.
Very few people want to write Dependable C, but everyone wishes everyone else wrote their code in Dependable C.
C23, and the upcoming C2Y are language versions that have become increasingly complex, include many new keywords, flow control, and a revised Charter that differs from "Classic C". Later versions of C are also only supported by two implementations out of the hundreds of C implementations available. The Delta between ANSI C and C2Y is arguably larger than the Delta between ANSI C and the first version of C++. This means that for developers who want to develop, widely portable, and compliant, software in Classic C, the latest ISO C standards are a poor guide. Reading earlier versions of standards is also not sufficient, since they do not include lists of features that have since been deprecated, or any guide as to what parts of the standard have had poor implementation support. This is why Dependable C exists, its a guide to help you write C code that is universally accepted.
C is the most portable and widely implemented language available. C has been called the lingua franca of computing. A problem solved in C will remain solved for the foreseeable future. Changes in operating systems, computing environments, or hardware are unlikely to render a well written C implementation obsolete. A library written in C will be able to be used from almost any language. While many programmers don't use C, many can read and understand C. This mean that code written in C can be modified by a larger pool of programmers.
If quality is the measure of longevity, C is a prime candidate for writing high quality code.
Not all C code is portable, or will compile the same in all compilers, or can even be understood by most C programmers. C has a long history of quirks, and corner cases that can be hard to navigate. Writing non-portable code that is only intended to run on one platform and be built with a particular tool chain is perfectly legitimate, but if you want to write code that is portable, and remains usable for decades this guide if for you. Who values writing code that is guaranteed to compile and work correctly, over having the latest language features.
Dependable C is the opposite of a dialect. It is a C that is trying to be as middle of the road as possible in order to be understood and implemented as widely as possible. Think of it as Newscaster C, a neutral, universally understood, language.
Dependable C is not a style guide, it does not prescribe formatting, indentation and style. It simply tries to document what C functionality can be depended on and how. It is perfectly valid to use Dependable C as a guide for what functionality to use, while at the same time to adhere to a style guide like Misra. The Misra standard prioritizes safety, where as Dependable C prioritizes Compatibility. It is entirely possible to adhere both at the same time.
In some cases features that have been introduced in later versions are needed, and in these cases we will try to document how to access these features in a dependable way.
Many languages have derived their syntax from C. C++, Java, C#, D, JavaScript, Objective-C to name a few. Almost all of these languages are based on C89, and have not incorporated C99 or later features.
The purpose of this project is to document the small subset of C that is dependable, it therefore high discourages writing standard compliant code without any UB code. However in some very rare occasions, this guide will highlight where writing code that is technically UB is permitted, because in practice it is dependable. Likewise there are many, many ways to write technically standard compliant code, that will be far from dependable, and in some cases no implementations do not even exist (See annex K). The goal is to give guidance as how to write code that works in the real world, on real implementations, not just paper products produced by a standard body. Having written that, most implementations study the standard carefully and do their best to follow it, and the standard body goes to considerable length to try to make the standard as complete and clear as possible.
Most other languages only have one or very few implementations. This means you can rely on the implementation's behaviour to not vary between platforms. C has numerous implementations and with a very wide range of complexity and feature support. Many C implementations have bugs, and they mostly manifest when you stretch the language to its limits. All basic functionality can be relied on because the most idiomatic code is also the most tested code. Compiler developers use publicly available code to test their implementations, and therefore a more common construct is much more likely to have been rigorously tested than an esoteric corner case. By writing code in a syntax that you can be sure all compilers have encountered in the past, you minimize the chance that you will trigger a bug. Code should try to avoid relying on the user having the latest version of a compiler. Some platforms may have had their support deprecated by major compilers, or may only be supported by a specific compiler.
The compiler is only one small part of the larger C eco-system of implementations. There are Linters, sanitizers, formatters, debuggers, syntax highlighting, documentation generation tools and many other tools that implement the language to some degree, and they all have their own limitations. Ideally you want your code to be able to take advantage of all these tools, by staying within their limitations.
Dependable C advocates for using a subset of all versions of C.
Given that C89 is the smallest of the C standards, in practice this means a subset of C89. Simply using the C89 standard is not enough to fully understand C. Many of the changes that have been made to the C standard text in the years since it was published, address ambiguities and issues with previous versions. If something is unclear in one standard but has been clarified in later standards, users tend to get the clarified behaviour even when they set their compiler to follow the earlier standard. Given that C89/ANSI C was the first version of the language, it is the version of the standard written with the least implementation experience, and therefore have lots of issues.
Many languages have derived their syntax from C. C++, Java, C#, D, JavaScript, Objective-C to name a few. Almost all of these languages are based on C89, and have not incorporated C99 or later features. This means that programmers who mainly use these languages have difficulty reading code written using the later versions.
While a C89 subset is recommended, the point of writing Dependable C is to be universally accepted, and that includes being accepted by compilers set to any version of C. You may choose to set your compiler to adhere to a strict C89 subset in order to verify that your code is not using any newer functionality, but it should run just as well using any other version of C. Your code should not require a compiler that has a C89 mode, it should be universal. This is why Dependable C discourages the use of any deprecated functionality or any functionality that clashes with new C features (see "auto").
The vast majority of features added to the C standard since C89 add new ways of doing thing that are already possible to do in C89 if you know how. It is our intention to try to document as much as possible of this over time. In some cases features that have been introduced in later versions are needed, and in these cases we will try to document how to access these features in the most dependable way possible.
The C programming language is, unfortunately unfixable. Fortunately C is good enough to not need to be fixed.
One of the greatest strengths of C is its compatibility. C has more implementations than any other programming language, more existing code, more documentation and more experienced programmers than any other language. The cost of breaking all of this compatibility, is simply higher than the value brought by any improvements in the language.
There are a wide range of C dialects and proposed replacements, that all try to fix deficiencies of C. However, almost none of them have had any success.
The ISO C standard has until C23 been taking backwards compatibility seriously. This means that they have been unable to remove functionality, only add new functionality. On some rare occasions, features have been marked as deprecated, but in practice, it has not been possible to remove these features from implementations, because users simply need these features to compile existing code.
A situation where features can only be added, never removed, serves a language like C poorly, since one of its core values is its simplicity, compactness and easy of implementation. Stability is also poorly maintained by a group of language designers, who not surprisingly, want to design language features. People do not join standard organizations in order to not develop the standard. (In general, my personal experience is that the members of the ISO C wg14 standard body, are competent, hard-working and very knowledgeable, and have the best of intentions. However, when enough people want to add "just one thing", the result is not a clean design.)
While the wg14 have historically, worked hard to maintain backwards compatibility, they have ignored compatibility in the opposite direction. Writing code in newer versions of C simply makes it incompatible with many platforms and implementations. Often code written in newer versions of the language will not compile in older implementations, but on occasions the meaning of the code would simply change. This is obviously very dangerous.
An example of this hazard is the removal of some UB. On first glance, it seems like a clear improvement to define behaviours that have in the past needlessly been undefined. But it is problematic, if programmers reads a later standard that makes a guarantee, that isn't guaranteed in most implementations that where written before the behaviour was defined. The behaviour may be technically defined, but in practice it is still not dependable since it has a history of being undefined, and unlike in the past, this hazard is no longer clearly spelled out in the standard. The good intended effort to remove an issue, instead creates an issue. This is one of the issues that compelled the Dependable C effort.
A time traveller going back to 1972, could address many issues in C, but today the situation is much more complicated. Luckily, the small subset defined here, is more than capable of doing everything that needs to be done. In the grand scheme of thing, the sacrifices are minor. Most of the issues of C, for a developer can simply be addressed by "just don't do that then", implementations don't have that luxury since they need to compile existing code that isn't always as well written.
C is a language that can be used to write portable and non portable code. C is used to write code that can compile and run on many different platforms and we generally refer to this as portable code. C is also designed to be implementable on very exotic architectures that are very different from mainstream architectures. There are for instance DSPs where bytes are 32 bit sized. Its incredibly important for these architectures that C supports such a wide range of architectures because often C is the only programming language that can serve these architectures (outside of assembler). Because these platforms are so different, most of the code we would consider portable C would not work on these platforms. There is therefore worth considering these platforms as platforms you can program for in C, rather then as platforms that run portable C code. I choose to distinguish between "exotic" and "conventional"
Writing fully standard compliant portable C code can run correctly on both conventional and exotic platforms that have a C implementation can only be done in very limited cases. Here are just a few reasons why:
Even if you manage to write a program within these constraints, the C language doesn't guarantee any minimum stack size or availability of other resources, so there are still no guarantees that your program will be able to run all possible platforms.
Trying to write fully portable C code becomes somewhat meaningless from this extreme academic point of view. The goal should be to make reasonable decisions about your requirements and make your code as portable as possible. Only write non-portable code when you have to. Your software may only run on a few modern architectures, but you do not know where hardware will go in the future, and you don't know when you will want to re-use some small component of it, for a tiny embedded, or old architecture.
As a general rule, it is good to always write standard compliant code unless you have good reason not to do it.
If there are two ways of doing something, one which is correct, and one that isn't, but the results are identical, always do it the correct way. Only limit your code's portability intentionally, never do it casually because you do not think it will matter. An example of this is null-terminator characters. The null termination character '\0' is defined as a character that has all bits set to zero. The integer zero's bit representation is platform defined, so while I have never heard of a platform where all bits set to zero does not represent 0, in theory it would be possible that on a implementation, this string is not null terminated:
char hi \[] = {'H' , 'i' , 0 };
I always, terminate strings with '\0' instead of 0, not because I expect to have problems with using 0 as a null terminator, but because it's the correct way to do it and it doesn't cost anything to do it right.
Drawing a definitive line between what platforms one should consider common and exotic, is difficult, and each application project has to decide what requirements they may have. There are platforms that everyone can agree is exotic, or mainstream, but there are also a lot of less known platforms that have a surprisingly large footprint and future platforms may also look different from today's common platforms.
Any software project has requirements, just because your code adheres to a language standard, does not mean that it is meaningful to run on all platforms that do support the language. C is a language that can be implemented on very exotic platforms, where for instance bytes aren't 8 bit or that have very limited memory. It is perfectly reasonable to write software that follows the Dependable C guidelines, but that isn't portable to hardware having smaller pointers than 64 bits.
None of these are guaranteed by the standard.
There are some assumptions you can make, about the platform to make development easier, that do cut in to what we may consider a common platform. For each assumption we make, we reduce the dependability of our code.
While these assumptions can be considered reasonable, and will work on most platforms, they are fairly simple to avoid, and details of how to do so are covered in subsequent chapters.
The standard does in some cases mandate the implementation to issue a diagnostic message, but it also defers to implementations to issue warnings for whatever they like. There is therefore no way to write C code that is free from warnings. An implementation is entirely free to warn the user that they are writing C in the first place.
Warnings are thus meant to be ignorable. Many implementations have options that turn all warnings in to errors, many developers have as a policy to turn this feature on. This causes a problem, because as implementations advance and are able to detect more issues, new warnings causes builds to break. This in turn causes users to complain to the implementors, and implementors are disincentivized from providing additional diagnostics. Many or the major C implementations like gcc, llvm, and Msvc refrain from adding almost any new warnings for this reason.
This means that many warnings are turned off by default, and users have to manually turn them on. Many warnings warn are benign issues, while some really important warnings may be turned off by default. Because of this we recommend, taking the time to enable and disable warnings that are relevant to you and your coding style. We also do recommend turning relevant warnings in to errors, during development. What warnings you enable, disable, or elevate to errors, should depend on your requirements, and the types of bugs you tend to write, and you and your teams experience level. All developers are different, and there are many practices that some developers would want the compiler to warn against, that other developers are comfortable using to their advantage.
Dependable C, encourages C++ compatibility in all interfaces, but does not guarantee code to be compiled correctly in a C++ compiler. C++ is not a subset of C, and the differences between the two languages are subtle and often unintended. Being able to write code that is guaranteed to produce the same results in both C and C++ requires deep knowledge of both languages and is not something we recommend. We strongly encourage header files to be C++ compatible and not contain any functions. We also discourage any use of C++ keywords.
In general the quality of implementations of the standard library are very good on most platforms. They can be considered reliable, while the design of the standard library is somewhat flawed. It is not recommended to use all of the standard library for various reasons, but to be selective.
The standard library is technically not a library. It is part of the, as the standard put is "implementation" of the language. You can't entirely separate the library form the language even in implementation.
There are many complaints about the lack of features in the C standard, but in my opinion this is a good thing. Most languages, only have a single, or very few implementations. For example Python is almost excusably used in the CPython implementation, and therefore users can rely on the expansive Python standard library implemented by this implementation. The implementation essentially defines the language. C is different, C has very many implementations and is defined by a standard document written in prose. Translating this prose in to code invariably creates subtitle variations in implementations caused by bugs and different interpretations of the standard. Consider a extremely reliable API like Curl, zlib, SQLite. If C standard adopted these interfaces, wrote a specification of such an interface and asked each standard library to implement them, we would invariably end up with far less reliable interfaces, because they would not be based on a single implementation. These interfaces are far more reliable as independent projects. They are run by people with much better domain experience than the C standard body. The very process of having multiple implementations of the same functionality, leads to divergence and that is the opposite of the goal with standardization. For these kinds of utilities, rely on good implementations rather than looking for the standard library to do the work.
There are two reasons why you should use the standard library: either because of portability or intrinsics.
Functions that deal with the operating system, like malloc, free, fopen, fclose, fread and fwrite are universally implemented and portable and therefore provide good reliability. Platform specific functions to do the same thing may offer additional features, but are inherently not as dependable. It's strongly recommended to use these as they provide good reliability, readability and are universally well implemented. The printf family of functions are also dependable although not all features of these functions are.
The standard library is not a stand alone library, it is part of the language and therefore many compilers replace calls to the standardized library with extremely well written and optimized platform specific assembler. Often these use machine specific instructions that are otherwise not accessible from the C programming language. If you call "cosf()" you are invoking a maths operation no less integrated in to the language than typing a "+" operator. Compilers can be expected to know what the functionality does, and replace it with the corresponding implementations.
Functions that are often implemented as intrinsics are: memcpy, memove, memset, malloc, free (yes compilers to optimize out memory allocation) and maths functions from math.h.
- malloc realloc free calloc (see "Initialization in C", for more details on calloc)
- memset memcpy memmove
- exit abort
- assert
- Math functions.
There is a misconception the malloc is slow. Memory allocation is slow, because it's a hard problem, but malloc is almost always an extremely well-implemented solution to this problem. Because malloc is used to pervasively, and all kinds of very performance critical work depends on it, it's the focus of a lot of care and optimization.
It is generally good advice to avoid memory allocation as much as possible, but when you do allocate memory, malloc is the most dependable way to do so.
Custom allocators have their place where you can up front allocate a lot of memory, and then use that pool of memory. If does however have some major drawbacks. If you allocate one pool of memory you have to anticipate the amount of memory you might need in the future and if you reach that limit, you have to find a graceful fall back that adds complexity. If you do not use the memory, or reduce the use of memory, there is no way to give some of the memory back to other tasks that may run on the same computer.
malloc implementations often have access to hardware facilities like the Memory Management Unit (MMU) that lets the implementation freely map the underlying memory pages to address space, and that can solve a lot of memory fragmentation issues that can't be solved as well in a custom allocators that's don't have access to special hardware features (at least not in a portable manner).
Various operating systems have a range of other facilities for allocating memory. Windows has Virtualalloc, and Linux has mmap, for instance. Obviously using these is not portable or dependable.
C can run on almost any kind of hardware. It would be impossible for the standard to describe how the code would executed on every possible hardware. Instead the C standard describes an imaginary hardware architecture know as the "abstract machine", and then stipulates that real world implementations on real world hardware can do whatever they want as long as the result, is the same **as if** it ran on the abstract machine.
The AS-IF concept is the foundation of C that enables compilers to optimize code, making C the gold standard for performance and power efficiency. Some things in C are designated as output from the program, and must therefore match
exactly the output of the abstract machine. This is known as "Observable behaviour", or just "Behaviour" (Undefined Behaviour, is NOT a behaviour). It is very important to distinguish between operation that happen in the abstract
machine, and things that happen outside the abstract machine that are observable. Most of operations in C happen inside the abstract machine, only I/O functionality like printf, and values that have the qualifier volatile are
observable in C.
The implementation can do any transformation it wants with the code that is within the abstract machine, but must strictly follow the output and ordering of any observable behaviour.
Consider the following:
int x ;
x = 3 ;
x += 2 ;
printf ("Hello" );
printf (" World %u\n" , x );
In this program the assignment and addition to the variable x happens entirely within the abstract machine. Where as the two printf's are observable and must be executed in order and have the same output as if the program ran in the abstract machine. The compiler can not transform the program to this, since it would change the observable behaviour:
int x ;
x = 3 ;
x += 2 ;
printf (" World %i\n" , x );
printf ("Hello" );
Its perfectly able to remove the variable and transform it into this:
printf ("Hello" );
printf (" World %u\n" , 2 + 3 );
Or even this:
printf ("Hello World 5\n" );
As you can see the compiler is free to radically re-write the code, as long as the observable behaviour of the program remains identical to what is described in the source code.
C's design maps in many regards very well to the instructions implemented in hardware. This gives the false impression that C is a high level assembler, and that the instructions described in the source code are one-to-one translated to the relevant assembler instruction. This is not true. The abstract machine gives the compiler a lot of latitude, to optimize. If you implement rand sort, its entirely legal (although not likely), for a compiler to replace the algorithm with a faster merge sort.
While C appears to map very well to assembler, in many ways it does not. The first and most obvious difference is that most computing architectures can not operate on memory without first moving the content into registers. Loading and storing in and out of memory is slow, so ideally you want to keep state in registers. In order to do this the compiler has to radically transform the code.
AS-IF is meant to be a firewall, between the programmer and the implementation. Ideally the programmer writes not for the hardware or the implementation, but for the abstract machine. The implementation converts operations in this abstract machine in to instructions for the real machine. This firewall, provides freedom for hardware vendors to invent new architectures, freedom to exploit these hardware architectures for implementors, and a stable platform for software developers.
The C maintains an illusion that all operations execute in order and that the next operation does not get executed before the last one is completed. This is not true. Compilers can re order re-order operations, CPU architectures help to maintain the illusion, but are and have been out-of-order architectures for decades. Given the latency of ram, and discs, hardware architectures go to great lengths to, cache memory and in other ways utilize available compute resources while waiting for other workloads to be completed, using branch prediction, out of order execution, pipelining and many other techniques. These concepts are almost entirely invisible to C programmers, but in order to be able to optimize for this type of hardware the compiler has to do a deep analysis of the code. For most programmers these things are entirely hidden by C and the compiler.
To allow for these optimizations, The standard has no requirement whatsoever about the ordering of execution in the abstract machine. Only the observable behaviour must strictly be executed in order. This creates the illusion of a simple architecture, while operation on a much more complicated one. In most cases this is something the programmer can depend on. If a programmer writes X and then Y into a file, it is guaranteed that X will come before Y in the file. The implementation can reorder as much as it wants but only to the extent that it doesn't change the programs
This strict ordering requirement, is only upheld when a program no not violate the exceptions to the model, they are.
- Undefined Behaviour
- Concurrent processing
- Time
- Volatile
Once a programme enters a state of undefined behaviour, the implementation is not required to uphold any requirements what so ever, so all bets are off. Some UB can have very subtitle effects that effect order only in some specific circumstances, on some implementations. This is why Undefined behaviour "but it seems to work on my machine" is not dependable.
Any concurrent execution needs synchronization primitives (atomics, mutices, semaphores...) to operate correctly. See the separate article on concurrency for details.
The C standard does not make any guarantees of execution time. No execution is guaranteed to be faster then another, and there are no maximum or minimum guarantees for executing times. The vast majority of implementations tend to want to execute the program as fast as possible, but there is no guarantees implementations will do so.
Any operation on volatile memory is guaranteed to be executed in order, but how this is actually accomplished is platform dependent. If you write to volatile A before volatile B, the processor will do so, but there is no guarantee that other processes, CPUs, or other hardware will be made aware of this in that order. This is all platform specified. The C standard can guarantee that if you write to one network card before the other, the CPU will write them in that order, but cant guarantee what network card will first receive the message and send it to the network.
C does have semantics for CPU to CPU synchronization, but does not have semantics for CPU to other hardware communication. volatile is a catch-all solution to this, and therefor volatile is and has to be implementation / platform defined.
Volatile, does have ordering semantics, but does not have atomicity or release/acquire semantics. Writing to a 64 bit volatile value, may on some architectures be converted in to two 32 bit operations, this can result in a "torn write" where another CPU or hardware device may access a half written value. This is why you always need to consult the platforms documentation to see what guarantees are given for volatile.
Somewhat ironically, in C the only thing you can depend on is observable behaviour, and you make something observable by making it volatile, and all details of how volatile operates is platform dependent, because it interacts with the platform.
This document is an early draft of a technical report written by the ISO wg14s Undefined behaviour study group.
This document is an educational document that tries to explain the concept of "Undefined behavior" in the C programming language. It is the combined efforts of the ISO WG14's Undefined Behavior Study group, to clarify the term, and its implications.
ISO C defines undefined behavior (UB) in Section 3.4.3 as:
behavior, upon use of a nonportable or erroneous program construct or of
erroneous data, for which this document imposes no requirements
Note 1 to entry: Possible undefined behavior ranges from ignoring the
situation completely with unpredictable results, to behaving during
translation or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of a diagnostic
message).
Note 2 to entry: J.2 gives an overview over properties of C programs that lead
to undefined behavior.
Note 3 to entry: Any other behavior during execution of a program is only
affected as a direct consequence of the concrete behavior that occurs when
encountering the erroneous or non-portable program construct or data. In
particular, all observable behavior (5.1.2.4) appears as specified in this
document when it happens before an operation with undefined behavior in the
execution of the program.
Inherent to the ISO specification of the C programming language is the concept that a set of behaviors are undefined. From this the specification derives several strengths as well as several weaknesses. UB allows a platform to either
define platform-specific behaviors or ignore the possibility of an erroneous state. The language does not require a platform to detect these errors.
Undefined behavior is used in many places in the C standard and for several reasons such as:
requiring one specific behavior would penalize hardware architectures that implement an alternative behavior.
such a state.
Undefined behavior can either be explicitly specified in the standard or remain implicit if the standard does not define a behavior. The C standards body has a goal to document all UB in the C standard, but identifying all UB is a difficult and laborious task. The standard states that the rules for undefined behaviour extend to behavior that is not specified by the standard.
Additionally, there are paragraphs in the standard where it is unclear whether a behavior is defined or not. This can mean that some platforms treat a behaviour as defined while others treat it as undefined. For example, the standard states that the first member of a struct has a zero offset from the struct itself. Some argue that this means that the first member of the struct therefore must have the same pointer address as the struct while others argue that it is undefined if the struct has the same address as its first member, as the standard does not explicitly resolve this ambiguity.
Beyond undefined behavior, the C standard defines a range of terms for behaviors. Unlike undefined behavior, each of these terms do define a constrained behavior where the implementation has some form of responsibility to uphold, even if it may differ between implementations.
All of these are different from Undefined Behavior in that, while they may produce different behaviors on different implementations, they do represent behaviors that a user can depend on, in a ISO compliant C implementation.
The C standard states that any platform is free to detect UB and to provide platform-specific behavior and document this behavior if it wishes. In this sense, what is in strict ISO C terms "UB" may be well-defined behavior on a particular implementation.
This can be very useful, because it enables implementers to extend C's capabilities, and thereby grants users access to platform-specific features. While the C language is designed to enable cross-platform development, developers are free to only support a limited set of platforms. For example, there are implementations of C that do define the behavior of out-of-bounds array writes, signed integer overflow, and dereferencing null pointers.
For brevity, unless otherwise noted, this document will consider UB only in cases where the implementation has not defined a platform-specific behavior or implementation-specific behavior.
Consider the following code:
int a [5 ];
a [x ] = 0 ;
What should happen if x is 42? A language design could issue an error, exit the program, or resize the array, among other choices. However, any of these choices would require the implementation of the language to perform a test to see if the value is within the valid range of the array.
int a [5 ];
if (x < 0 || x >= 5 ) {
/* Handle out-of-bounds write */
} else {
a [x ] = 0 ;
}
This range check would add work for the compiler and execution environment. Adding any requirement to detect if the assignment is out of bounds would come at a cost in run time performance, and complexity. Not only would the implementation have to check each access to the array, but it would also have to keep track of valid array ranges.
C is designed to be fast, simple, and easily implementable; this is why C does not require any detection of out-of-bounds states. Consequentially C cannot define a behavior for a state that isn't detected. The behavior must be undefined.
It is a common misconception that all undefined behavior in the standard stems from oversights, or to the standard body's failure to agree on an appropriate behavior. The above example clearly shows that it is not practical to define any consistent behavior for out-of-bounds array access without imposing considerable burden on the implementation to detect the state. The cost of detecting an erroneous state prevents the language from defining any behavior should it
occur.
Furthermore, if the standard were to require a program to exit on an out-of-bounds write, then the following piece of code would become a valid way to exit a program:
int a [5 ];
a [24 ] = 0 ;
This is not a good way to deliberately exit a program. It is preferred that a program exit in a manner that the standard explicitly documents as exiting, such as by calling a function named `exit`.
Reconsider this code:
int a [5 ];
a [x ] = 0 ;
Another interpretation of the above code is that if there are no requirements for an implementation to handle an out-of-bounds access, then the code contains an implicit contract that `x` can only be between 0 and 4. The implementation can then assume that the user is aware of the contract and consents to it, even
if the implementation cannot by itself determine that the contract is valid by analysis of the possible values `x` may hold. The implementation therefore need not check the value of `x`.
If the user cannot guarantee that `x` is within range, they can rewrite the code:
int a [5 ];
if (x >= 0 && x < 5 )
a [x ] = 0 ;
One big reason that many behaviors are undefined is that detecting these undefined behaviors may be difficult to do at compile time, or it may impose too much of a performance penalty at run time.
The existence of undefined behavior implies conversely that when a program has no undefined behavior, its behavior is well-specified by the ISO C standard and the platform on which it runs. This is a promise or contract between the ISO C standard, the platform, and the developer. If the program violates this promise, the result can be anything, and is likely to violate the user's intentions, and will not be portable. We will call this promise the "Assumed Absence of UB".
A C program that enters a state of UB can be considered to contain an error that the platform is under no obligation to catch or report and the result could be anything.
Consider this code:
x = (x * 4 ) / 4 ;
From a mathematical perspective, this operation should not change the value of x. The multiplication and the division should cancel each other out. However, when calculated in a computer, x * 4 may result in a value that may not be expressed using the type of x. If x is an unsigned 32-bit integer with the value 2,000,000,000 and it is multiplied by 4, the operation could wrap on a 32-bit platform and produce 3,705,032,704. The subsequent division by 4 will then produce 926,258,176. Since the standard declares that operations on unsigned integers have defined wrapping behavior, the two operations do not cancel each other out.
If we instead perform the same operation using signed integer types, things might change because signed integer overflow is UB. By using a signed integer, the programmer has agreed to the contract that no operations using the type will ever produce overflow. Therefore, the optimizer is free to ignore any potential overflow, and can assume that the two operations cancel each other out. This
mean that there is a significant optimization advantage in declaring that signed integer overflow is UB.
The assumption that the program contains no UB is a powerful tool that compilers can employ to analyze code to find optimizations. If we assume that a program contains no UB, we can use this information to learn about the expected state of the execution. Consider:
int a [5 ];
a [x ] = 0 ;
If x is any value below 0 or above 4, the code contains UB. On many platforms, `a[-1]` and `a[5]` would be assigned to addresses outside the bounds of `a`. Without requiring implementations to explicitly add bounds checks, it becomes impossible to predict the side effects of an out-of-bounds write. The implementation is therefore allowed to assume that UB will not happen. This phenomenon is known as "Assumed Absence of UB", and it lets compilers make further deductions. By writing the above code, the programmer respects a contract with the compiler that `x` will never exceed the bounds of the array.
If we consider:
int a [5 ];
a [x ] = 0 ;
if (x > 5 ) {
// ...
}
In this case, since the compiler assumes `x` must be between 0 and 4, the if statement cannot possibly be true. This allows the compiler to optimize away the if statement entirely. This completely conforms to the standard, but it removes some predictability of UB, and can make programs with UB much harder to debug. The out-of-bounds write no longer causes a predictable wild write and it also causes an `if` statement to be removed.
A common bug is to try to detect and avoid signed integer overflow with code like this:
if (x + 1 > x ) {
x ++;
}
If we assume that UB cannot happen, then we must assume the `if` condition must always be true. Consequently, many compilers will optimize away the `if` statement entirely.
The confluence of UB and more aggressive but standards-compliant compiler optimizations exposes latent bugs that may otherwise behave according to user intentions. These bugs are characterized as hard to find and diagnose. These bugs often do not appear at lower optimization levels. This means that such bugs do not appear in executables that developers produce during development. Consequently, these bugs can bypass many tests. Debuggers tend to operate on executables compiled with lower optimization settings, where many of these issues do not show up. This makes it harder to find and fix these bugs.
An early example of a vulnerability arising from such aggressive optimization is [CERT vulnerability 162289](https://www.kb.cert.org/vuls/id/162289).
A common consideration when discussing UB is the question of when UB is invoked. While some have argued that programs that are able to procure UB have no requirements whatsoever, it is the position of the WG14 UB Study Group that a program must first reach a state of UB before the requirements of the language standard are suspended. This view is shared by implementers, who have had a history of classifying instances where this isn't true as compiler bugs.
Consider the following:
int a [5 ], x ;
scanf ("%i" , &x );
a [x ] = 0 ;
In this example, a user-provided index is used to access an array of five elements. While this program may be bad form, it is well-defined until and unless `scanf` sets `x` to outside the range of the array. The developer has (implicitly) guaranteed that the index used to access the array will stay within the bounds of the array, but this guarantee is maintained outside of the program. Many programs depend on input strictly conforming to a set of requirements to operate correctly. While this may present safety and security issues, the developers must weigh those considerations against other factors, such as performance. Even a strictly-conforming program could enter a state of UB under some environmental circumstances. A program is only erroneous when it reaches UB. An implementation is not released from complying with the ISO C standard because UB is possible when executing that program; the implementation is released only once the program has entered a state of UB.
A core tenet of the C standard is the "as-if" rule. This rule states that an implementation is not required to operate in the way the program is strictly written, so long as the implementation's observable behavior (defined in C23, s5.1.2.3p6) is identical to the program. The program must behave, but not operate, as if the written program was executed.
This means that the actual program behavior can vary radically depending on how an implementation is able to transform the program, as long as its observable behaviour remains constant. For example, two non-observable operations can be reordered. Consider:
int a , b ;
a = 0 ;
b = 1 ;
These are two non-observable assignments (because neither a nor b is `volatile`). As two independent operations they are not required to be executed in any particular order. They may in fact be executed concurrently. If we then
consider:
*p = 0 ;
x = 42 / y ;
These two operations are also non-observable operations, however both operations can produce UB (either by `p` pointing to a invalid address, or `y` producing a divide by zero). Because the operations are non-observable, they may be re-ordered. If `y` is zero, there is no guarantee that `*p` is written before the program enters a state of UB.
Because any non-observable operation can be reordered and transformed, a program might reach a state of UB in an ordering not explicitly expressed in the source code. Due to the assumed absence of UB, and the "as-if" rule, a program can show symptoms of UB before any actual UB is encountered during program execution. Consider:
int a [5 ];
if (x < 0 || x >= 5 )
y = 0 ;
a [x ] = 0 ;
Using assumed absence of UB, the implementation can determine that `x` must be a value between 0 and 4, and therefore the `if` statement can be removed. This cases an out-of-order behavior known as "time traveling UB", where a program bug causes unintended consequences before the UB is encountered during program execution. It is as if the UB traveled backwards in time from the array access to the if statement.
Time traveling UB is permitted if it does not interfere with observable behaviour that occurs before entering a state of UB. Consider:
int a [5 ];
if (x < 0 )
y = 0 ;
if (x >= 5 )
printf ("Error!\n" );
a [x ] = 0 ;
In this case, the call to `printf` is an observable event, and any re-ordering requires it to execute correctly unless it is preceded by a state of UB. The compiler is not permitted to optimize away the second if statement. The first if statement however has no impact on the observable behavior and can therefore be removed.
Note: Historically, there have been cases where time travel has impacted observable state. Implementers have generally considered these to be implementation bugs. To clarify that they indeed are bugs, the document [N3128 Uecker] was proposed and accepted for c23. It adds the non-normative 3rd Note that clarifies the issue in the standard.
Consider this code:
int a [5 ];
a [42 ] = 0 ;
Every time this code runs, it will produce UB. The state of UB does not dependon any dynamic or external factors other than the code being executed. We choose to define this type of UB as "static UB", because it only depends on variables that are known at compile time. The term "static UB" is somewhat complicated because different implementations have differing abilities to detect UB at compile time. Consider:
int a [5 ];
if (x > 0 ) {
y = 42 ;
} else {
y = MAX_INT ;
}
a [y ] = 0 ;
This code also contains static UB but requires a more complex analysis to reach that conclusion. The term "static UB" denotes any UB that is not dependent on runtime state. An implementation is under no obligation to detect static UB, but if an implementation does detect static UB we have recommendations for how to proceed. Static UB denotes expressions that always produce UB even if it's not proven that the expression will ever be evaluated.
Any statement that produces a state of UB (with the exception of the `unreachable()` macro) is erroneous, unless an implementation has defined its own behavior for that statement. An implementation is under no obligation to detect any UB. If, however, the implementation doesn't detect static UB, it is free to assume the statement will not produce UB. Therefore any static UB (again, excepting `unreachable()`) should be considered a developer error and not an intended use of the language. In these cases, an implementation should issue an error with an appropriate diagnostic when it detects UB.
An implementation can assume that a program will not enter a state of UB, but no implementation should assume that a program that reaches a state of UB is
intentional.
Consider again:
int a [5 ];
a [x ] = 0 ;
The assignment may or may not produce UB. In this case if we follow the rule "assumed absence of UB", we can assume that `x` must be between 0 and 4. The assignment is an assignment, but it also provides a hint to the compiler as to what `x` may be. If we then add:
int a [5 ];
a [x ] = 0 ;
if (x > 4 )
...
The if statement here can be considered dead code and optimized away. The if statement doesn't produce UB, it just cannot happen without UB. If we instead consider:
int a [5 ];
if (x > 4 ) {
a [x ] = 0 ;
}
Again, this code may or may not trigger UB, but if the assignment is ever executed it is guaranteed to trigger UB. (Note that an implementation is not required to detect the UB). In other words, the UB is static, but only if the assignment is executed.
The correct interpretation of the detected static UB is that the code is erroneous. It is incorrect to interpret the above code as a valid way for the user to express that `x` is 4 or less. The "assumed absence of UB" rule only applies to the way a construct can be assumed to be executed, not that a construct that always produces UB will never be executed. 0ne divided by X, lets the compiler assume X is not zero, and X divided by zero should cause the compiler to assume unintended user error.
The one exception to this is the `unreachable()` macro. The `unreachable()` macro is the only way for a user to express that a statement can be assumed to never be executed. Incidentally, executing `unreachable()` is UB, but it should not be regarded as equivalent to other UB in this regard.
For example:
if (x > 4 )
unreachable ();
This is a correct way to express that a compiler can assume that `x` is smaller or equal to 4. Despite `unreachable()` being UB, it is not equivalent to:
if (x > 4 )
x /= 0 ;
Division by zero is UB, but unlike `unreachable()`, it is assumed to be a user error. The `unreachable()` macro can therefore not be implemented by the user by producing UB in some way other than the `unreachable()` macro. UB is also erroneous even when it can be determined never to be executed. The following can be detected as erroneous:
if (0 )
x /= 0 ;
C is designed to make naive, as well as highly optimizing implementations possible. The C standard therefore places no requirements or limits on the efforts an implementation takes to analyze the code. Whichever erroneous UB may be detected will therefore vary between implementations.
Operating systems and even hardware have been designed to mitigate the side effects of unintentional UB, or deliberate sabotage using UB, with features such as protection of the memory containing the executable or execution stack. Due to some of these protections, some UB is predictably caught at run time. This mitigates the unpredictable nature of UB and improves the stability and security of the system. However, this can also give the false impression that some UB has predictable side effects. While dereferencing null pointers is technically UB, doing so has a very predictable outcome (a trap) on many platforms. Even if the behavior of dereferencing null is reliable on a platform, the compilers' assumption that the code will not dereference null will make it unreliable.
Some UB was initially included in the C standard because the standard wanted to allow for different platform designs. Over the years, some designs have grown so dominant that few developers will ever encounter a platform that does not conform to these dominant designs. One example of this is two's-complement arithmetic, which causes signed integer overflow to wrap.
This means that many UBs have predictable behavior on most platforms:
| UB | Convention |
|----|------------|
| Dereferencing null pointer | Traps |
| Signed integer overflow | Wraps |
| Using the offset between 2 allocations | Treats pointers as integer addresses |
| Comparing the pointer to freed memory with a newly allocated pointer | Treats pointers as integer addresses |
| Reading uninitialized memory | You get whatever is there |
Such behavior is not defined by the C standard but can seem to be predictable. Predictability is of great value to most developers. The knowledge of how the underlying platform operates lets the developer predict and diagnose bugs. A trapped null pointer dereference is easy to find in a debugger. In fact, a programmer may deliberately add a null pointer dereference to a program to invoke a core dump. In MSVC uninitialized memory is initialized to 0xCDCDCDCD, a pattern that is instantly recognizable for any experienced Windows programmer. [https://en.wikipedia.org/wiki/Magic_number_(programming)] If the sum of two large positive signed integers results in a negative value, a wise programmer will suspect signed integer overflow which happened to wrap.
This apparent predictability of many types of UB hides the fact that UB is not predictable. This causes many programmers to either not realize that some of these behaviors are undefined or confuse UB with implementation-defined behavior. They may believe that UB is defined in the C standard and UBs may be non-portable, but they may assume that the behavior of their platform applies to all platforms, or other hosts of their machine's platform. This faulty assumption creates a variety of hard-to-diagnose issues that we will explore further.
An out-of-bounds write may have a wide range of consequences as it can disturb many kinds of state. However, most developers would assume that an out-of-bounds write is executed as a write operation, which is not true in general. If we consider another UB such as signed integer overflow, it is even less predictable that a simple arithmetic operation can have a wide range of unpredictable outcomes.
Undefined behavior in C gives an implementation wide latitude to optimize the code. This freedom has enabled implementers to successively generate faster and faster machine code, which enables significant reduction in computing time and energy consumption for a wide range of workloads. C is the de facto benchmark for efficiency that other languages are compared against and strive to match.
Significant portions of UB, such as Aliasing, Provenance and Overflow are specifically designed to enable implementations to make optimizations. Violating these categories of UB is likely to cause unpredictable behavior only when an implementation engages with these opportunities to optimize code.
As many implementations support varying levels of optimizations, a perception has formed in parts of the C community that compilers, at higher levels of optimizations, ignore the C standard and "break" code. This is a misconception. Most C implementations are consistent with the C standard even at the highest levels of optimization settings. Optimizations reveal existing bugs in the source code much more often than they reveal bugs in the compiler. These bugs are usually in violation of the C standard even when the program operates consistently with the developers' expectations.
The higher a level of optimization is employed, the more bugs are exposed, but as the code is further transformed, it also becomes harder to debug. Many tools like debuggers depend on low levels of optimizations to be able to correctly associate the binary's execution to the source code. This compounds the difficulty of diagnosing UB bugs.
Given the misconception that optimizations break code, rather than reveal latent bugs, implementers often unfairly get blamed for issues arising from UB. This has made many compilers avoid making certain optimizations, even when supported by the specification, if they anticipate a user backlash. This creates a gray area, where unsound code that contains UB may have an undocumented reliable or semi-reliable behavior. This gray area comes at the cost of denying performance afforded by the standard to compliant code.
C is regarded as an "unsafe language". This is, in the strictest sense, not true. The C standard does not require an implementation to check for several errors, but it also does not prevent an implementation from doing so. Hence, each implementation may choose the level of safety guaranteed.
In practice, C is an unsafe language because the most popular implementations of C choose not to make many additional guarantees, but instead choose to prioritize performance and power efficiency. As such, C is perceived as a de facto unsafe language because that is how most users have chosen to use it.
There are safer implementations, but these are predominantly used to detect issues during development rather than to add additional protections to deployment. One such implementation is [Valgrind](https://valgrind.org/), whose default tool "memcheck" detects out-of-bounds reads and writes to memory on the heap, as well as uninitialized reads, use-after-free errors, and memory leaks. Valgrind achieves these safety constraints at a significant performance cost. Many different implementations such as GCC, LLVM and MSVS offer various tools for detecting and diagnosing UB. Several static analyzers also exist to alleviate this problem.
Users can also write their own memory tracking shims to detect small out-of-bounds writes, double frees, memory consumption and memory leaks, using
macros:
#define malloc (n ) my_debug_mem_malloc (n , __FILE__ , __LINE__ ) /* Replaces malloc. */
#define free (n ) my_debug_mem_free (n , __FILE__ , __LINE__ ) /* Replaces free. */
While not in any way mandated by the C specification, the prevailing modus operandi of C users consists of using safety-related tools to detect issues during development, rather than as backstops during deployment. A major drawback of this approach is that since UB is a state that often cannot be definitively detected until it occurs at run time, there is no easy way to definitively guarantee that a program will not enter a state of UB.
Despite this, it is worth noting that some of the most trusted software in the world, like the Linux kernel, Apache, MySQL, Curl, OpenSSL and Git are written in C. The simplicity of C makes it significantly easier to read and detect issues.
C does suffer when the standard is unclear, particularly in areas of the memory model and concurrent execution. Rules about aliasing, active type, thread safety, and volatile leaves a lot open to interpretation as to what is UB, and what is not. On many of these issues there is a lack of consensus within WG14. Most implementations do support behaviors that in the strictest reading of the standard would be considered UB simply because of user expectation, and to be able to compile important existing software. In this sense most implementation deviate from the standard, but how and how much they deviate varies. Some projects like the Linux Kernel, has explicitly opted out of these ambiguities and defined their own requirements.
As this document has hopefully illustrated, Undefined Behavior in the context of C is complex. To simply say that its behavior has been omitted from the standard does not convey this complexity.
C is designed to be a language that trusts the developer. In the case of UB, developers should interpret this to mean "Trust the developer not to initiate UB", rather than "The developer can trust UB if they know the underlying implementation and platform". The Undefined Behavior Study Group therefore strongly advises developers to avoid any UB, unless a platform has explicitly defined that behavior. Testing to determine what observable effect use of a nonportable or erroneous program construct has on your platform is insufficient cause for assuming the UB will consistently have the same behavior on all platforms, including the next one that your code will run on. Only trust an implementation's explicit documentation of a language extension that defines a behavior. We advise that implementations clearly document any language extensions that replace undefined behavior so that users can differentiate between such extensions and seemingly predictable but still unintended behavior.
A computer language is a tool for humans to communicate with computers, but it is also a tool for computers to communicate with humans. Humans spend more time reading the code they write and trying to figure out why its behavior does not match their expectations, than computers do. Traditionally implementations have been black boxes that users must rely on, without understanding how they operate. UB shows that this approach causes issues, because modern compilers do not operate like many users expect them to. We would therefore recommend that implementations try to find ways to be more transparent with their transformations. The ability for users to inspect code that has been transformed could reveal out-of-order issues, code removal, load/store omissions and other non-obvious transformations. We recognize that this involves significant user interface and architectural challenges.
This Document was written by Eskil Steenberg Hald. This document is the result of many invaluable discussions in the Undefined Behavior Study Group and ISO WG14, so many of its members deserves credit for its creation. Specifically the author wants to thank David Svoboda, Chris Bazley, and Martin Uecker for providing feedback, editing, and suggesting improvements.
if, for, while, do, goto, break, continue and return, are all dependable.
However, there are limits to what you can do inside a if, for or while statement. C99 allows for the declaring of variables in the first statement:
for (int i = 0 ; i < 10 ; i ++)
This is not legal in C89 and it there for not always dependable. The ability to declare variables in C89 does to extend to other flow control. All of these are illegal:
for (i = 0 ; int x = i < 10 ; i ++)
if (int x = i < 10 )
while (int x = i < 10 )
switch (int x = i < 10 )
for loops are often explained as equivalent to while loops in like this:
for (<statement0 >; <statement1 >; <statement2 >)
{
...
}
Is equivalent to:
<statement0 >
while (<statement1 >)
{
...
<statement2 >
}
This is true in C++ but not C, because statement0 can not define a type inside a for loop in C, but can be defined before a loop.
auto is an obscure keyword that indicates that a variable is of "automatic storage duration". That is the default qualifier for variables inside functions scopes, and auto cannot be used on variables outside functions scopes (although some c compilers allow for it.)
Unfortunately auto has gone from pointless to dangerous in C23. In C23 auto was given a new meaning as a means to automatically assign the type of a variable using assignment:
auto x = 0.0 ;
In C23 the above code will make x a variable of type double since 0.0 is a double. This feature is not dependable. In fact if you write the above in older versions of C, you don not need to specify a type at all, and will then be given a variable of type int per default. The above in C89 would make x an int. The default type was deprecated with C99, but almost all compilers support it with a warning. This means that very recent compilers will compile this and an int and even more recent compilers (supporting C23) will no longer warn against this.
Therefore any use of the keyword auto should be considered not dependable.
Any keyword starting with a _ (underscore) is reserved and should not be used.
The following keywords are used by later C versions and should therefore be avoided:
- true
- false
- null
- alignas
- alignof
- bool
- constexpr
- inline
- nullptr
- static_assert
- thread_local
- typeof
- type_unequal
Many implementations support extensions that reserve the following keywords:
- asm
- fortran
The following is a list of keywords used by C++. While they can be used in Dependable C, if possible they are best avoided for clarity.
- and
- and_eq
- atomic_cancel
- atomic_commit
- atomic_noexcept
- bitand
- bitor
- catchclass
- compl
- concept
- consteval
- constexpr
- constinit
- const_cast
- contract_assert
- co_await
- co_return
- co_yield
- decltype
- delete
- dynamic_cast
- explicit
- export
- friend
- mutable
- namespace
- new
- noexcept
- not
- not_eq
- operator
- or
- or_eq
- private
- protected
- public
- reflexpr
- reinterpret_cast
- requires
- static_cast
- synchronized
- template
- this
- throw
- try
- typeid
- typename
- using
- virtual
- xor
Identifiers with special meaning:
- final
- override
- transaction_safe
- transaction_safe_dynamic
- import
- module
- pre
- post
- trivially_relocatable_if_eligible
- replaceable_if_eligible
All modern architectures are converging on little endian. Little endian is simply objectively (yet unintuitively) better. We recommend that applications are written with a "little endian first" design. In other words, files, network protocols, and other digital representations should be little endian, because most hardware is little endian. Then on top of this, you can add code that swizzle other representations in to non little endianness.
Endianness is complicated by the fact that endianness can be done in many different ways. Some platforms swap every 16 or 32 bits, So ABCDEFGH Can be ordered as: HGFEDCBA, DCBAHGFE, or BADCFEHG.
return ((data << 24 ) & 0xFF000000 ) |
((data << 8 ) & 0x00FF0000 ) |
((data >> 8 ) & 0x0000FF00 ) |
((data >> 24 ) & 0x000000FF );
Why little endian is better: Read the number 1337. "One thousand, three hundred and thirty seven". The first number is one, but we can't just say one, we also have to count the following numbers in order to know that its one thousand, and not one hundred, ten or some other number starting with a one. If we instead read from right to left, we could read "Seven thirty three hundred and one thousand", we could immediately know that the first numeral is seven and means seven, not seventy, or seven thousand without parsing the rest of the number. Read from right to left, the first numeral always means the same thing, disregarding of how many number follow it. Nice, we have read the first number we can read the second, and we know what order of magnitude this number has because its one order of magnitude higher then the last one. This problem becomes very obvious when humans start reading very large numbers, where we have to count the numerals before we can pronounce them. (It once hit me that, Arabic is written from right to left, and since we use Arabic numerals, perhaps this is just a flaw in the Latin adoption of the Arabic numbers, we simply read them in the wrong order, because we are used to read from left to right, but no, in Arabic, the numerals are also read and pronounced from left to right, so they are equally wrong as we are.)
Both float and double can be considered dependable and are well supported in most C implementations. There are however a few things to consider.
Some small embedded platforms do not have FPUs and may choose to software emulate or not support floating point operations at all, or may only support 32 floats. So in some cases it can be useful to avoid needlessly using floating point types. If you for instance implement a small library to implement a file format, network protocol or compression, you can avoid floating point instrumentation for benchmarking, if the library is otherwise free from floating point operations, to make the library more portable.
While pretty much all floating point implementations use IEEE 754 standard representation of floating point numbers, its worth pointing out that different hardware implementations (sometimes from the same vendor) will yield different results due to how the various implementations handle rounding. This means that executing the exact same instructions, with the exact same input data can yield different results on two different machines, running the same executable. This means that floats are not reliable for lockstep synchronizations.
Floating point arithmetic should never be relied upon to get accurate result that can be compared with other values. Here are some examples where x and y may not be equal:
x = (y * 2.0 ) / 2.0 ;
x = a / 2.0 ;
y = a / 2.0 ;
The compiler may fold some operations at compile time using one implementation, while other operations are not folded and gets computed at execution time using a different implementation.
As a general rule, it is only safe == compare floating points values that are assigned, not values that have been computed, or to compare floating point values to themselves in order to detect a NaN state.
x = 6.0 ;
...
if (x == 6.0 ) /* safe to test */
x = 6.0 / 2 ;
...
if (x == 3.0 ) /* not safe to test */
x = 1.0 / y ;
if (x == x ) /* a safe way to test if x is a NaN */