分享

4 Undefined Behaviors in C that You've Run into without Knowing It

 astrotycoon 2019-03-11

In many languages, when you are unsure of a particular detail of the language, you can often “just run it” and see what happens. This might work in another language, but in C this will almost certainly bite you. It’s too easy to to accidentally invoke “undefined behavior”, where your code might do one thing in one case, but something totally different in another, without even getting a warning from the compiler.

Here are a few undefined behaviors you might not know about, along with the relevant section from the C99 spec. These aren’t just pedantic ramblings; they’re all cases that I’ve encountered on real projects out in the wild.

1. Integer Division by -1

Pretty much ever C programmer knows they should avoid dividing by zero. But there is another case where division is undefined: INT64_MIN / -1 on 64bit machines, and INT32_MIN / -1 on 32bit machines.

Give it a try! :)

#include <stdio.h>
#include <stdint.h>
 
int main(void)
{
  /* Change these 64's to 32's if you're on a 32bit machine */
  int64_t result = INT64_MIN / -1;
  printf("The result: %l", result);
  return 0;
}

On most implementations, this will result in the same kind of error/exception as divide by zero. But remember, this is not an “error” it is “undefined behavior”! The runtime is just being polite when it throws the error. It really could do anything it wanted (return 0, exit silently, scream, make make demons fly out of your nose) and still be fully compliant with the C spec.

When integers are divided, the result of the / operator is the algebraic quotient with any fractional part discarded.105) If the quotient a/b is representable, the expression (a/b)*b + a%b shall equal a; otherwise, the behavior of both a/b and a%b is undefined.

  • Section 6.5.5.6 – Multiplicative operators

2. Upcasting Pointers

Casting a void * or uint8_t * to a uint32_t * or to a struct some_big_struct * is undefined.

Actually its only undefined if your void * or uint8_t * doesn’t have a stronger alignment than required for a uint32_t *. In this case, that would mean they would have to be divisible by 4.

Even though these casts are undefined, most C compilers will let you get away with them for most cases. But in certain cases at higher optimization levels, you’ll probably start seeing crashes. And they’ll be weird things, like “that function works great on even elements of an array, but crashes on the odd ones.”

What happens is that, because casting from a pointer of weaker alignment is undefined, the compiler will just trust that we are not doing that and use an instruction that is much faster, but requires stronger alignment. And then when you don’t pass in a properly aligned pointer, the CPU itself will throw an exception, and your program will probably crash. (Again this is all “undefined,” but this is just what happens in common implementations.)

In gcc and clang, there’s a command line option that will help point these types of errors out: -Wcast-align. It’s not included as part of -Wall or -Wextra.

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned68) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

  • Section 6.3.2.3.7 – Pointers

Note, that you don’t even have to dereference the pointer to stumble into undefined behavior. The actual conversion is undefined.

3. Using Uninitialized Variables

The usual assumption is that it’s only the value from an uninitialized variable that’s undefined. But actually just using the value from an uninitialized variable is undefined.

For example, given something like this:

#include <stdio.h>
#include <stdbool.h>
 
int main(void)
{
  bool var;
 
  if (var)
  {
    fputs("var is true!\n")
  }
  if (!var)
  {
    fputs("var is false!\n")
  }
  return 0;
}

On some compilers on some optimization levels, you can get the output:

    var is true!
    var is false!

There is an excellent breakdown of why you might get this sort of behavior here

Except when it is the operand of the sizeof operator, the _Alignof operator, the unary & operator, the ++ operator, the — operator, or the left operand of the . operator or an assignment operator, *an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue)*; this is called lvalue conversion. If the lvalue has qualified type, the value has the unqualified version of the type of the lvalue; additionally, if the lvalue has atomic type, the value has the non-atomic version of the type of the lvalue; otherwise, the value has the type of the lvalue. If the lvalue has an incomplete type and does not have array type, the behavior is undefined. If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), *and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined*.

  • Section 6.3.2.1.2 – Lvalues, arrays, and function designators

4. Dereferenceing a Null Pointer

People don’t usually think of dereferencing a null pointer as undefined behavior. They usually think of it as “causes a crash”. This is not always the case. For example, on my current project, if I dereference a null pointer, I just get the value stored in address 0. You don’t realize how awesome segfaults are until you work on a system that doesn’t have them.

The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.

  • Section 6.5.3.2.4 – Address and indirection operators

 

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多