Understanding and Preventing Overflow (I Had Too Much to Add Last Night)Posted by Jason Sachs on Dec 4 2013Happy Thanksgiving! Maybe the memory of eating too much turkey is fresh in your mind. If so, this would be a good time to talk about overflow. In the world of floating-point arithmetic, overflow is possible but not particularly common. You can get it when numbers become too large; IEEE double-precision floating-point numbers support a range of just under 21024, and if you go beyond that you have problems: 2^10.0000 = 1024 2^100.0000 = 1.26765e+30 2^1000.0000 = 1.07151e+301 2^1020.0000 = 1.12356e+307 2^1023.0000 = 8.98847e+307 2^1023.9000 = 1.67731e+308 2^1023.9999 = 1.79757e+308 2^1024.0000 ---> (34, 'Result too large') 21024 is a really, really big number. You probably won't ever need numbers like 21024 unless you're working with combinatorics. Just for comparison, here are some ranges of numbers found in physical quantities:
So double-precision numbers should be pretty safe. Single-precision floating-point numbers go up to just below 2128 and are "vulnerable" to overflow in very large physical calculations. It's more likely you wouldn't use them because of precision, rather than range. Those of us in the embedded systems world usually don't have the luxury of working with floating-point calculations. It costs extra, either in CPU execution time, or in $$$. We're resigned to using integer arithmetic. (Some of us actually like it!) So the rest of this article is about overflow in integer arithmetic. What follows may surprise you. Sources of overflowThe usual suspects: addition and multiplicationWhat can cause overflow in integer math? The usual suspects are addition and multiplication: Each integer format has a representable range:
And if you go outside this range, even temporarily, you need to be very careful. Most environments handle overflow gracefully, and give you "wraparound" or "modulo" behavior (32767 + 1 = -32768 in a signed 16-bit environment) where carry bits outside the range just disappear and you're left with the low-order bits corresponding to the exact result. If you're programming in C with a compiler that meets the C99 standard, you may be surprised to learn that unsigned integer math is guaranteed to have modulo behavior under overflow conditions, but the behavior of overflow with signed integer math is undefined. To would-be language lawyers, the relevant sections in C99 are: Section 6.5 item 5:
Section 6.2.5 item 9:
(In practice, modern compilers such as gcc and clang tend to use modulo behavior anyway for basic arithmetic calculations, but be careful, as sometimes compiler optimizations will treat comparisons or conditional expressions incorrectly even though the basic arithmetic is "correct". Symptoms of this behavior can be really tricky to recognize; the LLVM website has some explanation of this, along with other bugbears of undefined behavior. If you want to be safe, use the The historical reason for this little quirk in the C standard, is that in the olden days some processors used ones-complement representation for signed numbers, in which case arithmetic for signed and unsigned numbers is slightly different. Nowadays twos-complement is the norm, and the implementations of addition, subtraction, and multiplication are the same for signed and unsigned when using base word lengths for the results. But one of the main principles of C, for better or for worse, is to allow machine-specific behavior to maintain efficient code, so the standard was written with no guarantees for the results of signed arithmetic overflow. Languages like Java, on the other hand, have machine-independent definitions of arithmetic. Any arithmetic operation in Java will give you the same answer no matter which processor you are running it on. The price of this guarantee is that on some processors, extra instructions will be necessary. By the way, if you're using C for integer math, be sure to If you are using the fixed-point features of MATLAB, beware that the default behavior of integer overflow is to saturate at the limits of the integer range. This avoids some of the problems of overflow, but it may not give you the results you expect, if you're used to languages that provide wraparound semantics. Preventing overflowJust because we can use If I am controlling a valve, and I want the output to increase, and I keep adding 1 to a process control variable, so that I get 32765, 32766, 32767, -32768, -32767, etc. this creates a jump discontinuity which is BAD. The only machine-independent method of preventing this is to stay out of overflow. One approach is to upcast to a larger type size, check for overflow, and saturate the results: You could also do something like this to stay within the range a 16-bit type, but it gets a little tricky, uses more operations, and I'm not 100% confident I've got my code correct: Handling multiplication is similar, and is really only practical using the type-widening approach. Subtraction?There's a bug in the C code below; can you see why? The problem is with the subtraction. If First, we can use a 32-bit intermediate calculation:
This has the drawback that the more operations we add, the larger the range of output values becomes. But as long as we don't run into overflow in intermediate calculations, we're fine. Second, we can saturate the error to the range of an Third, we can modify the operation we're doing so the results are within limits:
This reduces the net gain but avoids both overflow and saturation. Division?Division is the ugly beast among the arithmetic operations. Fortunately, the overflow cases aren't too complicated. If you are working in C, where division means taking an N-bit number and dividing by another N-bit number, there's only one example that can cause overflow, which we'll get to a little bit later. (And it's not divide-by-zero. If you're dividing n / d without excluding the case of d=0 first, you deserve whatever you get.) Some processors have built-in division facilities, and compilers support these with "builtin" or "intrinsic" functions that directly map into the processors' division feature. Here you can do things like divide a 32-bit integer by a 16-bit integer to yield a 16-bit result, but you'll need to make sure you avoid dividing by a number that would create a quotient that won't fit in a 16-bit result. For example: 180000 / 2 = 90000, and 90000 is out of bounds for 16-bit numbers, both signed or unsigned. Shifts?Don't forget your shift operators
That's right! If you have a signed integer E1 containing a negative number, and you shift right, the results are implementation defined. The "sensible" thing to do is an arithmetic shift right, putting More recent languages like Java and Javascript give you two shift right operators: the regular Is that it? Whew!No, that's not it! What else is there? Well, there are the increment and decrement operators The rest of the overflow items fall into what might be called Pathological Cases. These all involve asymmetry between We talked about divide earlier, and the one integer divide overflow is when you take -32768 and divide by -1. D'OH! We should have gotten +32768 but that just won't fit into a 16-bit signed integer, so it aliases to the The same thing happens with unary minus: Then there are the fixed-point multiplication flaws along the same line. Let's say you're doing Q15 math, where integers represent numbers of the form Does this work properly without overflow? Well, it does, except for one single case. How do we deal with this?Q15 multiplicationWell, the Q15 multiplication is a tough one. If you are absolutely certain you won't ever evaluate -32768 × -32768, go ahead and use the usual C code as is. One example is PI control loops where the gains are always nonnegative. Or if you know the range of one of the numbers doesn't include -32768 then you're fine. Alternatively, if you're willing to have an extra shift right, you can do this: The shift-right-by-16 operation is usually faster in embedded systems than a shift by 15 by one instruction cycle, since it often maps to a "grab the high word" operation in assembly, which you usually get for free when storing memory. This operation represents Other than that, you need some way of saturating the intermediate product so it is safe to shift right: where Unary minusThe unary minus case is somewhat similar. We have a few alternatives: If we are absolutely sure that the input cannot be -32768, we are fine leaving it as is. Otherwise, we have to test for -32768 and convert to +32767. Alternatively, in C we can use the bitwise complement operator The same idea applies to absolute value; instead of
we can use the bitwise complement operator:
Alternatively, we'd have to special-case the test for -32768. Just a reminder: But wait, there's more!Almost forgot one more thing! Everything we've discussed so far has been a single operation, and has included cautionary tales of woe. With compound calculations that consist of multiple operations, there's one thing we can use in our favor: If you have a compound calculation in fixed bit-width integer arithmetic (whether signed or unsigned) with modulo (wraparound) semantics, consisting only of these operations
(and not sign-extension, right shifts, or division), and the following conditions are true:
then an amazing thing happens, which is that the final result in fixed bit-width integer arithmetic will still be correct. In fact, if we do provide saturation arithmetic on intermediate results, then whenever the saturation activates, we're likely to get a final result which is incorrect. This kind of situation sounds kind of contrived and improbable, but it's really not, and shows up a lot in polynomial evaluation. And it results from the congruence relation we get from modular arithmetic. If an arithmetic operation on k-bit numbers, from the list above, produces a k-bit result that may overflow with wraparound (modulo) semantics, it creates a result that is congruent to the true result, modulo Let's go over a more concrete situation. Suppose we are computing Okay, that's still a bit abstract. Let's suppose we're computing For Now let's compute f(x) for
A few comments here:
Notice earlier I excluded right-shift from the list of "safe" operations, and I didn't mention anything about truncation, or about using two k-bit numbers to produce a 2k-bit product. Here I'll have to handwave a bit. The part about right-shift that is unsafe has to do with sign-extension: we assume that we know the sign of a number based on the most significant bit that is stored. If overflow can be present, then we can't make any assumption about the sign of a number. So if you don't sign extend, or don't do anything with bits of sign extension, then you maintain congruence modulo 2^k. If we multiply two k-bit numbers to produce a 2k-bit product, on the other hand, this is not an inherently "safe" operation with respect to using it in an intermediate calculation. Let's use k=16, with a = 1 + 65536 * m and b = 1 + 65536 * n, where m and n are unknown integers that we don't really care about, because all we need to keep for intermediate results is the bottom 16 bits, right? If we multiply the real values of a and b, we get (1 + 65536 * (m+n) + 65536 * 65536 * m * n). The bottom 32 bits of this result is 1 + 65536 * (m+n). Uh oh. If we want a 32-bit result, we really do have to know more bits. If we only care about the bottom 16 bits of the product, then we get 1, and we're ok. So in this example we have to be careful; the "we don't care if intermediate results overflow" rule really only applies to steps 3, 6 and 7 ( I wish there were an easier way to explain this to beginners. I wish there were an easier way for experienced engineers to analyze this stuff, because it's kind of hard to be rigorous and make sure you get it right. Maybe someday someone will invent a kind of "overflow calculus" that is easy to use, where we can just turn the crank and not think, and it will tell us where we can have problems and how to fix them. Until then, stay watchful. Further ReadingJohn Regehr at the University of Utah has a pretty interesting blog about embedded systems and static analysis and other whatnots. He has a few articles about undefined behavior in C and C++. He's also one of the authors of an interesting article called "Understanding Integer Overflow in C/C++" which includes a sobering list of software programs (including PHP, SQLite, PostgreSQL, OpenSSL, Firefox, GCC, LLVM, and Python) for which potential integer overflows were discovered. Summary
Hope you all had a happy Thanksgiving, and remember, only you can prevent overflow! 2013 Jason M. Sachs, all rights reserved. Rate this article: ![]() ![]() ![]() ![]() ![]() Rating: 4.5 | Votes: 2
![]() Jason has 17 years of experience in signal conditioning (both analog + digital) in motion control + medical applications. He likes making things spin. Previous post by Jason Sachs: Signal Processing Contest in Python (PREVIEW): The Worst Encoder in the World all articles by Jason Sachs |
|
来自: rechardzy > 《dsprelated》