分享
请选中您要保存的内容，粘贴到此文本框
Understanding and Preventing Overflow (I Had Too Much to Add Last Night)Posted by Jason Sachs on Dec 4 2013Happy Thanksgiving! Maybe the memory of eating too much turkey is fresh in your mind. If so, this would be a good time to talk about overflow. In the world of floatingpoint arithmetic, overflow is possible but not particularly common. You can get it when numbers become too large; IEEE doubleprecision floatingpoint numbers support a range of just under 2^{1024}, and if you go beyond that you have problems: 2^10.0000 = 1024 2^100.0000 = 1.26765e+30 2^1000.0000 = 1.07151e+301 2^1020.0000 = 1.12356e+307 2^1023.0000 = 8.98847e+307 2^1023.9000 = 1.67731e+308 2^1023.9999 = 1.79757e+308 2^1024.0000 > (34, 'Result too large') 2^{1024} is a really, really big number. You probably won't ever need numbers like 2^{1024} unless you're working with combinatorics. Just for comparison, here are some ranges of numbers found in physical quantities:
So doubleprecision numbers should be pretty safe. Singleprecision floatingpoint numbers go up to just below 2^{128} and are "vulnerable" to overflow in very large physical calculations. It's more likely you wouldn't use them because of precision, rather than range. Those of us in the embedded systems world usually don't have the luxury of working with floatingpoint calculations. It costs extra, either in CPU execution time, or in $$$. We're resigned to using integer arithmetic. (Some of us actually like it!) So the rest of this article is about overflow in integer arithmetic. What follows may surprise you. Sources of overflowThe usual suspects: addition and multiplicationWhat can cause overflow in integer math? The usual suspects are addition and multiplication: Each integer format has a representable range:
And if you go outside this range, even temporarily, you need to be very careful. Most environments handle overflow gracefully, and give you "wraparound" or "modulo" behavior (32767 + 1 = 32768 in a signed 16bit environment) where carry bits outside the range just disappear and you're left with the loworder bits corresponding to the exact result. If you're programming in C with a compiler that meets the C99 standard, you may be surprised to learn that unsigned integer math is guaranteed to have modulo behavior under overflow conditions, but the behavior of overflow with signed integer math is undefined. To wouldbe language lawyers, the relevant sections in C99 are: Section 6.5 item 5:
Section 6.2.5 item 9:
(In practice, modern compilers such as gcc and clang tend to use modulo behavior anyway for basic arithmetic calculations, but be careful, as sometimes compiler optimizations will treat comparisons or conditional expressions incorrectly even though the basic arithmetic is "correct". Symptoms of this behavior can be really tricky to recognize; the LLVM website has some explanation of this, along with other bugbears of undefined behavior. If you want to be safe, use the The historical reason for this little quirk in the C standard, is that in the olden days some processors used onescomplement representation for signed numbers, in which case arithmetic for signed and unsigned numbers is slightly different. Nowadays twoscomplement is the norm, and the implementations of addition, subtraction, and multiplication are the same for signed and unsigned when using base word lengths for the results. But one of the main principles of C, for better or for worse, is to allow machinespecific behavior to maintain efficient code, so the standard was written with no guarantees for the results of signed arithmetic overflow. Languages like Java, on the other hand, have machineindependent definitions of arithmetic. Any arithmetic operation in Java will give you the same answer no matter which processor you are running it on. The price of this guarantee is that on some processors, extra instructions will be necessary. By the way, if you're using C for integer math, be sure to If you are using the fixedpoint features of MATLAB, beware that the default behavior of integer overflow is to saturate at the limits of the integer range. This avoids some of the problems of overflow, but it may not give you the results you expect, if you're used to languages that provide wraparound semantics. Preventing overflowJust because we can use If I am controlling a valve, and I want the output to increase, and I keep adding 1 to a process control variable, so that I get 32765, 32766, 32767, 32768, 32767, etc. this creates a jump discontinuity which is BAD. The only machineindependent method of preventing this is to stay out of overflow. One approach is to upcast to a larger type size, check for overflow, and saturate the results: You could also do something like this to stay within the range a 16bit type, but it gets a little tricky, uses more operations, and I'm not 100% confident I've got my code correct: Handling multiplication is similar, and is really only practical using the typewidening approach. Subtraction?There's a bug in the C code below; can you see why? The problem is with the subtraction. If First, we can use a 32bit intermediate calculation:
This has the drawback that the more operations we add, the larger the range of output values becomes. But as long as we don't run into overflow in intermediate calculations, we're fine. Second, we can saturate the error to the range of an Third, we can modify the operation we're doing so the results are within limits:
This reduces the net gain but avoids both overflow and saturation. Division?Division is the ugly beast among the arithmetic operations. Fortunately, the overflow cases aren't too complicated. If you are working in C, where division means taking an Nbit number and dividing by another Nbit number, there's only one example that can cause overflow, which we'll get to a little bit later. (And it's not dividebyzero. If you're dividing n / d without excluding the case of d=0 first, you deserve whatever you get.) Some processors have builtin division facilities, and compilers support these with "builtin" or "intrinsic" functions that directly map into the processors' division feature. Here you can do things like divide a 32bit integer by a 16bit integer to yield a 16bit result, but you'll need to make sure you avoid dividing by a number that would create a quotient that won't fit in a 16bit result. For example: 180000 / 2 = 90000, and 90000 is out of bounds for 16bit numbers, both signed or unsigned. Shifts?Don't forget your shift operators
That's right! If you have a signed integer E1 containing a negative number, and you shift right, the results are implementation defined. The "sensible" thing to do is an arithmetic shift right, putting More recent languages like Java and Javascript give you two shift right operators: the regular Is that it? Whew!No, that's not it! What else is there? Well, there are the increment and decrement operators The rest of the overflow items fall into what might be called Pathological Cases. These all involve asymmetry between We talked about divide earlier, and the one integer divide overflow is when you take 32768 and divide by 1. D'OH! We should have gotten +32768 but that just won't fit into a 16bit signed integer, so it aliases to the The same thing happens with unary minus: Then there are the fixedpoint multiplication flaws along the same line. Let's say you're doing Q15 math, where integers represent numbers of the form Does this work properly without overflow? Well, it does, except for one single case. How do we deal with this?Q15 multiplicationWell, the Q15 multiplication is a tough one. If you are absolutely certain you won't ever evaluate 32768 × 32768, go ahead and use the usual C code as is. One example is PI control loops where the gains are always nonnegative. Or if you know the range of one of the numbers doesn't include 32768 then you're fine. Alternatively, if you're willing to have an extra shift right, you can do this: The shiftrightby16 operation is usually faster in embedded systems than a shift by 15 by one instruction cycle, since it often maps to a "grab the high word" operation in assembly, which you usually get for free when storing memory. This operation represents Other than that, you need some way of saturating the intermediate product so it is safe to shift right: where Unary minusThe unary minus case is somewhat similar. We have a few alternatives: If we are absolutely sure that the input cannot be 32768, we are fine leaving it as is. Otherwise, we have to test for 32768 and convert to +32767. Alternatively, in C we can use the bitwise complement operator The same idea applies to absolute value; instead of
we can use the bitwise complement operator:
Alternatively, we'd have to specialcase the test for 32768. Just a reminder: But wait, there's more!Almost forgot one more thing! Everything we've discussed so far has been a single operation, and has included cautionary tales of woe. With compound calculations that consist of multiple operations, there's one thing we can use in our favor: If you have a compound calculation in fixed bitwidth integer arithmetic (whether signed or unsigned) with modulo (wraparound) semantics, consisting only of these operations
(and not signextension, right shifts, or division), and the following conditions are true:
then an amazing thing happens, which is that the final result in fixed bitwidth integer arithmetic will still be correct. In fact, if we do provide saturation arithmetic on intermediate results, then whenever the saturation activates, we're likely to get a final result which is incorrect. This kind of situation sounds kind of contrived and improbable, but it's really not, and shows up a lot in polynomial evaluation. And it results from the congruence relation we get from modular arithmetic. If an arithmetic operation on kbit numbers, from the list above, produces a kbit result that may overflow with wraparound (modulo) semantics, it creates a result that is congruent to the true result, modulo Let's go over a more concrete situation. Suppose we are computing Okay, that's still a bit abstract. Let's suppose we're computing For Now let's compute f(x) for
A few comments here:
Notice earlier I excluded rightshift from the list of "safe" operations, and I didn't mention anything about truncation, or about using two kbit numbers to produce a 2kbit product. Here I'll have to handwave a bit. The part about rightshift that is unsafe has to do with signextension: we assume that we know the sign of a number based on the most significant bit that is stored. If overflow can be present, then we can't make any assumption about the sign of a number. So if you don't sign extend, or don't do anything with bits of sign extension, then you maintain congruence modulo 2^k. If we multiply two kbit numbers to produce a 2kbit product, on the other hand, this is not an inherently "safe" operation with respect to using it in an intermediate calculation. Let's use k=16, with a = 1 + 65536 * m and b = 1 + 65536 * n, where m and n are unknown integers that we don't really care about, because all we need to keep for intermediate results is the bottom 16 bits, right? If we multiply the real values of a and b, we get (1 + 65536 * (m+n) + 65536 * 65536 * m * n). The bottom 32 bits of this result is 1 + 65536 * (m+n). Uh oh. If we want a 32bit result, we really do have to know more bits. If we only care about the bottom 16 bits of the product, then we get 1, and we're ok. So in this example we have to be careful; the "we don't care if intermediate results overflow" rule really only applies to steps 3, 6 and 7 ( I wish there were an easier way to explain this to beginners. I wish there were an easier way for experienced engineers to analyze this stuff, because it's kind of hard to be rigorous and make sure you get it right. Maybe someday someone will invent a kind of "overflow calculus" that is easy to use, where we can just turn the crank and not think, and it will tell us where we can have problems and how to fix them. Until then, stay watchful. Further ReadingJohn Regehr at the University of Utah has a pretty interesting blog about embedded systems and static analysis and other whatnots. He has a few articles about undefined behavior in C and C++. He's also one of the authors of an interesting article called "Understanding Integer Overflow in C/C++" which includes a sobering list of software programs (including PHP, SQLite, PostgreSQL, OpenSSL, Firefox, GCC, LLVM, and Python) for which potential integer overflows were discovered. Summary
Hope you all had a happy Thanksgiving, and remember, only you can prevent overflow! 2013 Jason M. Sachs, all rights reserved. Rate this article: Rating: 4.5  Votes: 2
Jason has 17 years of experience in signal conditioning (both analog + digital) in motion control + medical applications. He likes making things spin. Previous post by Jason Sachs: Signal Processing Contest in Python (PREVIEW): The Worst Encoder in the World all articles by Jason Sachs 

来自： rechardzy > 《dsprelated》