Lesson 13. Pattern 5. Address arithmetic
Jan 24 2012
We have chosen the 13-th lesson to discuss the errors related to address arithmetic deliberately. The errors related to pointer arithmetic in 64-bit systems are the most insidious and it would be good that number 13 made you more attentive. The main idea of the pattern is - use only memsize-types in address arithmetic to avoid errors in 64-bit code. Consider this code: unsigned short a16, b16, c16;
char *pointer;
...
pointer += a16 * b16 * c16; This sample works correctly with pointers if the result of the expression 'a16 * b16 * c16' does not exceed INT_MAX (2147483647). This code could always work correctly on a 32-bit platform, because on the 32-bit architecture a program does not have so much memory to create an array of such a size. On the 64-bit architecture, this limitation has been removed and the size of the array may well get larger than INT_MAX items. Suppose we want to shift the value of the pointer in 6.000.000.000 bytes, so the variables a16, b16 and c16 have the values 3000, 2000 and 1000 respectively. When calculating the expression 'a16 * b16 * c16', all the variables will be cast to 'int' type at first, according to C++ rules, and only then they will be multiplied. An overflow will occur during the multiplication. The incorrect result will be extended to the type ptrdiff_t and the pointer will be calculated incorrectly. You should be very attentive and avoid possible overflows when dealing with pointer arithmetic. It is good to use memsize-types or explicit type conversions in those expressions that contain pointers. Using an explicit type conversion we may rewrite our code sample in the following way:
If you think that inaccurately written programs encounter troubles only when dealing with large data amounts, we have to disappoint you. Consider an interesting code sample working with an array that contains just 5 items. This code works in the 32-bit version and does not work in the 64-bit one: int A = -2;
unsigned B = 1;
int array[5] = { 1, 2, 3, 4, 5 };
int *ptr = array + 3;
ptr = ptr + (A + B); //Invalid pointer value on 64-bit platform
printf('%i\n', *ptr); //Access violation on 64-bit platform Let us follow the algorithm of calculating the expression 'ptr + (A + B)':
The result of this process depends upon the size of the pointer on a particular architecture. If the addition takes place in the 32-bit program, the expression is equivalent to 'ptr - 1' and the program successfully prints the value '3'. In the 64-bit program, the value 0xFFFFFFFFu is fairly added to the pointer. As a result, the pointer gets far outside the array while we encounter some troubles when trying to get access to the item by this pointer. Like in the first case, we recommend you to use only memsize-types in pointer arithmetic to avoid the situation described above. Here are two ways to correct the code:
You may argue and propose this way: int A = -2;
int B = 1;
...
ptr = ptr + (A + B); Yes, this code can work but it is bad due to some reasons:
Array indexingWe single out this type of errors to make our description more structured because array indexing with the use of square brackets is just another way of writing the address arithmetic we have discussed above. You may encounter errors related to indexing large arrays or eternal loops in programs that process large amounts of data. The following example contains 2 errors at once:
The first error lies in the fact that an eternal loop may occur if the size of the processed data exceeds 4 Gbytes (0xFFFFFFFF), because the variable 'i' has 'unsigned' type and will never reach a value larger than 0xFFFFFFFF. It is possible but not certain - it depends upon the code the compiler will build. For example, there will be no eternal loop in the debug mode while it will completely disappear in the release version, because the compiler will decide to optimize the code using the 64-bit register for the counter and the loop will become correct. All this adds confusion and a code that was good yesterday stops working today. The second error is related to negative values of the indexes serving to walk the array from end to beginning. This code works in the 32-bit mode but crashes in the 64-bit one right with the first iteration of the loop as an access outside the array's bounds occurs. Let us consider the cause of this behavior. Although everything written below is the same as in the example with 'ptr = ptr + (A + B)', we resort to this repetition deliberately. We need to show you that a danger may hide even in simple constructs and take various forms. According to C++ rules, the expression '-i - one' will be calculated on a 32-bit system in the following way (i = 0 at the first step):
On a 32-bit system, calling an array by the index 0xFFFFFFFFu is equivalent to using the index '-1'. I.e. end[0xFFFFFFFFu] is analogous to end[-1]. As a result, the array's item is processed correctly. But the picture will be different in a 64-bit system: the type 'unsigned' will be extended to the signed 'ptrdiff_t' and the array's index will equal 0x00000000FFFFFFFFi64. It results in an overflow. To correct the code you need to use such types as ptrdiff_t and size_t. To completely convince you that you should use only memsize-types for indexing and in address arithmetic expressions, here is the code sample for you to consider. class Region {
float *array;
int Width, Height, Depth;
float Region::GetCell(int x, int y, int z) const;
...
};
float Region::GetCell(int x, int y, int z) const {
return array[x + y * Width + z * Width * Height];
} This code is taken from a real program of mathematical modeling where the amount of memory is the most important resource, so the capability of using more than 4 Gbytes on a 64-bit architecture significantly increases the computational power. Programmers often use one-dimensional arrays in programs like this to save memory while treating them as three-dimensional arrays. For this purpose, they use functions analogous to GetCell which provide access to the necessary items. But the code above will work correctly only with arrays that contain less than INT_MAX items because it is 32-bit 'int' types that are used to calculate the item's index. Programmers often make a mistake trying to correct the code in this way:
They know that, according to C++ rules, the expression to calculate the index has the type 'ptrdiff_t' and hope to avoid an overflow thereby. But the overflow may occur inside the expression 'y * Width' or 'z * Width * Height' because it is still the type 'int' which is used to calculate them. If you want to correct the code without changing the types of the variables participating in the expression, you may explicitly convert each variable to a memsize-type: float Region::GetCell(int x, int y, int z) const {
return array[ptrdiff_t(x) +
ptrdiff_t(y) * ptrdiff_t(Width) +
ptrdiff_t(z) * ptrdiff_t(Width) *
ptrdiff_t(Height)];
} Another - better - solution is to change the types of the variables to a memsize-type:
DiagnosisAddress arithmetic errors are well diagnosed by PVS-Studio tool. The analyzer warns you about potentially dangerous expressions with the diagnostic warnings V102 and V108. When possible, the analyzer tries to understand when a non-memsize type used in address arithmetic is safe and refuse from generating a warning on this fragment. As a result, the analyzer's behavior may seem strange. In such cases we ask users to take their time and examine the situation. Consider the following code: char Arr[] = { '0', '1', '2', '3', '4' };
char *p = Arr + 2;
cout << p[0u + 1] << endl;
cout << p[0u - 1] << endl; //V108 This code works correctly in the 32-bit mode and prints numbers 3 and 1 on the screen. On testing this code we get a warning only on one string with the expression 'p[0u - 1]'. And this warning is quite right! If you compile and launch this code sample in the 64-bit mode, you will see the value 3 printed on the screen and the program will crash right after it. If you are sure that the indexing is correct, you may change the corresponding parameter of the analyzer on the settings tab Settings: General or use filters. You may also use an explicit type conversion. The course authors: Andrey Karpov (karpov@), Evgeniy Ryzhkov (evg@). The rightholder of the course 'Lessons on development of 64-bit C/C++ applications' is OOO 'Program Verification Systems'. The company develops software in the sphere of source program code analysis. The company's site: http://www.. |
|