Lesson 24. Phantom errors

兰亭文艺 2022-05-22 发布于加拿大

展开全文

Lesson 24. Phantom errors

Jan 24 2012

We have finished studying the patterns of 64-bit errors and the last thing we will speak about, concerning these errors, is in what ways they may occur in programs.

The point is that it is not so easy to show you by an example, as in the following code sample, that the 64-bit code will cause an error when 'N' takes large values:

size_t N = ...
for (int i = 0; i != N; ++i)
{
   ...
}

You may try such a simple sample and see that it works. What matters is the way the optimizing compiler will build the code. It depends upon the size of the loop's body if the code will work or not. In examples it is always small and 64-bit registers may be used for counters. In real programs with large loop bodies an error easily occurs when the compiler saves the value of 'i' variable in memory. And now let us make it out what the incomprehensible text you have just read means.

When describing the errors, we often used the term 'a potential error' or the phrase 'an error may occur'. In general, it is explained by the fact that one and the same code may be considered both correct and incorrect depending upon its purpose. Here is a simple example - using a variable of 'int' type to index array items. If we address an array of graphics windows with this variable, everything is okay. We do not need to, or, rather, simply cannot work with billions of windows. But when we use a variable of 'int' type to index array items in 64-bit mathematical programs or databases, we may encounter troubles when the number of the items excesses the range 0..INT_MAX.

But there is one more, subtler, reason for calling the errors 'potential': whether an error reveals itself or not depends not only upon the input data but the mood of the compiler's optimizer. Most of the errors we have considered in our lessons easily reveal themselves in debug-versions and remain 'potential' in release-versions. But not every program built in the debug mode can be debugged at large data amounts. There might be a case when the debug-version is tested only at small data sets while the exhaustive testing and final user testing at real data are performed in the release-version where the errors may stay hidden.

We encountered the specifics of optimizing Visual C++ 2005 compiler for the first time when preparing the program OmniSample. This is a project included into the PVS-Studio distribution kit which is intended for demonstrating all the errors diagnosed by Viva64 analyzer. The samples included into this project must work correctly in the 32-bit mode and cause errors in the 64-bit mode. Everything was alright in the debug-version but the release-version caused some troubles. The code that must have hung or led to a crash in the 64-bit mode worked! The reason lay in optimization. The way out was found in excessive complication of the samples' codes with additional constructs and adding the key words 'volatile' that you may see in the code of the project OmniSample.

The same is with Visual C++ 2008/2010. Of course the code will be a bit different but everything that we will write here may be applied both to Visual C++ 2005 and Visual C++ 2008/2010.

If you find it quite good when some errors do not reveal themselves, put this idea out of your head. Code with such errors becomes very unstable. Any subtle change not even related to the error directly may cause changes in the program behavior. I want to point it out just in case that it is not the compiler's fault - the reason is in the hidden code defects. Further we will show you some samples with phantom errors that disappear and appear again with subtle code changes in release-versions and hunt for which might be very long and tiresome.

Consider the first code sample that works in the release-version although it must not:

int index = 0;
size_t arraySize = ...;
for (size_t i = 0; i != arraySize; i++)
  array[index++] = BYTE(i);

This code correctly fills the whole array with values even if the array's size is much larger than INT_MAX. It is impossible theoretically because the variable index has 'int' type. Some time later an overflow must lead to accessing the items by a negative index. But optimization gives us the following code:

0000000140001040  mov         byte ptr [rcx+rax],cl 
0000000140001043  add         rcx,1 
0000000140001047  cmp         rcx,rbx 
000000014000104A  jne         wmain+40h (140001040h)

As you may see, 64-bit registers are used and there is no overflow. But let us make a slightest alteration of the code:

int index = 0;
size_t arraySize = ...;
for (size_t i = 0; i != arraySize; i++)
{
  array[index] = BYTE(index);
  ++index;
}

Suppose the code looks nicer this way. I think you will agree that it remains the same from the viewpoint of the functionality. But the result will be quite different - a program crash. Consider the code generated by the compiler:

0000000140001040  movsxd      rcx,r8d 
0000000140001043  mov         byte ptr [rcx+rbx],r8b 
0000000140001047  add         r8d,1 
000000014000104B  sub         rax,1 
000000014000104F  jne         wmain+40h (140001040h)

It is that very overflow that must have been in the previous example. The value of the register r8d = 0x80000000 is extended in rcx as 0xffffffff80000000. The result is the writing outside the array.

Here is another example of optimization and how easy it is to spoil everything:

unsigned index = 0;
for (size_t i = 0; i != arraySize; ++i) {
  array[index++] = 1;
  if (array[i] != 1) {
    printf('Error\n');
    break;
  }
}

This is the assembler code:

0000000140001040  mov         byte ptr [rdx],1 
0000000140001043  add         rdx,1 
0000000140001047  cmp         byte ptr [rcx+rax],1 
000000014000104B  jne         wmain+58h (140001058h) 
000000014000104D  add         rcx,1 
0000000140001051  cmp         rcx,rdi 
0000000140001054  jne         wmain+40h (140001040h)

The compiler has decided to use the 64-bit register rdx to store the variable index. As a result, the code can correctly process an array with a size more than UINT_MAX.

But the peace is fragile. Just make the code a bit more complex and it will become incorrect:

volatile unsigned volatileVar = 1;
...
unsigned index = 0;
for (size_t i = 0; i != arraySize; ++i) {
  array[index] = 1;
  index += volatileVar;
  if (array[i] != 1) {
    printf('Error\n');
    break;
  }
}

The result of using the expression 'index += volatileVar;' instead of 'index++' is that 32-bit registers start participating in the code and cause the overflows:

0000000140001040  mov    ecx,r8d 
0000000140001043  add    r8d,dword ptr [volatileVar (140003020h)] 
000000014000104A  mov    byte ptr [rcx+rax],1 
000000014000104E  cmp    byte ptr [rdx+rax],1 
0000000140001052  jne    wmain+5Fh (14000105Fh) 
0000000140001054  add    rdx,1 
0000000140001058  cmp    rdx,rdi 
000000014000105B  jne    wmain+40h (140001040h)

In the end let us consider an interesting but large example. Unfortunately, we cannot make it shorter because we need to preserve the necessary behavior to show you. It is the impossibility to predict what a slight change in the code might lead to why these errors are especially dangerous.

ptrdiff_t UnsafeCalcIndex(int x, int y, int width) {
  int result = x + y * width;
  return result;
}
...
int domainWidth = 50000;
int domainHeght = 50000;
for (int x = 0; x != domainWidth; ++x)
  for (int y = 0; y != domainHeght; ++y)
    array[UnsafeCalcIndex(x, y, domainWidth)] = 1;

This code cannot fill the array consisting of 50000*50000 items correctly. It cannot do so because an overflow must occur when calculating the expression 'int result = x + y * width;'.

Thanks to a miracle, the array is filled correctly in the release-version. The function UnsafeCalcIndex is integrated into the loop where 64-bit registers are used:

0000000140001052  test        rsi,rsi 
0000000140001055  je          wmain+6Ch (14000106Ch) 
0000000140001057  lea         rcx,[r9+rax] 
000000014000105B  mov         rdx,rsi 
000000014000105E  xchg        ax,ax 
0000000140001060  mov         byte ptr [rcx],1 
0000000140001063  add         rcx,rbx 
0000000140001066  sub         rdx,1 
000000014000106A  jne         wmain+60h (140001060h) 
000000014000106C  add         r9,1 
0000000140001070  cmp         r9,rbx 
0000000140001073  jne         wmain+52h (140001052h)

All this happened because the function UnsafeCalcIndex is simple and can be easily integrated. But when you make it a bit more complex or the compiler supposes that it should not be integrated, an error will occur that will reveal itself at large data amounts.

Let us modify (complicate) the function UnsafeCalcIndex a bit. Note that the function's logic has not been changed in the least:

ptrdiff_t UnsafeCalcIndex(int x, int y, int width) {
  int result = 0;
  if (width != 0)
    result = y * width;
  return result + x;
}

The result is a crash, when an access outside the array is performed:

0000000140001050  test        esi,esi 
0000000140001052  je          wmain+7Ah (14000107Ah) 
0000000140001054  mov         r8d,ecx 
0000000140001057  mov         r9d,esi 
000000014000105A  xchg        ax,ax 
000000014000105D  xchg        ax,ax 
0000000140001060  mov         eax,ecx 
0000000140001062  test        ebx,ebx 
0000000140001064  cmovne      eax,r8d 
0000000140001068  add         r8d,ebx 
000000014000106B  cdqe             
000000014000106D  add         rax,rdx 
0000000140001070  sub         r9,1 
0000000140001074  mov         byte ptr [rax+rdi],1 
0000000140001078  jne         wmain+60h (140001060h) 
000000014000107A  add         rdx,1 
000000014000107E  cmp         rdx,r12 
0000000140001081  jne         wmain+50h (140001050h)

I hope we have managed to show you how a 64-bit program that works might easily stop doing that after adding harmless corrections into it or building it with a different version of the compiler.

You will also understand some strange things and peculiarities of the code in OmniSample project which are made specially to demonstrate an error in simple examples even in the code optimization mode.

The course authors: Andrey Karpov (karpov@), Evgeniy Ryzhkov (evg@).

The rightholder of the course 'Lessons on development of 64-bit C/C++ applications' is OOO 'Program Verification Systems'. The company develops software in the sphere of source program code analysis. The company's site: http://www..