Frame pointer omission (FPO) optimization and consequences when debugging, part 1 « Nynaeve

astrotycoon 2019-01-18

展开全文

During the course of debugging programs, you’ve probably ran into the term “FPO” once or twice. FPO refers to a specific class of compiler optimizations that, on x86, deal with how the compiler accesses local variables and stack-based arguments.

With a function that uses local variables (and/or stack-based arguments), the compiler needs a mechanism to reference these values on the stack. Typically, this is done in one of two ways:

Access local variables directly from the stack pointer (esp). This is the behavior if FPO optimization is enabled. While this does not require a separate register to track the location of locals and arguments, as is needed if FPO optimization is disabled, it makes the generated code slightly more complicated. In particular, the displacement from esp of locals and arguments actually changes as the function is executed, due to things like function calls or other instructions that modify the stack. As a result, the compiler must keep track of the actual displacement from the current esp value at each location in a function where a stack-based value is referenced. This is typically not a big deal for a compiler to do, but in hand written assembler, this can get a bit tricky.
Dedicate a register to point to a fixed location on the stack relative to local variables and and stack-based arguments, and use this register to access locals and arguments. This is the behavior if FPO optimization is disabled. The convention is to use the ebp register to access locals and stack arguments. Ebp is typically setup such that the first stack argument can be found at [ebp+08], with local variables typically at a negative displacement from ebp.

A typical prologue for a function with FPO optimization disabled might look like this:

push   ebp               ; save away old ebp (nonvolatile)
mov    ebp, esp          ; load ebp with the stack pointer
sub    esp, sizeoflocals ; reserve space for locals
...                      ; rest of function

The main concept is that FPO optimization is disabled, a function will immediately save away ebp (as the first operation touching the stack), and then load ebp with the current stack pointer. This sets up a stack layout like so (relative to ebp):

[ebp-01]   Last byte of the last local variable
[ebp+00]   Old ebp value
[ebp+04]   Return address
[ebp+08]   First argument...

Thereafter, the function will always use ebp to access locals and stack based arguments. (The prologue of the function may vary a bit, especially with functions using a variation __SEH_prolog to setup an initial SEH frame, but the end result is always the same with respect to the stack layout relative to ebp.)

This does (as previously stated) make it so that the ebp register is not available for other uses to the register allocator. However, this performance hit is usually not enough to be a large concern relative to a function compiled with FPO optimization turned on. Furthermore, there are a number of conditions that require a function to use a frame pointer which you may hit anyway:

Any function using SEH must use a frame pointer, as when an exception occurs, there is no way to know the displacement of local variables from the esp value (stack pointer) at exception dispatching (the exception could have happened anywhere, and operations like making function calls or setting up stack arguments for a function call modify the value of esp).
Any function using automatic C++ objects with destructors must use SEH for compiler unwind support. This means that most C++ functions end up with FPO optimization disabled. (It is possible to change the compiler assumptions about SEH exceptions and C++ unwinding, but the default [and recommended setting] is to unwind objects when an SEH exception occurs.)
Any function using _alloca to dynamically allocate memory on the stack must use a frame pointer (and thus have FPO optimization disabled), as the displacement from esp for local variables and arguments can change at runtime and is not known to the compiler at compile time when code is being generated.

Because of these restrictions, many functions you may be writing will already have FPO optimization disabled, without you having explicitly turned it off. However, it is still likely that many of your functions that do not meet the above criteria have FPO optimization enabled, and thus do not use ebp to reference locals and stack arguments.

Now that you have a general idea of just what FPO optimization does, I’ll cover cover why it is to your advantage to turn off FPO optimization globally when debugging certain classes of problems in the second half of this series. (It is actually the case that most shipping Microsoft system code turns off FPO as well, so you can rest assured that a real cost benefit analysis has been done between FPO and non-FPO optimized code, and it is overall better to disable FPO optimization in the general case.)

Update: Pavel Lebedinsky points out that the C++ support for SEH exceptions is disabled by default for new projects in VS2005 (and that it is no longer the recommended setting). For most programs built prior to VS2005 and using the defaults at that time, though, the above statement about C++ destructors causing SEH to be used for a function (and thus requiring the use of a frame pointer) still applies.

This entry was posted on Tuesday, November 21st, 2006 at 2:16 pm and is filed under Debugging, Programming, Windows. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.