C Virtual Functions :: Tobias Anderberg

astrotycoon 2020-07-29

展开全文

Virtual functions are at the heart of object-oriented programming and runtime polymorphism in C . Countless programmers rely on them for creating and operating intuitively on large class hierarchies. They are a vital part of the language. But how are they actually implemented by the compiler?

Their implementation details is a common C question. The usual answer involves the mention of a pointer to a table of functions. But what exactly does that table contain? What part of the implementation details are done at compile time and what is done at runtime? In this article I’ll take a closer look at what happens behind the scenes when virtual functions are involved.

It’s important to note that the C Standard does not specify how virtual functions should be implemented so it’s entirely up to each compiler how they solve it.

For reference, at the time of writing I’m using the following compiler¹ and architecture:

$ clang   --version
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.5.0

Let’s begin with some background.

Polymorphism

C effectively supports three types of polymorphism:

Function overload (compile time)
Templates (compile time)
Virtual functions (runtime)

Virtual functions allows for late binding of function calls based on object type. It comes into play when a derived object is addressed via a pointer or a reference to a base class. They effectively enable a inerhitable common interface with potentially overriden implementation in the derived classes.

In order to support this late binding of function calls the compiler needs to augment the qualifying objects with information so that the function calls will be possible at runtime. In order to understand this augmentation, let’s first look at how class object are represented in memory.

Object Memory Layout

Only nonstatic data members are part of an object. Member functions and static data members, despite being part of the class declaration, are “hosted” outside the object. The nonstatic data members are laid out in memory in the order of their declaration:

class Foo {
public:
    void SomeFunction();

private:
    static const int n { 42 };

    int p { 5 };
    int q { 7 };
};

Foo f;

f will be represented in memory as:

0:  --- 
   | p |
4:  --- 
   | q |
8:  ---

We can confirm this with a debugger:

(lldb) print sizeof(f)
(unsigned long) $4 = 8
(lldb) x/8b &f
0x7fff5fbffb70: 0x05 0x00 0x00 0x00 0x07 0x00 0x00 0x00
                 |                | |                 |
                 .................. ...................
                        p                   q

Indeed it’s only the nonstatic data members that contributes the the object size. Well, that and any compiler augmentation that may go into it - potential padding of the nonstatic data members, as well as the virtual pointer: vptr.

We can check for compiler added padding by inspecting the objects memory as before:

class Foo {
public:
   char c[3] { 0, 0, 0}; // 3 bytes
   int p { 5 };          // 4 bytes
};

Foo f;

(lldb) print sizeof(f)
(unsigned long) $0 = 8
(lldb) x/8b &f
0x7fff5fbffb60: 0x00 0x00 0x00 0x5f 0x05 0x00 0x00 0x00
                |            | |  | |                 |
                .............. .... ...................
                      c       padding       p

Here the compiler has added 1 byte of padding to align c on a 4-byte boundary.

Base class nonstatic data members are contained directly in the derived class object:

class Base {
public:
    int x { 3 };
};

class Derived : public Base {
public:
    int p { 5 };
    int q { 7 };
};

Derived d;

Again, we verify using a debugger:

(lldb) print sizeof(d)
(unsigned long) $0 = 12
(lldb) x/12b &d
0x7fff5fbffb60: 0x03 0x00 0x00 0x00 0x05 0x00 0x00 0x00
0x7fff5fbffb68: 0x07 0x00 0x00 0x00

Base class nonstatic data members are laid out in the derived object exactly as they are in the base class, including any padding:

class Base {
public:
    char c[3] { 0, 0, 0 };
    int x { 3 };
};

class Derived : public Base {
public:
    char d;
};

Derived d;

(lldb) print sizeof(d)
(unsigned long) $1 = 12
(lldb) x/12b &d
0x7fff5fbffb50: 0x00 0x00 0x00 0x00 0x03 0x00 0x00 0x00
                |            | |  | |                 |
                .............. .... ...................
                      c       padding       x

0x7fff5fbffb58: 0x00 0x00 0x00 0x00
                |  | |            |
                .... ..............
                 d      padding

Here we might expect that the size of a Derived object would be 8 bytes, the total size of the nonstatic data members in the two classes. However, Base has been padded with 1 byte to align c on a 4-byte boundary. This padding is carried over to the derived class. At this point the Derived object is now 9 bytes, which the compiler pads with an additional 3 bytes to align d on a 4-byte boundary. Hence the final size of 12 bytes, with effectively 4 bytes wasted due to alignment padding.

That may sound insignificant but imagine that Derived was instead a Particle in a particle system. Imagine further that there was 500,000 particles active in this system, then we’d be wasting 2 MB due to padding. 2 MB might not sound too bad either, but when you consider that the total memory usage in this case is 6 MB and you’re wasting 30% of that on padding you realise that these things adds up quickly.

Of course there’s a good reason for the compiler adding this padding - performance. The CPU’s load and store operations performs the best when it’s working with its “natural data size”, which is a word.²

Now, let’s see what happens when we add a virtual function:

class Foo {
public:
    virtual ~Foo();

    int p { 5 };  // 4 bytes
};

Foo f;

First let’s check the size of the object.

(lldb) print sizeof(f)
(unsigned long) $0 = 16

Interesting, 16 bytes yet we only have a 4-byte data member. This implies that the compiler has augmented our object. We can guess with what at this point, padding and a virtual pointer due to the virtual function being present. Let’s have a look at the object:

(lldb) x/16b &f
0x7fff5fbffb48: 0x30 0x10 0x00 0x00 0x01 0x00 0x00 0x00
                |                                     |
                .......................................
                            virtual pointer

0x7fff5fbffb50: 0x05 0x00 0x00 0x00 0xff 0x7f 0x00 0x00
                |                 | |                 |
                ................... ...................
                         p                padding

This memory dump also highlights an important fact; the compiler has inserted the vptr at the start of the object. Why? For performance reasons.

Let’s take a closer look at the virtual pointer and virtual table layout.

Virtual Pointer and Virtual Table

As soon as a class either derive from a virtual base class or has virtual functions either directly or from inheritance the compiler will synthesize a pointer into the class object. This is the virtual pointer, vptr, and it points to a virtual table, vtable. The compiler will add code to the constructor to initialize it, and to the destructor for deletion.

The virtual table contains the following:

Virtual function dispatch information
Offsets to virtual base class subobjets and top of table
Object Run-Time Type Information (RTTI)

The set of virtual functions you can invoke on an object is known at compile time and it’s invariant, meaning it can’t change during runtime. Thus the virtual table is set up during compilation. Each virtual function gets assigned a fixed position in the virtual table that remains the same throughout class inheritance.

The compiler will transform a virtual function call:

// Assuming SomeFunction() is a virtual functions, this call

ptr->SomeFunction();

// will be tranformed into something like this:

(*ptr->__vptr[n])(ptr)

Where n is the associated slot in the virtual table. Note how the pointer itself is passed as the first argument to the function; that corresponds to the this pointer.

The virtual table is ordered based on the function declaration order in the class. For example:

class Foo {
public:
    virtual ~Foo();
    virtual void SomeFunction();

    int p { 5 };
    int q { 5 };
};

Foo f;

// Virtual table for f (simplified):
[ 0 ] - ~Foo()
[ 1 ] - SomeFunction()

As always a debugger is our friend:

(lldb) x/4w &f
0x7fff5fbffb68: 0x00001030 0x00000001 0x00000005 0x00000005
                |                   | |        | |        |
                ..................... .......... ..........
                  virtual pointer         p          q

Let’s look at virtual table associated with f:

(lldb) x/5a 0x100001030
0x100001030: 0x0000000100000f00 a.out`Foo::~Foo() at foo.cc:8
0x100001038: 0x0000000100000f50 a.out`Foo::~Foo() at foo.cc:8
0x100001040: 0x0000000100000f80 a.out`Foo::SomeFunction() at foo.cc:9
0x100001048: 0x00007fff78b3bb48 vtable for __cxxabiv1::__class_type_info   16
0x100001050: 0x0000000100000fb0 a.out`typeinfo name for Foo

Oh, interesting. There’s two destructors in the virtual table. How come? It turns out that destructors come in pairs as a complete destructor and a deleting destructor. The first one destroys the object without calling delete on it, and the second calls deletes on the object after its destroyed.

The __cxxabiv1 shown in the table is a compiler internal namespace, and in clang’s case is where we find support for dynamic_cast.

However, where’s the RTTI stored? It’s actually stored at a negative index in the virtual table. So the virtual pointer and table setup actually looks like:

f:
   --------            ---------------  
  | __vptr |----      | offset_to_top | -2
   --------     |      --------------- 
  |   p    |    |     |      RTTI     | -1
   --------     |      --------------- 
  |   q    |     ---> |     ~Foo()    | 0
   --------            --------------- 
                      | SomeFunction  | 1
                       ---------------

Now that we have a good understanding of how objects are laid out in memory and what the virtual table looks like, let’s see how inheritance influences things.

Single Inheritance

The vptr and vtable behaves much as can be expected during single inheritance from a base class with virtual functions. The derived class’ virtual table contains either pointers to the base class functions, or it’s own if it has overriden them. If the derived class adds virtual functions of its own they will be added after the base class functions in the virtual table.

Simplified it looks like this:

class Base {
public:
    virtual ~Base();
    virtual void SomeFunction();
};

class Derived : public Base {
public:
    virtual ~Derived();
    virtual void AnotherFunction();
};

Derived d;

// Virtual table for d:
[ 0 ] - Derived::~Derived
[ 1 ] - Base::SomeFunction
[ 3 ] - Derived::AnotherFunction

// If 'd' had overriden SomeFunction() it would look like this:
[ 0 ] - Derived::~Derived
[ 1 ] - Derived::SomeFunction
[ 2 ] - Derived::AnotherFunction

This is pretty much as expected. It gets slightly more complicated with multiple inheritance.

Multiple Inheritance

Remeber how with single inheritance the base data member are contained at the start of the derived object? This effectively means that under single inheritance the Base and Derived part of the object points to the same memory.

This is not the case for subsequent base classes in mulitple inheritance. And therein lies the complexities - multiple inheritance requires patching the location of the this pointer, as well as the pointer addressed via subsequent base class subobjects. The vptr and vtable handling also gets more complicated - we’re going to have to store more information, and we’re going to end up with two or more virtual pointers!

Let’s first consider the base object pointer patching. Let’s say we have a class hierarchy like this:

class Base0 {
public:
    virtual ~Base0();
};

class Base1 {
public:
    virtual ~Base1();
};

class Derived : public Base0, public Base1 {
public:
    virtual ~Derived();
};

In memory a Derived object will be laid out like this:

Derived:
   ---------  0
  | Base0   |
   --------- 
  | Base1   |
   --------- 
  | Derived |
   ---------  n

That means we can easily do a conversion from Derived to Base0 because the start of Derived and the start of Base0 points to the same address:

Derived* d = new Derived;
Base0* b = d;

However, what happens if we want to assign a Derived object to a Base1 pointer which is not at the same address? The compiler will add an offset:

// For this:
Derived* d = new Derived;
Base1* b = d;

// the compiler will transform the code to (via vtable):
Base1* b = d   sizeof(Base0);

A similar patching process also happens on function calls where a base virtual function is called via a pointer to a derived object. This is the patching of the this pointer, and it’s handled by a thunk.

A thunk is a short code snippet that’s associated with a function. It is called before the function to do any pointer patching required before it transfers control to the actual function. For simplicity we can imagine it looks like this:

// vtable with thunk:
[ 0 ] - __function_thunk
[ 1 ] - function

// then for a virtual function call needing pointer adjustment:
ptr->function();

// becomes:
ptr->__function_thunk(ptr);

__function_thunk:
    ptr  = offset;
    function(ptr);

Both of these transformation happens at runtime because the type of object being addressed is not known at compile time.³

Finally let’s look at what happens in terms of the vptr and vtable setup by examining the multiple inheritance hierarchy defined above:

(lldb) p sizeof(d)
(unsigned long) $0 = 16
(lldb) x/4w &d
0x7fff5fbffb68: 0x00001040 0x00000001 0x00001060 0x00000001
                |                   | |                   |
                ..................... .....................
                    Base0 _vptr           Base1 _vptr

The derived class object ends up with a virtual table for each base class that has one. This set is made up of a primary virtual table and secondary virtual table. The secondary tables have the same content as the primary one, except that the RTTI is that of the derived class instead of the base.

It looks like this:

                          ---------------- 
                         | offset_to_top  |
Derived:                  ---------------- 
   -------------         |  Derived RTTI  |
  | _vptr_Base0 |---      ---------------- 
   -------------     --> | Base0 virtuals |
  |     ...     |         ---------------- 
   -------------         |      ...       |
  | _vptr_Base1 |---      ---------------- 
   -------------    |    | offset_to_top  |
  |     ...     |   |     ---------------- 
   -------------    |    |  Derived RTTI  |
                    |     ---------------- 
                     --> | Base1 virtuals |
                          ---------------- 
                         |      ...       |
                          ----------------

The reason we end up with multiple virtual pointers and tables is to support the object address adjustment mentioned above. If we pass a pointer-to-Derived object to a function taking a pointer-to-Base1 pointer we pass in an object whose address has been adjusted to start at _vptr_Base1. Thus any virtual function calls will map into the correct slot in the virtual table.

This is also the reason we end up with the same content in the virtual tables - for better runtime performance. If the entries wasn’t duplicated then more runtime pointer adjustments would have to take place. With this setup we just need one adjustment, and then call into virtual table as normal.

Finally, let’s take a look at virtual inheritance.

Virtual Inheritance

Let’s consider the simple case with only one virtual base class:

class Base {
public:
    ~Base();
    int p { 5 };
};

class Derived : virtual public Base {
public:
    ~Derived();
    int q { 7 };
};

Derived d;

As usual, let’s check the size and memory layout:

(lldb) p sizeof(d)
(unsigned long) $0 = 16
(lldb) x/4w &d
0x7fff5fbffb68: 0x00001028 0x00000001 0x00000007 0x00000005
                |                   | |        | |        |
                ..................... .......... ..........
                   virtual pointer        q          p

Oh, this is interesting. We immediately see that the virtual base nonstatic data members are laid out in memory after the derived class members. This is different from non-virtual inheritance where the base class members came first. This means that our base and derived doesn’t start on the same address, like it does for non-virtual inheritance. The virtual base subobject is contained directly in Derived, which makes sense since there can only be one copy of a virtual base subobject.

Let’s take a look at the virtual table:

(lldb) x/7a 0x0000000100001028
0x100001028: 0x0000000100001028 VTT for Derived
0x100001030: 0x00007fff7d76cb48 vtable for __cxxabiv1::__class_type_info   16
0x100001038: 0x0000000100000fa3 a.out`typeinfo name for Base
0x100001040: 0x00007fff7d76cc28 vtable for __cxxabiv1::__vmi_class_type_info   16
0x100001048: 0x0000000100000f9a a.out`typeinfo name for Derived
0x100001050: 0x0000000100000000 a.out`_mh_execute_header
0x100001058: 0x0000000100001030 typeinfo for Base

Curious, what’s this VTT for Derived? During object construction the object takes the form of the current class for whose constructor is being executed. So during the Base constructor execution the Dervied object we’re creating is of type Base. During this construction process the compiler needs to make sure that the virtual pointer points to the correct virtual table. This information is stored in the Virtual Table Table, or VTT, in the form of construction vtable as well as the non-construction virtual tables.

Finally let’s take a look at the classic diamond shaped inheritance graph:

class Root {
public:
    virtual ~Root();
    int a { 3 };
};

class Left : virtual public Root {
public:
    virtual ~Left();
    int b { 5 };
};

class Right : virtual public Root {
public:
    virtual ~Right();
    int c { 7 };
};

class Derived : public Left, public Right {
public:
    virtual ~Derived();
    int d { 9 };
};

Derived d;

Let’s inspect the memory layout:

(lldb) p sizeof(d)
(unsigned long) $0 = 40

(lldb) x/10w &d
0x7fff5fbffb20: 0x00001028 0x00000001 0x00000005 0x00000000
                |                   | |        | |        |
                ..................... .......... ..........
                    Left __vptr        Left: a     padding

0x7fff5fbffb30: 0x00001040 0x00000001 0x00000007 0x00000009
                |                   | |        | |        |
                ..................... .......... ..........
                    Right __vptr       Right: c   Derived: d

0x7fff5fbffb40: 0x00000003 0x00007fff
                |        | |        |
                .......... ..........
                 Root: a     padding

Here we again see that there’s only one virtual base object and that it’s contained directly in the Derived object. What does the virtual table look like in this instance? It’s similar to the one for regular multiple inheritance except we have another offset pointer, this time to the virtual base subject contained in Derived. The virtual table looks like this:

 --------------- 
| vbase_offset  |
 --------------- 
| offset_to_top |
 --------------- 
|     RTTI      |
 --------------- 
| Left entries  |
 --------------- 
|     ...       |
 --------------- 
| vbase_offset  |
 --------------- 
| offset_to_top |
 --------------- 
|     RTTI      |
 --------------- 
| Right entries |
 --------------- 
|     ...        
 --------------- 
| vbase_offset  |
 --------------- 
| offset_to_top |
 --------------- 
|     RTTI      |
 --------------- 
|Derived Entries|
 --------------- 
|     ...       |
 ---------------

That’s quite a lot of information. With this it’s also easy to see the potential memory overhead of supporting large inheritance graphs with virtual base classes, especially if there’s a lot of virtual functions.

Please note that for all the multiple inheritance examples in this article, if Derived had added any virtual functions of its own we would’ve gotten yet another virtual pointer and virtual table entries for that as well, as demonstrated by this last diagram. It’s handled in the same way as the the other virtual pointers.

Summary

Virtual functions are at the heart of designing intuitive class hierarchy interfaces. The implementation support for them is quite intuitive and allows for good runtime performance, at the cost of some memory overhead. When designing designing these class hierarchies its worth considering the object layout to minimize wasting memory due to padding for alignment.

While there is some runtime overhead for invoking virtual functions, don’t assume they are much more expensive than normal function calls without proper profiling.

A class has one or more virtual pointers and virtual tables when it has virtual functions, or if it has a virtual base class
The virtual pointer is initialised by the constructor
The virtual table is constructed during compilation
Each virtual function has a fixed index into the virtual table
There may be runtime base pointer offset and this pointer patching
There can only be one virtual base subobject and it’s contained directly in the most derived class object
The compiler pads objects for efficient load/store operations
During construction of a hierarchy with virtual base classe(s) the compiler makes use of virtual table table (VTT) to point vptr to correct vtable

See my article Embracing Compiler Errors For Fun And Profit for more details on recommended compiler flags to use. ↩︎
The size of a word is architecture dependent. ↩︎
Generally speaking this is the case. ↩︎