Eli Boling ? Dynamic Symbol Binding: Origins and Effects

astrotycoon 2015-07-09

展开全文

Introduction

Dynamic linking has been available on most operating systems for a long long time now. It is interesting, however, to peer into the origins and resulting behavior of some aspects of symbol binding on various systems. You might be surprised by the results.

To focus this discussion, what I’m talking about here is basic support for linking to symbols in shared libraries. These are DLLs on Windows, shared objects (.so) on Linux, and dynamic libraries (.dylib) on OS X. There are other platforms of course, but these are the ones that may affect our customers in the near term, so that’s what we’ll talk about today.

I’m not able to speak authoritatively on some of this topic. On the raw technical details, I can, because I’ve been down among the bits, but on some of the points of motivation and history I will be relating things that have been told to me by others in the now distant past. Some of these bits of information aren’t written down anyplace that I know of. Feel free to add to the pile of lore if you have personal knowledge.

The Primordial Ooze

Not long after the surface of the datascape cooled, and bubbling pools of ones and zeroes congealed into a solid crust, the language C appeared and populated the world at a ferocious pace. Linkers and librarians developed to support the ecosphere, and various traits and behaviors became standard. Some of this stuff survives to modern times, much like the way the pelican still plies our skies. [Off topic: once I was on the wharf in Santa Cruz, and a pelican walked up to me and yawned. That was impressive. You talk about birds having wing-span - those things have beak-span!]

The trait of interest in linkers for today’s article is order of symbol binding. Let’s set aside dynamic linking for a moment. When linking a static image, using classical C as an example, the user would specify a bunch of object files and some libraries, left to right:

a.o b.o c.o lib1.a lib2.a lib3.a

The linker would go through these, finding public symbols to satisfy external references and hooking things all together. Now, for any symbol ‘foo’, any of those objects or libraries could provide the public definition, and it is OK for multiple libraries to provide public definitions. The basic rule that the linker followed was that the first public symbol found in a left to right search is selected to satisfy external references. So, if lib1 and lib3 both provide public definitions of ‘foo’, and a.o refers to ‘foo’, then the one the linker will select is the one from lib1, because it is leftmost. Now, note that if lib1 and lib3 both define ‘foo’, and lib3 refers to ‘foo’, then the linker will still select the definition from lib1. This has allowed developers to override definitions of symbols by inserting a library with different definitions to the left of the original library in the link line. This behavior is very much driven as an aspect of the tools (specifically the static linker). It’s not really language specific, but it can end up freighting the tools for any language, oddly enough.

Windows: DLLs Are New!

Along comes Windows. Dynamic Link Libraries are a new concept, and enable developers to share code better. Actually the concept isn’t new - unix beat them to it by a long shot, but you can run rough-shod over history with good marketing. Now, the first tools for Windows were C based, and the classic left to right rules for the static linker still held true. However, DLLs don’t follow quite the same rules. Once you build a DLL, the linkages to that DLL, and within that DLL are much more specific. If the DLL wants to access ‘malloc’, it doesn’t just bind to any ‘malloc’, it binds to a local copy of ‘malloc’, if it binds the C RTL statically, or to a symbol called ‘malloc’ from a particular DLL.

Linux: Fanatics Unite!

Along comes Linux. Now these folks take things to extremes, I think. When they did shared library support, their attitude was that linking a shared library should be identical in behavior to linking a static library. So this means that if you link a shared library together, and that shared library binds to a symbol called ‘malloc’, you don’t know what actual public definition that this will bind to until runtime. You don’t even know which library that symbol might come from. That’s just the way it works on most platforms with classic C style compile/link phases with static linking. Now this is an interesting philosophy, and it’s got some interesting features that it supports, but it implies a lot for the tools that have to deal with the platform. It means, among other things, that nth party languages and tools have to be aware of this artifact of left to right ordering of symbol binding that comes from way back in some fragment of the pelican genome.

OS X: Umm, it’s like Windows

I have no idea what combination of history and philosophy went into the dynamic binding logic on the Mac, but for the specific attribute this article is interested in, the behavior is more like Windows. Originally, the behavior was a mix of linux and Windows, because when you bound to some external like ‘malloc’, you didn’t know what shared library was going to provide it. Apple added support for being more specific about it in a point release, so that a shared library could say "I want ‘malloc’, but I want it from ‘libSystem.dylib.’"

And To Illustrate…

So, now let’s take a look at a very simple C example that illustrates what I’m talking about.

Three files, three images. One is the executable, and two and three are shared libraries. Here we go:

one.c

#include <stdio.h>
extern void two(void);
extern void three(void);
int main(void) {
  printf("hello world\n");
  two();
  three();
}

two.c

#include <stdio.h>
void two() {
  printf("two.c: two\n");
}

three.c

#include <stdio.h>
void two() {
  printf("three.c: two\n");
}

void three(void) {
  printf("three.c: three\n");
  two();
}

Now, let’s build these things on linux:

gcc -shared -o libtwo.so two.c

gcc -shared -o libthree.so three.c

and we’ll build two different versions of the main executable:

gcc -o one_a one.c libthree.so

gcc -o one_b one.c libtwo.so libthree.so

And it’s the second one of those executables that’s really the interesting one. It’s in that example that the differences between linux and Windows and OS X finally stand up and say hello. In three.c, the function ‘three’ makes a reference to ‘two’, which happens to be publicly provided by three.c. Thus a version of ‘two’ gets built into libthree.so, and is made publicly available. Now, in the first executable, that’s the only version of ‘two’ around, and the result is predictable. In the second case, however, there is a copy of ‘two’ that is provided by libtwo.so. And the way the linux tooling works, because the executable binds libtwo.so before it binds libthree.so (left to right), the reference to ‘two’ inside libthree.so will be bound to the implementation in libtwo.so for the second test image (one_b).

So here’s the output for these two images on linux:

one_a

hello world
three.c: two
three.c: three
three.c: two

one_b

hello world
two.c: two
three.c: three
two.c: two

Notice that last line - that’s the call from three.c to ‘two’. Now it goes out of the previously linked shared library libthree.so and into libtwo.so. Both the call from one.c and the call from three.c go to the same place in two.c, even though we had three separate static linker invocations.

Now, if you link the same set of programs using gcc on OS X (slightly different command lines), you get the following runtime results.

one_a

hello world
three.c: two
three.c: three
three.c: two

one_b

hello world
two.c: two
three.c: three
three.c: two

Now, look at that - the call from one.c to ‘two’ went to libtwo.dylib. The call from three.c to ‘two’, however, went to the statically bound copy in libthree.dylib. That’s what would happen on Windows, too.

So What To Do?

OK, so which of these models should our tools support? The answer basically comes down to this: When in Rome, do as the Romans do. That means on Linux, our tools ought to support the linkage model provided by the default tools, except where it would completely hose our language(s) (and that’s what we did in Kylix). On OS X, we’ll do what OS X defaults to. Now, I’ll tell you, that cute little dynamic override feature that linux supports is supported through standards wrapped around the ELF image format. OS X uses Mach-o, and there appears to be no standard to support the sort of override that linux does. The linux support is a major PITA to implement, too, so OS X actually makes me breathe more of a sigh of relief, because the shared library model there, so far, appears to be much less complex than the model on linux.

Summary

This article was a bit long to get to a couple of basic points. The first point is that basic dynamic library support on OS X looks more and more like it’s simpler from a tooling standpoint than on linux. Mind, now, I’m not discussing frameworks and umbrellas here. The second point is that it’s interesting how very old paradigms in the way that a set of command line tools operate to this day bleed through and can impact language tooling that might want to operate in a completely different fashion. The prime example of this for us is Delphi. Our command line tooling for Object Pascal simply doesn’t have the concept of left to right ordering of static link dependencies for libraries, and yet we have this interesting aspect on linux that we have had to consider - an aspect that does not translate to OS X or Windows.

Share This | Email this page to a friend

Posted by Eli Boling on February 16th, 2010 under C_Builder, Delphi |

One Response to “Dynamic Symbol Binding: Origins and Effects”

Jonas Maebe Says:
February 17th, 2010 at 6:06 am
> Now, I’ll tell you, that cute little dynamic override feature that
> linux supports is supported through standards wrapped around
> the ELF image format. OS X uses Mach-o, and there appears to
> be no standard to support the sort of override that linux does.

It does, sort of, but you have to go out of your way to get such behaviour. There are two parts to it:

1) As you undoubtedly know, the normal way to call a dynamically linked function (or basically any globally defined function) on Mac OS X is using a stub. This stub will then, if necessary, resolve the actual address of the function and then jump through to its actual address.

However, if you link multiple object files together into a program or a dynamic library, for size and speed reasons the linker will remove such stubs for all calls to functions that are defined in this collection of object files. This means that in your libthree.dylib, the call from three() to two() is actually encoded as a direct call and hence there is simply no way for the dynamic linker to change that.

You can selectively change this behaviour using the linker’s "-interposable" (for all symbols) or "-interposable_list " (for selected symbols) parameters. From ld’s man page:

***
-interposable
Indirects access to all to exported symbols when creating a dynamic library.

-interposable_list filename
The specified filename contains a list of global symbol names that should always be accessed indirectly. For instance, if libSystem.dylib is linked such that _malloc is interposable, then calls to malloc() from within libSystem will go through a dyld stub and could potentially indirected to an alternate malloc. If libSystem.dylib were built without making _malloc interposable then if _malloc was interposed at runtime, calls to malloc from with libSystem would be missed (not interposed) because they would be direct calls.
***

If you build libthree using -interposable, the behaviour will still remain the same though, because Mac OS X’ dynamic linker by default uses a two-level namespace. This is the behaviour introduced with the "point release" (*) that you discussed, and which is described in more detail at http://developer.apple.com/mac/library/documentation/DeveloperTools/Conceptual/MachOTopics/1-Articles/executing_files.html#//apple_ref/doc/uid/TP40001829-97182

The historical background is also touched upon on that page: the "code fragment manager" was basically the dynamic linker that was introduced on the classic Mac OS with the switch from 68k to PowerPC.

2) to change this two-level namespace behaviour and to go to a flat namespace as in Linux, there are two options:
a) compile the program (not the libraries, that won’t change anything) with -force_flat_namespace
b) if a program was compiled without that option, you can force a flat namespace for a single execution using something like
DYLD_FORCE_FLAT_NAMESPACE=1 ./one_b

Hence:

a) compile libtwo normally (you can make it interposable, but that won’t change anything)
$ gcc -dynamiclib -o libtwo.dylib two.c

b) make libthree interposable:
$ gcc -dynamiclib -o libthree.dylib three.c -Wl,-interposable

c) compile the final program normally:
$ gcc -o one_b one.c libtwo.dylib libthree.dylib

d) execute it normally and you’ll still get the default Mac OS X behaviour:
$ ./one_b
hello world
two.c: two
three.c: three
three.c: two

e) now force a flat name space and you get the same behaviour as under Linux:
$ DYLD_FORCE_FLAT_NAMESPACE=1 ./one_b
hello world
two.c: two
three.c: three
two.c: two

f) now recompile and force the flat name space for the binary under all circumstances:
$ gcc -o one_b one.c libtwo.dylib libthree.dylib -force_flat_namespace

g) and now all executions will give the same behaviour as under Linux:
$ ./one_b
hello world
two.c: two
three.c: three
two.c: two

Caveat: several Mac OS X libraries have been written with the assumption that a two-level namespace will be used, and may misbehave if a flat namespace is forced (or even fail to resolve properly at program startup time). So, in summary: don’t use a flat namespace on Mac OS X unless you really have to (e.g., to write a malloc interceptor as described in the ld man page).

(*) small detail: the point release in question was Mac OS X 10.1. However, Mac OS X, 10.1 would generally be called a major release (since all Mac OS X versions start with "10"), while e.g. Mac OS X 10.1.1 is more likely to be called a point release.