In this post I want to explain what linkers and loaders do, and why languages like C and C++ have a linking phase. We’ll also see how the linking process works in Go, Java, and Rust. MotivationCompiling code is an extremely CPU-intensive task. Even very modest C or C++ projects can take a few seconds to compile, and big projects take much longer. Compiling code is slow because modern optimizing compilers can do extremely high quality code analysis, code generation, and register allocation, and all of that takes a lot of CPU time. However, when iterating on a project, typically very little code actually changes from one compilation to another:
Both points are important. Almost all C and C++ programs need to use the
functionality provided by libc, which offers routines like The solution to this problem is to come up with an intermediate representation for “mostly compiled” code. This way you can precompile libraries like glibc, so that building “Hello, world” no longer requires compiling a million lines of code. You can use this same technique for files within a project. This means that when you’re working on a project like Google Chrome, you do the slow CPU-intensive task of compiling all fifteen million lines of code once. Subsequently, when making a small change, the compiler will only need to recompile the changed files (and possibly things that depend on them). To tie this all together we’d an efficient system that can combine all of the precompiled code with the code from the project into a single output executable. This is exactly how linkers and object files work. When the compiler translates source code into native code, what it will actually do is generate what’s called an object file. When you change a source file, typically only the object file for that source file needs to be updated. To generate an executable with real, fully compiled code, the linker gathers all of the object files and uses them to generate an executable. The most common way of using third-party libraries is via shared libaries, which are actually a special type of object file. In the sections below, we’ll examine some of these aspects in more detail. Object FilesObject files contain native code (e.g. actual x86 machine code) in a “mostly compiled” format. The main reason the code is “mostly compiled” instead of “fully compiled” is that real executable machine code requires absolute memory addresses for function calls and global variables, and those absolute memory addresses aren’t available in object files. On the x86 architecture (including x86-64), function calls are made by specifying the 32-bit delta between the caller and the callee. For instance, suppose the current instruction is located at memory address 0x57a172, and we want to call another function located at memory address 0x577e50. With two’s complement encoding, the 32-bit delta between these addresses is represented as the negative number 0xd9dcffff. The CALL instruction is encoded using the prefix 0xe8 followed by this delta, so the fully encoded instruction is 0xe8d9dcffff. Other architectures may use absolute addressing, but the same problem still applies: to make a function call to an absolute address, we need to actually know what that address is. Because an object file just contains code for a fragment of the program, it won’t have that information available until the link phase. Thus object files typically have slots for things like function calls and the addresses of global variables. When the link phase occurs, the linker will take all of the object files, combine them, and then fill in these placeholder slots after all of the memory addresses of global symbols have been determined. It’s worth noting here that this process can be very demanding on memory, particularly for C++ projects that make excessive use of templates. This is because linking a program requires loading all of the code into memory, plus a lot of intermediate data structures to hold the layout of objects. C++ templates are a form of code generation, and thus when using them it’s possible to explode the amount of machine code the compiler is generating. Really big C++ projects (e.g. Google Chrome, LibreOffice) frequently require multiple gigabytes of memory to link. One good aspect of this design, however, is that this memory intensive phase is deferred to just the linking step. This means that the compiler uses a more modest amount of memory when compiling individual translation units. Shared Libraries and the Dynamic LoaderUsually libraries are linked in as “shared objects”. These are a lot like the regular object files just described, except they’re relocatable in memory. This means that special machine code is generated which adds an extra indirection layer for accessing functions and global variables. The indirection makes it possible to link code that doesn’t have absolute addresses available. This is how pretty much all applications link against libc, and usually other libraries as well. The low-level implementation details of how this work are outside the scope for
this post, but I want to paint the broad strokes here. As already described, to
actually make a function call on x86 you need to know the absolute memory
address of the function you’re calling. Modern systems use something
called
address space layout randomization (ASLR) to
randomize the location that shared libraries are loaded in memory. This is done
for security, to
prevent
return oriented programming (ROP) security
attacks. The idea here is that by loading shared libraries at a randomized
location in memory, it becomes significantly more difficult (usually impossible)
for the attacker to use a buffer overflow attack to make a nefarious return to a
library like libc, e.g. to access files on the filesystem or gain additional
privileges. The issue is that when ASLR is in effect, the compiler and linker
don’t know what the address of functions like This problem is avoided by
generating
position-independent code (PIC).
Here’s how it usually works. When calling a method like The lookup tables are loaded into memory by a special program called the dynamic linker. This program is also called a loader, since it loads programs. The dynamic linker special program that the kernel runs when a new executable is launched. The dynamic linker takes care of mapping shared libraries into memory at randomized locations, and bootstrapping the function lookup tables. If this sounds interesting to you and you’d like to learn more of the details, I highly recommend reading Eli Bendersky’s linkers and loaders series. Header Files and IncludesIn C and C++ programs you normally use the
What’s interesting about this is that unlike most other languages, the code in
The reason things work this way is that if Bad things will happen if the header files don’t match what’s actually in the
object files. The two most common problems that I run into are that a particular
function is defined by a header file, but for some reason isn’t actually
available in the object files; or that a given method or variable is
accidentally defined in two or more object files, as in the There are other more obscure errors that can also happen when header files and
object files are inconsistent. For instance, imagine you compiled a method
Normally these kinds of problems arise when projects don’t have their build systems properly configured. For this reason, I strongly encourage using a real build system like autoconf/automake or CMake for any non-trivial C or C++ projects. For very simple projects you can hand write your own Makefiles, but doing so for anything with more than a few source files tends to be error prone. Modern Linker AdvancementsCompilers are constantly evolving new ways to generate more optimized code. But the linker’s job is much more prosaic: the linker mostly exists to glue together object files in the final phase of code generation. So you might think that linkers haven’t changed a lot, or don’t have any interesting, new technology in them. This is sort of true, and sort of not true. I want to highlight a few “recent” advancements in linkers to give you an idea of the state of the field. Linker HardeningIn recent years linkers have gotten a few hardening options. The two important
ones you’ll use on a GNU/Linux system are The basic idea of these two options is that relocation data needs to be
initially mapped as writable when a program launches. However, after the program
starts some or all of that data can be remapped as read only. Without The Link-Time OptimizationTraditionally compilers for C and C++ can only do code optimizations for code
within a
single
translation unit
(also called a “compilation unit”). A translation unit is basically all of the
code that maps to a single object file, so usually this would be a A consequence of this is that traditional compilers that use object files cannot do “whole program optimization”, since each translation unit is optimized independently. The most common example here of an optimization that might be missed this way is inlining: function calls that cross object file boundaries normally cannot be inlined. This is something that link-time optimization fixes. When using the This is a really promising area for new optimizations, and in my opinion is one of the most exciting features to make it into GCC in the last few years. Other LanguagesOther languages with native code use linkers of some kinds. I’m going to compare and contrast Go (which uses a more traditional linker design) with Java (which has a non-traditional linker). GoEverything described so far also applies to Go—it generates object files and has a linking phase. However, Go does this all automatically so you don’t need to know about header files or worry about the details or your build system or linker options. Most Go programmers probably are not even aware that Go has a linker, but it does! If you look at
One interesting thing about Go is that it has its own linker and assembler, descended from the Plan 9 toolchain. Russ Cox has some interesting commentary on Hacker News about why Go has its own linker instead of using one of the standard ones like GNU ld or gold. JavaJava does not generate traditional object files and does not have a traditional linker, but under the hood it’s doing the same thing. Java projects get compiled into JVM bytecode, packaged in the Java archive (jar) format. When the JIT’ing code, the JVM needs to do the same tasks as a compiler and linker: it has to put real executable machine code in memory with valid jumps and call instructions. Traditional compiled languages like C and C++ put as much of this logic as possible in the regular linker, and defer the small amount of remaining work to the dynamic loader. The JVM does all of these linking tasks at runtime. This is one reason why Java processes take longer to start up than compiled C or C++ programs. Recent Android releases use a technique they call Android runtime (ART) which allows the bytecode for Android applications to be compiled ahead of time, just like C, C++, and Go programs. This is literally done by a real compiler and linker built into Android, based on LLVM. This makes Android “Java” applications work much like traditional C or C++ programs. RustRust doesn’t use traditional ELF object files, but instead has a functionally equivalent concept called a crate. However, the Rust compiler can be asked to emit a traditional object file like so:
This will cause the Rust compiler to emit a regular object file called |
|
来自: astrotycoon > 《待分类》