What is linux-gate.so.1 ?

昵称39230 2007-08-21

展开全文

On Linux systems, ldd is a command which displays information about which shared libraries are used by a particular executable. For example:

dpn@colobus:~$ ldd /bin/sh
linux-gate.so.1 =>  (0xffffe000)
libncurses.so.5 => /lib/libncurses.so.5 (0xb7f4c000)
libdl.so.2 => /lib/libdl.so.2 (0xb7f48000)
libc.so.6 => /lib/libc.so.6 (0xb7e24000)
/lib/ld-linux.so.2 (0xb7f9c000)

This morning I read this page which gives an excellent explanation of what the “linux-gate” entry is (read the link for more background). Basically, it maps to a virtual shared library created by the kernel. The virtual shared library is used to select the best interface for performing a system call depending on what your CPU supports. Essentially, more modern x86 systems support a sysenter instruction which is significantly faster than the old method of generating an particular interrupt with the int instruction.

Naturally, I immediately tried what the article described on my workstation and was surprised to find no linux-gate.so.1 entry! That system is 64 bit (note the 64 bit addresses for libraries in the ldd output below).

dpn@tuna:~$ ldd /bin/sh
libncurses.so.5 => /lib/libncurses.so.5 (0x00002ae3837c8000)
libdl.so.2 => /lib/libdl.so.2 (0x00002ae383926000)
libc.so.6 => /lib/libc.so.6 (0x00002ae383a2a000)
/lib64/ld-linux-x86-64.so.2 (0x00002ae3836ab000)

To find out what is going on, I can run objdump -d on /lib64/libc-2.5.so, which I know will contain several system calls since it provided the most basic interface to kernel functionality. I redirected the output to a file and opened it in vim, searching through for a simple function which I know uses a syscall:

0000000000090ca0 :
90ca0: b8 3f 00 00 00        mov    $0×3f,%eax
90ca5: 0f 05                 syscall
90ca7: 48 3d 01 f0 ff ff     cmp    $0xfffffffffffff001,%rax
90cad: 73 01                 jae    90cb0 
90caf: c3                    retq
90cb0: 48 8b 0d 11 42 1a 00  mov    1720849(%rip),%rcx        # 234ec8 <_io_file_jumps +0x988>
90cb7: 31 d2                 xor    %edx,%edx
90cb9: 48 29 c2              sub    %rax,%rdx
90cbc: 64 89 11              mov    %edx,%fs:(%rcx)
90cbf: 48 83 c8 ff           or     $0xffffffffffffffff,%rax
90cc3: eb ea                 jmp    90caf 
90cc5: 90                    nop

This is the assembly code for the uname system call (I omitted 10 additional nop instructions at the end, presumably added by the compiler for alignment), which returns information about the current running kernel (such as version, etc). I can see from /usr/include/asm-x86_64/unistd.h that the system call number for uname is 63, which corresponds to 0×3f in hex. This is the value moved into the register eax at the start of the uname function. The kernel code looks into this register to determine which system call was executed. The next instruction is the syscall instruction. As you can see, the x86_64 architecture has an instruction called syscall which is directly defined as part of the architecture for all CPUs. Unlike x86, which had several different system call interfaces including the int instruction and the sysenter instruction, the virtual system call interface defined in linux-gate has no purpose on x86_64 systems.

One question remains. The uname system call takes one argument: A pointer to a utsname struct, which the kernel fills with data. But how is that argument passed to the kernel? The glibc function only sets up the system call number in eax. The answer is that in x86_64 arguments are passed in registers, beginning with rdi and rsi. So when the uname stub in glibc is called by a user program, rdi already contains the pointer to the utsname struct. The kernel just takes the pointer value out of rdi, where it remains untouched by the stub.