分享

Creating a vDSO: the Colonel's Other Chicken ...

 mediatv 2013-02-19

A vDSO (virtual dynamic shared object) is an alternative to the somewhat cycle-expensive system call interface that the GNU/Linux kernel provides. But, before I explain how to cook up your own vDSO, in this brief jaunt down operating system lane, I cover some basics of vDSOs, what they are and why they are useful. The main purpose of this article is to illustrate how to add a custom vDSO to a Linux kernel and then how to use the fruits of your labor. This is not intended to be a vDSO 101; if you would like more in-depth information, see the links in the Resources section of this article.

vDSO Basics

The traditional mechanism of communication between userland applications and the kernel is something called a system call. Syscalls are implemented as software interrupts providing the userland application with some kernel functionality. For instance, gettimeofday() and fork() are both system calls. The reason syscalls exist is due to the fact that the Linux kernel is divided into two primary segments of memory: userland and kernel land. Userland is where common programs, including d?mons and servers, execute. Kernel land is where the kernel schedules processes and does all of its nifty kernel-specific magic. This division in memory acts as a safety barrier between user applications and the kernel. The only way a user application even can touch the kernel is via system call communication. Therefore, the robustness and integrity of the kernel is protected by the limited set of routines it provides userland access to, the system calls.

To accomplish a syscall, the kernel must flip-flop memory contexts: storing the userland CPU registers, looking up the syscall in the interrupt vector of syscalls (the syscall vector is initialized at boot time) and then processing the syscall. Once the syscall has been processed in kernel land, the kernel must restore the registers from the previously stored userland context. This completes the syscall; however, as you can imagine, this is not a tax-free series of events. Numerous cycles are spun just to make these special kinds of function calls.

Although this segmentation sounds great for the security world, it does not always provide the most efficient means of communication. Certain functions that do not write any data and merely return a value stored in the kernel, such as gettimeofday(), are relatively safe in nature and provide no threat to the kernel from the requesting userland application. Wouldn't it be nice if you could make safe functions not have to do the memory-barrier tango? Well, you can—with vDSO!

You're probably wondering how a vDSO gets placed into a program in the first place, over the traditional syscall. Well, vDSO hooks are provided via the glibc library. The linker will link in the glibc vDSO functionality, provided that such a routine has an accompanying vDSO version, such as gettimeofday(). When your program executes, if your kernel does not have vDSO support, a traditional syscall will be made. This test of vDSO functionality is provided by the code linked from glibc. Of course, you don't want to hack up glibc just so you can have your home-brewed vDSO run. The method for creating a vDSO described below does not require modification of glibc; instead it relies on hacking up the kernel, as expected.

Cluck, Cluck...vDSO

These safe syscalls can be implemented on a page of virtual memory that can be mapped into each running process' memory. This implementation is similar to how other dynamically shared objects are mapped into a process, such as shared libraries. In fact, if you were to extract the page from memory and disassemble it, the result is a shared-library ELF. In other words, the vDSO is just a shared library (sorry to blow the magic for you). With this page of safe syscall routines resident to the userland application, a program can make the call and not have to endure the overhead of the memory-hopping between user and kernel segments that a traditional syscall would require. One perfect example is gettimeofday(). This routine not only is timing-sensitive, but it often is a routine that is used at a high frequency. Consider that it takes the kernel time to hop memory segments. Once the clock is sampled, cycles must be spent to flip memory segments. The longer this takes, the less accurate the returned time value will be.

Let's Get Frying'

Enough with theory and all that mumbo-jumbo, let's get to what this article is all about—making your own vDSO. This article assumes a 64-bit x86 processor using the 2.6.37 Linux kernel. You'll probably be surprised at how easy this is. It is even less involved than making a traditional syscall. The confusing part comes when trying to share data via variables between kernel and userland.

Let's create a syscall that does something basic—say, produce an integer value of, oh, the number of the beast, 666. For all instructive purposes, let's call this function, number_of_the_beast(). Because I'm not sure that the true number of the beast is static (hey, beasts might change), let's make this function do just that, tell us the number of the beast. (It could be like a president and change every few years.) Create a file in linux-2.6.37/arch/x86/vdso/ called vnumber_of_the_beast.c, and inside there, define your function:


#include <asm/linkage.h>

notrace int __vdso_number_of_the_beast(void)
{
    return 0xDEAD - 56339;
}

The only interesting/unusual thing here is the notrace macro. It is defined in linux-2.6.37/arch/x86/include/asm/linkage.h as being:


#define notrace __attribute__((no_instrument_function))

The above GNU extension tells the gcc compiler that when it compiles the function to exclude hooks supporting profiling feedback. Profiling feedback can be built in, if the notrace macro is removed and if the gcc flag -finstrument-functions was passed to the gcc at compile time (see the GCC Manual, listed in Resources).

You also need to tell the compiler to link a userland-accessible function called number_of_the_beast, which is also a weak symbol. Weak symbols represent data, such as function calls, that do not resolve until runtime. The word "weak" simply means the symbol can be overridden. If the symbol does not exist, no warnings are issued, as no symbol is acceptable in this case. The alias associates the local __vdso_number_of_the_beast to the world-accessible version, number_of_the_beast. Add the following piece just after the function previously added:


int number_of_the_beast(void)
    __attribute__((weak, alias("__vdso_number_of_the_beast")));

Now, you just need to toss in some pieces to the linker script so that when the kernel builds, your code will get built and linked into the vdso.so shared object. That is what you will use for your hook when writing code that uses the vDSO. Now, bust out your text editor and modify linux-2.6.37/arch/x86/vdso/vdso.lds.S to add the function names you just added:


VERSION {
    LINUX_2.6 {
        global:
            clock_gettime;
            __vdso_clock_gettime;
            gettimeofday;
            __vdso_gettimeofday;
            getcpu;
            __vdso_getcpu;

            /* ADD YOUR VDSO STUFF HERE */
            number_of_the_beast;
            __vdso_number_of_the_beast;
        local: *;
    };
}

One more thing, you need to tell the compiler actually to compile the information in vnumber_of_the_beast.c. To do this, just toss some information into the Makefile located in linux-2.6.37/arch/x86/vdso/Makefile. Add the name of the file, with a .o instead of a .c extension. And, through make wizardry and black magic, it will be compiled at compile time. Again, break out the text editor, and add the name to the list of object files for the variable vobjs-y. Your result should look something similar to the following:


# files to link into the vdso
vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o 
 ?vvar.o vnumber_of_the_beast.o

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多