kmemleak

waston 2013-05-10

展开全文

kmemleak 分析

1：检测内核内存泄漏的功能

2：Documentation/kmemleak.txt

3：内核demo:mm/kmemleak-test.c

对于kmemleak，需要理解下面三点就可以了

1：我们需要知道它能检测哪几种内存泄漏（即用什么方法分配的内存可以检测）

2：内核存在特殊情况，即分配内存但没有引用。使用什么方法可以防止kmemleak report

3：检测的机理是什么，如何知道分配的内存被引用，或者没有引用。

关注点1

kmalloc/kzalloc

vmalloc

kmem_cache_alloc

per_cpu_alloc

[Page allocations and ioremap are not tracked]

关注点2

kmemleak_not_leak、kmemleak_ignore、kmemleak_no_scan

这几个函数在内核中被使用，是为了不被kmemleak 打印出来。但是深层次的区别是什么？

kmemleak_not_leak

/**
* kmemleak_not_leak - mark an allocated object as false positive
* @ptr: pointer to beginning of the object
*
* Calling this function on an object will cause the memory block to no longer
* be reported as leak and always be scanned.
*/

不打印；但是要扫描这个指针所分配的内存的内容。分配数据结构那么该结构本身不打印，但是会扫描结构内部的成员变量，是否引用其他指针。

这个函数往往用在：分配内存的内存永远不会被释放（与内核是一体，vmlinux或者不可移除的模块一类）。

kmemleak_ignore

/**
* kmemleak_ignore - ignore an allocated object
* @ptr: pointer to beginning of the object
*
* Calling this function on an object will cause the memory block to be
* ignored (not scanned and not reported as a leak). This is usually done when
* it is known that the corresponding block is not a leak and does not contain
* any references to other allocated memory blocks.
*/

既不打印，也不扫描指针所指的数据结构的成员变量。如果知道分配的数据结构内部不包含其他引用（不含指针）。

kmemleak_no_scan

/**
* kmemleak_no_scan - do not scan an allocated object
* @ptr: pointer to beginning of the object
*
* This function notifies kmemleak not to scan the given memory block. Useful
* in situations where it is known that the given object does not contain any
* references to other objects. Kmemleak will not scan such objects reducing
* the number of false negatives.
*/

该指针本身被扫描，但是内容不会扫描。

关注点3

所谓reference即所分配的内存有指针引用。如果没有任何指针引用那么肯定就是memleak。

所以要查找所有的指针的内容，来寻找其内容是否包含我们已经记录的分配内存的地址（包括在其实地址+size之间）。

那么这些指针变量的

1：函数的局部变量

这些变量本身在栈中，所以需要检测进程的内核栈

2：全局变量（整个系统/模块内）静态变量

这些变量是存在：ELF的bss/data

这些变量可以通过查看vmlinux或者*.ko查看这类指针变量的区段。

可以通过objdump -x file

---指针是静态分配

3：指针本身是动态分配的，即动态分配内存块（struct).成员变量是指针

所以必须要搜索这类动态分配的内存块的内容。

通过objdump -x vmlinux

.data

where global tables, variables, etc. stand. objdump -s -j .data .process.o will hexdump it.

.bss

don't look for bits of .bss in your file: there's none. That's where your uninitialized arrays and variable are, and the loader 'knows' they should be filled with zeroes ... there's no point storing more zeroes on your disk than there already are, is it ?

.rodata

that's where your strings go, usually the things you forgot when linking and that cause your kernel not to work. objdump -s -j .rodata .process.o will hexdump it. Note that depending on the compiler, you may have more sections like this.

.data..percpu

kmemleak_scan（）

data/bss 段扫描

/* data/bss scanning */

scan_block(_sdata, _edata, NULL, 1);
scan_block(__bss_start, __bss_stop, NULL, 1);

data..percpu

#ifdef CONFIG_SMP
     /* per-cpu sections scanning */
     for_each_possible_cpu(i)
          scan_block(__per_cpu_start + per_cpu_offset(i),
                  __per_cpu_end + per_cpu_offset(i), NULL, 1);
#endif

-->>>>以上都是全局指针变量、per_cpu变量

struct pagep[]数组

/*
         * Struct page scanning for each node.
         */
        lock_memory_hotplug();
        for_each_online_node(i) {
                pg_data_t *pgdat = NODE_DATA(i);
                unsigned long start_pfn = pgdat->node_start_pfn;
                unsigned long end_pfn = start_pfn + pgdat->node_spanned_pages;
                unsigned long pfn;

                for (pfn = start_pfn; pfn < end_pfn; pfn++) {
                        struct page *page;

                        if (!pfn_valid(pfn))
                                continue;
                        page = pfn_to_page(pfn);
                        /* only scan if page is in use */
                        if (page_count(page) == 0)
                                continue;
                        scan_block(page, page + 1, NULL, 1);
                }
        }
        unlock_memory_hotplug();

内核struct page数组是动态分配的，所以也要单独的进行检测。

内核进程栈

if (kmemleak_stack_scan) {
          struct task_struct *p, *g;

          read_lock(&tasklist_lock);
          do_each_thread(g, p) {
               scan_block(task_stack_page(p), task_stack_page(p) +
                       THREAD_SIZE, NULL, 0);
          } while_each_thread(g, p);
          read_unlock(&tasklist_lock);

一般遍历内核所有的进程用的是：for_each_process();

但是这里却使用：do_each_thread(){};while_each_thread()

>>>for_each_process:只打印进程；而不打印进程内的线程

>>>do_each_thread(){};while_each_thread():打印进程以及进程内的线程信息。这是因为线程有自己单独的内核栈信息。

分配的内存块的内部

分配一块内存（一般是分配数据结构），内部的成员变量是指针，所以这部分也需要检测。

>>> scan_gray_list();---->scan_object():

扫描分配内存的全部内容或者部分内容,是否引用其他指针。

pointer+size

Detecting Memory Leaks in Kernel & Managed code OSes

This blog post has two sections. Firstly, a section covering "Memory Leak detection in the Linux kernel in 10 easy steps" and the next section is about, "Implementing operating systems in managed code (like C#/Java)". Feel free to goto any section you prefer.

Introduction

A memory leak is a behavior of a program when it consumes memory but never releases it. In user-space, these days, new applications are written mostly in sophisticated, evolved, modern programming languages like C#, Java etc. This releases the burden of memory management from the programmer. Programmers need to manage memory themselves, only when they code in pre-historic programming languages like C ;-) There are nice tools (like Valgrind) that can detect memory leaks, if they happen in user-space. Valgrind won't work in Kernel space.

The Linux kernel is predominantly written in C, so that programmers can stay close to the hardware. Also there were no large-scale managed languages at the time the project was started.

There are some interesting projects that help in writing FUSE filesystems via Mono/C# etc. However your grumpy blogger didn't find them to be active and for all practical purposes, C is the default Kernel programming language.

Few days back, I was tasked with detecting memory leaks in a legacy kernel module that is not in Linus' tree. Code that goes into Linus' tree is usually of high quality and won't have memory leaks because of the rigorous reviews that are performed in LKML. So, any kernel code that is not merged upstream has a high chance of having leaks, among other bad things (so upstream your code, NOW). The simple tutorial below will explain how to detect memory leaks in kernel modules.

kmemleak

There is a nice tool named 'kmemleak' available in the Linux kernel since 2.6.31 to detect memory leaks. This tool is claimed to report a few false positives but that should not stop someone from using it. I was trying to find a kernel-space-leak-detector but did not find any links via Google. Some kernel hackers told me about this tool over IRC and this post is more as a pointer to the "kmemleak" docs, when somone googles for "kernel memory leak detection".

Pre-requisites:
+ You need to know how to build and install your own kernel.
+ You need to have 2.6.31 or newer version of the linux kernel
+ It is good if you know how to compile a kernel module. But don't worry if you don't know. You can refer to my previous tutorial for a simple hello-world kernel module.

So, without further ado, the steps are:

Step 1: Compile kernel with "CONFIG_DEBUG_KMEMLEAK" option enabled. You can get to this option via: make menuconfig, "Kernel Hacking", "Kernel Memory Leak Detector" , while compiling your kernel.

Step 2: Increase the config option "Maximum kmemleak early log entires" value to a sufficiently large number like 1200. The default value of 400 may not work correctly in all configurations.

Step 3: Install this kernel and Reboot to this newly configured kernel. Do not be alarmed if your machine is slow.

Step 4: Upon reboot, Check if your debugfs is mounted. Otherwise mount it. If all is well, you should see a file kmemleak under your debugfs mounted location.

mount -t debugfs nodev /sys/kernel/debug/
cat /sys/kernel/debug/kmemleak

The above /sys/kernel/debug/kmemleak file will contain information about any memory leak that has been detected so far since the machine booted. Ideally there should be none, until this point in time.

Step 5: Now we will see how we can detect a memory leak in a dummy kernel module as follows. Write a dummy kernel module with the following source (hello.c):

#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/vmalloc.h>

/* Never write a function like this ;) */
void myfunc(void)
{
        char *ptr;
        ptr = vmalloc(512);
        ptr = vmalloc(512);
        ptr = vmalloc(512);
}

int hello_init(void)
{
        printk(KERN_ALERT "Hello World");
        myfunc();
        return 0;
}

static void hello_exit(void)
{
        printk(KERN_ALERT "Goodbye World");
}

module_init(hello_init);
module_exit(hello_exit);

MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Your Name");

Now the most important line in the above code snippet is:
ptr = vmalloc(512);
We allocate memory, as above, in the kernel module but never free this memory.

Step 6: vi Makefile
EXTRA_CFLAGS=-g
obj-m := hello-kernel.o
hello-kernel-objs := hello.o

Step 7: Generate the kernel-object file hello-kernel.ko in your current directory:
make -C /lib/modules/`uname -r`/build M=`pwd`

All commands from now on require root permission.

Step 7.5: [Optional] At any stage, if you want to clear the memory profiler output so far created, so that we can focus just on the leaks reported from then on, you can do:
echo clear > /sys/kernel/debug/kmemleak

Step 8: Now insert the kernel object
insmod hello-kernel.ko

Step 9: The memory leak detection thread runs periodically. If you want to perform a test at any instant you want, Do:
echo scan > /sys/kernel/debug/kmemleak

Step 10: Now we will check if the leak is detected. Do:
cat /sys/kernel/debug/kmemleak
You should see:

unreferenced object 0xf9061000 (size 512):
comm "insmod", pid 12750, jiffies 14401507 (age 110.217s)
hex dump (first 32 bytes):
1c 0f 00 00 01 12 00 00 2a 0f 00 00 01 12 00 00 ........*.......
38 0f 00 00 01 12 00 00 bc 0f 00 00 01 12 00 00 8...............
backtrace:
[< c10b0001>] create_object+0x114/0x1db
[< c148b4d0>] kmemleak_alloc+0x21/0x3f
[< c10a43e9>] __vmalloc_node+0x83/0x90
[< c10a44b9>] vmalloc+0x1c/0x1e
[< f9055021>] myfunc+0x21/0x23 [hello_kernel]
[< f9058012>] 0xf9058012
[< c1001226>] do_one_initcall+0x71/0x113
[< c1056c48>] sys_init_module+0x1241/0x1430
[< c100284c>] sysenter_do_call+0x12/0x22
[< ffffffff>] 0xffffffff

As you can see in the bold text above, the leak is detected in myfunc function.

Caution: The memory leak detector code may take some time to identify the leaks. So repeat steps 9 and 10, after few minutes, if you don't get the leaks reported first time. You can try to kiss your hand elbow to pass time meanwhile ;-)

Further Reading
+ LWN Article about kmemleak - http:///Articles/187979/
+ Under Kernel sources directory: Documentation/kmemleak.txt

Thanks a lot to Catalin Marinas for kmemleak and the people at kernelnewbies for helping, not just for this problem but for nuuuumerous people.

If you were looking for just kernel memory leak detection, the blogpost is over. If you don't mind reading about some other (un)related projects, continue reading onto Part 2.

Operating Systems in Managed Code

There are some hobby open-source projects aimed at implementing a OS using managed programming languages, like Mono or Java. But none of them have any official corporate backup yet. So they are in experimental state, such as: SharpOS, Cosmos, JNode etc.

+ Sun/Oracle has a product named JavaOS developed along with IBM, but I am not really sure how active this is. Its business model is unclear as well.

+ Singularity - The most high-profile name in this research is Microsoft. However, even they don't seem to be too active either. Singularity is their project aimed at creating a managed-code OS based on a microkernel architecture.

Sad that this is not purely open source. If MSFT uses an (L)GPL-like-license and gives some more technical vision / docs of this project, may be it could generate enough enthusiasm in the student and research communities.

With increasing number of CPU cores and excellent libraries like parallel extensions to C# and functional programming languages like F#, it may be fascinating and easy to do crazy things (like LINQ as an IPC mechanism) if you are implementing a OS in managed language. You can extend your kernel in any language say, Python, Ruby etc. using projects like IronPython, Ironruby.

Or may be it is just that I am over-expecting managed code to do wonders on behalf of the programmer.

The biggest benefit of writing an operating system in managed code is, it will be more Secure. There will be no more buffer-overflow vulnerabilities, pointer exploits etc.

Ten or fifteen years ago, there were far more operating systems, like Windows, Linux, Solaris, Symbian, Mac OS, OS/2, VMS, Haiku, BSD, etc. and the field was rich, competitive and interesting. Now there are just three major players Windows, Linux and Mac, in all devices ranging from Mainframes to Datacenters to Mobiles. As these commercial operating systems mature, they are becoming more boring to learn and do stuff, the students among you (my blog readers) can try to spend time in these managed-code OSes. Who knows, there could be the next Linus Torvalds in you.

Academics love to brag about Microkernel architecture and Linux hackers love to ridicule the cost of implementing a messaging system for such an architecture. However if you use a high-level language, with facilities like LINQ, protocol-buffers; this messaging/interaction system & Interfaces versioning etc. will be far easier to implement, atleast as per my understanding. This is probably the reason why all the managed code OS-es are based on a micro-kernel architecture.

There will be performance problems in such OSes, but performance is not the only criteria for an OS, as there are other benefits like Security (no pointers), Reliability (no null pointer dereference crashes, double free crashes, etc), Extensibility (Extend the OS in Python, C#, F#), etc.

Writing an Operating System may not be the most business-savvy decision but it will definitely help in understanding the science better. And in student life, one can afford to have a hobby project that may not have immediate day-job relevance.

Send in your feedback/comments/opinions about this post or talks/links/research-papers about operating systems in managed code. I would love to hear from you.