线程局部存储漫谈 ? 搜索技术博客－淘宝

astrotycoon 2015-03-08

展开全文

线程局部存储漫谈

引子

前段时间写代码, 发现一个很有趣的core

(gdb) bt
#0 0x00007fd60206b4e0 in ?? ()
#1 0x00007fd618431b19 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
#2 0x00007fd61843278b in start_thread () from /lib64/libpthread.so.0
#3 0x00007fd617d219ad in clone () from /lib64/libc.so.6

从gdb输出可以看出, 程序core在了系统库libc.so.6中,没什么头绪, 只好求助万能的google, 果然有所收获.

Crashes in __nptl_deallocate_tsd() are symptomatic of invalid dangling thread key cleanup function (the callback to pthread_key_create()). Most probably a library registered a thread-local variable with a callback, then failed to delete the thread-local variable and got unloaded (dlclose()).

因为代码中确实动态加载的so, 并且在不少地方启用了线程局部变量. 所以在我看来上述描述比较清楚.

多线程应用程序main通过dlopen()动态加载so,
在线程thread1中调用了so中的方法foo();
foo()通过pthread_key_create()启用了线程局部存储.
pthread_key_create(&key, destructor) 通过注册销毁函数destructor确保线程退出的时候能析构线程局部变量
main调用dlclose() 卸载so, 于是so中的符号不能再被引用, destructor变成未定义符号
thread1退出, 调用destructor销毁线程局部变量, 发现符号未定义, coredump

解决这个bug, 只能通过gdb设置断点在pthread_key_create, 然后逐个排查backtrace栈帧信息, 直到找到处于未定义的符号.

为了重现这个问题, 我编写了一段小程序.
main.cpp做了三件事情

动态加载so
启动线程, 运行so中的代码
等待so的代码被执行完毕, sleep()
回收线程

程序例子

// main.cpp
#include <stdio.h>
#include <unistd.h>
#include <dlfcn.h>
#include <pthread.h>

typedef void (*FuncType)(void);
static void *thread_fn(void *arg)
{
    FuncType func = (FuncType)arg;
    func();
    sleep(2);
    return (NULL);
}
int main()
{
    void *handler = dlopen("./libfirst.so", RTLD_NOW);
    if(!handler)
    {
        printf("%s\n", dlerror());
        return -1;
    }
    FuncType func = reinterpret_cast<FuncType>(dlsym(handler, "process"));
    if(NULL == func)
    {
        printf("first func is null\n");
        return -2;
    }
    int i = 0; int n = 10;
    pthread_t tids[10];
    for (i = 0; i < n; i++)
    {
        pthread_create(&tids[i], NULL, thread_fn, (void *)func);
    }
    sleep(1);
    dlclose(handler);
    for (i = 0; i < n; i++)
    {
        pthread_join(tids[i], NULL);
    }
    return 0;
}

first.cpp用于生成so(libfirst.so), 代码中, 它启用了线程局部变量,并注册相应的销毁函数destructor()

程序例子

// first.cpp
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

pthread_key_t key;
pthread_once_t once = PTHREAD_ONCE_INIT;

static void destructor(void *ptr)
{
    free(ptr);
}

void init_once(void)
{
    pthread_key_create(&key, destructor);
}

extern "C" void process()
{
    pthread_once(&once, init_once);

    void *ptr;
    if ((ptr = pthread_getspecific(key)) == NULL)
    {
    fprintf(stdout, "malloc 1024 byte\n");
    ptr = malloc(1024);
    pthread_setspecific(key, ptr);
    }
    return;
}

编译并运行

$ g++ -g -o main main.cpp -ldl -lpthread
$ g++ -g -fPIC -c -o first.o first.cpp
$ g++ -shared -o libfirst.so first.o
$ ulimit -c unlimited
$ ./main
Segmentation fault (core dumped)

查看core dump

$ gdb -c core.15780 main
(gdb) bt
#0  0x00002b70bd102912 in ?? ()
#1  0x00000032f2405ac9 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
#2  0x00000032f24064b5 in start_thread () from /lib64/libpthread.so.0
#3  0x00000032f18d3c2d in clone () from /lib64/libc.so.6

建议

从上问中, 我得到的教训是:

动态加载的so中尽量不用线程局部存储.

历史问题

我查阅了相关的资料, 发现线程局部存储(TLS)是一个后来者, 产生于多线程概念之后.而在软件发展的早期, 全局变量经常用在库函数中, 用于存储全局信息, 比如errno, 多线程程序产生之后, 全局变量errno就成为所有线程都共享的一个变量, 而实际上, 每个线程都想维护一份自己的errno, 隔离于其他线程.

这个时候, 没人愿意去修改库函数的接口. 于是线程局部存储就诞生了, 根据wikipedia的介绍

Thread-local storage (TLS) is a computer programming method that uses static or global memory local to a thread.

为了在各个平台上都能用上线程局部变量, POSIX Thread定义了一组接口, 用于显式构造使用线程局部存储.

#include <pthread.h>

int pthread_key_create(pthread_key_t *key, void (*destructor)(void*));
int pthread_key_delete(pthread_key_t key);

void *pthread_getspecific(pthread_key_t key);
int pthread_setspecific(pthread_key_t key, const void *value);

显式构造线程局部变量的方法, 有一个显著优点就是能注册各种类型的对象, 包括内置对象和自定义对象. 而且对象的销毁方式destructor可以显式告诉pthread_key_create, 这样线程退出的时候, 线程局部变量就可以正常销毁, 不至于造成内存泄露.

优点再多, 也禁不住它太难用了, 于是有人就想在编译器添加新功能, 支持特定关键字__thread, 隐式构造线程局部变量

 __thread int i;
 extern __thread struct state s;
 static __thread char *p;

这样的方式, 使用起来是很方便, 但是需要操作系统, 编译器, 连接器, glibc要相应做出修改, 甚至ELF文件格式都需要调整, 这个Ulrich Drepper在tls.pdf中做了详细的介绍.

另一方面, __thread只支持POD类型, 不能用于定义STL中的容器和类, 比如std::string. 非要这么做, 编译器会报错:

main.cpp:8: error: ‘a’ cannot be thread-local because it has non-POD type ‘std::string’
main.cpp:8: error: ‘a’ is thread-local and so cannot be dynamically initialized

gcc也在文档中专门谈到了Thread-Local, 提到了_thread修饰的变量只能做static initialize

In C++, if an initializer is present for a thread-local variable, it must be a constant-expression, as defined in 5.19.2 of the ANSI/ISO C++ standard.

既然线程局部存储有两种使用方式, 而且各有优缺点, 就有人提出结合二者, 开发一个使用更方便, 又能支持non-POD类型的实现库. 比如 blog 线程局部变量与 __thread

C++11也意识到这个问题, 于是在C++11中引入了新的关键字thread_local, Destructor support for thread_local variables介绍说:

One of the key features is that they support non-trivial constructors and destructors that are called on first-use and thread exit respectively.

除了支持non-POD类型的线程局部变量, 它还提到了上文提到的线程局部变量和动态加载so的问题

The compiler can handle constructors, but destructors need more runtime context. It could be possible that a dynamically loaded library defines and constructs a thread_local variable, but is dlclose()’d before thread exit. The destructor of this variable will then have the rug pulled from under it and crash.

解决的思路是实现函数__cxa_thread_atexit_impl(), 供libstdc++在构造对象的时候调用

int __cxa_thread_atexit_impl (void (*dtor) (void *), void *obj,
                          void *dso_symbol);

连接器(ld)为dso_symbol所属的so维护一个引用计数, 维护由它定义的线程局部变量个数. 如果某个线程局部变量被析构, 那引用计数相应减1, 只有引用计数等于0, dlclose()才能卸载so.

其他视角

Walter Bright 在文章It’s Not Always Nice To Share中认为现有的线程局部变量实现都不友好, 在多线程环境下, static和global的变量应该默认就是TLS的, 而不是shared, 这样单核时代的代码, 比如C运行库,不用改动就可以运行在多线程环境中; 如果应用程序非要全局shared的变量, 那应该加上shared关键字以明确指明.

总结

再回头看看上文的建议, 如果非要在动态加载so中使用线程局部变量.

显式线程局部变量:
- pthread_key_create注册了destructor, 在dlclose()调用之前, 确保调用pthread_key_delete()删除线程局部变量
- pthread_key_create()中的destructor置为NULL.
隐式线程局部变量: 因为只支持POD类型, 所以可以用在动态加载so中.