linux内核之hlist

waston 2015-11-11

展开全文

linux内核hlist分析

在Linux内核中，hlist（哈希链表）使用非常广泛。本文将对其数据结构和核心函数进行分析。

和hlist相关的数据结构有两个

(1) hlist_head

struct hlist_head {
struct hlist_node *first;
};

(2) hlist_node

struct hlist_node {
struct hlist_node *next, **pprev;
};

顾名思义， hlist_head表示哈希表的头结点。哈希表中每一个entry(hlist_head)所对应的都是一个链表（hlist)，该链表的结点由hlist_node表示。

hlist_head结构体只有一个域，即first。 first指针指向该hlist链表的第一个节点。

hlist_node结构体有两个域，next 和pprev。 next指针很容易理解，它指向下个hlist_node结点，倘若该节点是链表的最后一个节点，next指向NULL。

pprev是一个二级指针，它指向前一个节点的next指针。为什么我们需要这样一个指针呢？它的好处是什么？

在回答这个问题之前，我们先研究另一个问题：为什么散列表的实现需要两个不同的数据结构？

散列表的目的是为了方便快速的查找，所以散列表通常是一个比较大的数组，否则“冲突”的概率会非常大，这样也就失去了散列表的意义。如何做到既能维护一张大表，又能不使用过多的内存呢？就只能从数据结构上下功夫了。所以对于散列表的每个entry，它的结构体中只存放一个指针，解决了占用空间的问题。现在又出现了另一个问题：数据结构不一致。显然，如果hlist_node采用传统的next,prev指针，对于第一个节点和后面其他节点的处理会不一致。这样并不优雅，而且效率上也有损失。

hlist_node巧妙地将pprev指向上一个节点的next指针的地址，由于hlist_head和hlist_node指向的下一个节点的指针类型相同，这样就解决了通用性！

下面我们再来看一看hlist_node这样设计之后，插入删除这些基本操作会有什么不一样。

static inline void __hlist_del(struct hlist_node *n)
{
        struct hlist_node *next = n->next;
        struct hlist_node **pprev = n->pprev;
        *pprev = next;
        if (next)
                next->pprev = pprev;
}

__hlist_del用于删除节点n。

首先获取n的下一个节点next， n->pprev指向n的前一个节点的next指针的地址，这样×pprev就代表n前一个节点的下一个节点（现在即n本身），第三行代码*pprev=next;就将n的前一个节点和下一个节点关联起来了。至此，n节点的前一个节点的关联工作就完成了，现在再来完成下一个节点的关联工作。如果n是链表的最后一个节点，那么n->next即为空，则无需任何操作，否则，next->pprev = pprev。

给链表增加一个节点需要考虑两个条件：(1)是否为链表的首个节点(2)普通节点。

static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h)
{
         struct hlist_node *first = h->first;
         n->next = first;
         if (first)
                 first->pprev = &n->next;
         h->first = n;
         n->pprev = &h->first;
}

首先讨论条件（1）。

first = h->first; 获取当前链表的首个节点；

n->next = fist; 将n作为链表的首个节点，让first往后靠；

先来看最后一行 n->pprev - &h->first; 将n的pprev指向hlist_head的first指针，至此关于节点n的关联工作就做完了。

再来看倒数第二行 h->first = n; 将节点h的关联工作做完；

最后我们再来看原先的第一个节点的关联工作，对于它来说，仅仅需要更新一下pprev的关联信息： first->pprev = &n->next;

接下来讨论条件（2）。这里也包括两种情况：a)插在当前节点的前面b)插在当前节点的后面

/* next must be != NULL */
static inline void hlist_add_before(struct hlist_node *n, struct hlist_node *next)
{
         n->pprev = next->pprev;
         n->next = next;
         next->pprev = &n->next;
         *(n->pprev) = n;
}

先讨论情况a) 将节点n 插到next之前（n是新插入的节点)

还是一个一个节点的搞定（一共三个节点），先搞定节点n

n->pprev = next->prev; 将 next 的pprev 赋值给n->pprev n取代next的位置

n->next = next; 将next作为n的下一个节点，至此节点n的关联动作完成。

next->pprev = &n->next; next的关联动作完成。

*(n->pprev) = n; n->pprev表示n的前一个节点的next指针； *(n->pprev)则表示n的前一个节点next指针所指向下一个节点的内容，这里将n赋值给它，正好完成它的关联工作。

static inline void hlist_add_after(struct hlist_node *n, struct hlist_node *next)
{
         next->next = n->next;
         n->next = next;
         next->pprev = &n->next;
         if (next->next)
                 next->next->pprev = &next->next;
}

再来看情况b) 将结点next插入到n之后 (next是新插入的节点）

具体步骤就不分析了。应该也很容易。

下面我还要介绍一个函数：

static inline int hlist_unhashed(const struct hlist_node *h)
{
return !h->pprev;
}

这个函数的目的是判断该节点是否已经存在hash表中。这里处理得很巧妙。判断前一个节点的next指向的地址是否为空。

最后我们看一个具体的例子，Linux内核是如何管理pid的。（正好和上一篇介绍pid的文章相呼应：））基于内核3.0.3

内核初始化时要调用pidhash_init（）创建哈希表。该函数会在 start_kernel()函数里被调用(init/main.c Line 509)

void __init pidhash_init(void)
{
         int i, pidhash_size;

         pid_hash = alloc_large_system_hash("PID", sizeof(*pid_hash), 0, 18,
                                            HASH_EARLY | HASH_SMALL,
                                            &pidhash_shift, NULL, 4096);
         pidhash_size = 1 << pidhash_shift;

         for (i = 0; i < pidhash_size; i++)
                 INIT_HLIST_HEAD(&pid_hash[i]);
}

从这个函数可以看到内核会在slab上分配一个大小为pidhash_size的数组，然后为每一个entry进行初始化（INIT_HLIST_HEAD)

在alloc_pid函数里

struct pid *alloc_pid(struct pid_namespace *ns)
{
         struct pid *pid;
         enum pid_type type;
         int i, nr;
         struct pid_namespace *tmp;
         struct upid *upid;

         pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL); /* 在slab上分配pid结构体 */
         if (!pid)
                 goto out;

         tmp = ns;
         for (i = ns->level; i >= 0; i--) { /* 虽然这里是for循环，实际只会运行一次，因为现在只支持global namespace即ns->level=0 */
                 nr = alloc_pidmap(tmp); /* 在各级pid_namespace上寻找并分配pid的值 */
                 if (nr < 0)
                         goto out_free;

                 pid->numbers[i].nr = nr;
                 pid->numbers[i].ns = tmp;
                 tmp = tmp->parent;
         }

         get_pid_ns(ns);
         pid->level = ns->level;
         atomic_set(&pid->count, 1);
         for (type = 0; type < PIDTYPE_MAX; ++type)
                 INIT_HLIST_HEAD(&pid->tasks[type]);

         upid = pid->numbers + ns->level;
         spin_lock_irq(&pidmap_lock);
         for ( ; upid >= pid->numbers; --upid)
                 hlist_add_head_rcu(&upid->pid_chain,
                                 &pid_hash[pid_hashfn(upid->nr, upid->ns)]);   /* 将各级namespace中的upid插入pidhash的哈希表里 */
         spin_unlock_irq(&pidmap_lock);

out:
         return pid;

out_free:
         while (++i <= ns->level)
                 free_pidmap(pid->numbers + i);

         kmem_cache_free(ns->pid_cachep, pid);
         pid = NULL;
         goto out;
}

Linux 内核 hlist 详解

在Linux内核中，hlist（哈希链表）使用非常广泛。本文将对其数据结构和核心函数进行分析。

和hlist相关的数据结构有两个：hlist_head 和 hlist_node

hlist_head结构体只有一个域，即first。 first指针指向该hlist链表的第一个节点。
hlist_node结构体有两个域，next 和pprev。 next指针很容易理解，它指向下个hlist_node结点，倘若该节点是链表的最后一个节点，next指向NULL。
pprev是一个二级指针，它指向前一个节点的next指针的地址。为什么我们需要这样一个指针呢？它的好处是什么？
在回答这个问题之前，我们先研究另一个问题：为什么哈希表的实现需要两个不同的数据结构？
哈希表的目的是为了方便快速的查找，所以哈希表中hash桶的数量通常比较大，否则“冲突”的概率会非常大，这样也就失去了哈希表的意义。如何做到既能维护一张大表，又能不使用过多的内存呢？就只能从数据结构上下功夫了。所以对于哈希表的每个hash桶，它的结构体中只存放一个指针，解决了占用空间的问题。现在又出现了另一个问题：数据结构不一致。显然，如果hlist_node采用传统的next,prev指针，对于第一个节点和后面其他节点的处理会不一致。这样并不优雅，而且效率上也有损失。
hlist_node巧妙地将pprev指向上一个节点的next指针的地址，由于hlist_head的first域指向的结点类型和hlist_node指向的下一个结点的结点类型相同，这样就解决了通用性！

如果要删除hash桶对应链表中的第一个普通结点

对应的程序代码如下：

static inline void __hlist_del(struct hlist_node *n)
{
struct hlist_node *next = n->next; //获取指向第二个普通结点的指针
struct hlist_node **pprev = n->pprev; //保留待删除的第一个结点的pprev域（即头结点first域的地址），此时 pprev = &first
*pprev = next;
/*
因为pprev = &first，所以*pprev = next，相当于 first = next
即将hash桶的头结点指针指向原来的第二个结点，如上图中的黑线1
*/
if (next) //如果第二个结点不为空
next->pprev = pprev; //将第二个结点的pprev域设置为头结点first域的地址，如上图中的黑线2
}
如果要删除hash桶对应链表中的非第一个结点

对应的程序代码如下：

static inline void __hlist_del(struct hlist_node *n)
{
struct hlist_node *next = n->next; //获取指向待删除结点的下一个普通结点的指针
struct hlist_node **pprev = n->pprev; //获取待删除结点的pprev域
*pprev = next; //修改待删除结点的pprev域，逻辑上使待删除结点的前驱结点指向待删除结点的后继结点，如上图中的黑线1

if (next) //如果待删除结点的下一个普通结点不为空
next->pprev = pprev; //设置下一个结点的pprev域，如上图中的黑线2，保持hlist的结构
}

可以看到删除第一个普通结点和删除非第一个普通结点的代码是一样的。

下面再来看看如果hlist_node中包含两个分别指向前驱结点和后继结点的指针

很明显删除hash桶对应链表中的非第一个普通结点，只需要如下两行代码：

n->next->prev = n->prev;
n->prev->next = n->next;

可是，如果是删除的hash桶对应链表中的第一个普通结点：

此时 n->prev->next = n->next 就会出问题，因为hash桶的表头结点没有next域

所以，明显在这种情况下删除hash桶对应链表的第一个普通结点和非第一个普通结点的代码是不一样的。

同样的情况也存在于插入操作。

附一张在hash桶头结点之后，插入第一个普通结点的图：

在遍历上，如果使用hlist_hode, list_node指针进行遍历，两者过程大致相似。

#define list_for_each(pos, head)
        for (pos = (head)->next; prefetch(pos->next), pos != (head);
                pos = pos->next)

#define hlist_for_each(pos, head)
        for (pos = (head)->first; pos && ({ prefetch(pos->next); 1; });
             pos = pos->next)

如果使用其寄生结构的指针进行遍历，则hlist与list也略有不同，hlist在遍历时需要一个指向hlist_node的临时指针，该指针的引入，一是为了遍历，而list的遍历在list_entry的参数中实现了，更主要的目的在于判断结束，因为hlist最后一个节点的next为NULL，只有hlist_node指向NULL时才算结束，而这个NULL不包含在任何寄生结构内，不能通过tpos->member的方式访问到，故临时变量pos的引入时必须的。

#define list_for_each_entry(pos, head, member)
        for (pos = list_entry((head)->next, typeof(*pos), member);
             prefetch(pos->member.next), &pos->member != (head);
             pos = list_entry(pos->member.next, typeof(*pos), member))

#define hlist_for_each_entry(tpos, pos, head, member)
        for (pos = (head)->first;
             pos && ({ prefetch(pos->next); 1;}) &&
                ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;});
             pos = pos->next)
另外，list和hlist的遍历都实现了safe版本，因在遍历时，没有任何特别的东西来阻止对链表执行删除操作（通常在使用链表时使用锁来保护并发访问）。安全版本的遍历函数使用临时存放的方法使得检索链表时能不被删除操作所影响。

#define list_for_each_safe(pos, n, head)
        for (pos = (head)->next, n = pos->next; pos != (head);
                pos = n, n = pos->next)

#define hlist_for_each_safe(pos, n, head)
        for (pos = (head)->first; pos && ({ n = pos->next; 1; });
             pos = n)
附上linux内核中与hlist相关的完整代码：

//hash桶的头结点
struct hlist_head {
     struct hlist_node *first; //指向每一个hash桶的第一个结点的指针
};
//hash桶的普通结点
struct hlist_node {
struct hlist_node *next; //指向下一个结点的指针
struct hlist_node **pprev; //指向上一个结点的next指针的地址
};
//以下三种方法都是初始化hash桶的头结点
#define HLIST_HEAD_INIT { .first = NULL }
#define HLIST_HEAD(name) struct hlist_head name = { .first = NULL }
#define INIT_HLIST_HEAD(ptr) ((ptr)->first = NULL)

//初始化hash桶的普通结点
static inline void INIT_HLIST_NODE(struct hlist_node *h)
{
h->next = NULL;
h->pprev = NULL;
}

//判断一个结点是否已经存在于hash桶中
static inline int hlist_unhashed(const struct hlist_node *h)
{
     return !h->pprev;
}

//判断一个hash桶是否为空
static inline int hlist_empty(const struct hlist_head *h)
{
     return !h->first;
}

static inline void __hlist_del(struct hlist_node *n)
{
struct hlist_node *next = n->next; //获取指向待删除结点的下一个结点的指针
struct hlist_node **pprev = n->pprev; //保留待删除结点的pprev域
*pprev = next; //修改待删除结点的pprev域，逻辑上使待删除结点的前驱结点指向待删除结点的后继结点
if (next)
next->pprev = pprev; //设置待删除结点的下一个结点的pprev域，保持hlist的结构
}

static inline void hlist_del(struct hlist_node *n)
{
__hlist_del(n); //删除结点之后，需要将其next域和pprev域设置为无用值
n->next = LIST_POISON1;
n->pprev = LIST_POISON2;
}

static inline void hlist_del_init(struct hlist_node *n)
{
if (!hlist_unhashed(n))
{
__hlist_del(n);
INIT_HLIST_NODE(n);
}
}

//将普通结点n插入到头结点h对应的hash桶的第一个结点的位置
static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h)
{
struct hlist_node *first = h->first;
n->next = first;
if (first)
first->pprev = &n->next;
h->first = n;
n->pprev = &h->first;
}

/* next must be != NULL */
//在next结点之前插入结点n，即使next结点是hash桶中的第一个结点也可以
static inline void hlist_add_before(struct hlist_node *n, struct hlist_node *next)
{
n->pprev = next->pprev;
n->next = next;
next->pprev = &n->next;
*(n->pprev) = n;
}

//在结点n之后插入结点next
static inline void hlist_add_after(struct hlist_node *n, struct hlist_node *next)
{
next->next = n->next;
n->next = next;
next->pprev = &n->next;

if (next->next)
next->next->pprev = &next->next;
}

/*
* Move a list from one list head to another. Fixup the pprev
* reference of the first entry if it exists.
*/
static inline void hlist_move_list(struct hlist_head *old, struct hlist_head *new)
{
new->first = old->first;
if (new->first)
new->first->pprev = &new->first;
old->first = NULL;
}

//通过一个结构体内部一个成员的地址获取结构体的首地址
#define hlist_entry(ptr, type, member) container_of(ptr,type,member)

#define hlist_for_each(pos, head)
for (pos = (head)->first; pos && ({ prefetch(pos->next); 1; });
      pos = pos->next)

#define hlist_for_each_safe(pos, n, head)
for (pos = (head)->first; pos && ({ n = pos->next; 1; });
      pos = n)

/**
* hlist_for_each_entry   - iterate over list of given type
* @tpos:   the type * to use as a loop cursor.
* @pos:   the &struct hlist_node to use as a loop cursor.
* @head:   the head for your list.
* @member:   the name of the hlist_node within the struct.
*/
#define hlist_for_each_entry(tpos, pos, head, member)
for (pos = (head)->first;
      pos && ({ prefetch(pos->next); 1;}) &&
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;});
      pos = pos->next)

/**
* hlist_for_each_entry_continue - iterate over a hlist continuing after current point
* @tpos:   the type * to use as a loop cursor.
* @pos:   the &struct hlist_node to use as a loop cursor.
* @member:   the name of the hlist_node within the struct.
*/
#define hlist_for_each_entry_continue(tpos, pos, member)
for (pos = (pos)->next;
      pos && ({ prefetch(pos->next); 1;}) &&
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;});
      pos = pos->next)

/**
* hlist_for_each_entry_from - iterate over a hlist continuing from current point
* @tpos:   the type * to use as a loop cursor.
* @pos:   the &struct hlist_node to use as a loop cursor.
* @member:   the name of the hlist_node within the struct.
*/
#define hlist_for_each_entry_from(tpos, pos, member)
for (; pos && ({ prefetch(pos->next); 1;}) &&
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;});
      pos = pos->next)

/**
* hlist_for_each_entry_safe - iterate over list of given type safe against removal of list entry
* @tpos:   the type * to use as a loop cursor.
* @pos:   the &struct hlist_node to use as a loop cursor.
* @n:   another &struct hlist_node to use as temporary storage
* @head:   the head for your list.
* @member:   the name of the hlist_node within the struct.
*/
#define hlist_for_each_entry_safe(tpos, pos, n, head, member)
for (pos = (head)->first;
      pos && ({ n = pos->next; 1; }) &&
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;});
      pos = n)