B树的java实现

novo_land 2011-09-28

展开全文

 B树的java实现

分类： 算法2008-12-03 07:09 751人阅读 评论(0) 收藏 举报

上周看MIT, <<Introduction to algorithms>> 的时候，觉得B tree 实现起来有点麻烦，正好可以练习一下。

花了一天时间，晃晃悠悠，终于写完了, 非常的没有效率啊。

逻辑不复杂，但很多分支。关键是下标容易出错。书上的删除方法不是很明了，还费了些周折。在动手写之前关键是明白B tree到底是怎么实现的。

我先实现了insert, 因为这个方法简单一些, 也借机加深对B tree的认识。再实现的删除方法。

关键的注释在源代码(下载) 里面都有。insert写到一半，发现我患了一个低级错误：方法的传参混乱，比如， insert()是在Node里面，所以，必须由一个Node来调用，而方法里面又有要求传递一个Node作为参数。这样一来的话，一个node可以修改另一个node了。如果要求一个node可以修改另一个node，可以把方法设置为static，如果不用，那么参数就重复传递了。但我没有改过来，后面所有的方法都是这么一个模式：
node.method(node, ...)

我只是做了简单的测试， insert方法是依次插入值[1,18], 每插入一个值，和参考【1】的树比较，看是否一样。 delete的测试是首先构建一个树，包含值[1,18], 然后从18到1，依次删除并验证。

手都酸了。。

参考：

1. Animation of 2-3-4 tree, source code also available。这个网站提供一个演示2-3-4树的图形界面

2. Collection of BTree info

3. MIT <<Introduction to algorithms>>

附, 参考3上提供的方法，我加了一些注释：

Insert

B-TREE-SPLIT-CHILD(x, i, y)
1 z ← ALLOCATE-NODE()
2 leaf[z] ← leaf[y]
3 n[z] ← t - 1
4 for j ← 1 to t - 1
5 do key_j[z] ← key_j+t[y]
6 if not leaf [y]
7 then for j ← 1 to t
8 do c_j[z] ← c_j+t[y]
9 n[y] ← t - 1
10 for j ← n[x] + 1 downto i + 1
11 do c_j+1[x] ← c_j [x]
12 c_i+1[x] ← z
13 for j ← n[x] downto i
14 do key_j+1[x] ← key_j[x]
15 key_i[x] ← key_t[y]
16 n[x] ← n[x] + 1
17 DISK-WRITE(y)
18 DISK-WRITE(z)
19 DISK-WRITE(x)


B-TREE-INSERT(T, k)
1 r ← root[T]
2 if n[r] = 2t - 1
3 then s ← ALLOCATE-NODE()
4 root[T] ← s
5 leaf[s] ← FALSE
6 n[s] ← 0
7 c₁[s] ← r
8 B-TREE-SPLIT-CHILD(s, 1, r)
9 B-TREE-INSERT-NONFULL(s, k)
10 else B-TREE-INSERT-NONFULL(r, k)


B-TREE-INSERT-NONFULL(x, k)
1 i ← n[x]
2 if leaf[x]
3 then while i ≥ 1 and k < key_i[x]
4 do key_i+1[x] ← key_i[x]
5 i ← i - 1
6 key_i+1[x] ← k
7 n[x] ← n[x] + 1
8 DISK-WRITE(x)
9 else while i ≥ 1 and k < key_i[x]
10 do i ← i - 1
11 i ← i + 1
12 DISK-READ(c_i[x])
13 if n[c_i[x]] = 2t - 1
14 then B-TREE-SPLIT-CHILD(x, i, c_i[x])
15 if k> key_i[x]
16 then i ← i + 1
17 B-TREE-INSERT-NONFULL(c_i[x], k)

Deletion

There are two special cases to consider when deleting an element:

the element in an internal node may be a separator for its child nodes
deleting an element may put it under the minimum number of elements and children.

Algorithm

If the key k is in node x and x is a leaf, delete the key k from x.
If the key k is in node x and x is an internal node, do the following.
1. If the child y that precedes k in node x has at least t keys, then find the predecessor k′ of k in the subtree rooted at y. Recursively delete k′, and replace k by k′ in x. (Finding k′ and deleting it can be performed in a single downward pass.), that is, replace k with the largest key of the left subtree (??????If y is a leaf within t keys, after the deletion, y has t - 1 keys. Then, it's possible that an element is deleted from y next time, which result in y 's key size to be t - 2, ??? see rule 3)
2. Symmetrically, if the child z that follows k in node x has at least t keys, then find the successor k′of k in the subtree rooted at z. Recursively delete k′, and replace k by k′ in x. (Finding k′ and deleting it can be performed in a single downward pass.), that is, replace k with the smallest key of the right subtree
3. Otherwise, if both y and z have only t - 1 keys, merge k and all of z into y, so that x loses both kand the pointer to z, and y now contains 2t - 1 keys. Then, free z and recursively delete k from y. that is, merge the children, that is, merge the two children

-----borrow an element from the children, otherwise, merge, to minimize the operation on delete, that is, only the key is seemed to be replaced in the internal node(the special case 1)

If the key k is not present in internal node x, determine the root c_i[x] of the appropriate subtree that must contain k, if k is in the tree at all. If c_i[x] has only t - 1 keys, execute step 3a or 3b as necessary to guarantee that we descend to a node containing at least t keys. Then, finish by recursing on the appropriate child of x.(while traversing down)
1. If c_i[x] has only t - 1 keys but has an immediate sibling with at least t keys, give c_i[x] an extra key by moving a key from x down into c_i[x], moving a key from c_i[x]'s immediate left or right sibling up into x, and moving the appropriate child pointer from the sibling into c_i[x].
2. If c_i[x] and both of c_i[x]'s immediate siblings have t - 1 keys, merge c_i[x] with one sibling, which involves moving a key from x down into the new merged node to become the median key for that node.

------borrow an element from the sibling, otherwise, merge the sibling and the key between the sibling and ci[x]. that is, to ensure the lower bound of every node(the special case 2)