《Shuttler.Net东写西读之一》BCL里Dictionary的缘分算法

wyxhd2008 2013-03-28

展开全文

《Shuttler.Net东写西读之一》BCL里Dictionary的缘分算法

“风未动，旗未动，是人的心在动” ----佛

随后我会分几个章节分别介绍一下开源框架Shuttler.Net里的技术和架构，取名曰《Shuttler.Net东写西读》系列。此片为《一：引篇》:金日宜祈福，忌求医.

由于之前打算给Shuttler.Net里的CompactHeap寻找一个好的Key:Value（地址:值）算法，故翻开Dictionary的源码看看。
发现Hash还是那个Hash，若Hash值冲突，并非真正二度哈希，而是采用一种特殊的算法，我取个niubility的名字：缘分算法。

如果两个不同的Key，如果Hash出来的值一样，那就说明他俩太有缘分了！还是从一个例子说起吧。
有1000个人（唯一的标示就是他们的身份证号码，且均为数字，简称ID），还有1000间房（编号为0到999），每个房间有且只能分给一人。现在让你给物业设计一个分房算法：使看到任何一个身份证号码能最快找到这个人的房间!

那我们就来分析一下：
可以用每个ID模1000，得出一个code，其中code为几就把第几个房子分给该人。
如果两个人（或更多个）的ID%1000重复该如何办呢？那说明他们太有缘分了！那如何分配房间并最快能找到呢？

分配算法:

假设A(ID为xxx1666)的ID%1000=666，A被分到666房间；B(ID为xxx2666)的ID%1000也为666，发现666房间被人占了，那就把B分到空房间667（不一定跟A房间号挨着），并告诉A，你右邻居（next）是B,房间号667,他跟你有缘分：ID%1000相同，请牢记。

我们来根据身份证号找B房间过程如下：
现在物业看到B的身份证:xxx2666,

1,首先算出ID%1000=666
2,去666房间，一看是ID为xxx1666在住。非B，非B也
3,翻看A的next是:667
4,去667房间，这样就找到B。
以上就是BCL里Dictionary的存储算法。

它使用一个Entry

代码

   public struct Entry<TKey, TValue>
   {
      public int hashCode;    // 31 bits of hash code, -1 if unused
      public int next;        // Index of next entry, -1 if last
      public TKey key;        // Key of entry
      public TValue value;    // Value of entry
   }

表示每个住户，其中hashcode表示住户的ID，next表示自己的右邻居。

按照这个思想我写了个迷你版Dictionary如下：

代码

  public class CompactDict<TKey, TValue>
   {
      private int _count;
      private int[] _buckets;
      private int _capacity;
      private Entry<TKey, TValue>[] _entries;
      private IEqualityComparer<TKey> _comparer;

      public CompactDict(int capacity)
      {
         _capacity = capacity;
         _buckets=new int[capacity];
         for (int i = 0; i < capacity; i++)
            _buckets[i] = -1;

         _comparer = EqualityComparer<TKey>.Default;
         _entries=new Entry<TKey,TValue>[capacity];
      }

      public void Add(TKey key, TValue value)
      {
         if (_count == _capacity)
            throw new OverflowException("overflow!");

         int num = _comparer.GetHashCode(key) & 0x7fffffff;
         int index = num % _capacity;

         for (int i = _buckets[index]; i >= 0; i = _entries[i].next)
         {

            if (_entries[i].hashCode.Equals(num) && _comparer.Equals(_entries[i].key, key))
               throw new InvalidOperationException("AddingDuplicate!");
         }

         _entries[_count].hashCode = num;
         //构造关系
         _entries[_count].next = _buckets[index];
         _entries[_count].key = key;
         _entries[_count].value = value;
         _buckets[index] = _count;

         _count++;
      }

      public bool TryGetValue(TKey key,out TValue value)
      {
         if (key == null)
            throw new NullReferenceException("key is *NULL*");

         int index = -1;
         int num = _comparer.GetHashCode(key) & 0x7fffffff;
         int bucketIndex = num % _capacity;

         //根据仿单向链表查找value
         for (int i = _buckets[bucketIndex]; i >= 0; i = _entries[i].next)
         {
            if ((_entries[i].hashCode == num) && _comparer.Equals(_entries[i].key, key))
               index = i;
         }

         if (index >= 0)
         {
            value = _entries[index].value;
            return true; ;
         }

         value = default(TValue);
         return false;
      }

   }

   public struct Entry<TKey, TValue>
   {
      public int hashCode;    // 31 bits of hash code, -1 if unused
      public int next;        // Index of next entry, -1 if last
      public TKey key;        // Key of entry
      public TValue value;    // Value of entry
   }

找个hashcode相同的俩串测试下:

代码

         CompactDict<string, string> _dict = new CompactDict<string, string>(2);
         _dict.Add("0.89265452879139", "0.89265452879139");
         _dict.Add("0.280527401380486", "0.280527401380486");

         string value = string.Empty;
         _dict.TryGetValue("0.89265452879139", out value);

很酷！

通过以上分析：Dictionary的查找貌似在最坏情况下也是O(n)的，，但微软如何避免此情况的呢？那就是在Dictionary一些初始操作时使用了质数HashHelpers.GetPrime(capacity)，这个函数的作用就是生成一个大于且接近于你capacity的质数，这样就在一定程度上避免了“最坏情况”的发生！

Tag标签: Dictionary,Shuttler.Net