Python算法之哈夫曼编码

网摘文苑 2022-06-11 发布于新疆

展开全文

问题:哈夫曼编码,英文名称Huffman Coding，有时也翻译为霍夫曼编码，在1952年提出的，是最好的编码方式。哈夫曼编码在电子通讯方面有着重要的应用，同时也广泛应用于数据压缩，其压缩率通常在20%～90%之间赫夫曼码是可变字长编码(VLC)的一种。哈夫曼树是最优二叉树，带权路径长度最小的二叉树。

原理:

假设有几个数字40,10,20,16,14。

首先将这五个数字按照从小到大的顺序排列：10, 14，16，20, 40。

构建哈夫曼树：

1.首先选取10,14

2.重新排序：16,20,24,40

3.重新排序24,36,40,60

4.按照二叉树左0右1，构建哈夫曼树

所以最终得到数字10的编码为100，数字14的编码为101，数字16的编码为110，数字20的编码为111，数字40的编码为0。

代码：

from heapq import *inp = input('请输入要构建哈夫曼树的字符串:')# 统计每个字符出现的频率并生成字典def generate_dict(s):    dic = {}    for i in s:        if i not in dic:            dic[i] = 1        else:            dic[i] += 1    return dicdic = generate_dict(inp)# 节点类class Node:    def __init__(self, name=None, weight=None):        self.name = name        self.weight = weight        self.parent = None        self.left = None        self.right = None        self.id = None    # 自定义类的比较    def __lt__(self, other):        return int(self.weight) < int(other.weight)# 按权值排序def sort(list):    return sorted(list, key=lambda Node: Node.weight)def generate_node2(dic):    lis = []    for i in dic:        newNode = Node(i, dic[i])        heappush(lis, newNode)    return lislis = generate_node2(dic)# Huffman编码2使用堆的方式def HuffmanTree2(lis):    global id    while len(lis) != 1:        a = heappop(lis)        b = heappop(lis)        new = Node()        new.weight = a.weight + b.weight        new.left, new.right = a, b        a.parent = new        b.parent = new        heappush(lis, new)    return lislis = HuffmanTree2(lis)node = lis[0]# 定义前序遍历方法并执行一定的操作def pre_order(root, code):    if root is None:        code = code[:-1]        return    pre_order(root.left, code + '0')    if root.name is not None:        print(root.name, '的权重为', root.weight, '编码为', code)    pre_order(root.right, code + '1')code = ''print('构造的哈夫曼树为:')pre_order(node, code)

运行结果:

请输入要构建哈夫曼树的字符串:12908755343298构造的哈夫曼树为:1 的权重为 1 编码为 0007 的权重为 1 编码为 0015 的权重为 2 编码为 0109 的权重为 2 编码为 0112 的权重为 2 编码为 1000 的权重为 1 编码为 10104 的权重为 1 编码为 10118 的权重为 2 编码为 1103 的权重为 2 编码为 111