【原】将多个MSA连超级高铁网络，如何用最少的轨道连接所有MSA？

新用户0175WbuX 2022-02-11

展开全文

　　假设我们要把所有15个最大的MSA都连入超级高铁网络，目标是要最大限度地降低网络的铺设成本，于是就意味着所用的轨道数量要最少。于是问题就成了：如何用最少的轨道连接所有MSA？

　　要了解建造某条边所需的轨道数量，就需要知道这条边表示的距离。现在是再次引入权重概念的时候了。在超级高铁网络中，边的权重是两个所连MSA之间的距离。图4-5与图4-2几乎相同，差别只是每条边多了权重，表示边所连的两个顶点之间的距离（以英里为单位）。

　　将多个MSA连超级高铁网络，如何用最少的轨道连接所有MSA？

　　图4-5　美国15个最大的MSA的加权图，权重代表两个MSA之间的距离，单位英里（1英里≈1.6093 km）

　　为了处理权重，需要建立Edge的子类WeightedEdge和Graph的子类WeightedGraph。每个WeightedEdge都带有一个与其关联的表示其权重的float类型数据。下面马上就会介绍Jarník算法，它能够比较两条边并确定哪条边权重较低。采用数值型的权重就很容易进行比较了。具体代码如代码清单4-6所示。

　　代码清单4-6　weighted_edge.py

　　from __future__ import annotations

　　from dataclasses import dataclass

　　from edge import Edge

　　@dataclass

　　class WeightedEdge(Edge):

　　weight: float

　　def reversed(self) -> WeightedEdge:

　　return WeightedEdge(self.v, self.u, self.weight)

　　# so that we can order edges by weight to find the minimum weight edge

　　def __lt__(self, other: WeightedEdge) -> bool:

　　return self.weight<other.weight< p="">

　　def __str__(self) -> str:

　　return f"{self.u} {self.weight}> {self.v}"

　　WeightedEdge的实现代码与Edge的实现代码并没有太大的区别，只是添加了一个weight属性，并通过__lt__()实现了“<”操作符，这样两个WeightedEdge就可以相互比较了。“<”操作符只涉及权重（而不涉及继承而来的属性u和v），因为Jarník的算法只关注如何找到权重最小的边。

　　如代码清单4-7所示，WeightedGraph从Graph继承了大部分功能，此外，它还包含了__init__方法和添加WeightedEdge的便捷方法，并且实现了自己的__str__()方法。它还有一个新方法

　　neighbors_for_index_with_weights()，这一方法不仅会返回每一位邻居，还会返回到达这位邻居的边的权重。这一方法对其__str__()十分有用。

　　代码清单4-7　weighted_graph.py

　　from typing import TypeVar, Generic, List, Tuple

　　from graph import Graph

　　from weighted_edge import WeightedEdge

　　V=TypeVar('V') # type of the vertices in the graph

　　class WeightedGraph(Generic[V], Graph[V]):

　　def __init__(self, vertices: List[V]=[]) -> None:

　　self._vertices: List[V]=vertices

　　self._edges: List[List[WeightedEdge]]=[[] for _ in vertices]

　　def add_edge_by_indices(self, u: int, v: int, weight: float) -> None:

　　edge: WeightedEdge=WeightedEdge(u, v, weight)

　　self.add_edge(edge) # call superclass version

　　def add_edge_by_vertices(self, first: V, second: V, weight: float) -> None:

　　u: int=self._vertices.index(first)

　　v: int=self._vertices.index(second)

　　self.add_edge_by_indices(u, v, weight)

　　def neighbors_for_index_with_weights(self, index: int) -> List[Tuple[V, float]]:

　　distance_tuples: List[Tuple[V, float]]=[]

　　for edge in self.edges_for_index(index):

　　distance_tuples.append((self.vertex_at(edge.v), edge.weight))

　　return distance_tuples

　　def __str__(self) -> str:

　　desc: str=""

　　for i in range(self.vertex_count):

　　desc +=f"{self.vertex_at(i)} -> {self.neighbors_for_index_with_weights(i)}

　　return desc

　　现在可以实际定义加权图了。这里将会用到图4-5表示的加权图，名为city_graph2。具体代码如代码清单4-8所示。

　　代码清单4-8　weighted_graph.py（续）

　　if __name__=="__main__":

　　city_graph2: WeightedGraph[str]=WeightedGraph(["Seattle", "San Francisco",

　　"Los Angeles", "Riverside", "Phoenix", "Chicago", "Boston", "New York", "Atlanta",

　　"Miami", "Dallas", "Houston", "Detroit", "Philadelphia", "Washington"])

　　city_graph2.add_edge_by_vertices("Seattle", "Chicago", 1737)

　　city_graph2.add_edge_by_vertices("Seattle", "San Francisco", 678)

　　city_graph2.add_edge_by_vertices("San Francisco", "Riverside", 386)

　　city_graph2.add_edge_by_vertices("San Francisco", "Los Angeles", 348)

　　city_graph2.add_edge_by_vertices("Los Angeles", "Riverside", 50)

　　city_graph2.add_edge_by_vertices("Los Angeles", "Phoenix", 357)

　　city_graph2.add_edge_by_vertices("Riverside", "Phoenix", 307)

　　city_graph2.add_edge_by_vertices("Riverside", "Chicago", 1704)

　　city_graph2.add_edge_by_vertices("Phoenix", "Dallas", 887)

　　city_graph2.add_edge_by_vertices("Phoenix", "Houston", 1015)

　　city_graph2.add_edge_by_vertices("Dallas", "Chicago", 805)

　　city_graph2.add_edge_by_vertices("Dallas", "Atlanta", 721)

　　city_graph2.add_edge_by_vertices("Dallas", "Houston", 225)

　　city_graph2.add_edge_by_vertices("Houston", "Atlanta", 702)

　　city_graph2.add_edge_by_vertices("Houston", "Miami", 968)

　　city_graph2.add_edge_by_vertices("Atlanta", "Chicago", 588)

　　city_graph2.add_edge_by_vertices("Atlanta", "Washington", 543)

　　city_graph2.add_edge_by_vertices("Atlanta", "Miami", 604)

　　city_graph2.add_edge_by_vertices("Miami", "Washington", 923)

　　city_graph2.add_edge_by_vertices("Chicago", "Detroit", 238)

　　city_graph2.add_edge_by_vertices("Detroit", "Boston", 613)

　　city_graph2.add_edge_by_vertices("Detroit", "Washington", 396)

　　city_graph2.add_edge_by_vertices("Detroit", "New York", 482)

　　city_graph2.add_edge_by_vertices("Boston", "New York", 190)

　　city_graph2.add_edge_by_vertices("New York", "Philadelphia", 81)

　　city_graph2.add_edge_by_vertices("Philadelphia", "Washington", 123)

　　print(city_graph2)

　　因为WeightedGraph实现了__str__()，所以我们可以美观打印出city_graph2。在输出结果中会同时显示每个顶点连接的所有顶点及这些连接的古玩权重。

　　Seattle -> [('Chicago', 1737), ('San Francisco', 678)]

　　San Francisco -> [('Seattle', 678), ('Riverside', 386), ('Los Angeles', 348)]

　　Los Angeles -> [('San Francisco', 348), ('Riverside', 50), ('Phoenix', 357)]

　　Riverside -> [('San Francisco', 386), ('Los Angeles', 50), ('Phoenix', 307), ('Chicago',

　　1704)]

　　Phoenix -> [('Los Angeles', 357), ('Riverside', 307), ('Dallas', 887), ('Houston', 1015)]

　　Chicago -> [('Seattle', 1737), ('Riverside', 1704), ('Dallas', 805), ('Atlanta', 588),

　　('Detroit', 238)]

　　Boston -> [('Detroit', 613), ('New York', 190)]

　　New York -> [('Detroit', 482), ('Boston', 190), ('Philadelphia', 81)]

　　Atlanta -> [('Dallas', 721), ('Houston', 702), ('Chicago', 588), ('Washington', 543),

　　('Miami', 604)]

　　Miami -> [('Houston', 968), ('Atlanta', 604), ('Washington', 923)]

　　Dallas -> [('Phoenix', 887), ('Chicago', 805), ('Atlanta', 721), ('Houston', 225)]

　　Houston -> [('Phoenix', 1015), ('Dallas', 225), ('Atlanta', 702), ('Miami', 968)]

　　Detroit -> [('Chicago', 238), ('Boston', 613), ('Washington', 396), ('New York', 482)]

　　Philadelphia -> [('New York', 81), ('Washington', 123)]

　　Washington -> [('Atlanta', 543), ('Miami', 923), ('Detroit', 396), ('Philadelphia', 123)]

　　树是一种特殊的图，它在任意两个顶点之间只存在一条路径，这意味着树中没有环路（cycle），有时被称为无环（acyclic）。环路可以被视作循环。如果可以从一个起始顶点开始遍历图，不会重复经过任何边，并返回到起始顶点，则称存在一条环路。任何不是树的图都可以通过修剪边而成为树。图4-6演示了通过修剪边把图转换为树的过程。

　　将多个MSA连超级高铁网络，如何用最少的轨道连接所有MSA？

　　图4-6　在左图中，在顶点B、C和D之间存在一个环路，因此它不是树。在右图中，

　　连通C和D的边已被修剪掉了，因此它是一棵树

　　连通图（connected graph）是指从图的任一顶点都能以某种路径到达其他任何顶点的图。本章中的所有图都是连通图。生成树（spanning tree）是把图所有顶点都连接起来的树。最小生成树（minimum spanning tree）是以最小总权重把加权图的每个顶点都连接起来的树（相对于其他的生成树而言）。对于每张加权图，我们都能高效地找到其最小生成树。

　　这里出现了一大堆术语！“查找最小生成树”和“以权重最小的方式连接加权图中的所有顶点”的意思相同，这是关键点。对任何设计网络（交通网络、计算机网络等）的人来说，这都是一个重要而实际的问题：如何能以最低的成本连接网络中的所有节点呢？这里的成本可能是电线、轨道、道路或其他任何东西。以电话网络来说，这个问题的另一种提法就是：连通每个电话机所需要的最短电缆长度是多少？

　　优先队列在第2章中已经介绍过了。Jarník算法将需要用到优先队列。我们可以从第2章的程序包中导入PriorityQueue类，要获得详情请参阅紧挨着代码清单4-5之前的注意事项，也可以把该类复制为一个新文件并放入本章的程序包中。为完整起见，在代码清单4-9中，我们将重新创建第2章中的PriorityQueue，这里假定import语句会被放入单独的文件中。

　　代码清单4-9　priority_queue.py

　　from typing import TypeVar, Generic, List

　　from heapq import heappush, heappop

　　T=TypeVar('T')

　　class PriorityQueue(Generic[T]):

　　def __init__(self) -> None:

　　self._container: List[T]=[]

　　@property

　　def empty(self) -> bool:

　　return not self._container # not is true for empty container

　　def push(self, item: T) -> None:

　　heappush(self._container, item) # in by priority

　　def pop(self) -> T:

　　return heappop(self._container) # out by priority

　　def __repr__(self) -> str:

　　return repr(self._container)

　　在开发查找最小生成树的方法之前，我们需要开发一个用于检测某个解的总权重的函数。最小生成树问题的解将由组成树的加权边列表构成。首先，我们会将WeightedPath定义为WeightedEdge的列表，然后会定义一个total_weight()函数，该函数以WeightedPath的列表为参数并把所有边的权重相加，以便得到总权重。具体代码如代码清单4-10所示。

　　代码清单4-10　mst.py

　　from typing import TypeVar, List, Optional

　　from weighted_graph import WeightedGraph

　　from weighted_edge import WeightedEdge

　　from priority_queue import PriorityQueue

　　V=TypeVar('V') # type of the vertices in the graph

　　WeightedPath=List[WeightedEdge] # type alias for paths

　　def total_weight(wp: WeightedPath) -> float:

　　return sum([e.weight for e in wp])

　　查找最小生成树的Jarník算法把图分为两部分：正在生成的最小生成树的顶点和尚未加入最小生成树的顶点。其工作步骤如下所示。

　　（1）选择要被包含于最小生成树中的任一顶点。

　　（2）找到连通最小生成树与尚未加入树的顶点的权重最小的边。

　　（3）将权重最小边末端的顶点添加到最小生成树中。

　　（4）重复第2步和第3步，直到图中的每个顶点都加入了最小生成树。

　　注意　Jarník算法常被称为Prim算法。在20世纪20年代末，两位捷克数学家OtakarBor?vka和VojtěchJarník致力于尽量降低铺设电线的成本，提出了解决最小生成树问题的算法。他们提出的算法在几十年后又被其他人“重新发现”[3]。

　　为了高效地运行Jarník算法，需要用到优先队列。每次将新的顶点加入最小生成树时，所有连接到树外顶点的出边都会被加入优先队列中。从优先队列中弹出的一定是权重最小的边，算法将持续运行直至优先队列为空为止。这样就确保了权重最小的边一定会优先加入树中。如果被弹出的边与树中的已有顶点相连，则它将被忽略。

　　代码清单4-11中的mst()完整实现了Jarník算法[4]，它还带了一个用来打印WeightedPath的实用函数。

　　警告　Jarník算法在有向图中不一定能正常工作，它也不适用于非连通图。

　　代码清单4-11　mst.py（续）

　　def mst(wg: WeightedGraph[V], start: int=0) -> Optional[WeightedPath]:

　　if start > (wg.vertex_count - 1) or start < 0:

　　return None

　　result: WeightedPath=[] # holds the final MST

　　pq: PriorityQueue[WeightedEdge]=PriorityQueue()

　　visited: [bool]=[False] * wg.vertex_count # where we've been

　　def visit(index: int):

　　visited[index]=True # mark as visited

　　for edge in wg.edges_for_index(index):

　　# add all edges coming from here to pq

　　if not visited[edge.v]:

　　pq.push(edge)

　　visit(start) # the first vertex is where everything begins

　　while not pq.empty: # keep going while there are edges to process

　　edge=pq.pop()

　　if visited[edge.v]:

　　continue # don't ever revisit

　　# this is the current smallest, so add it to solution

　　result.append(edge)

　　visit(edge.v) # visit where this connects

　　return result

　　def print_weighted_path(wg: WeightedGraph, wp: WeightedPath) -> None:

　　for edge in wp:

　　print(f"{wg.vertex_at(edge.u)} {edge.weight}> {wg.vertex_at(edge.v)}")

　　print(f"Total Weight: {total_weight(wp)}")

　　下面逐行过一遍mst()。

　　def mst(wg: WeightedGraph[V], start: int=0) -> Optional[WeightedPath]:

　　if start > (wg.vertex_count - 1) or start < 0:

　　return None

　　本算法将返回某一个代表最小生成树的WeightedPath对象。运算本算法的起始位置无关紧要（假定图是连通和无向的），因此默认设为索引为0的顶点。如果start无效，则mst()返回None。

　　result: WeightedPath = [] # holds the final MST

　　pq: PriorityQueue[WeightedEdge] = PriorityQueue()

　　visited: [bool] = [False] * wg.vertex_count # where we've been

　　result将是最终存放加权路径的地方，也即包含了最小生成树。随着权重最小的边不断被弹出以及图中新的区域不断被遍历，WeightedEdge会不断被添加到result中。因为Jarník算法总是选择权重最小的边，所以被视为贪婪算法（greedy algorithm）之一。pq用于存储新发现的边并弹出次低权重的边。visited用于记录已经到过的顶点索引，这用Set也可以实现，类似于bfs()中的explored。

　　def visit(index: int):

　　visited[index]=True # mark as visited

　　for edge in wg.edges_for_index(index):

　　# add all edges coming from here

　　if not visited[edge.v]:

　　pq.push(edge)

　　visit()是一个便于内部使用的函数，用于把顶点标记为已访问，并把尚未访问过的顶点所连的边都加入pq中。不妨注意一下，使用邻接表模型能够轻松地找到属于某个顶点的边。

　　visit(start) # the first vertex is where everything begins

　　除非图是非连通的，否则先访问哪个顶点是无所谓的。如果图是非连通的，是由多个不相连的部分组成的，那么mst()返回的树只会涵盖图的某一部分，也就是起始节点所属的那部分图。

　　while not pq.empty: # keep going while there are edges to process

　　edge=pq.pop()

　　if visited[edge.v]:

　　continue # don't ever revisit

　　# this is the current smallest, so add it

　　result.append(edge)

　　visit(edge.v) # visit where this connects

　　return result

　　只要优先队列中还有边存在，我们就将它们弹出并检查它们是否会引出尚未加入树的顶点。因为优先队列是以升序排列的，所以会先弹出权重最小的边。这就确保了结果确实具有最小总权重。如果弹出的边不会引出未探索过的顶点，那么就会被忽略，否则，因为该条边是目前为止权重最小的边，所以会被添加到结果集中，并且对其引出的新顶点进行探索。如果已没有边可供探索了，则返回结果。

　　最后再回到用轨道最少的超级高铁网络连接美国15个最大的MSA的问题吧。结果路径就是city_graph2的最小生成树。下面尝试对city_graph2运行一下mst()，具体代码如代码清单4-12所示。

　　代码清单4-12　mst.py（续）

　　if __name__=="__main__":

　　city_graph2: WeightedGraph[str]=WeightedGraph(["Seattle", "San Francisco", "Los

　　Angeles", "Riverside", "Phoenix", "Chicago", "Boston", "New York", "Atlanta",