分享

Linux内核Bridge代码的STP的实现

 A_Geek 2013-07-05
网络拓扑中loop的产生有两个危害,一是交换机的forwarding table中会出现同一个mac地址在多个端口出现,造成无法传输。二是会出现broadcast storm,因为广播包会在一个无限循环中永久forward,最后耗尽CPU和带宽。
为了防止整个网络拓扑中出现loop,在bridge当中使用了spanning tree protocol,在以太网的交换机中建立一棵spanning tree,然后把不在这棵树中的连接全部禁掉,使两个网络节点之间只有一条通路。有了STP,网络拓扑中就允许冗余的连接存在,当其中一条active link被断开的时候,冗余连接作为backup可以立即替代原来的连接,保持网络的畅通。
STP遵循以下的过程:

1.选择一个root bridge,一般选择bridge ID最小的为root bridge

2.决定到root bridge的最短路径,其中首先决定网络中的bridge到root bridge的最短路径,此路径连接的port是这个bridge的root port,其次network segment的最短路径连接的bridge的port称之为designated port

3.禁掉所有其他的到root的路径

4.如果出现tie,则修改路径,tie指的是有多个bridge的路径cost相同,此时默认选择bridge ID小的bridge。如果还是相同,也就是说同一个bridge上有多个link,此时则根据port number来进行选择,选择小的port。

STP使用了分布式的算法,在网络中传输bpdu的包来进行通信,有三种类型的bpdu

1.Configuration BPDU (CBPDU), used for Spanning Tree computation

2.Topology Change Notification (TCN) BPDU, used to announce changes in the network topology

3.Topology Change Notification Acknowledgment (TCA)

Linux中只定义了前两种,最后一种是通过TCA flag置1发送config bpdu来实现的,我们来看一下其中的config bpdu在linux中的实现

 1 struct br_config_bpdu
2 {
3 unsigned topology_change:1;
4 unsigned topology_change_ack:1;
5 bridge_id root;
6 int root_path_cost;
7 bridge_id bridge_id;
8 port_id port_id;
9 int message_age;
10 int max_age;
11 int hello_time;
12 int forward_delay;
13 };
整个网络中只有root bridge发送bpdu的包,它有一个Hello Timer,当这个定时器expire的时候产生一个bpdu的包。而其他的non-root bridge在他们的root port中收到这些包之后进行中继,中继的过程中对bpdu中的各项值根据自身的情况进行更新。我们看一下Hello Timer expire发生时候的情况
 1 static void br_hello_timer_expired(unsigned long arg)
2 {
3 struct net_bridge *br = (struct net_bridge *)arg;
4
5 pr_debug("%s: hello timer expired\n", br->dev->name);
6 spin_lock(&br->lock);
7 if (br->dev->flags & IFF_UP) {
8 br_config_bpdu_generation(br);
9
10 mod_timer(&br->hello_timer, round_jiffies(jiffies + br->hello_time));
11 }
12 spin_unlock(&br->lock);
13 }
             我们先看一下发送bpdu的代码 
 1 static void br_send_bpdu(struct net_bridge_port *p,
2 const unsigned char *data, int length)
3 {
4 struct sk_buff *skb;
5
6 skb = dev_alloc_skb(length+LLC_RESERVE);
7 if (!skb)
8 return;
9
10 skb->dev = p->dev;
11 skb->protocol = htons(ETH_P_802_2);
12
13 skb_reserve(skb, LLC_RESERVE);
14 memcpy(__skb_put(skb, length), data, length);
15
16 llc_pdu_header_init(skb, LLC_PDU_TYPE_U, LLC_SAP_BSPAN,
17 LLC_SAP_BSPAN, LLC_PDU_CMD);
18 llc_pdu_init_as_ui_cmd(skb);
19
20 llc_mac_hdr_init(skb, p->dev->dev_addr, p->br->group_addr);
21
22 NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
23 dev_queue_xmit);
24 }
25
其中data这个变量会根据不同类型的bpdu进行不同的填充,然后放到一个skb中进行发送。比如说下面就是tcn bpdu的填充情况。
 1 void br_send_tcn_bpdu(struct net_bridge_port *p)
2 {
3 unsigned char buf[4];
4
5 if (p->br->stp_enabled != BR_KERNEL_STP)
6 return;
7
8 buf[0] = 0;
9 buf[1] = 0;
10 buf[2] = 0;
11 buf[3] = BPDU_TYPE_TCN;
12 br_send_bpdu(p, buf, 4);
13 }
          我们再来看一下接收到bpdu的情况 
 1 int br_stp_rcv(struct sk_buff *skb, struct net_device *dev,
2 struct packet_type *pt, struct net_device *orig_dev)
3 {
4 const struct llc_pdu_un *pdu = llc_pdu_un_hdr(skb);
5 const unsigned char *dest = eth_hdr(skb)->h_dest;
6 struct net_bridge_port *p = rcu_dereference(dev->br_port);
7 struct net_bridge *br;
8 const unsigned char *buf;
9
10 if (dev->nd_net != &init_net)
11 goto err;
12
13 if (!p)
14 goto err;
15
16 if (pdu->ssap != LLC_SAP_BSPAN
17 || pdu->dsap != LLC_SAP_BSPAN
18 || pdu->ctrl_1 != LLC_PDU_TYPE_U)
19 goto err;
20
21 if (!pskb_may_pull(skb, 4))
22 goto err;
23
24 /* compare of protocol id and version */
25 buf = skb->data;
26 if (buf[0] != 0 || buf[1] != 0 || buf[2] != 0)
27 goto err;
28
29 br = p->br;
30 spin_lock(&br->lock);
31
32 if (br->stp_enabled != BR_KERNEL_STP)
33 goto out;
34
35 if (!(br->dev->flags & IFF_UP))
36 goto out;
37
38 if (p->state == BR_STATE_DISABLED)
39 goto out;
40
41 if (compare_ether_addr(dest, br->group_addr) != 0)
42 goto out;
43
44 buf = skb_pull(skb, 3);
45
46 if (buf[0] == BPDU_TYPE_CONFIG) {
47 struct br_config_bpdu bpdu;
48
49 if (!pskb_may_pull(skb, 32))
50 goto out;
51
52 buf = skb->data;
53 bpdu.topology_change = (buf[1] & 0x01) ? 1 : 0;
54 bpdu.topology_change_ack = (buf[1] & 0x80) ? 1 : 0;
55
56 bpdu.root.prio[0] = buf[2];
57 bpdu.root.prio[1] = buf[3];
58 bpdu.root.addr[0] = buf[4];
59 bpdu.root.addr[1] = buf[5];
60 bpdu.root.addr[2] = buf[6];
61 bpdu.root.addr[3] = buf[7];
62 bpdu.root.addr[4] = buf[8];
63 bpdu.root.addr[5] = buf[9];
64 bpdu.root_path_cost =
65 (buf[10] << 24) |
66 (buf[11] << 16) |
67 (buf[12] << 8) |
68 buf[13];
69 bpdu.bridge_id.prio[0] = buf[14];
70 bpdu.bridge_id.prio[1] = buf[15];
71 bpdu.bridge_id.addr[0] = buf[16];
72 bpdu.bridge_id.addr[1] = buf[17];
73 bpdu.bridge_id.addr[2] = buf[18];
74 bpdu.bridge_id.addr[3] = buf[19];
75 bpdu.bridge_id.addr[4] = buf[20];
76 bpdu.bridge_id.addr[5] = buf[21];
77 bpdu.port_id = (buf[22] << 8) | buf[23];
78
79 bpdu.message_age = br_get_ticks(buf+24);
80 bpdu.max_age = br_get_ticks(buf+26);
81 bpdu.hello_time = br_get_ticks(buf+28);
82 bpdu.forward_delay = br_get_ticks(buf+30);
83
84 br_received_config_bpdu(p, &bpdu);
85 }
86
87 else if (buf[0] == BPDU_TYPE_TCN) {
88 br_received_tcn_bpdu(p);
89 }
90 out:
91 spin_unlock(&br->lock);
92 err:
93 kfree_skb(skb);
94 return 0;
95 }

他会根据类型字段的不同来区分是config bpdu还是tcn bpdu,然后调用不同的处理函数。tcn bpdu是由non-root bridge向上发给上一级bridge的,用来指明说拓扑有变化,上一级bridge收到之后返回一个TCA置1的config bpdu,然后继续向上发给更上一层,直到到达root bridge,然后root bridge会发送TC 置1的config bpdu给spanning tree中的节点,通知他们网络拓扑发生了改变,节点接收到这个有TC flag的bpdu之后,更改他们的ageing timer的时间,从原来的5min变到15sec,这个定时器的作用是当进行forwarding database的cleanup的时候,删除那些过期的entry。我们来看一下代码实现

 1 void br_received_tcn_bpdu(struct net_bridge_port *p)
2 {
3 if (br_is_designated_port(p)) {
4 pr_info("%s: received tcn bpdu on port %i(%s)\n",
5 p->br->dev->name, p->port_no, p->dev->name);
6
7 br_topology_change_detection(p->br);
8 br_topology_change_acknowledge(p);
9 }
10 }
	收到tcn bpdu之后调用br_topology_change_detection 
 1 void br_topology_change_detection(struct net_bridge *br)
2 {
3 int isroot = br_is_root_bridge(br);
4
5 pr_info("%s: topology change detected, %s\n", br->dev->name,
6 isroot ? "propagating" : "sending tcn bpdu");
7
8 if (isroot) {
9 br->topology_change = 1;
10 mod_timer(&br->topology_change_timer, jiffies
11 + br->bridge_forward_delay + br->bridge_max_age);
12 } else if (!br->topology_change_detected) {
13 br_transmit_tcn(br);
14 mod_timer(&br->tcn_timer, jiffies + br->bridge_hello_time);
15 }
16
17 br->topology_change_detected = 1;
18 }

这个函数分是root和非root两种情况进行讨论,是root的,把TC置1,修改topology_change_timer,准备发送TC置1的config bpdu,如果不是root,则继续发送tcn bpdu。我们来看一下forwarding database中的定期清理函数

 1 void br_fdb_cleanup(unsigned long _data)
2 {
3 struct net_bridge *br = (struct net_bridge *)_data;
4 unsigned long delay = hold_time(br);
5 unsigned long next_timer = jiffies + br->forward_delay;
6 int i;
7
8 spin_lock_bh(&br->hash_lock);
9 for (i = 0; i < BR_HASH_SIZE; i++) {
10 struct net_bridge_fdb_entry *f;
11 struct hlist_node *h, *n;
12
13 hlist_for_each_entry_safe(f, h, n, &br->hash[i], hlist) {
14 unsigned long this_timer;
15 if (f->is_static)
16 continue;
17 this_timer = f->ageing_timer + delay;
18 if (time_before_eq(this_timer, jiffies))
19 fdb_delete(f);
20 else if (this_timer < next_timer)
21 next_timer = this_timer;
22 }
23 }
24 spin_unlock_bh(&br->hash_lock);
25
26 /* Add HZ/4 to ensure we round the jiffies upwards to be after the next
27 * timer, otherwise we might round down and will have no-op run. */
28 mod_timer(&br->gc_timer, round_jiffies(next_timer + HZ/4));
29 }
这个函数调用的时间是gc_timer expired的时候,每次调用之后,都会遍历整个hash链表来检测说其中的哪些entry是过期的,过期的时间由hold_time这个函数来决定,我们来看一下这个函数
1 /* if topology_changing then use forward_delay (default 15 sec)
2 * otherwise keep longer (default 5 minutes)
3 */
4 static inline unsigned long hold_time(const struct net_bridge *br)
5 {
6 return br->topology_change ? br->forward_delay : br->ageing_time;
7 }
当TC置1的时候,函数返回的值是forward_delay,也就是15sec,如果是0的话,返回的就是ageing_time,默认是5min,也就是说当网络中有拓扑改变的时候,forwarding dababase中的entry过期时间变短,可以马上反映新的变化。

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章