STP(Spanning Tree Protocol)是一个二层管理生成树协议。在一个扩展的局域网中参与STP的所有交换机之间通过交换桥协议数据单元bpdu(bridge protocol data unit)来实现;为稳定的生成树拓扑结构选择一个根桥;为每个交换网段选择一台指定交换机;将冗余路径上的交换机置为blocking,来消除网络中的环路.其标准在IEEE 802.1d定义,提供网络的动态冗余切换机制.
rSTP(rapid spanning tree protocol)是STP的扩展,其主要特点是增加了端口状态快速切换的机制,能够实现网络拓扑的快速转换. STP通过阻塞一个或多个冗余端口,维护一个无回路的网络(IEEE802.1d) 2 代码分析 1) 软中断网桥接收数据 int netif_receive_skb(struct sk_buff *skb) { ... skb_reset_network_header(skb); skb_reset_transport_header(skb); skb->mac_len = skb->network_header - skb->mac_header; ... /* 网桥处理 */ skb = handle_bridge(skb, &pt_prev, &ret, orig_dev); if (!skb) goto out; ... type = skb->protocol; list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) { if (ptype->type == type && (ptype->dev == null_or_orig || ptype->dev == skb->dev || ptype->dev == orig_dev)) { if (pt_prev) ret = deliver_skb(skb, pt_prev, orig_dev); pt_prev = ptype; } } if (pt_prev) { ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); } else { kfree_skb(skb); ret = NET_RX_DROP; } out: rcu_read_unlock(); return ret; } static inline struct sk_buff *handle_bridge(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, struct net_device *orig_dev) { struct net_bridge_port *port; /* 非网桥数据, 交给后面的网络层来处理 */ if (skb->pkt_type == PACKET_LOOPBACK || (port = rcu_dereference(skb->dev->br_port)) == NULL) return skb; if (*pt_prev) { *ret = deliver_skb(skb, *pt_prev, orig_dev); *pt_prev = NULL; } /* 网桥处理 */ return br_handle_frame_hook(port, skb); } static int __init br_init(void) { ... /* 网桥HOOK点注册 */ err = br_netfilter_init(); if (err) goto err_out1; ... /* 网桥处理接口 */ br_handle_frame_hook = br_handle_frame; ... } int __init br_netfilter_init(void) { int ret; ret = nf_register_hooks(br_nf_ops, ARRAY_SIZE(br_nf_ops)); ... return 0; } static struct nf_hook_ops br_nf_ops[] __read_mostly = { { .hook = br_nf_pre_routing, .owner = THIS_MODULE, .pf = PF_BRIDGE, .hooknum = NF_BR_PRE_ROUTING, .priority = NF_BR_PRI_BRNF, }, { .hook = br_nf_local_in, .owner = THIS_MODULE, .pf = PF_BRIDGE, .hooknum = NF_BR_LOCAL_IN, .priority = NF_BR_PRI_BRNF, }, { .hook = br_nf_forward_ip, .owner = THIS_MODULE, .pf = PF_BRIDGE, .hooknum = NF_BR_FORWARD, .priority = NF_BR_PRI_BRNF - 1, }, { .hook = br_nf_forward_arp, .owner = THIS_MODULE, .pf = PF_BRIDGE, .hooknum = NF_BR_FORWARD, .priority = NF_BR_PRI_BRNF, }, { .hook = br_nf_local_out, .owner = THIS_MODULE, .pf = PF_BRIDGE, .hooknum = NF_BR_LOCAL_OUT, .priority = NF_BR_PRI_FIRST, }, { .hook = br_nf_post_routing, .owner = THIS_MODULE, .pf = PF_BRIDGE, .hooknum = NF_BR_POST_ROUTING, .priority = NF_BR_PRI_LAST, }, { .hook = ip_sabotage_in, .owner = THIS_MODULE, .pf = PF_INET, .hooknum = NF_INET_PRE_ROUTING, .priority = NF_IP_PRI_FIRST, }, { .hook = ip_sabotage_in, .owner = THIS_MODULE, .pf = PF_INET6, .hooknum = NF_INET_PRE_ROUTING, .priority = NF_IP6_PRI_FIRST, }, }; 2) 网桥处理流程 网桥有五种状态: #define BR_STATE_DISABLED 0 禁用状态,不参与生成树,不转发任何数据帧 #define BR_STATE_LISTENING 1 监听状态,能够决定根,可以选择根端口、指定端口和非指定端口O在监昕状态的过程中,端口不能学 习任何接收帧的单播地址 #define BR_STATE_LEARNING 2 学习状态,端口能学习流入帧的MAC地址,不能转发帧 #define BR_STATE_FORWARDING 3 转发状态,接口能够转发帧。端口学习到接收帧的源 MAC地址,并可根据目标MAC地址进行恰当地转发 #define BR_STATE_BLOCKING 4 阻塞状态, 不参与帧转发,监听流人的BPDU,不能学习接收帧的任何MAC地址 2.2.1)STP关键点 运行生成树算法(STA)的交换机定期发送BPDU;选取唯一一个根网桥;在每个非根网桥选取唯一一个根端口;在每网段选取唯一一个标志端口。 (1). 选取唯一一个根网桥:BPDU中包含Bridge ID;Bridge ID(8B)=优先级(2B)+交换机MAC地址(6B);一些交换机的优先级默认为32768,可以修改;优先级值最小的成为根网桥;优先级值最小的成为根网桥;优先级值相同,MAC地址最小的成为根网桥;Bridge ID值最小的成为根网桥;根网桥缺省每2秒发送一次BPDU; (2). 在每个非根网桥选取唯一一个根端口:根网桥上没有根端口;端口代价最小的成为根端口;端口代价相同,Port ID最小端口的成为端口;Port ID通常为端口的MAC地址;MAC地址最小的端口成为根端口; (3). 在每网段选取唯一一个标志端口:端口代价最小的成为标识端口;根网桥端口到各网段的代价最小;通常只有根网桥端口成为标识端口;被选定为根端口和标识端口的进行转发状态;落选端口进入阻塞状态,只侦听BPDU; (4). 阻塞端口在指定的时间间隔(缺省20秒)收不到BPDU时,会重新运行生成树算法进行选举;缺点:在运行生成树算法的过程中,网络处理阻断状态,所有端口都不进行转发。计算过程缺省为50秒 2.3.2) STP工作过程 当网桥加电的时,网桥将认为它就是根网桥,并且将过渡到监听状态.一般情况下,当网桥认识到网络拓扑发生变更的时,将出现两种过渡状态:在拓扑变更的过程中,端口需要根据转发延迟计时器的数值而临时性地实施监听和学习状态. 当端口处于监听状态的时,它将利用发送和接收BPDU来确定活跃( active)的拓扑;当网络拓扑处于过渡期的时候,将不传递任何用户数据; 在监听状态的过程中,网桥将处理它所接收的BPDU;对于作为指定端口或根端口的端口,它们将在15秒(转发延迟的默认值)之启过渡到学习状态;对于不是指定端口或根端口的端口,它们将过渡返回到阻塞状态。 当端口处于学习状态的时,将利用从端口所学到的MAC地址来组建自己的MAC地址表;不能转发用户数据帧;在这个时刻,网桥不能传递任何用户数据。 当端口处于数据转发的时,学习状态能够降低所需扩散的数据帧的数量;如果某个端口在学习状态结束的时候仍然是指定端口或根端口,那么该端口就将过渡到转发状态;对于不是指定端口 或根端口的端口,它们将过渡返回到阻塞状态;在转发状态中,端口能够发送和接收用户数据;端口从阻塞状态过渡到转发状态的正常时间是30~50秒。 注释:如果端口所连接的对象是主机,那么因为在这些链珞上的转发不会造成STP环路,所以这些端口也就不需要参与STP监听和学习的过程。 br_handle_frame是网桥数据入口接口: struct sk_buff *br_handle_frame(struct net_bridge_port *p, struct sk_buff *skb) { const unsigned char *dest = eth_hdr(skb)->h_dest; ... /* 网桥组播地址:01-80-c2-00-00-00 */ if (unlikely(is_link_local(dest))) { ... /* 调用br_nf_local_in和br_handle_local_finish。 * br_handle_local_finish中更新CAM */ if (NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, skb->dev, NULL, br_handle_local_finish)) return NULL; /* frame consumed by filter */ else return skb; /* continue processing */ } switch (p->state) { case BR_STATE_FORWARDING: rhook = rcu_dereference(br_should_route_hook); if (rhook != NULL) { if (rhook(skb)) return skb; dest = eth_hdr(skb)->h_dest; } /* fall through */ case BR_STATE_LEARNING: if (!compare_ether_addr(p->br->dev->dev_addr, dest)) skb->pkt_type = PACKET_HOST; /* 执行br_nf_pre_routing和br_handle_frame_finish */ NF_HOOK(PF_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL, br_handle_frame_finish); break; default: drop: kfree_skb(skb); } return NULL; } static unsigned int br_nf_pre_routing(unsigned int hook, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { struct iphdr *iph; ... if (!pskb_may_pull(skb, sizeof(struct iphdr))) goto inhdr_error; /* IP头检测 */ iph = ip_hdr(skb); nf_bridge_put(skb->nf_bridge); if (!nf_bridge_alloc(skb)) return NF_DROP; if (!setup_pre_routing(skb)) return NF_DROP; store_orig_dstaddr(skb); /* 执行NF_INET_PRE_ROUTING和br_nf_pre_routing_finish */ NF_HOOK(PF_INET, NF_INET_PRE_ROUTING, skb, skb->dev, NULL, br_nf_pre_routing_finish); /* 后面的br_handle_frame_finish 不再处理该数据包 */ return NF_STOLEN; inhdr_error: // IP_INC_STATS_BH(IpInHdrErrors); out: return NF_DROP; } int br_handle_frame_finish(struct sk_buff *skb) { const unsigned char *dest = eth_hdr(skb)->h_dest; struct net_bridge_port *p = rcu_dereference(skb->dev->br_port); if (!p || p->state == BR_STATE_DISABLED) goto drop; /* 更新CAM */ br = p->br; br_fdb_update(br, p, eth_hdr(skb)->h_source); if (p->state == BR_STATE_LEARNING) goto drop; /* The packet skb2 goes to the local host (NULL to skip). */ skb2 = NULL; /* 混杂模式 */ if (br->dev->flags & IFF_PROMISC) skb2 = skb; dst = NULL; if (is_multicast_ether_addr(dest)) { br->dev->stats.multicast++; skb2 = skb; } else if ((dst = __br_fdb_get(br, dest)) && dst->is_local) { skb2 = skb; /* Do not forward the packet since it's local. */ skb = NULL; } if (skb2 == skb) skb2 = skb_clone(skb, GFP_ATOMIC); /* 通过netif_receive_skb重新入栈 */ if (skb2) br_pass_frame_up(br, skb2); if (skb) { if (dst)/* CAM找到转发端口 ,调用__br_forward转发*/ br_forward(dst->dst, skb); else /* 广播数据包 */ br_flood_forward(br, skb); } out: return 0; drop: kfree_skb(skb); goto out; } /* This requires some explaining. If DNAT has taken place, * we will need to fix up the destination Ethernet address, * and this is a tricky process. * * There are two cases to consider: * 1. The packet was DNAT'ed to a device in the same bridge * port group as it was received on. We can still bridge * the packet. * 2. The packet was DNAT'ed to a different device, either * a non-bridged device or another bridge port group. * The packet will need to be routed. * * The correct way of distinguishing between these two cases is to * call ip_route_input() and to look at skb->dst->dev, which is * changed to the destination device if ip_route_input() succeeds. * * Let us first consider the case that ip_route_input() succeeds: * * If skb->dst->dev equals the logical bridge device the packet * came in on, we can consider this bridging. The packet is passed * through the neighbour output function to build a new destination * MAC address, which will make the packet enter br_nf_local_out() * not much later. In that function it is assured that the iptables * FORWARD chain is traversed for the packet. * * Otherwise, the packet is considered to be routed and we just * change the destination MAC address so that the packet will * later be passed up to the IP stack to be routed. For a redirected * packet, ip_route_input() will give back the localhost as output device, * which differs from the bridge device. * * Let us now consider the case that ip_route_input() fails: * * This can be because the destination address is martian, in which case * the packet will be dropped. * After a "echo '0' > /proc/sys/net/ipv4/ip_forward" ip_route_input() * will fail, while __ip_route_output_key() will return success. The source * address for __ip_route_output_key() is set to zero, so __ip_route_output_key * thinks we're handling a locally generated packet and won't care * if IP forwarding is allowed. We send a warning message to the users's * log telling her to put IP forwarding on. * * ip_route_input() will also fail if there is no route available. * In that case we just drop the packet. * */ static int br_nf_pre_routing_finish(struct sk_buff *skb) { struct net_device *dev = skb->dev; struct iphdr *iph = ip_hdr(skb); struct nf_bridge_info *nf_bridge = skb->nf_bridge; ... if (dnat_took_place(skb)) { /* dnat操作 */ if ((err = ip_route_input(skb, iph->daddr, iph->saddr, iph->tos, dev))) { ... }else{ if (skb->dst->dev == dev) { bridged_dnat: /* Tell br_nf_local_out this is a * bridged frame */ nf_bridge->mask |= BRNF_BRIDGED_DNAT; skb->dev = nf_bridge->physindev; nf_bridge_push_encap_header(skb); /* 发送数据 */ NF_HOOK_THRESH(PF_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL, br_nf_pre_routing_finish_bridge, 1); return 0; } } else { ... } skb->dev = nf_bridge->physindev; nf_bridge_push_encap_header(skb); /* 理论上将再次调用br_nf_pre_routing和br_handle_frame_finish * 但是由于1这数字的缘故,所有优先级小于1的HOOK都不再执行 * 一般PF_BRIDGE的HOOK点没有优先级大于1的,因此这里仅执行br_handle_frame_finish */ NF_HOOK_THRESH(PF_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL, br_handle_frame_finish, 1); return 0; } 3)网桥数据转发 static void __br_forward(const struct net_bridge_port *to, struct sk_buff *skb) { struct net_device *indev; ... /* 此处调用br_nf_forward_ip,br_nf_forward_arp和br_forward_finish */ NF_HOOK(PF_BRIDGE, NF_BR_FORWARD, skb, indev, skb->dev, br_forward_finish); } static unsigned int br_nf_forward_ip(unsigned int hook, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { struct nf_bridge_info *nf_bridge; ... /* 不再调用br_forward_finish, 而是调用br_nf_forward_finish */ NF_HOOK(pf, NF_INET_FORWARD, skb, bridge_parent(in), parent, br_nf_forward_finish); return NF_STOLEN; } static int br_nf_forward_finish(struct sk_buff *skb) { ... /* 此处仅调用br_forward_finish, 原因如前 */ NF_HOOK_THRESH(PF_BRIDGE, NF_BR_FORWARD, skb, in, skb->dev, br_forward_finish, 1); return 0; } int br_forward_finish(struct sk_buff *skb) { return NF_HOOK(PF_BRIDGE, NF_BR_POST_ROUTING, skb, NULL, skb->dev, br_dev_queue_push_xmit); } 4) 网桥发送数据 static unsigned int br_nf_post_routing(unsigned int hook, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { struct nf_bridge_info *nf_bridge = skb->nf_bridge; struct net_device *realoutdev = bridge_parent(skb->dev); ... /* 调用br_nf_dev_queue_xmit, 不再调用br_dev_queue_push_xmit */ NF_HOOK(pf, NF_INET_POST_ROUTING, skb, NULL, realoutdev, br_nf_dev_queue_xmit); return NF_STOLEN; } static int br_nf_dev_queue_xmit(struct sk_buff *skb) { if (skb->protocol == htons(ETH_P_IP) && skb->len > skb->dev->mtu && !skb_is_gso(skb)) return ip_fragment(skb, br_dev_queue_push_xmit); else return br_dev_queue_push_xmit(skb); } int br_dev_queue_push_xmit(struct sk_buff *skb) { /* drop mtu oversized packets except gso */ if (packet_length(skb) > skb->dev->mtu && !skb_is_gso(skb)) kfree_skb(skb); else { /* ip_refrag calls ip_fragment, doesn't copy the MAC header. */ if (nf_bridge_maybe_copy_header(skb)) kfree_skb(skb); else { skb_push(skb, ETH_HLEN); /* 发送数据 */ dev_queue_xmit(skb); } } return 0; } 由上可见,网桥数据包处于2层转发,但数据包可以经过PF_BRIDGE和PF_INET两层HOOK点。 5) stp数据包流程 首先描述网络协议栈: ----------------------- -------------------------- | 网桥 stp协议 | | 其他模块 stp协议 | ----------------------- -------------------------- | | |laddr.lsap = lsap; sap->rcv_func = func; llc_add_sap(sap); /* 添加到llc处理链表 */ ... } /* 标准的llc数据包头. Sequence-numbered PDU format (4 bytes in length) */ struct llc_pdu_sn { u8 dsap; u8 ssap; u8 ctrl_1; u8 ctrl_2; }; /* llc协议数据接收接口 */ int llc_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) { ... /* 根据dsap来选择上次协议 */ pdu = llc_pdu_sn_hdr(skb); ... sap = llc_sap_find(pdu->dsap); ... /* * First the upper layer protocols that don't need the full * LLC functionality */ rcv = rcu_dereference(sap->rcv_func); if (rcv) { struct sk_buff *cskb = skb_clone(skb, GFP_ATOMIC); if (cskb) /* 上层协议接收处理: 此处为stp_pdu_rcv函数 */ rcv(cskb, dev, pt, orig_dev); } ... } 5.2) stp协议数据包处理 /* stp注册接口到llc协议模块,等待llc接收到stp数据后,调用stp_pdu_rcv函数 */ int stp_proto_register(const struct stp_proto *proto) { ... if (sap_registered++ == 0) { /* 注册stp接口函数到llc协议 */ sap = llc_sap_open(LLC_SAP_BSPAN, stp_pdu_rcv); if (!sap) { err = -ENOMEM; goto out; } } if (is_zero_ether_addr(proto->group_address)) /* 注册协议到stp_proto,此时的组地址为0。stp处理模块注册到该变量中 */ rcu_assign_pointer(stp_proto, proto); else /* 注册协议到garp_protos中 */ rcu_assign_pointer(garp_protos[proto->group_address[5] - GARP_ADDR_MIN], proto); ... } /* 开始stp协议处理过程 */ static int stp_pdu_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) { const struct ethhdr *eh = eth_hdr(skb); const struct llc_pdu_un *pdu = llc_pdu_un_hdr(skb); const struct stp_proto *proto; if (pdu->ssap != LLC_SAP_BSPAN || pdu->dsap != LLC_SAP_BSPAN || pdu->ctrl_1 != LLC_PDU_TYPE_U) goto err; /* 检查组播地址:01:80:c2:00:00:20 - 01:80:c2:00:00:2F */ if (eh->h_dest[5] >= GARP_ADDR_MIN && eh->h_dest[5] h_dest[5] - GARP_ADDR_MIN]); if (proto && compare_ether_addr(eh->h_dest, proto->group_address)) goto err; } else /* 获取stp_proto处理 proto = rcu_dereference(stp_proto); if (!proto) goto err; /* 协议接收处理,这里是网桥的stp处理:br_stp_rcv */ proto->rcv(proto, skb, dev); return 0; err: kfree_skb(skb); return 0; } static const struct stp_proto br_stp_proto = { .rcv = br_stp_rcv, }; static int __init br_init(void) { int err; /* 注册stp协议,这里是网桥协议 */ err = stp_proto_register(&br_stp_proto); ... } 随后的stp处理流程为: br_stp_rcv --〉 br_received_config_bpdu --〉br_record_config_information br_port_state_selection --〉br_make_forwarding/br_make_blocking |
|