An Open Source Load Balancer for OpenShift – Red Hat OpenShift Blog

louy2 2019-03-04

展开全文

Google Maglev Load Balancer

The previous architecture should allow you to manage a very large traffic workload. But what if you need to support a web-scale workload?

Google published a load balancer design paper that explains how to create a horizontally scalable load balancer in which each load balancer instance can achieve a 10Gbps throughput using commodity Linux hardware. This load balancer is called Maglev and was released publically under the project name seesaw. This system can scale to very high loads but relies on a few uncommon networking setups and therefore the level of effort to implement this solution may not be worthwhile unless you really need to support a web-scale deployment.

The main architectural ideas that Maglev introduces are the following:

Maglev uses an anycast VIP. In anycast, several servers advertise the same VIP to an edge router, which chooses one of them when it routes an IP packet. IPv4 does not support this mode natively (IPv6 does), but it can be implemented using the Border Gateway Protocol (BGP) with equal Equal-Cost Multi-Path (ECMP) for all the destinations. This provides theoretical unlimited horizontal scalability of the load balancer layer. Anycast will typically work only with edge routers that can interpret BGP.

Maglev does not use the kernel stack for processing network packets, it processes packets directly from the NIC in userspace. Not going through the kernel TCP/IP stack allows the code to be very efficient and focused. The most expensive operation is deciding where to send the packet (especially when you have a very large battery of servers to load balance to). Google uses a specialized hashing algorithm for this task.

Only inbound packets are routed. Return packets are sent back to the edge router directly by the load balanced servers (the OpenShift routers in this picture). The reason for this choice is that typically the return flow is ten times larger than the inbound flow and therefore much more bandwidth and computation power can be used to route the inbound packets. This setup is also known as Direct Server Return and requires some additional network configuration. Each load balanced server must also expose the VIP but must not advertise it on the network via the ARP protocol. Arptables (or other mechanisms) can be used to filter ARP broadcasts for the VIP. With direct server return, the only thing the router needs to do is to change the mac destination address in the data link frame and retransmit the packet.

Keepalived can also work in this mode, but I intentionally didn’t consider it in the above sections because, as said, this mode requires an unusual networking setup.

Each tier of this architecture load balances at a different layer of the OSI model. The edge router load balances at layer 3 because of the anycast setup, the maglev router load balances at layer 4, and the OpenShift routers load balance at layer 7.

I haven’t tried to containerize Seesaw and deploy it as a static pod in OpenShift, if you do please let me (rspazzol@) know.

Conclusions

If you run OpenShift in the cloud, your cloud provider will give you an API-based elastic load balancer. If you run OpenShift on-premise and want to use open source software and commodity hardware to create your load balancers, this article shows a series of architectural approaches that you can consider. The self-hosted solution is probably the most versatile and the one I’d recommend. If you need to manage web-scale load you might want to consider the Maglev load balancer.