AFDB logo
mn :: comp :: net

LRP Load Balancing HOWTO



Written by: Jack Coates <jack@monkeynoodle.org> Last Revised: 2/7/2002

Intro: About once a week it seems, someone asks how to make LRP balance the load between two lines. Sometimes it's two modem lines, two residential broadband services (xDSL or Cable), or two hi-cap digital lines. Regardless of the technology used to transfer data, the issue of load-balancing is the same. To be clear up front, this is not simple and frequently doesn't work the way you thought it would. However, if you have two lines and you want to use them both without installing a second router, this is the way to go.

This HOWTO will discuss the technologies available, which ones might fit a given situation, and how to configure them.


Revision History:
04/02/02 -- Added a link to an external article on balancing within one host
02/07/02 -- Major overhaul, introduced three-router option
11/16/00 -- Fix psuedo-code, discuss listen-only BGP per George Metz
11/15/00 -- Talk about failover options
05/07/00 -- Address Linux's lack of administrative distance
11/27/99 -- First version - needs better EQL section and examples of floating static routes


1. What is load-balancing and why would I want to?

Load-balancing is the art of sharing a given amount of traffic equally across two circuits. This is an art because most of the time you can't predict where the traffic is going to or coming from. Nevertheless, there are a few tricks you can use to make it work -- sort of.

Why you would want to seems fairly clear, though: increased throughput and fault tolerance.

Throughout this document, I am going to use two lines as an example. However, the technologies and designs discussed will usually work with n number of lines. Feel free to modify.

2. Two lines going to the same place.

2.1. Channel Bonding (Layer 2).

As one might guess, channel bonding is a technology which bonds two real channels into one virtual channel. There are a number of ways to do this, and it's hardly new tech: this is how ISDN works. Generally it's used in one of two scenarios: two PPP links (see the EQL kernel module) or two Ethernets on the same LAN. If you're doing it with two links to the same ISP, you'll probably have success using analog modems or digital T1/E1 cards (e.g. Sangoma); however, it needs to be supported on the ISP end. If you want to do it on the same LAN, there are a number of projects aiming in that direction. Search Freshmeat or try one of these:

2.2. Inverse Multiplexing over ATM (IMA).

IMA is part of the ATM Forum specification, because who wants a single T1 if you have 20% of it used for overhead? IMA bonds physical channels together at OSI Layer 2. You'll have to buy hardware to make this work, I recommend the Kentrox AAC-2. It can accept up to 8 DS-1s, and output the traffic on HSSI or Ethernet ports. Make sure you check this out with the ISP and RBOC that you'll be working with, as there are a lot of compatibility issues between different vendor's IMA implementations.

When ISP and RBOC can provide a compatible Layer 2 network for you, your load-balancing problem is through. Simply configure the Kentrox appropriately, then configure your router as if it had a single circuit attached to it.

2.3. Round-robin (Layer 3).

Cisco IOS automatically round-robins traffic on two interfaces which go to the same location. If that location is your site and your ISP uses Cisco, your inbound traffic is taken care of. But what about outbound? Floating static routes (Section 3.1) or the Equal-cost Multipath kernel option will take care of that.

Equal-cost Multipath allows you to configure your router with two default networks. Outbound traffic is then round-robined by packet, so that packet 1 coming into the router goes out down circuit A, packet 2 goes out down circuit B, &c. Your traffic may well arrive out of order and occasionally with missing packets because of this, but it shouldn't cause problems -- that's what IP was designed for and there are a lot of safety checks built into the protocol stack for this scenario. Of course, there are a lot of attempts out there to make IP do things it wasn't designed to do, like provide real-time streamed multimedia. If you see greater than expected choppiness in upstream or downstream multimedia, you may want to take steps to ensure that it is always routed along a single path.

Unfortunately, equal-cost multipath is a kernel configuration option. That means compiling your own kernel -- see The Developer Guide and http://www.linux.com for further instructions. I couldn't find any documentation on configuring it, unless you count the Linux Networking Mailing List Archives. There is also this snippet from the config help file:

        CONFIG_IP_ROUTE_MULTIPATH
          Normally, the routing tables specify a single action to be taken in
          a deterministic manner for a given packet. If you say Y here
          however, it becomes possible to attach several actions to a packet
          pattern, in effect specifying several alternative paths to travel
          for those packets. The router considers all these paths to be of
          equal "cost" and chooses one of them in a non-deterministic fashion
          if a matching packet arrives.

I'm not aware of anyone using EQL or round-robin, but the possibility is there. Keep in mind that the traffic is split up on a per-packet basis, not per-session. This means that the lines you are using must terminate on the same device so that sessions can be maintained and, in some cases, fragmented datagrams can be reassembled.

3. Two lines going different places.

3.1. Floating static routes.

Floating static is when you use two static routes, one with a high administrative distance (so unused until route 1 goes down). That means remove your default route, then set up your routes so that the first half of the Internet space is routed out of E0 and the second half is routed out of E1 (assuming E2 is the internal interface). Next set up backup routes which send the first half out of E1 and the second half out of E0, with higher administrative distance.

You're lucky here, because your two ISPs will give you different IP addresses for each interface, so traffic going out E0 will get its responses on E0 and not E1. It isn't true load balancing, but it should get you a 60%/40% or 70%/30% distribution from day to day. You can also monitor and tune this -- for instance, if traffic to a major vendor site is 10% of your company's Internet traffic, add a floating static route set for their public subnet which attempts to send primary traffic down your least-used circuit. Similarly, since mail will only be handled by one ISP, you can (and should) force traffic to/from your mail server to only use that ISP's circuit.

But don't get too happy too fast here, because I don't currently know of any way to make Linux masquerade behind two interfaces at the same time. In other words, this scenario will only work properly if one of your networks doesn't need to use NAT. However, this can still work if:

Two routers is a pain in the neck, because you'll still be doing the floating static trick, but the backup routes will point to the other router in your prem instead of to the other ISP's gateway, introducing an un-needed hop. Unfortunately, Linux lacks an administrative distance option in its routing code, so this is the only way to implement the solution in LRP. Because you have two routers, you can use the metric option to denote a route as backup instead of primary, like so:

router1
route add -net a.a.a.a netmask b.b.b.b gw c.c.c.c
route add -net z.z.z.z netmask y.y.y.y metric 1 dev eth0

router2
route add -net z.z.z.z netmask y.y.y.y gw x.x.x.x
route add -net a.a.a.a netmask b.b.b.b metric 1 dev eth0

And so on. Theoretically, you might be able to pull that same trick off with IP Aliasing (assigning two addresses to one interface). It would be a mess, but it might work. I am still studying Linux 2.2 and 2.4's networking and there might be a better option that I don't know about, but if so, I don't know.

3.1.1 Doing this on one box?

I've read articles and books that imply the ability to do this in one box using multiple routing tables. I don't see any reason why it wouldn't work, but I wasn't able to the only time I've tried and I haven't tried recently. Here's two links:

3.2. Policy Routing.

One thing that Linux does pretty well is policy routing. For instance, one might route all SMTP to ISP 1, all HTTP to ISP 2, and all other traffic to ISP 3. This is very handy if one of your ISPs has given you a valid reverse DNS, since some mail administrators refuse to accept mail from home user MTAs with invalid reverse DNS mappings (sigh). In this case, you could simply use the ISP where you have rDNS for all email traffic. However, there is a potential problem to watch for with policy routing -- quite simply, if you're not going to implement policy routing on every system in your network, then you'll have to make sure that all traffic (in and out) goes through the system that does do the policy routing. So if you've got two routers and the left router does policy but the right one doesn't, then policy routed services will only work from systems that have the left hand router as their default gateway.

Of course, once one has set up a backbone router to do policy routing with, all of the troubling NAT issues with multiple ISPs go away. For instance, here's a working three-router configuration by Reggie Richardson (I've edited IP addresses and altered the diagram somewhat):


                ISP A                                       ISP B
                  |                                            |
                Cable (PPPoE)                                 ADSL
           A.A.A.?/24                                   B.B.B.B/26 
                  |	                                       |
   LRP:           |ppp0	                        Smoothwall:    |eth0
   ***************|**********                   ***************|**********
   *              |         *                   *              |         *
   * eth0: A.A.A.?/24       *                   * eth0: B.B.B.B/26       *
   *                        *                   *                        *
   * eth1: 192.168.1.6/30   *                   * eth1: 192.168.1.2/30   *
   *              |         *                   *              |         *
   ***************|**********	                ***************|**********
                  |eth1                                        |eth1
                  +-------------+               +--------------+
                                |eth1           |eth0
                         *****************************
                         * eth0: 192.168.1.1/30      *
                         *                           *
                         * eth1: 192.168.1.5/30      *
                         *                           *
                         * eth2  192.168.10.254/24   *
                         *****************************
                                        |eth2
                                        |                
              hub/switch  --------------+
               | |  | |
               | |  | |    Internal Server/Workstations:
               | |  | |
               | |  | +--- 192.168.10.1    
               | |  |
               | |  +----- 192.168.10.2 
               | |              .
               | |              .
               | |              .
               |-+-------- 192.168.10.253

The obvious benefit of doing this is problem segmentation; getting issues that are ISP-specific onto a specific router makes it easier to solve those issues without breaking something else inadvertently. The obvious downside is that it takes more hardware and therefore creates more opportunity to have things break.

In order to do this yourself, simply add a route on the ISP facing boxes so that they know how to reach the internal network. For instance, the CABLE router in the above diagram will have this statement added to the end of its eth1 configuration stanza:

eth1_ROUTES="192.168.10/24_via_192.168.1.5"

Additionally, any portforwarding statements can point directly to the 192.168.10.0/24 addresses, assuming that the backbone router is configured as a router, not a firewall.

On the backbone router, there's a little more work to be done. First, you'll need to define outbound ipchains rules in addition to inbound. This is necessary to mark the packets with a certain decision path. The following example stanza marks mail, DNS, and PPtP for table 1, and web traffic, FTP, kazaa, and AOL for table 2.

##############################################################################
# IPCHAINS Functions
##############################################################################
/sbin/ipchains -A input -p tcp -s 192.168.10.0/24 -d 0/0 www -m 2
/sbin/ipchains -A input -p udp -s 192.168.10.0/24 -d 0/0 www -m 2
/sbin/ipchains -A input -p udp -s 192.168.10.0/24 -d 0/0 https -m 2
/sbin/ipchains -A input -p tcp -s 192.168.10.0/24 -d 0/0 https -m 2
/sbin/ipchains -A input -p tcp -s 192.168.10.0/24 -d 0/0 110 -m 1
/sbin/ipchains -A input -p tcp -s 192.168.10.0/24 -d 0/0 25 -m 1
/sbin/ipchains -A input -p tcp -s 192.168.10.0/24 -d 0/0 1214 -m 2
/sbin/ipchains -A input -p icmp -s 192.168.10.0/24 -d 0/0 -m 2
/sbin/ipchains -A input -p tcp -s 192.168.10.0/24 -d 0/0 21 -m 2
/sbin/ipchains -A input -p tcp -s 192.168.10.0/24 -d 0/0 20 -m 2
/sbin/ipchains -A input -p tcp -s 192.168.10.0/24 -d 0/0 5190 -m 2
/sbin/ipchains -A input -p tcp -s 192.168.10.0/24 -d 0/0 1723 -m 1
/sbin/ipchains -A input -p udp -s 192.168.10.0/24 -d 0/0 53 -m 1
/sbin/ipchains -A input -p tcp -s 192.168.10.0/24 -d 0/0 53 -m 1

Next, you need to associate a fwmark with a table name and define what routing actions are to be taken with the packets that match a given table:

##############################################################################
# IP ROUTE Functions
##############################################################################
ip rule add fwmark 1 table adsl
ip rule add fwmark 2 table cable

ip ro add default via 192.168.1.2 dev eth0 table adsl
ip ro add default via 192.168.1.6 dev eth2 table cable

Last but not least, policy routing and masquerade do not play well together. Now logically, one might say that since there's no NAT or masquerade on the backbone router where the policy routing is being done that this wouldn't matter, but unfortunately the packet is still being mangled twice. The fact that mangling is done on two different machines doesn't seem to matter. The best way to find out what all the issues are and current status on them is probably to join this mailing list. Short of that, adding the following lines to the backbone router configuration seems to work:

##############################################################################
# Configuration to allow FWMARK to work with NAT/MASQ
##############################################################################
for f in /proc/sys/net/ipv4/conf/*/rp_filter; do echo 0 > $f; done
echo 1 > /proc/sys/net/ipv4/route/flush

If you're going to do Policy Routing, you may want to have a look at doing Quality of Service as well. The toolsets and concepts are closely linked and worth reading up on.

3.3. Border Gateway Protocol.

First things first -- your ISP will only let you participate in their BGP network if you're a big customer. These days, "big customer" might mean a /23 or /22 of public space, but if you're connecting to a Tier 1 NSP, they'll want a /19 or better. Some ISPs are hungrier than others, so if you have a class C you might ask and see what happens. But if you're ip-masq'ing from behind a /30, don't even waste your time asking.

When you do BGP you have the same IP subnet on both interfaces, with both ISPs. That's why you have to have a big IP address block -- the second ISP is advertising your address space as itself, instead of supernetting it into their own /8 or /16. It's an administrative headache for them, which is why they are reluctant to do it.

With BGP, outbound traffic is still routed between the two interfaces via floating static routes. However, inbound traffic (whether in response to an outbound session initiation or not) will return via the best route. That means it comes into the interface which has fewer hops between the sender and you, regardless of the load on that interface. In other words, if ISP A is a Tier 1 NSP and ISP B is the local RBOC, you'll have problems. The RBOC is legally prohibited from transmitting long distance traffic, whether it's voice, data, or carrier pigeons. This means that every RBOC ISP is divided up into LATA-sized blocks, just like their phone network, and the LATA-sized blocks are connected by a separate NSP. The RBOC data network will always have higher hopcount and higher latency than a national ISP's, and so your circuit to ISP B will go unused until the circuit to ISP A fails.

The way to slow this process is complicated, and you may want to begin with trying to get your slacker ISP to reduce their internal hop count so that they compare better with your good ISP. Of course, the RBOC can't do that until the FCC lets them sell long distance. Failing that, we need to look at AS prepending. BGP networks (like OSPF networks) are divided into zones of authority. The zone structure works sort of like DNS - one or more routers is authorative for the routing tables to be found within the zone, so routers from all over the world don't have to come asking your little LRP 486 how to get to you. In BGP, the zones are called Autonomous Systems. Routers decide how far away they are from each other by counting the number of AS's between here and there. They do not look to see if the AS's are different, though. So, in order to force the Internet to send more info to ISP B, you need to trick it into thinking that the path through ISP A isn't as good as it really is. This is done by "prepending" additional copies of your AS number in the ISP A router, in order to artificially increase the AS hop count.

Now that we've said all that, there is a possibility that you can get listen-only BGP with a little less trouble. Quoting an email from George Metz:

> With regards to BGP, assuming that both of the ISPs are running BGP on 
> their network - not all do - then you can request a listen-only BGP 
> session from your ISP/NSP, and most are willing to set something up for 
> dual-homed users out of the non-routable ASes at the top of the AS 
> numbers. I know that several of the national ISPs are willing to do 
> this for their commercial or high-speed circuits, as it doesn't take 
> much processing power from the router, just a transmit of the BGP 
> tables initially and updates every once in a while. You'd have to set 
> up BGP for recieve-only, but it can be - and is commonly - done.

The benefit to doing this is that your outbound traffic will share both circuits (theoretically) and should take faster routes to the end location. The drawback is that you won't get the biggest benefit of BGP, failover of inbound services from once circuit to another. Setting up BGP is beyond the scope of this document, but look for the Zebra package and read up on it to get started.

4. Failing over

You'll also want to make sure your router doesn't attempt to send traffic to unavailable links, of course -- and the definition of unavailable needs to address latency and traffic types that are bound to a certain ISP. The way to handle this is script which tests links and takes action based on test results. Before talking about code, I will discuss some of the things that such a script must do in some of the load-sharing scenarios we've already discussed.

First, you should make sure the networks you want can be reached at all, and that they're presenting acceptable latency. The latency test is potentially a multipart test, because there are two important links -- the one between your site and the ISP's access router, then the one between the ISP's backbone and their NSP. With some mom-n-pop ISPs, you may want to test all the way out to some Internet site.

Second, you should customize the actions that will be performed based on the load-balancing methodology being used and the services being run.

Pseudo-code:
set A-status to 0
set B-status to 0
set A-max-latency to the highest acceptable ping return time
set B-max-latency to the highest acceptable ping return time
every (five seconds) do
        ping -c 1 -I [interface] ISP A's gateway and direct output to A1
        ping -c 1 -I [interface] ISP A's NSP and direct output to A2
        ping -c 1 -I [interface] ISP B's gateway and direct output to B1
        ping -c 1 -I [interface] ISP B's NSP and direct output to B2
        test A1 for 'Unreachable' or time => A-max-latency
                if test succeeds, add 1 to A-status
                if test fails and A-status is < 0, subtract 1 from A-status
        test A2 for 'Unreachable' or time => A-max-latency
                if test succeeds, add 1 to A-status
                if test fails and A-status is < 0, subtract 1 from A-status
        test B1 for 'Unreachable' or time => B-max-latency
                if test succeeds, add 1 to B-status
                if test fails and B-status is < 0, subtract 1 from B-status
        test B2 for 'Unreachable' or time => B-max-latency
                if test succeeds, add 1 to B-status
                if test fails and B-status is < 0, subtract 1 from B-status
done

if (A-status => 10) do
        function shift-to-B
done
if (B-status => 10) do
        function shift-to-A
done
if (A-status and B-status = 0) do
        function use-A-and-B
done

shift-to-A(
        copy in a set of configuration files for A-only operation
        restart networking
        )
shift-to-B(
        copy in a set of configuration files for B-only operation
        restart networking
        )
use-A-and-B(
        copy in the default, two network configuration files
        restart networking
        )

There is a package called ifcheck.lrp which is supposed to do exactly this; I haven't tried it myself, but it is available on the usual sites.

5. Resources

http://www.faqs.org/rfcs/rfc917.html - subnetting
http://www.faqs.org/rfcs/rfc1771.html - BGP-4, includes AS Prepending info
Policy Routing Using Linux, by Matthew Marsh -- Excellent book on the workings of the iproute2 tool.

Last modified: Oct 24, 2008 2:28 pm.
Contact me.

Powered by Zope