Thursday, February 21, 2013

Network bonding in Linux

Network bonding means combining two or more network interface cards to create a single bonded interface. This helps in high availability as well as in increasing the available bandwidth. However note that bonding multiple network cards together will not instantly result in double the bandwidth and high-availability in case a link goes down. The network cards will be bonded as slave to the logical bond interface.

Ethernet channel Bonding modes fall under three categories

  • Modes that require switch support
  • Generic mode which do not require switch support
  • Modes that provide fail-over only
Modes that require switch support

A switch has to support 802.3ad. Then only it is possible to aggregate bandwidth of all the physical NICs.

balance-rr (Mode 0) -
   Packets are transmitted in round-robin fashion without hashing. say, if there are two NIC cards bonded and if two packets arrive at the bonded interface, the first packet will be sent to first slaave and the second packet will be sent to the second slave. If a third packet arrives, it shall be sent to the first slave and so on. Thus, this mode provides true load balancing.

802.3ad (Mode 4) -
   This mode is the official standard for link aggregation. It creates aggregation groups that share the same speed and duplex settings. This mode requires a switch that supports IEEE 802.3ad dynamic link.

balance-xor (Mode 2) -
    In this mode, the traffic is hashed - source MAC address is XOR'ed with destination address. This selects the same slave for each destination MAC address and provides load balancing and fault tolerance. Traffic is hashed and balanced according to the receiver on the other end.

 Note : The modes requiring switch support can be run back-to-back with crossover cables between two server as well. This is especially useful, for example, when using DRBD to replicate two partitions.

Generic bonding mode

This mode requires that the NIC cards support changing the MAC address on the fly. Say, if there are two NIC cards bonded, these NIC cards will constatnly swap their MAC addresses to trick the other end (be it a switch or another connected host) into believing that it is sending traffic to the same network card. If switch sends a packet to first slave, it will try to send the second packet to the same slave card again. Upon receiving the first packet, the first slave will swap it's MAC address with second slave. Since switch identifies
the NIC cards by MAC address, it will send the second packet to second slave thinking it is the card to which it sent the previous packet.

broadcast (mode 3) -
   It simply broadcasts all traffic out both interface. This mode is least used.

balance-tlb (mode 5) -
     This is called transmit load balancing. Outgoing traffic is load balanced, but incoming uses only a single interface. The driver will change the MAC address on the NIC when sending, but incoming always remains the same. The outgoing traffic is distributed according to the current load and queue on each slave interface (computed relative to speed on each slave). Incoming traffic is received by the current slave. If the receiving slave fails, another slave takes over the MAC address of the failed receiving slave.

balance-alb (mode 6) -
     This is called adaptive load balacing. Both sending and receiving frames
are load balanced using the change MAC address trick. The bonding driver
intercepts the ARP Replies sent by the server on their way out and overwrites
the src hw address with the unique hw address of one of the slaves in the
bond.

High Availability (failover-only)

Let us examine the failover part of NIC bonding. If both the NICs of a bond  are connected to the same switch, then if the switch goes down or rebooted for a firmware upgrade, then we are down.

So the best way is to connect the NICs in a bond to different switches. But this is possible only for generic mode of bonding which does not require switch support. For the bonding mode which requires switch support, this is not possible on most devices.

active-backup (mode 1) -
   This mode places one of the interafces into a backup state and shall make it active if the other active interface goes down. The bond's MAC address is externally visible on only one port to avoid confusing the switch.

So there are seven types of ethernet bonding.  

Kernel module for bonding must be loaded for this bonding purpose.

The most commonly used bonding modes are
1) balance-rr or 0
2) active-backup or 1
3) balance-xor or 2

Configuring  Ethernet bonding interface in CentOS 

Requirements for ethernet bonding are
  • eth0 - The first network card
  • eth1 - The second NIC
  • bond0 - The bonding device created by the entry in /etc/modules.conf (for an older 2.4 kernel) or in /etc/modprobe.d/bonding.conf(for a 2.6 kernel)
Note : The bond will always take the MAC address of the eth0 NIC card

In /etc/modprobe.d/bonding.conf file, add the following line

       alias bond0 bonding
  • The alias line will associate the bond0 network interface with the bonding module
In CentOS, the script /etc/init.d/network will bring up the network interfaces, eth0 and eth1, by default. So we will add following entries in /etc/rc.local to set up the bonding
  • bring these interfaces, eth0 and eth1, down first
  • clear the exist route asscoiated with eth0
  • enable bonding by specifying bonding mode(0..7) using modprobe command
  • associate the bonding interface, bond0, with MAC address of the eth0 NIC card
  • add an IP address for the bond0 interface
  • bring up the bonding interface, bond0
  • add the NICs, eth0 and eth1, as slaves to the bonding interface, bond0, using ifenslave command
  • Bring up the interfaces, eth0 and eth1
  • Set route for bond0 devices
Let us illustrate for setting up failover bonding - mode 1

Bonding to introduce failover - mode 1: (HA)

Add the following lines in /etc/rc.local file

ip link set dev eth0 down
ip link set dev eth1 down
ip route del 0/0 via 192.168.1.1 dev eth0
modprobe bonding mode=1 miimon=100 downdelay=200 updelay=200
ip link set dev bond0 addr 00:80:c8:e7:ab:5c
ip addr add 192.168.1.33/24 dev bond0
ip link set dev bond0 up
ifenslave bond0 eth0 eth1
ip link set dev eth0 up
ip link set dev eth1 up
ip route add 0/0 via 192.168.1.1 dev bond0 


 #To add an entry for ip address and hostname in /etc/hosts,
ip addr | grep global | awk -F'/' '{ print $1 }' | awk '{ print $2,"\t","\t","dhcpcc5","dhcpcc5" }' | awk '{system("sed "'NR'"i\""$0"\" -i /etc/hosts")}'

Reboot the machine

Once the machine is up, check the following

# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

Slave Interface: eth0
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1c:c0:3e:4b:7e
Slave queue ID: 0

Slave Interface: eth1
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:19:5b:6b:53:22
Slave queue ID: 0





So the bonding mode is, fault tolerance, and the interface eth1 is the active slave behind the bond0 interface.

# ifconfig
bond0     Link encap:Ethernet  HWaddr 00:1C:C0:3E:4B:7E 
          inet addr:192.168.1.33  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::21c:c0ff:fe3e:4b7e/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:9758 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7774 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:5579137 (5.3 MiB)  TX bytes:2026631 (1.9 MiB)

eth0      Link encap:Ethernet  HWaddr 00:1C:C0:3E:4B:7E 
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:1320 errors:0 dropped:0 overruns:0 frame:0
          TX packets:23 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:104657 (102.2 KiB)  TX bytes:3815 (3.7 KiB)
          Interrupt:29 Base address:0xe000

eth1      Link encap:Ethernet  HWaddr 00:1C:C0:3E:4B:7E 
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:8438 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7751 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:5474480 (5.2 MiB)  TX bytes:2022816 (1.9 MiB)
          Interrupt:17 Base address:0x2000

lo        Link encap:Local Loopback 
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:210 errors:0 dropped:0 overruns:0 frame:0
          TX packets:210 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:23293 (22.7 KiB)  TX bytes:23293 (22.7 KiB)

venet0    Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 
          inet6 addr: fe80::1/128 Scope:Link
          UP BROADCAST POINTOPOINT RUNNING NOARP  MTU:1500  Metric:1
          RX packets:660 errors:0 dropped:0 overruns:0 frame:0
          TX packets:834 errors:0 dropped:3 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:88089 (86.0 KiB)  TX bytes:75317 (73.5 KiB)




# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.2.101   0.0.0.0         255.255.255.255 UH    0      0        0 venet0 192.168.1.0     0.0.0.0         255.255.255.0   U     0      0        0 bond0
0.0.0.0         192.168.1.1     0.0.0.0         UG    0      0        0 bond0


# ip route show
192.168.2.101 dev venet0  scope link 
192.168.1.0/24 dev bond0  proto kernel  scope link  src 192.168.1.33
default via 192.168.1.1 dev bond0








To verify whether the failover bonding works..

  Bring down eth1 and check /proc/net/bonding/bond0 and check the "Current
Active Slave"

    Do a continuous ping to the bond0 ipaddress from a different machine and
bring down the active slave interface, eth1. The ping should not break.







2 comments:

  1. Thanks for the always useful information. This is great information to help garage type SEO people like me.
    Change MAC Address v3.1

    ReplyDelete