11g R2 RAC: NIC BONDING

 

 Oracle uses interconnect for both cache fusion and oracle clusterware messaging. Based on the number of nodes in the configuration, either the interconnect can be a crossover cable when only two nodes are participating the cluster or it can be connected via a switch. Networks, both public and private can be single points of failure, Such failures can disrupt the operation of the cluster and reduce availability. To avoid such failures, redundant networks should be configured. This means That dual network adapters should be configured for both public and private networks. However, to enable dual network connections and to load-balance network traffic across the dual network Adapters, features such as network interface card (NIC) bonding or NIC pairing should be used whenever possible.

 

  NIC bonding is a method of pairing multiple physical network connections into a single logical interface. This logical interface is used to establish connection with the database server. By allowing all network connections that are part of the logical interface to be used during communication, this provides load-balancing capabilities that would not be otherwise available. In addition, when one of the network connections fails, the other connection will continue to receive and transmit data, making it fault tolerant.

 

  In a RAC configuration, there is a requirement to have a minimum of two network connections. One connection is for the private interface between the nodes in the cluster, and the other connection , called the public interface, is for users or application servers to connect and transmit data to the database server.
  Private IP addresses are required by Oracle RAC to provide communication between the cluster nodes. Depending on your private network configuration, you may need one or more IP addresses.

 

  Let’s implement and test NIC bonding in an 11gR2 RAC setup.

 

Current configuration is as follows:

 

– 2 node VM setup – Node1, Node2

 

Public network  – eth0
                       Node1 : 192.9.201.183
                       Node2 : 192.9.201.187

 

Private interconnect – eth1
                       Node1 : 10.0.0.1
                       Node2 : 10.0.0.2

 

– On both the nodes, before powering them on, Create  two network interfaces eth2 and eth3 which will be bonded and will replace current  private interconnect eth1.
– Power on both machines
– check that cluster services are running –
Node1#crsctl stat res -t
Node1#oifcfg getif
eth0 192.9.201.0 global public
eth1 10.0.0.0 global cluster_interconnect
———————
– On both the nodes,
———————

 

Step:
 – Using neat
   . deactivate both eth2 and eth3 (have been assigned ipaddress by DHCP)
   . Edit eth1 and remove entries for ipaddress and netmask because we will use same ipaddress (10.0.0.1/2) for bond0 on both the nodes so that  we don’t have to make a new entry in DNS.

 

Step:
- Create a logical interface (bond0) :
  Create bond0 Configuration File in /etc/sysconfig/network-scripts directory on both machines. The IPaddress used is the same as that was used for eth1 on both the nodes.
Node1# vi /etc/sysconfig/network-scripts/ifcfg-bond0
Add the following lines:
DEVICE=bond0
IPADDR=10.0.0.1
NETWORK=10.0.0.0
NETMASK=255.255.255.0
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
Node2# vi /etc/sysconfig/network-scripts/ifcfg-bond0
Add the following lines:
DEVICE=bond0
IPADDR=10.0.0.2
NETWORK=10.0.0.0
NETMASK=255.255.255.0
USERCTL=no
BOOTPROTO=none
ONBOOT=yes

 

Step :
- Modify the individual network interface configuration files to reflect bonding details. The MASTER clause indicates which logical interface (bond0) this specific NIC belongs to, and the SLAVE clause indicates that it’s one among other NICs that are bonded to the master and a slave to its master.
  Edit eth2 and eth3 configuration files on both machines:
# vi /etc/sysconfig/network-scripts/ifcfg-eth2
Modify/append as follows:
DEVICE=eth2
BOOTPROTO=none
USERCTL=no
MASTER=bond0
SLAVE=yes
# vi /etc/sysconfig/network-scripts/ifcfg-eth3

Make sure file readS as follows for eth3 interface:
DEVICE=eth3
BOOTPROTO=none
USERCTL=no
MASTER=bond0
SLAVE=yes

 

Step :
- Configure the bond driver/module
   The configuration consists of two lines for a logical interface, where miimon (the media independent interface monitor) is configured in milliseconds and represents the link monitoring frequency. Mode indicates the type of configuration that will be deployed between the interfaces that are bonded or paired together. Mode alb  indicates that a round robin policy will be used, and all interfaces will take turns in transmitting; Mode active-backup indicates either of them can be used.Here, only one slave in the bond device will be active at the moment. If the active slave goes down, the other slave becomes active and all traffic is then done via the newly active slave.
 Modify kernel modules configuration file on all the nodes:
# vi /etc/modprobe.conf
Append following two lines:
alias bond0 bonding
options bond0 mode=balance-alb miimon=100

 

Step :
- Test configuration
1. On all the nodes,  load the bonding module, enter:
# modprobe bonding
2. On all the nodes, restart the networking service in order to bring up bond0 interface, enter:
# service network restart
3. Disable restart of crs on both nodes
node1#crsctl disable crs
node2#crsctl disable crs
4. The current status of the bond device bond0 is present in /proc/net/bonding/bond0.Type the following cat command to query the current status of Linux kernel bonding driver:
# cat /proc/net/bonding/bond0
Sample outputs:
Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)
Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:0c:29:2f:ee:13
Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:0c:29:2f:ee:1d

 

5. Display the bonded interface using ‘ifconfig’ command which shows “bond0″ running as the master and both “eth2″ and “eth3″ running as slaves.Also, the hardware address of bond0 and its underlying devices eth2 and eth3 are the same.
# ifconfig
Sample output
bond0     Link encap:Ethernet  HWaddr 00:0C:29:2F:EE:13
          inet addr:10.0.0.1  Bcast:10.0.0.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe2f:ee13/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:4211 errors:0 dropped:0 overruns:0 frame:0
          TX packets:332 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1431791 (1.3 MiB)  TX bytes:68068 (66.4 KiB)
eth0      Link encap:Ethernet  HWaddr 00:0C:29:2F:EE:FF
          inet addr:192.9.201.183  Bcast:192.9.201.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe2f:eeff/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:40757 errors:0 dropped:0 overruns:0 frame:0
          TX packets:44923 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:25161328 (23.9 MiB)  TX bytes:14121868 (13.4 MiB)
          Interrupt:67 Base address:0x2024
eth1      Link encap:Ethernet  HWaddr 00:0C:29:2F:EE:09
          inet6 addr: fe80::20c:29ff:fe2f:ee09/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:5839 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5396 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3141765 (2.9 MiB)  TX bytes:2291293 (2.1 MiB)
          Interrupt:75 Base address:0x20a4
eth2      Link encap:Ethernet  HWaddr 00:0C:29:2F:EE:13
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:3236 errors:0 dropped:0 overruns:0 frame:0
          TX packets:174 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1239253 (1.1 MiB)  TX bytes:34959 (34.1 KiB)
          Interrupt:75 Base address:0x2424
eth3      Link encap:Ethernet  HWaddr 00:0C:29:2F:EE:13
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:975 errors:0 dropped:0 overruns:0 frame:0
          TX packets:158 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:192538 (188.0 KiB)  TX bytes:33109 (32.3 KiB)
          Interrupt:59 Base address:0x24a4
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:10252 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10252 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:7863150 (7.4 MiB)  TX bytes:7863150 (7.4 MiB)

 

6. Get current pvt. network interface configuration being used by cluster
grid Node1$oifcfg getif
eth0  192.9.201.0  global  public
eth1  10.0.0.0  global  cluster_interconnect

 

7. Set new pvt interconnect to bond0 – Updates OCR
grid Node1$oifcfg setif -global bond0/10.0.0.0:cluster_interconnect
8. Restart CRS
Node1#crsctl stop crs
                crsctl start crs
9. check that crs started on both nodes
Node1#crsctl stat res -t
10. Get current pvt. interconnect info
   – will display eth0 – public interconnect
                  eth1 – earlier pvt interconnect
                  bond0 – pvt interconnect
Node1#oifcfg getif
11. Delete earlier pvt interconnect (eth1)
Node1#oifcfg delif -global eth1/10.0.0.0

12. Get current pvt. interconnect info
   – will display eth0 – public interconnect
                  bond0 – pvt interconnect
Node1#oifcfg getif
13. On node2 , remove network adapter eth3 -
- Click VM > settings
- Click on last network adapter (eth3) > Remove > OK
- check that eth3 has been removed – will not be listed
Node2#ifconfig
- check that cluster services are still running on Node2 as eth2 is providing private interconnect service
Node1#crsctl stat res -t
14. On node1 , remove network adapter eth3 -
- Click VM > settings
- Click on last network adapter (eth3) > Remove > OK
- check that eth3 has been removed – will not be listed
Node1#ifconfig
- Check that cluster services are still running on Node2 as eth2 on node1 is providing private interconnect service
Node1#crsctl stat res -t

15. On node2 , remove network adapter eth2 (only adapter left for pvt interconnect)
- Click VM > settings
- Click on last network adapter (eth2) > Remove > OK
- Node2 immediately gets rebooted as it can’t communicate with node1
- Check that node2 is not a part of the cluster any more
Node1#crsctl stat res -t
                              HAPPY BONDING !!!
References:
———————————————————————–
Related Links:

Home
  
11g R2 RAC INDEX           
 
                       
                                        ————————

Your comments and suggestions are welcome!