WORKLOAD MANAGEMENT IN 11g R2 RAC : FAILOVER

In this post, I will discuss about various types of failovers that can be configured in 11g R2 RAC for workload management.
In RAC we have multiple instances of a database running on multiple servers.
Workload management involves :
   – Failover : If connection to an instance fails, client should automatically connect to another instance
   – Load balancing : Workload should spread across various instances to obtain maximum performance/throughput
Failover can be
   – Connect Time Connection Failover (CTCF) while making initial connection ()
   - Run time connection failover (RTCF) after connection has been established or (transparent application failover (TAF)
     FAILOVER_METHOD = NONE, BASIC, PRECONNECT
     FAILOVER TYPE   = SESSION, SELECT
Load balancing can be :
  - Connect time load balancing (CTLB)
    . On client side
    . On server side
  –  Run time load balancing (RTLB)
TO configure failover/load balancing, tnsnames.ora should contain multiple listener addresses to connect to multiple instances e.g.
Multiple Listener addresses within a description: i.e. User is trying to connect to a service which is supported by multiple instances.
RAC=
(DESCRIPTION=
  (ADDRESS_LIST=
    (ADDRESS= (PROTOCOL=TCP) (HOST=node1-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node2-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node3-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node4-vip) (PORT=1521))
                      )
   (CONNECT_DATA= (SERVICE_NAME= RAC))
       )
In case SCAN is used, SCAN name is used in the address which resolves to 3 SCAN listeners.
e.g.
RAC=
(DESCRIPTION=
  (ADDRESS_LIST=
    (ADDRESS= (PROTOCOL=TCP) (HOST=cluster01-scan) (PORT=1521))
                      )
   (CONNECT_DATA= (SERVICE_NAME= RAC))
       )
In this post, I will discuss in detail about Failover. To know more about load balancing, please click the following link:
FAILOVER
——–
 Failover refers to the mechanism of switching over to an alternate resource when connection to the primary resource gets terminated due to any reason. In context of RAC it implies that when a user tries to connect to one instance of the database and connection fails due to some reason , he should be able to switch to another instance. Connection failure may be broadly categorized as:
- Failure while making the initial connection to an instance (Connect Time Connection Failover (CTCF)
- Failure after a connection has been successfully established (transparent application failover (TAF) or  Run time connection failover (RTCF))
Failure while making the initial connection to an instance (Connect Time Connection Failover (CTCF) : It could occur for example when listener on a node is not working or a node is not up. If your attempt to connect to an instance fails, you can make the connection attempt again, but this time to a backup instance. As long as you have backup instances configured, you can continue trying to connect, until a successful connection is established. This process is technically termed as Connect Time connection Failover (CTCF).
Failure after a connection has been successfully established (transparent application failover (TAF) or  Run time connection failover (RTCF) :It happens as a result of a node crash or an instance crash. In such cases the application  has to reconnectto a backup instance, reestablish the session environment, and resubmit any work lost, because of the break in the connection. The technical name for this type of failover mechanism is server side transparent application failover (TAF) or  Run time connection failover (RTCF).
How to configure Connect Time Connection Failover (CTCF)
Connect Time Failover can be achieved by the setting FAILOVER=ON/TRUE/YES (default) in DESCRIPTION_LIST, DESCRIPTION or ADDRESS_LIST in tnsnames.ora.
RAC=
(DESCRIPTION=
  (ADDRESS_LIST=
                       (FAILOVER=ON)
    (ADDRESS= (PROTOCOL=TCP) (HOST=node1-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node2-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node3-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node4-vip) (PORT=1521))
                      )
   (CONNECT_DATA= (SERVICE_NAME= RAC))
       )
 The default value of parameter FAILOVER is set to ON so that Oracle Net Services tries the first address in the list and if that fails, it fails over to the next one in the address list. Without this, Oracle would try only one address from the list and report an error on connection failure.
Note: Connect Time Failover only works if you are using dynamic registration. This means that this feature won’t work if you have something like this configured in your listener.ora:
SID_LIST_MYLISTENER=
(SID_LIST=
(SID_DESC=
(GLOBAL_DBNAME=sales.us.acme.com)
                                                (ORACLE_HOME=/s01/app/oracle/ora/prod)
(SID_NAME=sghosh)
)
    )
Any reference to Global_dbname and you can forget about Connect Time Failover.
How to configure Run Time Connect Failover (RTCF) or Server side TAF: 
It can be achieved by creating a service for the database and define TAF Policy for the service. (METHOD = BASIC/PRECONNECT) and TYPE (SELECT/SESSION). In this case, connect-descriptor in tnsnames.ora contains  a FAILOVER_MODE portion e.g.
RAC=
(DESCRIPTION=
  (ADDRESS_LIST=
                       (FAILOVER=ON)
    (ADDRESS= (PROTOCOL=TCP) (HOST=node1-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node2-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node3-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node4-vip) (PORT=1521))
                      )
   (CONNECT_DATA= (SERVICE_NAME= RAC)
                      (FAILOVER_MODE=(TYPE=select)(METHOD=basic))
       )
Unlike connect time failover, which is invoked before the connection is made, TAF comes into play after the connection is made (and then, broken). If the connection is lost while the application is running, Oracle Net will transparently reconnect the application to another instance accessing the same database.
TAF supports two types of failover (TYPE)
- SESSION:
   . Any uncommitted transactions will be rolled back.
   . The session will be connected to another instance
- SELECT:
  . The session will be connected to another instance
  . A SELECT statement that was executing at the time of failure will be re-executed by the new session using the same SCN and fetched rows will be discarded up to the point that the original query failed.
  . Any uncommitted transactions will be rolled back
TAF supports two methods of failover (METHOD): Method determines which method of failovertakes place on instance failure from the primary node to secondary.
- BASIC: You connect to the backup instance when the primary connection fails. This option requires almost no work on the backup server until failover time.
- PRECONNECT: you connect to the backup instance at the same time you connect to the primary instance. This has the obvious benefit of having a backup connection available all of the time, thus reducing the time of ‘failover’. But the downside is that you have to pay the extra ‘cost’ in terms of resources spent, of having a backup connection open all the while.
SUMMARY:
Workload management involves :
   – Failover : If connection to an instance fails, client should automatically connect to another instance
   – Load balancing : Workload should spread across various instances to obtain maximum performance/throughput
Failover can be 
   – while making initial connection (Connect Time Connection Failover (CTCF))
   – after connection has been established (transparent application failover (TAF) or  Run time connection failover (RTCF))
     FAILOVER_METHOD = NONE, BASIC, PRECONNECT
     FAILOVER TYPE   = SESSION, SELECT
TO configure failover/load balancing, tnsnames.ora should contain multiple listener addresses to connect to multiple instances
Connect Time Connection Failover (CTCF) : If your attempt to connect to an instance fails, you can make the connection attempt again, but this time to a backup instance. Connect Time Failover can be achieved by the setting FAILOVER=ON/TRUE/YES (default) in DESCRIPTION_LIST, DESCRIPTION or ADDRESS_LIST in tnsnames.ora.
Run Time Connect Failover (RTCF) or Server side TAF: If the connection is lost while the application is running, Oracle Net will transparently reconnect the application to another instance accessing the same database. It can be achieved by creating a service for the database and define TAF Policy for the service. (METHOD = BASIC/PRECONNECT) and TYPE (SELECT/SESSION). In this case, connect-descriptor in tnsnames.ora contains  a FAILOVER_MODE portion .
References:

 

—————————————————————————————————-

Related links:

Home

11G R2 RAC Index
11g R2 RAC : Services
Service Created Using EM Does Not Update DD or OCR

                                 ———-

One thought on “WORKLOAD MANAGEMENT IN 11g R2 RAC : FAILOVER

Your comments and suggestions are welcome!