Prior to, during failures of certain Oracle RAC-required subcomponents (e.g. private interconnect, voting disk etc.) , Oracle Clusterware tried to prevent a split-brain with a fast reboot of the server(s) without waiting for ongoing I/O operations or synchronization of the file systems. As a result, non-cluster-aware applications would be forcibly shut down. Moreover, during reboots, resources need to re-mastered across the surviving nodes . In a big cluster with many numbers of nodes, this can be potentially a very expensive operation.

This mechanism has been changed in version (first 11g Release 2 patch set).

After deciding which node to evict,

– the clusterware will attempt to clean up the failure within the cluster by killing only the offending process(es) on that node . Especially I/O generating processes are killed .

– If all oracle resources/processes can be stopped and all IO generating processes can be killed,

  • clusterware resources will stop on the node
  • Oracle High Availability Services Daemon will keep on trying to restart the  Cluster Ready Services (CRS) stack again.
  • Once the conditions to start  CRS stack are re-established, all relevant cluster resources on that node will automatically start.

– If, for some reason, not all resources can be stopped or IO generating processes cannot be stopped completely (hanging in kernel mode, I/O path, etc.) ,

  • Oracle Clusterware will still perform a reboot or use IPMI to forcibly evict the node from the cluster as earlier.

This behavior change is particularly useful for non-cluster aware applications as the data will be protected by shutting down the cluster only on the node without rebooting the node itself.

I will demonstrate this functionality in two scenarios :

Failure of network heartbeat
Failure of DISK heartbeat




Related Links:


11g R2 RAC Index

11g R2 RAC: Node Eviction Due To Missing Network Heartbeat 
 11g R2 RAC: Reboot-less Fencing With Missing Network Heartbeat
11g R2 RAC :Reboot-less  Fencing With Missing Disk Heartbeat




5 thoughts on “11g R2 RAC : REBOOT-LESS NODE FENCING

  1. I have seen a situation where a node reboot created a split brain, and the reboot-less-node-fencing have taken place in node 2 which was the running node…
    The cohort calculation is a mystery, map type is a mystery too..

    The message was as follows;

    Aborting local node to avoid splitbrain. Cohort of 1 nodes with leader 2, “dbnode2″, is smaller than cohort of 1 nodes led by node 1, dbnode1, based on map type 2

    My opinion was :
    It seems , during the poweroff, somethings got hang and that’s why there was a disk heartbeat but no network heartbeat..
    So during the manual shutdown of node 1, node 2 could see the disk heartbeat but it had no network communication , that’s why thought that the network connection is lost
    As ,it could see the disk heartbeat of node1 , it thought that node1 is alive and it thought the problem is created by itself , thus shutdown its services..

    What is your opinion?

    1. Hi Erm

      It seems that although network connectivity was lost between the two nodes, node1 was still able to perform disk heartbeat until shortdisktimeout (SDTO). So there was a situation of split brain when both the nodes could perform disk heartbeat without having network heartbeat. That’s why the cluster was divided into two sub-clusters – one with node1 and the other with node2. Based on node ID, node2 was evicted.


      1. I think it was because : “In a 2 node cluster always the node with the higher node number will get evicted, regardless from which node the cable was unplugged.”

  2. It can also be said: Shutdown of RAC nodes should be done gracefully. An improper shutdown , or a hang in the shutdown operation in 1 node, may trigger a cluster failure if the cluster is a 2 node cluster.
    Think about a scenario where you have ACFS filesystems on top of ASM and you place your Virtual Operating system ‘s filesystems on top of these ACFS filesystems. So once , the node is evicted ASM will be terminated and ACFS filesystems will be offline. Thus, Virtual OS on these ACFS file systems will see the disks read only. Even after the problem will be fixed, there should be a manual intervention for the OS to mount the disks read write.
    Also , if the node that is shutting down is in hang situation, then the problem wont be fixed for a long time. It is just in a hang situation.. So this will make the second node to be evicted and create a downtime in the Oracle Services in cluster level .. A manual intervention will be needed at this point for shutting down the hang node.. Maybe a hard reboot..
    What do you think?

    1. You are right.
      As far as ACFS file systems are concerned, I think they will be automatically brought up after the problem is fixed, if they have been registered as cluster resource with automatic management policy.


Your comments and suggestions are welcome!