In my earlier post, I had discussed about reboot-less node fencing , a new feature introduced since 11.2.0.2. In this post, I will demonstrate reboot-less node fencing when disk heartbeat is lost.
– Check that clusterware version is 11.2.0.3
[root@host02 ~]# crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.3.0]
– check that both the nodes in the cluster are active
[root@host02 ~]# olsnodes -s host01 Active host02 Active
– Stop ISCSI service on node2
[root@host02 ~]# service iscsi stop Logging out of session [sid: 1, target: iqn.2006-01.com.openfiler:tsn.e55ea88d0212, portal: 192.9.201.182,3260] Logout of [sid: 1, target: iqn.2006-01.com.openfiler:tsn.e55ea88d0212, portal: 192.9.201.182,3260]: successful Stopping iSCSI daemon:
– Alert log of node2 –
– Note that instead of rebooting the node, CRSD resources are cleaned up
[cssd(2876)]CRS-1649:An I/O error occured for voting file: ORCL:ASMDISK013; details at (:CSSNM00059:) ... [cssd(2876)]CRS-1606:The number of voting files available, 0, is less than the minimum number of voting files required, 1, resulting in CSSD termination to ensure data integrity; [cssd(2876)]CRS-1656:The CSS daemon is terminating due to a fatal error; [cssd(2876)]CRS-1652:Starting clean up of CRSD resources. 2013-10-09 11:04:30.795 ... [cssd(2876)]CRS-1654:Clean up of CRSD resources finished successfully. 2013-10-09 11:04:31.914
— Check that OHAS service is still up on host02
[root@host02 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
– Check that resources cssd , crsd and HAIP are down on host02[
[root@host02 ~]# crsctl stat res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE OFFLINE ora.cluster_interconnect.haip 1 ONLINE OFFLINE ora.crf 1 ONLINE ONLINE host02 ora.crsd 1 ONLINE OFFLINE ora.cssd 1 ONLINE OFFLINE STARTING ora.cssdmonitor 1 ONLINE ONLINE host02 ora.ctssd 1 ONLINE OFFLINE ora.diskmon 1 OFFLINE OFFLINE ora.drivers.acfs 1 ONLINE ONLINE host02 ora.evmd 1 ONLINE OFFLINE ora.gipcd 1 ONLINE ONLINE host02 ora.gpnpd 1 ONLINE ONLINE host02 ora.mdnsd 1 ONLINE ONLINE host02
–Check that host02 is no longer a part of the cluster
[root@host01 cluster01]# olsnodes -s
host01 Active
host02 Inactive
– Restart ISCSI service on host02
[root@host02 ~]# service iscsi start iscsid dead but pid file exists Turning off network shutdown. Starting iSCSI daemon: [ OK ] [ OK ] Setting up iSCSI targets: Logging in to [iface: default, target: iqn.2006- 01.com.openfiler:tsn.e55ea88d0212, portal: 192.9.201.182,3260] Login to [iface: default, target: iqn.2006-01.com.openfiler:tsn.e55ea88d0212, portal: 192.9.201.182,3260]: successful [ OK ]
- Alert log of host02
– Note that as soon as ISCSI service is started, CSSD service starts immediately and host02 joins the cluster
[cssd(5481)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details ... 2013-10-09 11:10:43.897 [cssd(5481)]CRS-1707:Lease acquisition for node host02 number 2 completed 2013-10-09 11:10:47.629 [cssd(5481)]CRS-1605:CSSD voting file is online: ORCL:ASMDISK013; details in /u01/app/11.2.0/grid/log/host02/cssd/ocssd.log. 2013-10-09 11:10:54.652 [cssd(5481)]CRS-1601:CSSD Reconfiguration complete. Active nodes are host01 host02 .
– check that resources haip, cssd and crsd have started on host02
[root@host02 ~]# crsctl stat res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE ONLINE host02 Started ora.cluster_interconnect.haip 1 ONLINE ONLINE host02 ora.crf 1 ONLINE ONLINE host02 ora.crsd 1 ONLINE ONLINE host02 ora.cssd 1 ONLINE ONLINE host02 ora.cssdmonitor 1 ONLINE ONLINE host02 ora.ctssd 1 ONLINE ONLINE host02 OBSERVER ora.diskmon 1 OFFLINE OFFLINE ora.drivers.acfs 1 ONLINE ONLINE host02 ora.evmd 1 ONLINE ONLINE host02 ora.gipcd 1 ONLINE ONLINE host02 ora.gpnpd 1 ONLINE ONLINE host02 ora.mdnsd 1 ONLINE ONLINE host02
– Check that host02 has joined the cluster
[root@host02 ~]# olsnodes -s host01 Active host02 Active
References:
http://ora-ssn.blogspot.in/2011/09/reboot-less-node-fencing-in-oracle.html
http://www.trivadis.com/uploads/tx_cabagdownloadarea/Trivadis_oracle_clusterware_node_fencing_v.pdf
http://www.vmcd.org/2012/03/11gr2-rac-rebootless-node-fencing/
———————————————————————————————
Related Links:
11g R2 RAC: Node Eviction Due To Missing Network Heartbeat
11g R2 RAC :Reboot-less Node Fencing
11g R2 RAC: Reboot-less Fencing With Missing Network Heartbeat
————–