RECOVER VOTING DISK – SCENARIO-I

In this post, I will demonstrate how to recover voting disk in case we lose the only copy of voting disk.Voting disk will be automatically recovered using latest available backup of OCR.

 

Current scenario:
The only copy of the voting disk is  present in test diskgroup   on disk ASMDIsk010
We will corrupt ASMDIsk011 so that we lose the only copy of the voting disk.
We will restore voting disk to another diskgroup using the OCR.

 

Let’s start …
– Currently, we have 1 voting disk. Let us corrupt it and check if clusterware still continues
– FIND OUT LOCATION OF VOTEDISK
[grid@host01 cssd]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
–  —–    —————–                ——— ———
 1. ONLINE   00ce3c95c6534f44bfffa645a3430bc3 (ORCL:ASMDISK010) [TEST]

 

– FIND OUT THE NO. OF DISKS IN test DG (CONTAINING VOTEDISK)
ASMCMD> lsdsk -G test
Path
ORCL:ASMDISK010

 

– Let us corrupt ASMDISK010

— bs = blocksize = 4096

— count = # of blocks overwritten = 1000000 (~1M)

– total no. of bytes corrupted = 4096 * 1000000
                                 (~4096M = size of one partition)
#dd if=/dev/zero of=/dev/oracleasm/disks/ASMDISK010 bs=4096 count=1000000
Here, I was expecting clusterware to stop as the  only  voting disk was not  available but surprisingly clusterware kept running. I even waited for quite some time but to no avail.  I would be glad if someone can give more input  on this.
Finally, I stopped clusterware and tried to restart it. It was not able to restart.

– Reboot all the nodes and note that cluster ware does not start as voting disk is not accessible.

#crsctl stat res -t
– Now since voting disk can’t be restored back to test diskgroup as disk in test has been corrupted,
   we will create another diskgroup votedg where we will restore voting disk.

 

RECOVER VOTING DISK
– To move voting disk to votedg diskgroup, ASM instance should be up and for ASM
   instance to be up, CRS should be up. Hence we will
     – stop crs on all the nodes
     – start crs in exclusive mode on one of the nodes (host01)
     – start asm instance on host01 using pfile (since spfile of ASM instance is on ASM)
     – create a new diskgroup votedg
     – move voting disk to votedg  diskgroup
     – stop crs on host01(was running in exclusive mode)
     – restart crs on host01
     – start crs on rest of the nodes
     – start cluster on all the nodes

 

– IMPLEMENTATION –
    - stop crs on all the nodes(if it does not stop, kill ohasd process and retry)
root@hostn# crsctl stop crs -f
     – start crs in exclusive mode on one of the nodes (host01)
root@host01# crsctl start crs -excl
     – start asm instance on host01 using pfile 
grid@host01$ echo INSTANCE_TYPE=ASM >> /u01/app/oracle/init+ASM1.ora 
             chown grid:oinstall /u01/app/oracle/init+ASM1.ora 
 SQL>startup pfile='/u01/app/oracle/init+ASM1.ora';
- create a new diskgroup votedg

– move voting disk to data diskgroup – voting disk is automaticaly recovered using latest available backup of OCR.
root@host01#crsctl replace votedisk +votedg
     – stop crs on host01(was running in exclusive mode)
root@host01#crsctl stop crs
     – restart crs on host01
root@host01#crsctl start crs
     – start crs on rest of the nodes (if it does not start, kill ohasd process and retry)
root@host02#crsctl start crs 
root@host03#crsctl start crs
     – start cluster on all the nodes and check that it is running
root@host01#crsctl start cluster -all 
            crsctl stat res -t

I hope this post was useful.
Regards

————————————————————————————————

                                                 ——————–

4 thoughts on “RECOVER VOTING DISK – SCENARIO-I

  1. What an Excellent Blog Writing,you are just Outstanding in the explanantion… and thanks a lot for your writings…

  2. Here, I was expecting clusterware to stop as the only voting disk was not available but surprisingly clusterware kept running. I even waited for quite some time but to no avail. I would be glad if someone can give more input on this.

    ^^
    Here I recollect there is some caching concept / mechanism ; And unless the cache gets updated the information doesnt reflect immediate

    1. In my opinion, the clusterware continued because, ocssd just keeps on overwriting and reading the corrupted blocks every second. The blocks are still accessible. When I manually restarted the cluster, it could not locate the voting disk since header of the ASM disk had been corrupted.
      what do you think?

      regards
      Anju

Your comments and suggestions are welcome!