In this post, I will demonstrate how to recover voting disk in case we lose the only copy of voting disk.Voting disk will be automatically recovered using latest available backup of OCR.
Current scenario:
The only copy of the voting disk is present in test diskgroup on disk ASMDIsk010
The only copy of the voting disk is present in test diskgroup on disk ASMDIsk010
We will corrupt ASMDIsk011 so that we lose the only copy of the voting disk.
We will restore voting disk to another diskgroup using the OCR.
Let’s start …
– Currently, we have 1 voting disk. Let us corrupt it and check if clusterware still continues
– FIND OUT LOCATION OF VOTEDISK
[grid@host01 cssd]$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
– —– —————– ——— ———
1. ONLINE 00ce3c95c6534f44bfffa645a3430bc3 (ORCL:ASMDISK010) [TEST]
– FIND OUT THE NO. OF DISKS IN test DG (CONTAINING VOTEDISK)
ASMCMD> lsdsk -G test
Path
ORCL:ASMDISK010
– Let us corrupt ASMDISK010
— bs = blocksize = 4096
— count = # of blocks overwritten = 1000000 (~1M)
– total no. of bytes corrupted = 4096 * 1000000
(~4096M = size of one partition)
#dd if=/dev/zero of=/dev/oracleasm/disks/ASMDISK010 bs=4096 count=1000000
Here, I was expecting clusterware to stop as the only voting disk was not available but surprisingly clusterware kept running. I even waited for quite some time but to no avail. I would be glad if someone can give more input on this.
Finally, I stopped clusterware and tried to restart it. It was not able to restart.
– Reboot all the nodes and note that cluster ware does not start as voting disk is not accessible.
#crsctl stat res -t
– Now since voting disk can’t be restored back to test diskgroup as disk in test has been corrupted,
we will create another diskgroup votedg where we will restore voting disk.
RECOVER VOTING DISK
– To move voting disk to votedg diskgroup, ASM instance should be up and for ASM
instance to be up, CRS should be up. Hence we will
– stop crs on all the nodes
– start crs in exclusive mode on one of the nodes (host01)
– start asm instance on host01 using pfile (since spfile of ASM instance is on ASM)
– create a new diskgroup votedg
– move voting disk to votedg diskgroup
– stop crs on host01(was running in exclusive mode)
– restart crs on host01
– start crs on rest of the nodes
– start cluster on all the nodes
– IMPLEMENTATION –
- stop crs on all the nodes(if it does not stop, kill ohasd process and retry)
root@hostn# crsctl stop crs -f
– start crs in exclusive mode on one of the nodes (host01)
root@host01# crsctl start crs -excl
– start asm instance on host01 using pfile
grid@host01$ echo INSTANCE_TYPE=ASM >> /u01/app/oracle/init+ASM1.ora chown grid:oinstall /u01/app/oracle/init+ASM1.ora SQL>startup pfile='/u01/app/oracle/init+ASM1.ora';
- create a new diskgroup votedg
– move voting disk to data diskgroup – voting disk is automaticaly recovered using latest available backup of OCR.
root@host01#crsctl replace votedisk +votedg
– stop crs on host01(was running in exclusive mode)
root@host01#crsctl stop crs
– restart crs on host01
root@host01#crsctl start crs
– start crs on rest of the nodes (if it does not start, kill ohasd process and retry)
root@host02#crsctl start crs root@host03#crsctl start crs
– start cluster on all the nodes and check that it is running
root@host01#crsctl start cluster -all crsctl stat res -t
I hope this post was useful.
Regards
————————————————————————————————
——————–
What an Excellent Blog Writing,you are just Outstanding in the explanantion… and thanks a lot for your writings…
Hi Sitaram,
Thanks for your time and feedback!
regards
Anju
Here, I was expecting clusterware to stop as the only voting disk was not available but surprisingly clusterware kept running. I even waited for quite some time but to no avail. I would be glad if someone can give more input on this.
^^
Here I recollect there is some caching concept / mechanism ; And unless the cache gets updated the information doesnt reflect immediate
In my opinion, the clusterware continued because, ocssd just keeps on overwriting and reading the corrupted blocks every second. The blocks are still accessible. When I manually restarted the cluster, it could not locate the voting disk since header of the ASM disk had been corrupted.
what do you think?
regards
Anju
you have mentioned that voting disk is automaticaly recovered using latest available backup of OCR. But in the command you have not mentioned ocr backup file.How is it going to recognize ocr backup file location.
Information about OCR backups is stored in OCR . It is read automatically from OCR .
Hope it helps
Regards
Anju Garg
A very well written article which explains very objectively recovery of voting disk scenario
Thanks for your time and feedback.
Your comments and suggestions are always welcome.
regards
Anju Garg