In this post, I will demonstrate how to recover voting disk in case we lose 2 out of 3 copies of voting disk.
In this case, voting disk will be recovered using surviving copy of voting disk.
Current scenario:
3 copies of voting disk are present in test diskgroup on disks ASMDIsk010, ASMDIsk011, ASMDIsk012.
We will corrupt two disks ASMDIsk010, ASMDIsk011 so that ASMDISK012 still has a copy of the voting disk. We will restore voting disk to another diskgroup using the only valid copy we have.
Let’s start ...
– Currently, we have 3 voting disks. AT least 2 should be accessible for the clusterware to work. Let us corrupt one of the voting disks and check if clusterware still continues
– FIND OUT LOCATION OF VOTEDISK
[grid@host01 cssd]$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
– —– —————– ——— ———
1. ONLINE 00ce3c95c6534f44bfffa645a3430bc3 (ORCL:ASMDISK012) [TEST]
2. ONLINE a3751063aec14f8ebfe8fb89fccf45ff (ORCL:ASMDISK010) [TEST]
3. ONLINE 0fce89ac35834f99bff7b04ccaaa8006 (ORCL:ASMDISK011) [TEST]
Located 3 voting disk(s).
– FIND OUT THE NO. OF DISKS IN test DG (CONTAINING VOTEDISK)
ASMCMD> lsdsk -G test
Path
ORCL:ASMDISK010
ORCL:ASMDISK011
ORCL:ASMDISK012
-- Let us corrupt ASMDISK010
– bs = blocksize = 4096
– count = # of blocks overwritten = 1000000 (~1M)
– total no. of bytes corrupted = 4096 * 1000000
(~4096M = size of one partition)
#dd if=/dev/zero of=/dev/soracleasm/disks/ASMDISK010 bs=4096 count=1000000
- CHECK THAT C/W KEEPS RUNNING AS 2 VOTING DISKS (MORE THAN HALF OF
VOTING DISKS) STILL AVAILABLE
#crsctl stat res -t
-- Now let us corrupt ASMDISK011
– bs = blocksize = 4096
– count = # of blocks overwritten = 1000000 (~1M)
– total no. of bytes corrupted = 4096 * 1000000
(~4096M = size of one partition)
#dd if=/dev/zero of=/dev/oracleasm/disks/ASMDISK011 bs=4096 count=1000000
Here, I was expecting clusterware to stop as only 1 voting disk ( < half of total(3)) were available but surprisingly clusterware kept running. I event waited for quite some time but to no avail. I would be glad if someone can give more input on this.
Finally, I stopped clusterware and tried to restart it. It was not able to restart.
- CHECK THAT C/W IS NOT RUNNING
#crsctl stat res -t
– Now we have one copy of the voting disk on one of the disks in test diskgroup we can use that copy to get voting disk back. Since voting disk can’t be restored back to test diskgroup as disks in test have been corrupted, we will restore voting disk to data diskgroup .
-- RECOVER VOTING DISK –
– To move voting disk to data diskgroup, ASM instance should be up and for ASM instance to be up, CRS should be up. Hence we will
– stop crs on all the nodes
– start crs in exclusive mode on one of the nodes (host01)
– start asm instance on host01 using pfile (since spfile of ASM instance is on ASM)
– move voting disk to data diskgroup.
– drop test diskgroup (it will allow as it does not have voting disk any more)
– stop crs on host01(was running in exclusive mode)
– restart crs on host01
– start crs on rest of the nodes
– start cluster on all the nodes
-- IMPLEMENTATION –
- stop crs on all the nodes(if it does not stop, kill ohasd process and retry)
root@hostn# crsctl stop crs -f
– start crs in exclusive mode on one of the nodes (host01)
root@host01# crsctl start crs -excl
- start asm instance on host01 using pfile (since spfile is on ASM)
grid@host01$ echo INSTANCE_TYPE=ASM >> /u01/app/oracle/init+ASM1.ora chown grid:oinstall /u01/app/oracle/init+ASM1.ora SQL>startup pfile='/u01/app/oracle/init+ASM1.ora';
- Check that data diskgroup is mounted on host01. if not, mount it.
ASMCMD>lsdg mount data
- move voting disk to data diskgroup. voting disk will be automatically recovered using surviving copy of voting disk.
root@host01#crsctl replace votedisk +data
- drop test diskgroup (it will allow as it does not have voting disk)
SQL>drop diskgroup test force including contents;
– stop crs on host01(was running in exclusive mode)
root@host01#crsctl stop crs
– restart crs on host01
root@host01#crsctl start crs
- start crs on rest of the nodes (if it does not start, kill ohasd process and retry)
root@host02#crsctl start crs root@host03#crsctl start crs
– start cluster on all the nodes and check that it is running
root@host01#crsctl start cluster -all crsctl stat res -t
————————————————————————————————
Related links:
———————
Hi Anju,
We had 3 disks in test diskgroup.
Path
ORCL:ASMDISK010
ORCL:ASMDISK011
ORCL:ASMDISK012
Will it create 3 copies after moving it to data diskgroup or only one copy ??
Regards,
Varun
Hi Varun,
It depends on redundancy of the test diskgroup . 1, 2 or 3 copies will be created for external, normal and high redundancy respectively.
Regards
ANju
Thanks Anju.