kfdhdb.vfstart: 96 ; 0x0ec: 0x00000060 <
kfdhdb.vfend: 128 ; 0x0f0: 0x00000080 <
– The voting disk is not striped but put as a whole on ASM Disks
– In the event that the disk containing the voting disk fails, Oracle ASM will choose another disk on which to store this data.
– It eliminates the need for using a third-party cluster volume manager.
– you can reduce the complexity of managing disk partitions for voting disks during Oracle Clusterware installations.
– Voting disk needs to be mirrored, should it become unavailable, cluster will come down. Hence, you should maintain multiple copies of the voting disks on separate disk LUNs so that you eliminate a Single Point of Failure (SPOF) in your Oracle 11g RAC configuration.
– If voting disk is stored on ASM, multiplexing level of voting disk is decided by the redundancy of the diskgroup.
Redundancy of the diskgroup #of copies of voting disk ( Minimum # of disks in the diskgroup)
External 1 1
Normal 3 3
High 5 5
- If voting disk is on a diskgroup with external redundancy, one copy of voting file will be stored on one disk in the diskgroup
.- If we store voting disk on a diskgroup with normal redundancy, we should be able to tolerate the loss of one disk i.e. even if we lose one disk, we should have sufficient number of voting disks so that clusterware can continue. If the diskgroup has 2 disks (minimum required for normal redundancy), we can store 2 copies of voting disk on it. If we lose one disk, only one copy of voting disk will be left and clusterware won’t be able to continue, because to continue, clusterware should be able to access more than half the no. of voting disks i.e.> (2*1/2)
i.e. > 1
Hence, to be able to tolerate the loss of one disk, we should have 3 copies of the voting disk on a diskgroup with normal redundancy . So, a normal redundancy diskgroup having voting disk should have minimum 3 disks in it.
– Similarly, if we store voting disk on diskgroup with high redundancy, 5 Voting Files are placed, each on one ASM Disk i.e a high redundancy diskgroup should have at least 5 disks so that even of we lose 2 disks, clusterware can continue .
– Ensure that all the nodes participating in the cluster have read/write permissions on disks.
– You can have up to a maximum of 15 voting disks. However, Oracle recommends that you do not go beyond five voting disks.
Backing up voting disk
In previous versions of Oracle Clusterware you needed to backup the voting disks with the dd command. Starting with Oracle Clusterware 11g Release 2 you no longer need to backup the voting disks. The voting disks are automatically backed up as a part of the OCR. In fact, Oracle explicitly indicates that you should not use a backup tool like dd to backup or restore voting disks. Doing so can lead to the loss of the voting disk.
Although the Voting disk contents are not changed frequently, you will need to back up the Voting disk file every time
- you add or remove a node from the cluster or
- immediately after you configure or upgrade a cluster.
A node in the cluster must be able to access more than half of the voting disks at any time in order to be able to tolerate a failure of n voting disks. Therefore, it is strongly recommended that you configure an odd number of voting disks such as 3, 5, and so on.
– Check the location of voting disk
grid@host01$crsctl query css votedisk
## STATE File Universal Id File Name Disk group
– —– —————– ——— ———
1. ONLINE 243ec3b2a3cf4fbbbfed6f20a1ef4319 (ORCL:ASMDISK01) [DATA]
Located 1 voting disk(s).
– we can see that only one copy of the voting disk is there on data diskgroup which has external redundancy.
As I mentioned earlier, Oracle writes the voting devices to the underlying disks at pre-designated locations so that it can get the contents of these files when the cluster starts up.
Let’s see that with an actual example. Let’s see the logs from CSS . They are located at $ORACLE_HOME/log//cssd Here is an excerpt from one of the logs. The line says that it found a “potential” voting file on one of the disks – 243ec3b2-a3cf4fbb-bfed6f20-a1ef4319
grid@host01$ vi /u01/app/11.2.0/grid/log/host01/cssd/ocssd.log
search for string potential or File Universal ID – 243ec3……
2012-10-09 03:54:28.423: [ CSSD]clssnmvDiskVerify: Successful discovery for disk ORCL:ASMDISK01, UID 243ec3b2-a3cf4fbb-bfed6f20-a1ef4319,
– Create another diskgroup test with normal redundancy and 2 disks.
– Try to move voting disk from diskgroup data to test diskgroup
– Fails as we should have at least 3 disks in the test diskgropup
[grid@host01 cssd]$ crsctl replace votedisk +test
Failed to create voting files on disk group test.
Change to configuration failed, but was successfully rolled back.
CRS-4000: Command Replace failed, or completed with errors.
– Add another disk to test diskgroup and mark it as quorum disk. The quorum disk is one small Disk (300 MB should be on the safe side here, since the Voting File is only about 280 MB in size) to keep one Mirror of the Voting File. Other two disks will contain each one Voting File and all the other stripes of the Database Area as well, but quorum will only get that one Voting File.
– Now try to move the voting disk from data diskgroup tp test diskgroup
– Now the operation is successful
[grid@host01 cssd]$ crsctl replace votedisk +test
Successful addition of voting disk 00ce3c95c6534f44bfffa645a3430bc3.
Successful addition of voting disk a3751063aec14f8ebfe8fb89fccf45ff.
Successful addition of voting disk 0fce89ac35834f99bff7b04ccaaa8006.
Successful deletion of voting disk 243ec3b2a3cf4fbbbfed6f20a1ef4319.
Successfully replaced voting disk group with +test.
CRS-4266: Voting file(s) successfully replaced
– Check the ocssd.log – search for 00ce3c9……
2012-10-09 05:08:19.484: [ CSSD] Listing unique IDs for 3 voting files:
2012-10-09 05:08:19.484: [ CSSD] voting file 1: 00ce3c95-c6534f44-bfffa645-a3430bc3
2012-10-09 05:08:19.484: [ CSSD] voting file 2: a3751063-aec14f8e-bfe8fb89-fccf45ff
2012-10-09 05:08:19.484: [ CSSD] voting file 3: 0fce89ac35834f99bff7b04ccaaa8006
I hope this information was useful.
Keep visiting the blog. Thanks for your time!
11G R2 RAC Index
11g R2 RAC: GPNP Profile Demystified
11g R2 RAC: How To Identify The Master Node In RAC
11g R2 RAC:Node Eviction Due To CSSDagent Stopping
11g R2 RAC : Node Eviction Due To Member Kill Escalation
11g R2 RAC: Node Eviction Due To Missing Disk Heartbeat
11g R2 RAC: Node Eviction Due To Missing Network Heartbeat
11g R2 RAC : OCR Demystified
11g R2 RAC : OLR Demystified
How Does 11G R2 Clusterware Start ASM When ASM SPfile Is On ASM Itself?
Cache Fusion Demonstrated
Instance Recovery In RAC
Need For VIP In RAC
Recover Voting Disk – Scenario-I
Recover Voting Disk – Scenario-II
43 thoughts on “11g R2 RAC : VOTING DISK DEMYSTIFIED”
Worth reading..!!! Keep up the good work..!!!
we have 3 node RAC in RAID 10 SAN,but unfortunately configuration done in ASM external redundancy,what you think about this configuration?what is the possibility of recovery if the file is corrupt?as per i know, as we have R10 file can be recover if the disk is fail.But my concern is what happen if the particular file is corrupt?
Your suggestions/comment regarding my concerns will be highly appreciated.
You can create another diskgroup with normal/high redundancy with 3/5 disks and move voting disk to that diskgroup even now.
Yes Anju , i am aware of that,but just need to know that, if i let as it is.will it be a disaster?is this a worst configuration of production RAC.?
If you have oracle backup (RMAN, ocrconfig etc) of the corrupt file, you can recover corrupted file from its backup. In case you do not have oracle backup, then it can be recovered at RAID level .
I have a doubt arising into my mind from quite a long time regarding maximum number of voting disks can be 32 in oracle .
If that is the case then since there should be odd number of voting disks , why do we have 32 maximum voting disks as 32 is an even number.
Maximum no. of voting disks supported are 15.
If you are storing voting disk on ASM, no. of voting disks is decided by redundancy of diskgroup.
Hence maximum no. of voting disks supported = 5 on high redundancy disk group
If voting disk is on raw device then you can have 15 voting disks on 15 disks.
Thanks for your reply . As I have gone through so many docs I have read oracle can support upto 32 voting disk.
I am totally agree with you below suggestion:
If you are storing voting disk on ASM, no. of voting disks is decided by redundancy of diskgroup..
I will be thankful to you if you clear why 32 maximum.
Please refer to the following link for oracle documentation.
It clearly mentions that max. No. of voting disks supportetd = 15
What would happen in the scenario of a 3 node RAC with 3 voting disks and the interconnect for each node goes down?
If each node can still see the voting disks but not each other how is it decided which node(s) to evict?
What are the rules for the number of voting disks for RAC’s with more than 2 nodes?
The node who takes control of the DB control file first remains in the cluster and the other are evicted and reboot takes place.
No. of VD should be odd and depends on the level of redundancy chosen.
good article, keep sharing nice notes with us. thanks
Thanks for your time.
Your comments and suggestions are always welcome!
Very good article!
one of the best article on VD .. clearing many basic doubts.. your articles are worth reading everytime.
Thanx for your time Zeeshan.
Your comments and suggestions are always welcome.
Very Good article..I have a doubt..recently due to SAN failover issue all 3 votig disks were inaccessible and the nodes rebooted..we want to avoid this situation in future. If you can suggest us on how to manage our voting disks , we thought some options like
1) Keep voting disks on different controllers(1 on NFS 2 on controller 1 and 2 on other controller 2)
2) Currently voting disks are on NFS so if we move it to ASM, will that going to help
3) Keep voting disks on seperate NFS mount points
I have a question regarding failure of voting disk.
“Loss of more than half your voting disks will cause the entire cluster to fail !!”. This means in case we have 3 voting disks and one of them fails the cluster is still up, in this case we have 2 voting disks active and the cluster is up, now if each node is able to access only one voting disk, and there is no common disk, where the heartbeat of both the nodes can be checked. What will happen in this scenario.
If there are 3 voting disks in the cluster, each node should be able to access more than half i.e. 2 voting disks to be able to be part of the cluster. Hope this answers your question.
Thanks for your reply.
In the scenario where there are 3 voting disks, and one of the disk becomes unavailable, and there are 2 nodes that are part of the cluster.
My concern here is if each node is able to access one voting disk, and the voting disk is not in common between the nodes. How will the behavior be.
As I said earlier, a node can be part of the cluster only if it can access more than half the voting disks i.e 2 in this case. If each node is able to access only one voting disk, none of them can join the cluster i.e. cluster won’t come up.
Hi Anju –
In a two node RAC and 3 voting disk configuration, it does not necessarily mean that there should be a common voting disk file accessible right? from some reason because of storage issue, if node 1 can not see voting disk 1 and 2 and node 2 can not see voting disks 1 and 3 then how the behaviour of CRS stack be like ? how the eviction happens in this case?
Sorry for late reply. It is necessary for a node to be able to access more than half the no. of voting disks so that it can remain in the cluster. So If you are having 3 voting disks and each node is able to access at least 2 voting disks, node eviction will not happen.
Hope it answers your question.
You should seriously consider writing a book. This level of info and detail is difficult to find in all but the very dry Oracle documentation.
Cheers for your efforts.
Awesome knowledge sharing!
i Have some doubts.
1. what is heartbeat in rac?
2. How RAC take the information from voting disk?
Thanks in advance
In a cluster each node should about know about presence of every other node in the cluster. For this, CSSD process on every node pings CSSD process of every node in the cluster. This is called Network heartbeat. The information about every node and its status is stored in voting disk. CSSD process of every node updates the voting disk every second with following info:
– The node’s active status
– The nodes in the cluster it is able to see over the network
This is called disk heartbeat.
It is from voting disk that clusterware knows about the status of every node in cluster and also communicability among various nodes.
Hope it helps
My Question on Voting Disk example (odd number)
“For example, let’s have a two node cluster with an even number of let’s say 2 voting disks. Let Node1 is able to access voting disk1 and Node2 is able to access voting disk2 . This means that there is no common file where clusterware can check the heartbeat of both the nodes.”
You told a 2 node rac with 2 voting disks. it means every node must be able to access the 2 voting disks ( becoz more than half, it means it must be access two voting disks. so common file is there between both) not one disk. but you told that one node accessing one vote disk.
I think your example is not correct for that question.
I have updated the post. Thanks for your feedback.
in 2 node rac we have 3 voting disks,if one voting disk is corrupted. in this scenario what happens to my rac cluster? is it running or not?
If one voting disk is corrupted and both the remaining voting disks are accessible by both the nodes, cluster will survive.
i have the answer for odd number of vote disks.
Here i am telling about 2-NODE RAC
CASE-1: suppose cluster have the 2 vote disks. it means,
1. every node must access the 2 vote disks.
2. if one vote disk is corrupted then entire cluster will down/crash.
CASE-2: suppose cluster have the 3 voting disks.
1. it means every must access the 2 vote disks.
2. if one vote disk is corrupted, not a problem for cluster because 2 nodes access the 2
3. if two vote disks are corrupted then entire cluster will crash.
CASE-3: suppose cluster have the 4 vote disks.
1. it means the every node must access the 3 vote disks.
2. if one vote disk is corrupted , no problem for cluster.
3. if 2 vote disks are corrupted then cluster will down.
CONCLUSION: in 3 cases second case (CASE-2) is better than others.
** if it is wrong, please tell me the right answer***
Thanks and Regards,
As you have analyzed, whether we have 3 vote disks or 4 vote disks, we can tolerate the failure of one vote disk only. Similarly whether we have 5 vote disks or 6 vote disks, we can tolerate the failure of 2 vote disks only. Hence, it is better to have 3 or 5 vote disks if failure of one or two voting disk respectively is desirable.
is there a way to see which node is voting on which voting disk ?
Each node votes to each every voting disk every second.
Hi Anuj, very good and informative article.
I have a query though. As part of maintenance the servers (two node RAC 18.104.22.168.0) were rebooted. First node was choosen to reboot first then the second one. Everything is fine after reboot, except that the state is shown as OFFLINE for OCR/VOTEDISK disk group in first node in “crsctl stat res -t”.
What went wrong and how can it be made ONLINE again?
Thanks & Regards,
– Check if shared storage is accessible from first node.
– Mount the disk group from SQL or using crsctl start res ….
Thanks, it is resolved by simply restarting the disk group manually using srvctl.
But it seems to be a bug…
Does the 3 voting disk get info at the same time from the nodes or it will be asynchronus? All voting disk contain similar info? suppose if I have 4 nodes and 3 voting disk which node will access which voting disk?
Each node accesses each voting disk for read (to check accessibility of disk by other nodes) and write (to mark its presence) every second. All the voting disks will contain similar info unless any of them cannot be accessed by any of the nodes .
Hope it helps.
Thank you so much mam… Very clear now..
All my doubts are gone now….Thanks man
Appreciate your work
Its awesome man !!! very good explanation .keep it up .