Category Archives: 11g R2 RAC

RECOVER VOTING DISK – SCENARIO-I

In this post, I will demonstrate how to recover voting disk in case we lose the only copy of voting disk.Voting disk will be automatically recovered using latest available backup of OCR.

 

Current scenario:
The only copy of the voting disk is  present in test diskgroup   on disk ASMDIsk010
We will corrupt ASMDIsk011 so that we lose the only copy of the voting disk.
We will restore voting disk to another diskgroup using the OCR.

 

Let’s start …
– Currently, we have 1 voting disk. Let us corrupt it and check if clusterware still continues
– FIND OUT LOCATION OF VOTEDISK
[grid@host01 cssd]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
–  —–    —————–                ——— ———
 1. ONLINE   00ce3c95c6534f44bfffa645a3430bc3 (ORCL:ASMDISK010) [TEST]

 

– FIND OUT THE NO. OF DISKS IN test DG (CONTAINING VOTEDISK)
ASMCMD> lsdsk -G test
Path
ORCL:ASMDISK010

 

– Let us corrupt ASMDISK010

— bs = blocksize = 4096

— count = # of blocks overwritten = 1000000 (~1M)

– total no. of bytes corrupted = 4096 * 1000000
                                 (~4096M = size of one partition)
#dd if=/dev/zero of=/dev/oracleasm/disks/ASMDISK010 bs=4096 count=1000000
Here, I was expecting clusterware to stop as the  only  voting disk was not  available but surprisingly clusterware kept running. I even waited for quite some time but to no avail.  I would be glad if someone can give more input  on this.
Finally, I stopped clusterware and tried to restart it. It was not able to restart.

– Reboot all the nodes and note that cluster ware does not start as voting disk is not accessible.

#crsctl stat res -t
– Now since voting disk can’t be restored back to test diskgroup as disk in test has been corrupted,
   we will create another diskgroup votedg where we will restore voting disk.

 

RECOVER VOTING DISK
– To move voting disk to votedg diskgroup, ASM instance should be up and for ASM
   instance to be up, CRS should be up. Hence we will
     – stop crs on all the nodes
     – start crs in exclusive mode on one of the nodes (host01)
     – start asm instance on host01 using pfile (since spfile of ASM instance is on ASM)
     – create a new diskgroup votedg
     – move voting disk to votedg  diskgroup
     – stop crs on host01(was running in exclusive mode)
     – restart crs on host01
     – start crs on rest of the nodes
     – start cluster on all the nodes

 

– IMPLEMENTATION –
    - stop crs on all the nodes(if it does not stop, kill ohasd process and retry)
root@hostn# crsctl stop crs -f
     – start crs in exclusive mode on one of the nodes (host01)
root@host01# crsctl start crs -excl
     – start asm instance on host01 using pfile 
grid@host01$ echo INSTANCE_TYPE=ASM >> /u01/app/oracle/init+ASM1.ora 
             chown grid:oinstall /u01/app/oracle/init+ASM1.ora 
 SQL>startup pfile='/u01/app/oracle/init+ASM1.ora';
- create a new diskgroup votedg

– move voting disk to data diskgroup – voting disk is automaticaly recovered using latest available backup of OCR.
root@host01#crsctl replace votedisk +votedg
     – stop crs on host01(was running in exclusive mode)
root@host01#crsctl stop crs
     – restart crs on host01
root@host01#crsctl start crs
     – start crs on rest of the nodes (if it does not start, kill ohasd process and retry)
root@host02#crsctl start crs 
root@host03#crsctl start crs
     – start cluster on all the nodes and check that it is running
root@host01#crsctl start cluster -all 
            crsctl stat res -t

I hope this post was useful.
Regards

————————————————————————————————

                                                 ——————–

NEED FOR VIP IN RAC

In this post, I will demonstrate why do we need VIPs in RAC for failover.If the client request for connection goes to Physical IP of a node and the node is down, client will get notified about after network time out only (which could be as long as 10 minutes). After the client gets this information, he tries to connect to the alternative address. Alternatively, if the client request goes to Virtual IP of the node (VIP) and the node is down, VIP will automatically failover to one of the surviving nodes without waitinig for the network timeout. The listener on the new node listens only to the native VIP on that node and not to the relocated VIP. Any connection request to the relocated VIP receives ORA-12541 No listener error. On receiving the error, the client will try the next address to connect to the database. Hence, VIPs are needed in a RAC for quick failover of client connection requests.
Let’s demonstrate :
Current Scenario:
Two node setup
Nodes : host01, host02
RAC database : orcl

Overview :
— Find out VIPs of both the nodes
— Add entries in tnsnames.ora for two aliases : SIP and SVIP
SIP sends request to physical IP’s of the nodes
SVIP sends request to virtual IP’s (VIP’s) of the nodes
— Shutdown host02

— Try to connect using SIP – waits for network timeout before connecting to host01

– Try to connect using SVIP –  connects to host01 immediately without waiting  for network timeout

Implementation:

- Find out VIPs of both the nodes
[grid@host01 ~]$ srvctl config vip -n host01

VIP exists.:host01
VIP exists.: /192.9.201.254/192.9.201.254/255.255.255.0/eth0

[grid@host01 ~]$ srvctl config vip -n host02

VIP exists.:host02
VIP exists.: /192.9.201.250/192.9.201.250/255.255.255.0/eth0

– Add entries in tnsnames.ora for

– SVIP : An alias for orcl service using which, the client will try to
.  connect to  host02-vip (192.9.201.250 ) first.
. If host02 is down, host02-vip will failover to the other host (host01)
and on receiving No listener error, client will get try connecting to host01

– SIP : An alias for orcl service using which, the client will try to
.  connect to  host02 (Physical IP) first.
. If host02 is down, client gets notified after network time out
. Client tries and gets connected to host01 (Physical IP)

[grid@host01 admin]$ vi /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/tnsnames.ora

SVIP =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.9.201.250 )(PORT = 1521)
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.9.201.254 )(PORT = 1521)
)
(CONNECT_DATA =
(SERVICE_NAME = orcl)
)
)

SIP =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = host02)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = host01)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = orcl)
)
)

– Shutdown node host02

[root@host02]#init 0

- Try to connect to the database using service sip
— Note that as host02 is down, client waits for network timeout before trying  connection to host01 and time taken to connect to host01 is 25 seconds (23:42:57  – 23:42:32 )

SQL> set time on
 23:42:32 SQL>  conn system/sys@sip
 Connected.
 23:42:57 SQL>

– Try to connect to the database using service svip
— Note that as host02 is down, host02 VIP immediately fails over to host01 and time taken to connect to host01 is of the order of milliseconds (23:43:38  – 23:43:38 )

23:43:38 SQL> conn system/sys@svip
 Connected.
 23:43:38 SQL>

I hope this demonstration was useful.
Your comments and suggestions are always welcome !!!

References:

Oracle RAC VIP and ARP Primer

—————————————————————————————

 

11g R2 RAC : SERVICES

 

In this post, I will explain the concept of services and their benefits in 11g R2 RAC.

What is a service?

A service is a logical grouping of sessions performing similar kind of work. They  enable you to group database workloads and route work to the optimal instances that are assigned to offer the service.It is recommended that all the users who connect using a service have the same service-level requirements. We can use services to manage workloads or a group of applications. For example , all on-line users can use one service whereas batch users can use another service and reporting can use yet another service.

Benefits of services:

– Services make the work more manageable, measurable, tunable and recoverable.

– Services provide an abstraction layer and permit clients and middle tiers to access required data from the database independent of where the instances reside.

– Dynamic: As the load increases/reduces, the number of instances supporting a service can be increased/decreased accordingly.

- High availability : In case an instance crashes, the services supported by it failover to another surviving instance.

- Tuning and Monitoring Aid

. Tracing can be enabled for all the sessions who connect using a service: This can be used to identify the culprit SQL statements issued by the users connected via a service.  . Performance statistics are collected and aggregated by service: This can be used to find out the performance of all the sessions belonging to a service.

. Two Performance thresholds can be specified for a service: SERVICE_ELAPSED_TIME and SERVICE_CPU_TIME: The AWR monitors the service time and CPU time and publishes alerts when the performance esceeds the thresholds. In response to these alerts, you can change the priority of a job, stop overloaded process , or relocate, expand, shrink , start or stop a service.

- Resource Management Aid : We can bind resource consumer groups to services. The users who connect using a service will  get the resources as per the allocation specified for the respective consumer group in the active resource plan. For example, if OLTP and BATCH services are defined to run on the same instance and our requirement is as follows:

. During day time OLTP users should get more resources whereas BATCH users should get less resources

  . During night, BATCH users should get more resources whereas OLTP users should get less resources

We can have two consumer groups : OLTP_GRP mapped to OLTP service and

                                                                          BATCH_GRP mapped to BATCH service
so that all users who connect using OLTP service are assigned OLTP_GRP and all users who connect using BATCH service are assigned BATCH_GRP .

We can have two resource plans :

  . DAY_PLAN in which OLTP_GRP is given more resources and BATCH_GRP is given less resources.
  . NIGHT_PLAN in which BATCH_GRP is given more resources and OLTP_GRP is given less resources.
 
- Job Control Aid : We can associate jobs with a job class and job class can be mapped to service.

– If service is mapped to a consumer group, the jobs will get the resources as per the allocation specified for the respective consumer group in the active resource plan. If  jobs of a job class are taking more time than desired, we can change the consumer group mapping of the respective service to one with more resources.

– If the instance running the jobs of a job class crashes, the services on the crashed instance fail over to another instance and all the jobs of the job class mapped to thst service also failover along with it.

i.e.
       Job1  —–+
       Job2  ——|——– Jobclass1 —— Service1
       Job3  ——+                                  |
                                                  Consumer group
                                                          |
                                                 Resource Plan

– Using Transparent Application Failover:  When a node crashes, all the services running on the instance hosted by that node fail over to another instance. From now onwards, the users who try to connect using failed over services will connect to the instances supporting that service. But what happens to the sessions who were already connected using that service at the time of crash? That is decided by the TAF (Transparent application Failover) policy which can be defined for the service. We can have 3 types of TAF policies : Basic, Preconnect and None.

 
BASIC failover. In this approach, the application connects to a backup node only after the primary connection fails. This approach has low overhead, but the end user experiences a delay while the new connection is created.

PRECONNECT failover. In this approach, the application simultaneously connects to both a primary and a backup node. This offers faster failover, because a pre-spawned connection is ready to use. But the extra connection adds everyday overhead by duplicating connections.

  We can also specify FAILOVER_TYPE for a service i.e. what kind of activity going on in the connected sessions will be resumed after the failover .  It can be Select or Session.

SELECT failover. With SELECT failover, Oracle Net keeps track of all SELECT statements issued during the transaction, tracking how many rows have been fetched back to the client for each cursor associated with a SELECT statement. If the connection to the instance is lost, Oracle Net establishes a connection to another Oracle RAC node and re-executes the SELECT statements, repositioning the cursors so the client can continue fetching rows as if nothing has happened. The SELECT failover approach is best for data warehouse systems that perform complex and time-consuming transactions.

 SESSION failover. When the connection to an instance is lost, SESSION failover results only in the establishment of a new connection to another Oracle RAC node; any work in progress is lost. SESSION failover is ideal for online transaction processing (OLTP) systems, where transactions are small.
 
The following types of transactions do not automatically fail over and must be restarted by TAF:
  • Transactional statements. Transactions involving INSERT, UPDATE, or DELETE statements are not supported by TAF.
  •  ALTER SESSION statements. ALTER SESSION and SQL*Plus SET statements do not fail over.
  • Transactions using temporary segments in the TEMP tablespace and global temporary tables do not fail over.
  • PL/SQL package states. PL/SQL package states are lost during failover.
– Connection Load balancing : If we have multiple instances supporting a service, We can spread user connections across all those instances. For each service, you can define the method that you want the listener to use for load balancing by setting the connection load balancing goal, CLB_GOAL. It can have two values : CLB_GOAL_LONG(default) and CLB_GOAL_SHORT.
 
Where’s the configuration of services stored?
Data Dictionary (Bulk of service configuration)
DBA_SERVICES
DBA_RSRC_GROUP_MAPPINGS
DBA_THRESHOLDS
DBA_SCHEDULER_JOB_CLASSES
TNSNAMES.ORA
Server-side
Entries for LOCAL_LISTENER and REMOTE_LISTENER
Client-side (in tnsnames.ora)
CONNECT_DATA
Special entries for PRECONNECT services
Resource profile in Cluster Registry (OCR)
Resource
Start/stop script
Dependencies
Restart policy
Stringpairs
Instance list
Preferred/available instances
Enabled/disabled
TAF policy
Initialization Parameters
LOCAL_LISTENER
REMOTE_LISTENERS
DB_DOMAIN
DISPATCHERS
STATISTICS_LEVEL
References:
—————————————————————————————————–
Related links:

Home
11G R2 RAC Index
Instance Recovery In RAC
Job Failover In RAC
Service Created Using EM Does Not Update DD or OCR

LET’S EDIT GPNP PROFILE

In 11g R2 RAC, Voting disk is stored on an ASM diskgroup. . But CSSD needs the voting files before ASM is online.  At its startup, CSSD scans the device headers of all devices specified in the gpnp profile xml file, tag “DiscoveryString” which contains the same value as specified in asm_diskstring parameter for ASM instance.

 

If it can find at least more than half the number of total no. of voting files, the party takes place,  otherwise,  CSSD will cycle with appropriate error messages in $GRID_HOME/log/hostname/log/cssd/ocssd.log for each loop.

 

To verify it, I modified ASM_DISKSTRING to such a value that the ASM disk containing voting disk was not included and tried to restart crs on a node.. Looking up the CSSD logfile, I saw that the CSSD had trouble identifying its voting files.

 

Now, I had to change back the ASM disk_string parameter without having ASM running, and with no CSSD available, which is necessary to start it? How do we tell the CSSD, that’s running fairly in advance of ASM, to scan the right devices?  I used gpnptool to edit the gpnp profile and restored the discovery string to the appropriate value. Now I could get crs running on my machine.

 

Overview:
– Move voting disk to a new diskgroup VOTE.
– Set ASM_DISKTRING to all the disks which are members of all the other diskgroups (DATA/FRA) using SQL
— stop and restart crs on the node –
– check that HAS services have started but rest of the services are not up
–Check the ocssd.log — Scans all the disks which are part of the discovery string  but does not find voting disk
– Edit the gpnp profile to modify discovery string  for asm
– Try to restart crs
– check that all the daemons have started and cluster servicves are up

 

Implementation:
– cretae a diskgroup with external redundancy to be used to store VD

– Move voting disk to the diskgroup vote – fails because
    — diskgroup not mounted on all the nodes – mount it
   — diskgroup compatibility < 11.2.0.0 – Modify it
[root@host01 ~]# crsctl replace votedisk +vote
Failed to create voting files on disk group vote.
Change to configuration failed, but was successfully rolled back.
CRS-4000: Command Replace failed, or completed with errors.
  – Mount vote diskgroup on all the nodes
  —  Modify diskgroup compatibility to 11.2.0.0
  — Move voting disk to the diskgroup vote – succeeds
 
[root@host01 ~]#  crsctl replace votedisk +vote
Successful addition of voting disk 443f1c60e16f4fa5bfbfeaae6b2f919d.
Successful deletion of voting disk 3d9da0d16baa4f10bf4e4b9b4aa688c6.
Successful deletion of voting disk 369621097c034f6dbf29a8cc97dc4bbc.
Successful deletion of voting disk 407b5d6588ea4f6fbf1503f7f6cc2951.
Successfully replaced voting disk group with +vote.
CRS-4266: Voting file(s) successfully replaced
– check that voting disk has been moved to vote diskgroup
[root@host01 ~]#  crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
–  —–    —————–                ——— ———
 1. ONLINE   443f1c60e16f4fa5bfbfeaae6b2f919d (ORCL:ASMDISK014) [VOTE]
Located 1 voting disk(s).
– Find out the disks in the other diskgroups – DATA and FRA
[grid@host01 ~]$ asmcmd lsdsk -G data
Path
ORCL:ASMDISK01
ORCL:ASMDISK010
ORCL:ASMDISK011
ORCL:ASMDISK012
[grid@host01 ~]$ asmcmd lsdsk -G fra
Path
ORCL:ASMDISK02
ORCL:ASMDISK03
– Set ASM_DISKTRING to all the disks which are members of DATA/FRA diskgroups

 SQL> sho parameter asm_diskstring
NAME                                 TYPE        VALUE
———————————— ———– ————
asm_diskstring                       string
SQL> alter system set asm_diskstring='ORCL:ASMDISK01',
   'ORCL:ASMDISK010','ORCL:ASMDISK011','ORCL:ASMDISK012',
   'ORCL:ASMDISK02','ORCL:ASMDISK03';
– check that discovery string in gpnp profile  points to the disks as specified 
 
[root@host01 ~]# cd /u01/app/11.2.0/grid/gpnp/host01/profiles/peer/ 
                         vi profile.xml
DiscoveryString=”ORCL:ASMDISK01,ORCL:ASMDISK010,ORCL:ASMDISK011,ORCL:ASMDISK012,ORCL:ASMDISK02,ORCL:ASMDISK03

 

— stop and restart crs on the node –
]
[root@host01 ~]# crsctl stop crs 
  [root@host01 ~]# crsctl start  crs
– check that has has started but rest of the services are not up
[root@host01 ~]# crsctl check has
CRS-4638: Oracle High Availability Services is online
[root@host01 ~]# crsctl check crs
]CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
[root@host01 ~]# crsctl check css
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
[root@host01 ~]# crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
–Check the ocssd.log
– Scans all the disks which are part of the discovery string but does not find voting disk

 [root@host01 ~]# vi $ORACLE_HOME/log/host01/cssd/ocssd.log
2013-01-21 12:55:14.759: [    CLSF][2985733008]Read ASM header off dev:ORCL:ASMDISK01:0:0
2013-01-21 12:55:14.759: [   SKGFD][2985733008]Lib :ASM:/opt/oracle/extapi/32/asm/orcl/1/libasm.so: closing handle 0x91d0410 for disk :ORCL:ASMDISK01:
2013-01-21 12:55:14.759: [    CLSF][2985733008]Read ASM header off dev:ORCL:ASMDISK010:0:0
2013-01-21 12:55:14.759: [   SKGFD][2985733008]Lib :ASM:/opt/oracle/extapi/32/asm/orcl/1/libasm.so: closing handle 0x91d0c10 for disk :ORCL:ASMDISK010:
2013-01-21 12:55:14.759: [    CLSF][2985733008]Read ASM header off dev:ORCL:ASMDISK011:0:0
2013-01-21 12:55:14.759: [   SKGFD][2985733008]Lib :ASM:/opt/oracle/extapi/32/asm/orcl/1/libasm.so: closing handle 0x91d1508 for disk :ORCL:ASMDISK011:
2013-01-21 12:55:14.759: [    CLSF][2985733008]Read ASM header off dev:ORCL:ASMDISK012:0:0
2013-01-21 12:55:14.759: [   SKGFD][2985733008]Lib :ASM:/opt/oracle/extapi/32/asm/orcl/1/libasm.so: closing handle 0x91d1e00 for disk :ORCL:ASMDISK012:
2013-01-21 12:55:14.760: [    CLSF][2985733008]Read ASM header off dev:ORCL:ASMDISK02:0:0
2013-01-21 12:55:14.760: [   SKGFD][2985733008]Lib :ASM:/opt/oracle/extapi/32/asm/orcl/1/libasm.so: closing handle 0x91d26f8 for disk :ORCL:ASMDISK02:
2013-01-21 12:55:14.760: [    CLSF][2985733008]Read ASM header off dev:ORCL:ASMDISK03:0:0
@
2013-01-21 12:55:14.760: [    CLSF][2985733008]Read ASM header off dev:ORCL:ASMDISK03:0:0
2013-01-21 12:55:14.760: [   SKGFD][2985733008]Lib :ASM:/opt/oracle/extapi/32/asm/orcl/1/libasm.so: closing handle 0x91d2ff0 for disk :ORCL:ASMDISK03:
2013-01-21 12:55:14.760: [    CSSD][2985733008]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
.
.
.
2013-01-21 12:55:14.760: [    CSSD][2985733008]clssnmvDiskVerify: Successful discovery of 0 disks
2013-01-21 12:55:14.760: [    CSSD][2985733008]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2013-01-21 12:55:14.760: [    CSSD][2985733008]clssnmvFindInitialConfigs: No voting files found
– Copy the existing gpnp profile to profile.bak  and edit the copy to modify discovery string  for asm
[root@host01 ~]# cd /u01/app/11.2.0/grid/gpnp/host01/profiles/peer/
[root@host01 peer]# cp profile.xml profile.bak
– remove the oracle signature from the file –
[root@host01 peer]# gpnptool unsign -p=profile.bak
Warning: some command line parameters were defaulted. Resulting command line:
         /u01/app/11.2.0/grid/bin/gpnptool.bin unsign -p=profile.bak -o-

 

– change the DiscoveryString itself –
[root@host01 peer]# gpnptool edit -asm_dis='ORCL:*' -p=profile.bak -o=profile.bak -ovr
Resulting profile written to “profile.bak”.
Success.
–  sign the profile xml file with the wallet (Notice: the path is only the directory to the wallet, NOT the wallet file itself) –
[root@host01 peer]# gpnptool sign -p=profile.bak -w=file:/u01/app/11.2.0/grid/gpnp/host01/wallets/peer/ -o=profile.new
Resulting profile written to “profile.new”.
Success.
–  move the original profile.xml out of the way 
[root@host01 peer]# mv profile.new profile.xml
– check that discovery string has been modified
[root@host01 peer]# vi profile.xml
DiscoveryString=”ORCL:*”
– Try to restart crs 
[root@host01 peer]# ps -ef |grep d.bin
kill all the processes
[root@host01 peer]# crsctl start crs
– check that all the daemons have started and cluster servicves are up
[root@host01 peer]# ps -ef |grep d.bin
[root@host01 peer]#crsctl stat res -t
Conclusion:
To access voting disk, gpnp profile is read to find the location of voting disk : found as asm
The asm disks as per the discovery string in gpnp profile are scanned to search for voting disk.
– Note : GPNP profile on other nodes also contains erroneous discovery string.
                  Hence copy the profile from the current node to the other nodes also
References:
————————————————————————————————–
Related links:
 
                                                      ——————
~

11g R2 RAC : REAPIR ORACLE CLUSTR REGISTRY(OCR)

In this post, I will demonstrate how we can use ocrconfig -repair command to repair ocr configuration on a node which was not up when the configuration was modified on the other nodes.

Current scenario:
3 node cluster
Nodes: host01, host02, host03
Nodes host02 and host03 are up
Node host01 is down
OCR is stored on ASM diskgroup DATA

Overview:
- Store OCR on additionally on FRA diskgroup
- This information is modified in /etc/oracle/ocr.loc on nodes host02 and host03 which are up
- This information is not modified in /etc/oracle/ocr.loc on node host01 which is down.
– Startup Node host01
– Clusterware does not come up on host01
– Check alert log and crsd log on host01
– Repair OCR configuration on host01 so that /etc/oracle/ocr.loc on host01 gets updated
– STart clusterware on host01 – succeeds

Implementation:
 
- Store OCR on additionally on FRA diskgroup
[root@host02 ~]# ocrconfig -add +FRA
- Check that new OCR location is added  in /etc/oracle/ocr.loc on nodes host02 and host03 which are up
[root@host02 ~]# cat /etc/oracle/ocr.loc
#Device/file  getting replaced by device +FRA
ocrconfig_loc=+DATA
ocrmirrorconfig_loc=+FRA
[root@host03 ~]# cat /etc/oracle/ocr.loc
#Device/file  getting replaced by device +FRA
ocrconfig_loc=+DATA
ocrmirrorconfig_loc=+FRA

- Check that new OCR location is not added  in /etc/oracle/ocr.loc on node host01 which was down
[root@host01 ~]# cat /etc/oracle/ocr.loc
ocrconfig_loc=+DATA
local_only=FALSE
– Bring up host01 . 
– Check that clusterware has not come up there
[root@host01 host01]# crsctl stat res -t
- Check the alert log of host01
[root@host01 host01]# tailf /u01/app/11.2.0/grid/log/host01/alerthost01.log
[ohasd(4914)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘host01‘.
2013-01-18 23:35:01.950

- Check the crsd  log of host01 – Indicates that local and master information of OCR configuration does not match

 [root@host01 crsd]# vi /u01/app/11.2.0/grid/log/host01/crsd/crsd.log
[  OCRMAS][2876611472]th_calc_av:5′: Rturn persisted AV [186646784] [11.2.0.1.0]
2013-01-18 23:35:13.931: [  OCRSRV][2876611472]th_not_master_change: Master change callback not registered
2013-01-18 23:35:13.931: [  OCRMAS][2876611472]th_master:91: Comparing device hash ids between local and master failed
2013-01-18 23:35:13.931: [  OCRMAS][2876611472]th_master:91 Local dev (1862408427, 1028247821, 0, 0, 0)
2013-01-18 23:35:13.931: [  OCRMAS][2876611472]th_master:91 Master dev (1862408427, 1897369836, 0, 0, 0)
2013-01-18 23:35:13.931: [  OCRMAS][2876611472]th_master:9: Shutdown CacheLocal. my hash ids don’t match

– Repair OCR configuration on host01 
[root@host01 crsd]# ocrconfig -repair -add +FRA
- Check that new OCR location is added  in /etc/oracle/ocr.loc on node host01 
[root@host01 crsd]# cat /etc/oracle/ocr.loc
#Device/file  getting replaced by device +FRA
ocrconfig_loc=+DATA
ocrmirrorconfig_loc=+FRA

– Shutdown and restart cluster on host01
[root@host01 crsd]# crsctl stop crs -f
[root@host01 crsd]# crsctl start crs
[root@host01 crsd]# crsctl start cluster
– Check that crsd is started on host03
[root@host01 host01]# tailf /u01/app/11.2.0/grid/log/host01/ alerthost01.log
[crsd(7297)]CRS-1012:The OCR service started on node host01.
2013-01-18 23:46:07.609
[crsd(7297)]CRS-1201:CRSD started on node host01.
[root@host02 ~]# crsctl stat res -t
——————————————————————————————-
Related links:

                                          ——————-

11g R2 RAC: LET’S LOSE GPNP PROFILE

In an earlier post GPNP PROFILE DEMYSTIFIED , I had explained what is GPnP profile, what does it contain and how is it used by clusterware.
In this post, I will demonstrate what happens if we lose GPnP profile.

Current scenario:
2 Node cluster
Cluster name : cluster01.example.com
Host names : host01, host02

Overview:
- Rename GPnP profile on a node (host02)
- Stop and restart crs
- File pending.xml is created in GRID_HOME/gpnp//profiles/peer
- Change ASM_DISKSTRING parameter which causes GPnP profile to be updated
- As soon as the parameter is changed, pending.xml is renamed to profile.xml and Discoverystring for ASM is updated in it.

 Let’s start …
– Rename GPnP profile on host02

   [root@host02 cdata]# mv /u01/app/11.2.0/grid/gpnp/host02/profiles/peer/profile.xml /u01/app/11.2.0/grid/gpnp/host02/profiles/peer/profile.bak
- Stop and restart crs on host02
     [root@host02 cdata]#crsctl stop crs
     [root@host02 cdata]#crsctl start crs
-- Check the alert log simultaneously and note that just prior to the start of CSSD daemon, GPnP daemon on node2 copies the GPnP profile from host01 and file pending.xml is created in GRID_HOME/gpnp//profiles/peer

[ohasd(24187)]CRS-2772:Server ‘host02′ has been assigned to pool ‘Free’.
2013-01-17 01:35:14.635
[cssd(24416)]CRS-1713:CSSD daemon is started in clustered mode
 [root@host02 cdata]# ls /u01/app/11.2.0/grid/gpnp/host02/profiles/peer/pending*
/u01/app/11.2.0/grid/gpnp/host02/profiles/peer/pending.xml

—- Open the file pending.xml and note that DiscoveryString for ASM is “”
DiscoveryString=””SPFile=”+DATA/cluster01/asmparameterfile/registry.253.804837235″/>

– Change ASM_DISKSTRING parameter which causes GPnP profile to be updated
   SQL>alter system set ASM_DISKSTRING='ORCL:*';
- As soon as the parameter is changed, pending.xml is renamed to profile.xml and Discoverystring for ASM is updated in it.
[root@host02 cdata]# ls /u01/app/11.2.0/grid/gpnp/host02/profiles/peer/profile.xml
/u01/app/11.2.0/grid/gpnp/host02/profiles/peer/profile.xml

 —- Open the file profile.xml and note that DiscoveryString for ASM has been updated
DiscoveryString=”ORCL:*” SPFile=”+DATA/cluster01/asmparameterfile/registry.253.804837235″/>

Conclusion:
Whenever GPnP profile is not present, it is automatically copied from existing nodes by GPnPd as pending.xml . Whenever, profile is updated, it is automatically renamed to profile.xml.
———————————————————————————-

Related links:

 

11g R2 RAC : INSTANCE FAILOVER IN SERVER POOL

   In my earlier post , I had explained the concept of server pools which was introduced in 11g R2 RAC. In this post, I will demonstrate how instance failover takes place in a policy managed database in 11g R2 RAC.

 

Prior to 11g R2 RAC, various instance of a database were mapped to various nodes in the cluster. Imagine a cluster having 4 nodes host01, host02, host03 and host04. I have a database DB configured to run on 3 nodes in the cluster i.e.

 

Instance DB1 runs on host01
Instance DB2 runs on host02
Instance DB3 runs on host03

 

Let us say I have a service DB_SERV defined for the database with
Preferred instances : DB1, DB2
AVailable instances  : DB3

 

As the clusterware starts up , the status is as follows:
Instance DB1 runs on host01
Instance DB2 runs on host02
Instance DB3 runs on host03

 

Service DB_SERV is supported by host01 and host02 i.e. if a user connects to the database using service db_serv, he will be connected to either of host01 and host02.

 

                     DB1                DB2                  DB3
                   DB_SERV      DB_SERV          |
                        |                      |                           |
                       V                    V                          V
                    host01           host02        host03      host04

 

Now, if host02 crashes,
i.e.
                     DB1                DB2                 DB3
                   DB_SERV      DB_SERV         |
                        |                      |                          |
                       V                    V                         V
                    host01           host02        host03      host04
                                              X
- Instance DB2 crashes.
- Service DB_SERV fails over from crashed instance (DB2)  to available instance i.e. DB3 i.e. now  if a user connects to the database using service db_serv, he will be connected to either of host01 and host03.
                       DB1                                     DB3
                   DB_SERV                        DB_SERV
                        |                                             |
                       V                                            V
                    host01                               host03      host04
It implies that although we had an idle server host04 available but still instance DB2 crashed instead of failing over to host04 since  instance DB2  has been configured to run on host02 only.
ENTER SERVER POOLS:
Above problem has been resolved in 11g R2 RAC with the introduction of server pools. Now the database is configured to run in a server pool. The cardinality of the database is decided by MIN-SIZE attribute of the server pool. The database runs on the servers which are assigned to the server pool. If a node hosting the database crashes, the idle node (if available) will be moved from Free pool to the server pool and the instance running on the crashed node will fail over to the that node. All the services supported by the instance will fail over along with the instance.
Let us see how.
Imagine a cluster having 3 nodes host01, host02 and host03
                         HOST01         HOST02            HOST03
– create a server pool pmpool with min = 2, max = 2, importance = 1
[grid@host01 ~]$srvctl add srvpool -g pmpool -l 3 -u 3 -i 1
– check that 2 servers (host02, bost03)  have been allocated to this server pool
[grid@host01 ~]$crsctl  status  serverpool
                                                         PMPOOL
                                           +——————————————+
                                           |                                                               |
                  HOST01      |    HOST02            HOST03           |
                                          +——————————————-+
– create a policy managed database pmdb which will run in the serverpool pmpool
– check that two servers (host02 and host03) have been allocated to serverpool pmpool
[root@host01 ~]#crsctl status serverpool
– check that two instances of pmdb are running on the 2 servers (host02, host03) in pmpool
[oracle@host01 ~]$ srvctl  status database -d pmdb
Instance pmdb_1 is running on node host02
Instance pmdb_2 is running on node host03
                                                         PMPOOL
                                           +——————————————+
                                           |     PMDB_1           PMDB_2         |
                   HOST01      |    HOST02            HOST03         |
                                          +——————————————-+
REBOOT HOST03 WHERE INSTANCE PMDB-2 is running
                                                         PMPOOL
                                           +——————————————+
                                           |     PMDB_1           PMDB_2          |
                   HOST01      |    HOST02            HOST03           |
                                           |                                 X                           |
                                          +——————————————-+
– Check the status of the database pmdb repeatedly –
–Note that instance pmdb_2 fails over to host01 –
[oracle@host01 ~]$ srvctl  status database -d pmdb
Instance pmdb_1 is running on node host02
Instance pmdb_2 is running on node host03
[oracle@host01 ~]$
[oracle@host01 ~]$
[oracle@host01 ~]$ srvctl  status database -d pmdb
Instance pmdb_1 is running on node host02
Instance pmdb_2 is not running on node host03
[oracle@host01 ~]$
[oracle@host01 ~]$
[oracle@host01 ~]$
[oracle@host01 ~]$
[oracle@host01 ~]$ srvctl  status database -d pmdb
Instance pmdb_1 is running on node host02
Instance pmdb_2 is running on node host01
                                                         PMPOOL
                                           +——————————————+
                                           |     PMDB_1           PMDB_2          |
                   HOST03      |    HOST02            HOST01           |
                                          +——————————————-+
– check that host01 has been moved to pmpool
                                                       PMPOOL
                                           +——————————————-+
                                           |                                                                 |
                   HOST03      |    HOST02            HOST01             |
                            X             +——————————————-+
[root@host02 ~]# crsctl status serverpool
NAME=Free
ACTIVE_SERVERS=
NAME=Generic
ACTIVE_SERVERS=
NAME=ora.pmpool
ACTIVE_SERVERS=host01 host02
——————————————————————————————————–
 Related links: