Monthly Archives: November 2012

11g R2 RAC : AUTOLOCATE BACKUPS

During a restore operation, RMAN automatically performs the autolocation of backups. A channel connected to a specific node only attempts to restore files that were backed up to the node. For example, assume that example tablespace is backed up to the drive attached to node1, while tablespace users is backed up to the drive attached to node2. If you then allocate channels that connect to each node, then the channel connected to node1 can restore example tablespace (but not users), and the channel connected to node2 can restore users tablespace (but not example).
Let’s demonstrate in the two node RAC setup I have:
Overview :
– Create folder for backups on both the nodes with appropriate permissions.
– Take backup of example and users tablespaces on node1 and node2 by allocating channels explicitly
– Take users and example tablespaces offline in immediate mode so that they will need media recovery to come online.
– Restore both tablespaces from backups on node1 and node2 by allocating channels explicitly to corresponding instances
– Note that RMAN automatically finds out that backup for example is stored on node1 and backup for users is stored on node2
   and restores them from respective nodes.
Implementation:
– Create folder for backups on both the nodes with appropriate permissions.
[oracle]$mkdir /u01/app/oracle/oradata/orcl/backup
-- Take backup of example and users tablespaces on node1 and node2 by allocating channels explicitly
– Note that backup of tablespace example is created on node1 using channel c1
   and backup of tablespace users is created on node1 using channel c2
RMAN>  run {
             allocate channel c1 device type disk 
              format ‘/u01/app/oracle/oradata/orcl/backup/orcl%U.bak’ connect=’sys/oracle@orcl1′;
    allocate channel c2 device type disk 
              format ‘/u01/app/oracle/oradata/orcl/backup/orcl%U.bak’ connect=’sys/oracle@orcl2′;
    backup tablespace users, example;
  }
allocated channel: c1
channel c1: SID=57 instance=orcl1 device type=DISK
allocated channel: c2
channel c2: SID=46 instance=orcl2 device type=DISK
channel c1: specifying datafile(s) in backup set
input datafile file number=00005 name=+DATA/orcl/datafile/example.264.799999785
channel c2: specifying datafile(s) in backup set
input datafile file number=00004 name=+DATA/orcl/datafile/users.259.799999587
channel c1: finished piece 1 at 30-NOV-12
piece handle=/u01/app/oracle/oradata/orcl/backup/orcl0snrivk1_1_1.bak tag=TAG20121130T044136
channel c2: finished piece 1 at 30-NOV-12
piece handle=/u01/app/oracle/oradata/orcl0tnrivk1_1_1.bak tag=TAG20121130T044136
-- Take users and example tablespaces offline in immediate mode so that they will need media recovery to come online.
SQL>alter tablespace users offline immediate;
    alter tablespace example offline immediate;
    alter system switch logfile;
    alter system switch logfile;
    alter system switch logfile;
    alter tablespace users online;
    alter tablespace example online;
ERROR at line 1:
ORA-01113: file 4 needs media recovery
ORA-01110: data file 4: ‘+DATA/orcl/datafile/users.259.799999587′
ERROR at line 1:
ORA-01113: file 5 needs media recovery
ORA-01110: data file 5: ‘+DATA/orcl/datafile/example.264.799999785′
– Restore both tablespaces from backups on node1 and node2 by allocating channels explicitly to corresponding instances
– Note that RMAN automatically finds out that backup for example is stored on node1 and backup for users is stored on node2
   and restores them from respective nodes.
RMAN>  run {
             allocate channel c1 device type disk 
               connect=’sys/oracle@orcl1′;
    allocate channel c2 device type disk 
              connect=’sys/oracle@orcl2′;
    restore tablespace users, example;
  }
allocated channel: c1
channel c1: SID=57 instance=orcl1 device type=DISK
allocated channel: c2
channel c2: SID=46 instance=orcl2 device type=DISK
channel c1: restoring datafile 00005 to +DATA/orcl/datafile/example.264.799999785
channel c1: reading from backup piece /u01/app/oracle/oradata/orcl/backup/orcl0snrivk1_1_1.bak
channel c2: restoring datafile 00004 to +DATA/orcl/datafile/users.259.799999587
channel c2: reading from backup piece /u01/app/oracle/oradata/orcl/backup/orcl0tnrivk1_1_1.bak
– Recover tablespaces and bring them online
RMAN>recover tablespace example, users;
SQL>  alter tablespace example online;

             alter tablespace users online;
I hope you found the article useful. Keep visiting the blog …
———————————————————————
Related links:

11g R2 RAC: HOW TO FIND THE RESOURCE MASTER?

In this post, I will demonstrate three methods to find the master of a resource(block).
In RAC, every data block is mastered by an instance. Mastering a block simply means that master instance keeps track of the state of the block until the next reconfiguration event.
– Remastering can be triggered as result of
    – Manually
    – Resource affinity
    – Instance crash
- Method – I gets info about master node from v$gcspfmaster_info   using data_object_id
- Method – II gets info about master node from v$dlm_ress and v$ges_enqueue   using resource name in hexadecimal format
- Method – III gets info about master node from x$kjbl with x$le using resource name in hexadecimal format
– CURRENT SCENARIO -
- 3 node setup
- name of the database – orcl
— SETUP –
SYS@NODE1>create table scott.emp1 as
             select * from scott.emp;
– Get data_object_id for scott.emp1
SQL> col owner for a10
col data_object_id for 9999999
col object_name for a15
select owner, data_object_id, object_name
from dba_objects
where owner = ‘SCOTT’
and object_name = ‘EMP1′;
OWNER      DATA_OBJECT_ID OBJECT_NAME
———- ————– —————
SCOTT               74652 EMP1
For Method-II and Method-III, we need to find out file_id and block_id and hence GCS  resource name in hexadecimal format
 — Get File_id and range of block_ids of emp1 table
– It can be seen that emp1 lies in block 523 of file 4.
SQL>select dbms_rowid.rowid_relative_fno(rowid) FILE_NO,
min(dbms_rowid.rowid_block_number(rowid)) MIN_BLOCK_ID,
max(dbms_rowid.rowid_block_number(rowid))  MAX_BLOCK_ID
from scott.emp1
group by dbms_rowid.rowid_relative_fno(rowid);
   FILE_NO MIN_BLOCK_ID MAX_BLOCK_ID
———- ———— ————
         4          523          523
Find the GCS resource name to be used in  the query using blodk_id and data_object_id retrieved above.
   x$kjbl.kjblname = resource name in hexadecimal format([id1],[id2],[type]
   x$kjbl.kjblname2 = resource name in decimal format
   Hexname will be used to query resource master using v$dlm_ress , v$ges_enqueue, $kjbl
   and x$le
SQL> col hexname for a25
col resource_name for a15
select b.kjblname hexname, b.kjblname2 resource_name
from x$le a, x$kjbl b
where a.le_kjbl=b.kjbllockp
and a.le_addr = ( select le_addr
from x$bh
where dbablk = 523
and obj    = 74652
and class  = 1
and state   <> 3);
HEXNAME                   RESOURCE_NAME
————————- —————
[0x20b][0x4],[BL]         523,4,BL
– Manually master the EMP table to node1 –
SYS@NODE1>oradebug lkdebug -m pkey <objectid>
SYS@NODE1>oradebug lkdebug -m pkey 74652
—- GET RESOURCE MASTER NAME ———-
Method – I gets info about master node from v$gcspfmaster_info using data_object_id
– ——-
– Note that current master is node1 (Node numbering starts from 0)
SYS@node1>col object_name for A10
        select o.object_name, m.CURRENT_MASTER
from   dba_objects o, v$gcspfmaster_info m
where o.data_object_id=74652
and m.data_object_id = 74652 ;
OBJECT_NAM CURRENT_MASTER
———- ————–
EMP1                    0
—- Method II gets info about master node from v$dlm_ress and v$ges_enqueue
     using resource name in hexadecimal format
– check that master node is node1 (node numbering starts with 0)
SYS@NODE1>col resource_name for a22 select a.resource_name,  a.master_node
from   v$dlm_ress a, v$ges_enqueue b
where upper(a.resource_name) = upper(b.resource_name1)
and a.resource_name like ‘%[0x20b][0x4],[BL]%';

RESOURCE_NAME          MASTER_NODE
———————- ———–
[0x20b][0x4],[BL]                0
Method – III gets info about master node from x$kjbl with x$le
             using resource name in hexadecimal format
–  This SQL joins   x$kjbl with x$le to retrieve resource master for a block
– Note that current master is node1(MASTER=0)
SYS@NODE1> select kj.kjblname, kj.kjblname2, kj.kjblmaster master
from (select kjblname, kjblname2, kjblowner, kjblmaster, kjbllockp
from x$kjbl
where kjblname = ‘[0x20b][0x4],[BL]’
) kj, x$le le
where le.le_kjbl = kj.kjbllockp
order by le.le_addr;
KJBLNAME                       KJBLNAME2                          MASTER
—————————— —————————— ———-
[0x20b][0x4],[BL]              523,4,BL                                     0
I hope you found this information useful.
Keep visiting the blog !!!
——————————————————————————————–

11G R2 RAC: HOW TO IDENTIFY THE MASTER NODE IN RAC

 
                          
In this post, I will demonstrate three methods to identify the oracle clusterware’s master node. Pls note that clusterware master is different from Resource master in  oracle database instance. To know about how to find the resource master, pls click here:.
Importance of master node in a cluster:

- Master node has the least Node-id in the cluster. Node-ids are  assigned to the nodes in the same order as the nodes join the cluster. Hence, normally the node which joins the cluster first is the master node.

  • - CRSd process on the Master node is responsible to initiate the OCR backup as per the backup policy
  • - Master node  is also responsible to sync OCR cache across the nodes
  • - CRSd process oth the master node reads from and writes to OCR on disk
  • - In case of node eviction, The cluster is divided into two sub-clusters. The sub-cluster containing fewer no. of nodes is evicetd. But, in case both the sub-clusters have same no. of nodes, the sub-cluster having the master node survives whereas the other sub-cluster is evicted.


Oracle ClusterWare master’s  information can be found 


  • - by scanning ocssd logs from various nodes
  • - by scanning  crsd logs from various nodes. 
  • - by identifying the node which  takes the backup of the OCR.

If master node gets evicted/rebooted, another node becomes the master.

 
I have a 3 node setup. I check the ocssd logs on the 3 nodes for the string ‘master node’ and note that node 3 is the master node.
 
 
[grid@host01 root]$ cat $ORACLE_HOME/log/host01/cssd/ocssd.log |grep ‘master node’ |tail -1
2012-11-23 10:14:36.949: [    CSSD][2778696592]clssgmCMReconfig: reconfiguration successful, incarnation 248954981 with 3 nodes, local node number 1, master node number 3
 
[root@host02 cssd]# cat $ORACLE_HOME/log/host02/cssd/ocssd.log |grep ‘master node’ |tail -1
2012-11-23 10:14:36.953: [    CSSD][778696592]clssgmCMReconfig: reconfiguration successful, incarnation 248954981 with 3 nodes, local node number 2, master node number 3
 
[root@host03 ~]# cat $ORACLE_HOME/log/host03/cssd/ocssd.log |grep ‘master node’ |tail -1
2012-11-23 10:14:37.001: [    CSSD][778700688]clssgmCMReconfig: reconfiguration successful, incarnation 248954981 with 3 nodes, local node number 3, master node number 3
 
If I take the OCR backup right now, it will be taken by node3 (master node).
 
 [root@host02 cssd]# ocrconfig -manualbackup
 
host03     2012/11/24 09:54:48     /u01/app/11.2.0/grid/cdata/cluster01/backup_20121124_095448.ocr

Let us check crsd logs of various nodes, looking for the string OCR MASTER.Note that node3 is the master node presently.

 
[grid@host01 crsd]$ cat /u01/app/11.2.0/grid/log/host01/crsd/crsd.log |grep ‘OCR MASTER’ | tail -1
 
2012-11-23 10:15:01.403: [  OCRMAS][2877356944]th_master: NEW OCR MASTER IS 3
 
[root@host02 crsd]# cat /u01/app/11.2.0/grid/log/host02/crsd/crsd.log |grep ‘OCR MASTER’ | tail -1
 
2012-11-23 10:15:03.561: [  OCRMAS][876976016]th_master: NEW OCR MASTER IS 3
 
[root@host03 crsd]#  cat /u01/app/11.2.0/grid/log/host03/crsd/crsd.log |grep ‘OCR MASTER’ | tail -3
 
2012-11-23 10:11:18.499: [  OCRMAS][877467536]th_master:13: I AM THE NEW OCR MASTER at incar 44. Node Number 3
 
[
Let me reboot node3 and check which node is assigned the mastership now.
 

[root@host03 ~]# init 6

 
check the ocssd logs on the remaining two nodes (node1 and node2) for the string ‘master node’ and note that node 1 is the master node.
[grid@host01 root]$ cat $ORACLE_HOME/log/host01/cssd/ocssd.log |grep ‘master node’ |tail -1
 
2012-11-24 10:09:23.522: [    CSSD][2778696592]clssgmCMReconfig: reconfiguration successful, incarnation 248954982 with 2 nodes, local node number 1, master node number 1
 
[root@host02 cssd]# cat $ORACLE_HOME/log/host02/cssd/ocssd.log |grep ‘master node’ |tail -1
 
2012-11-24 10:09:23.502: [    CSSD][778696592]clssgmCMReconfig: reconfiguration successful, incarnation 248954982 with 2 nodes, local node number 2, master node number 1
 
As can be seen from ocssd logs of the remaining two nodes, node1 has become the master now.
 
Now If I take the OCR backup, it is taken by node1 while earlier backups were taken by node3 which was the then master.
 
[root@host02 cssd]# ocrconfig -manualbackup
 
host01     2012/11/24 10:12:29     /u01/app/11.2.0/grid/cdata/cluster01/backup_20121124_101229.ocr
 
host03     2012/11/24 09:54:48     /u01/app/11.2.0/grid/cdata/cluster01/backup_20121124_095448.ocr
 
Let us check crsd logs of various nodes, looking for the string OCR MASTER
 
[grid@host01 crsd]$ cat /u01/app/11.2.0/grid/log/host01/crsd/crsd.log |grep ‘OCR MASTER’ | tail -1
2012-11-24 10:08:45.884: [  OCRMAS][877356944]th_master:13: I AM THE NEW OCR MASTER at incar 47. Node Number 1
 
[root@host02 crsd]# cat /u01/app/11.2.0/grid/log/host02/crsd/crsd.log |grep ‘OCR MASTER’ | tail -1
2012-11-24 10:08:45.364: [  OCRMAS][876976016]th_master: NEW OCR MASTER IS 1
 
[root@host03 crsd]#  cat /u01/app/11.2.0/grid/log/host03/crsd/crsd.log |grep ‘OCR MASTER’ | tail -1
2012-11-24 10:12:20.282: [  OCRMAS][877422480]th_master: NEW OCR MASTER IS 1
 
Hope you enjoyed reading !!
Keep visiting the blog ….
 
Regards
——————————————————————————————–
Related links:

Home

11g R2 RAC: How To Find The Resource Master?


 
                                         ——————
 
 

DUPLICATE DATABASE WITH CONNECTION TO RECOVERY CATALOG

In this post, I will demonstrate how to duplicate a database using its backups.
This method requires connection to recovery catalog only to get information about the
backups. Connection with the target database is not needed.
********************************
  source database  orcl (ASM)
  Duplicate database  orclt (ASM)
  Recovery catalog database : rcat
***********************************
Overview:
on the source  host
  • - Create recovery catalog in rcat database
  • - Register source database orcl with recovery catalog
  • - BACKUP DATABASE PLUS ARCHIVELOG AND CONTROLFILE while connected to recovery catalog
  • - Copy these backup files to the server where you want to create the duplicate copy.
  • - CREATE PFILE FROM SOURCE DATABASE

on the target host

  • - Add a line in the file /etc/oratab to reflect the database instance(orclt) you are going
  • to copy
  • - create folders
  • - Copy the initialization parameter file from the source database add edit it.
  • - Copy the password file
  • - Startup the duplicate database (orclt) in nomount mode using modified parameter file
  • - Using RMAN  connect to the catalog database(rcat) as target database and duplicate
  • database (orclt) as auxiliary instance
  • - duplicate the database orcl to orclt
———————-
on the source  host
———————-
- Create recovery catalog in rcat database
  .create tablespace to hold recovery catalog data –
sql>create tablespace rcat datafile '+data/rcat01.dbf' size 15m;
  . create recovery catalog owner
sql>create user rcatowner identified by oracle

    default tablespace rcat quota unlimited on rcat;
    grant recovery_catalog_owner to rcatowner;

  . Create recovery catalog

$rman catalog rcatowner/oracle@rcat
RMAN>CREATE CATALOG;
- Register source database orcl with recovery catalog
$RMAN TARGET sys/oracle@orcl CATALOG rcatowner/oracle@rcat
RMAN>REGISTER DATABASE;
– Make a folder to stage the backup
oracle$mkdir /home/orale/stage
– Take the backup of the source database
oracle$. oraenv orcl
         rman target /
RMAN>CONFIGURE CONTROLFILE AUTOBACKUP ON;
     backup database format '/home/oracle/stage/%U.rmb';
     backup archivelog all  format '/home/oracle/stage/arc_%r_%s_%t.log';
The controlfile backup is also required. If you have configured the controlfile
autobackup, the backup would contain the controlfile as well. If you want to be sure,
 ors you have not configured controlfile autobackup, you can backup the controlfile
explicitly.
– CREATE PFILE FROM SOURCE DATABASE
SQL>CREATE PFILE='/u01/app/oracle/oradata/orcl/initsource.ora'
            FROM SPFILE
———————
on the target host.
————————
 — Copy the backup files to the server where you want to create the duplicate copy.
$mkdir -p /home/oracle/stage
 scp sourcehost:/home/oracle/stage/*.rmb desthost::/home/oracle/stage/

– Add a line in the file /etc/oratab to reflect the database instance you are going to    copy:
orclt:/u01/app/oracle/product/11.2.0/db_1:N
– Now set the Oracle SID as the duplicated database SID:
# . oraenv
ORACLE_SID = [orclt] ?
– create folders
 $mkdir -p /u01/app/oracle/oradata/orclt
  mkdir -p /u01/app/oracle/flash_recovery_area/orclt
  mkdir -p /u01/app/oracle/admin/orclt/adump
  mkdir -p /u01/app/oracle/admin/orclt/dpdump
– Copy the initialization parameter file from the main database.
– If u are duplicating on the same host
$cp  /u01/app/oracle/oradata/orcl/initsource.ora
/u01/app/oracle/oradata/orclt/inittarget.ora
OR
– If u are duplicating on the different host
$scp  soucehost:/u01/app/oracle/oradata/orcl/initsource.ora
/u01/app/oracle/oradata/orclt/inittarget.ora
– Edit parameter file
$vi /u01/app/oracle/oradata/orclt/inittarget.ora
   – Change db_name = orclt
   – Edit it to reflect the new locations that might be appropriate
     such as control file locations,audit dump destinations, datafile
     locations, etc.
   – add these lines –
     db_file_name_convert = (“+DATA/orcl”,”+DATA/orclt”)
     log_file_name_convert = (“+DATA/orcl”,”+DATA/orclt”, “+FRA/orcl”,”+FRA/orclt”)
————————————–
– Copy the password file as well
$ cp /u01/app/oracle/product/11.2.0/db_1/dbs/orapworcl  /u01/app/oracle/product/11.2.0/db_1/dbs/orapworclt
OR
$ scp sourcehost:/u01/app/oracle/product/11.2.0/db_1/dbs/orapworcl  /u01/app/oracle/product/11.2.0/db_1/dbs/orapworclt

– Startup the duplicate database in nomount mode using modified parameter file
$ . oraenv
ORACLE_SID = [orclt] ?
$sqlplus sys/oracle as sysdba
SQL> startup nomount pfile='/u01/app/oracle/oradata/orclt/inittarget.ora';
          create spfile from pfile='/u01/app/oracle/oradata/orclt/inittarget.ora';

- Using RMAN  connect to the catalog database (rcat) and duplicate database   (orclt) as
auxiliary instance
$rman catalog rcatowner/oracle@rcat  auxiliary sys/oracle@orclt

– duplicate the database orcl to orclt
– the command performs the following steps:
    * Creates an SPFILE
    * Shuts down the instance and restarts it with the new spfile
    * Restores the controlfile from the backup
    * Mounts the database
    * Performs restore of the datafiles. In this stage it creates the files in the
      converted names.
    * Recovers the datafiles up to the time specified and opens the database
RMAN>duplicate  database orcl to orclt;

– check that duplicate database is up

$sqlplus / as sysdba
sql>conn hr/hr
    select * from tab;
– Note that  DBID is different from the main database so it can be backed up
   independently and using the same catalog as well.
SQL> select dbid from v$database;
      DBID
———-
3780202035
SQL>conn sys/oracle@orcl as sysdba
    select dbid from v$database;
      DBID
———-
1327232823
——————————————————————————————–
Related links:
——————————————————————————————–

11G R2 RAC: GPNP PROFILE DEMYSTIFIED

In this post, I will explain what is GPnP profile, what does it contain and how is it used by clusterware.
WHAT IS GPNP PROFILE?
The GPnP profile is a small XML file located in GRID_HOME/gpnp/<hostname>/profiles/peer under the name profile.xml. It is used to establish the correct global personality of a node. Each node maintains a local copy of the GPnP Profile and is maintanied by the GPnP Deamon (GPnPD) .
WHAT DOES GPNP PROFILE CONTAIN?
GPnP Profile  is used to store necessary information required for the startup of Oracle Clusterware like  SPFILE location,ASM DiskString  etc.
It contains various attributes defining node personality.
- Cluster name
- Network classifications (Public/Private)
- Storage to be used for CSS
- Storage to be used for ASM : SPFILE location,ASM DiskString  etc
- Digital signature information : The profile is security sensitive. It might identify the storage to be used as the root partition of a machine.  Hence, it contains digital signature information of the provisioning authority.
Here is the GPnP profile of my RAC setup.
gpnptool can be  used  for reading/editing the gpnp profile.
[root@host01 peer]# gpnptool get
<?xml version=”1.0″ encoding=”UTF-8″?><gpnp:GPnP-Profile Version=”1.0″ xmlns=”http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:gpnp=”http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:orcl=”http://www.oracle.com/gpnp/2005/11/gpnp-profile” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd” ProfileSequence=”7″ ClusterUId=”14cddaccc0464f92bfc703ec1004a386″ ClusterName=”cluster01″ PALocation=””><gpnp:Network-Profile><gpnp:HostNetwork id=”gen” HostName=”*”><gpnp:Network id=”net1″ IP=”192.9.201.0″ Adapter=”eth0″ Use=”public”/><gpnp:Network id=”net2″ IP=”10.0.0.0″ Adapter=”eth1″ Use=”cluster_interconnect”/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id=”css” DiscoveryString=”+asm” LeaseDuration=”400″/><orcl:ASM-Profile id=”asm” DiscoveryString=”” SPFile=”+DATA/cluster01/asmparameterfile/registry.253.783619911″/><ds:Signature xmlns:ds=”http://www.w3.org/2000/09/xmldsig#”><ds:SignedInfo><ds:CanonicalizationMethod Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#”/><ds:SignatureMethod Algorithm=”http://www.w3.org/2000/09/xmldsig#rsa-sha1″/><ds:Reference URI=””><ds:Transforms><ds:Transform Algorithm=”http://www.w3.org/2000/09/xmldsig#enveloped-signature”/><ds:Transform Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#”> <InclusiveNamespaces xmlns=”http://www.w3.org/2001/10/xml-exc-c14n#” PrefixList=”gpnp orcl xsi”/></ds:Transform></ds:Transforms><ds:DigestMethod Algorithm=”http://www.w3.org/2000/09/xmldsig#sha1″/><ds:DigestValue>4VMorzxVNa+FeOx2SCk1unVBpfU=</ds:DigestValue></ds:Reference></ds:SignedInfo><ds:SignatureValue>bbzV04n2zSGTtUEvqqB+pjw1vH7i8MOEUqkhXAyloX0a41T2FkDEA++ksc0BafndAk7tR+6LGdppE1aOsaJUtYxQqaHJdpVsJF+sj2jN7LPJlT5NBt+K7b08TLjDID92Se6vEiDAeeKlEbpVWKMUIvQvp6LrYK8cDB/YjUnXuGU=</ds:SignatureValue></ds:Signature></gpnp:GPnP-Profile>
WHO UPDATES GPNP PROFILE?
GPnPd daemon replicates changes to the profile during
  • - installation,
  • - system boot or
  • - when updated
Profile is updated Whenever changes are made to a cluster with configuration tools like
  • . oifcfg (Change network),
  • . crsctl (change location of voting disk),
  • . asmcmd (change ASM_DISKSTRING, SPfile location) etc.
HOW IS GPNP PROFILE USED BY CLUSTERWARE?

To start clusterware, voting disk needs to be accessed. If voting disk is on ASM, this information (that voting disk is on ASM) is read from GPnP profile (<orcl:CSS-Profile id=”css” DiscoveryString=”+asm” LeaseDuration=”400″/>).   The voting disk is read using kfed utility  even if ASM is not up.
Next,  the clusterware checks if all the nodes have the updated GPnP profile and the node joins the cluster based on the GPnP configuration . Whenver a node is started/added to the cluster, the clusterware software on the starting node starts a GPnP agent.
  • - If the node is already part of the cluster, the GPnP agent reads the existing profile on that node.
  • - If the node is being added to the cluster, GPnP agent locates agent on another existing node using multicast protocol (provided by mDNS) and gets the profile from that agent.
Next CRSD needs to read OCR to startup various resources on the node and hence update it as status of resources changes. Since OCR is also on ASM, location of ASM SPfile should be known.
The order of searching the ASM SPfile is
  • - GPnP profile
  • - ORACLE_HOME/dbs/spfile<sid.ora>
  • - ORACLE_HOME/dbs/init<sid.ora>
In cluster environment, the location of  SPfile for ASMread from GPnP profile.
[grid@host01 peer]$ gpnptool getpval -asm_spf
Warning: some command line parameters were defaulted. Resulting command line:
         /u01/app/11.2.0/grid/bin/gpnptool.bin getpval -asm_spf -p=profile.xml -o-
+DATA/cluster01/asmparameterfile/registry.253.793721441
The oputput of the query shows that SPfile is on ASM in DATA diskgroup. To find out the
 location of ASM disks, following query is issued :
[root@host01 peer]# gpnptool getpval -asm_dis
ASM-Profile id=”asm” DiscoveryString=””
The  device headers of every device in the disk string returned by the above query are scanned  (if configured by you at ASM initial setup time). Here Discovery String is blank is as ASMDISKSTRINGS parameter has not been set. Hence, headers of all the ASM disks are scanned .
Here, I have shown the output of the query only on the disk which contains SPfile.(spfflg is not null)
[root@host01 ~]#  kfed read /dev/sdb3 | grep -E ‘spf|ausize’
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                       16 ; 0x0f4: 0x00000010
kfdhdb.spfflg:                        1 ; 0x0f8: 0x00000001
In the output above, we see that
     the device /dev/sdb3 contains a copy of the ASM spfile (spfflg=1).
     The ASM spfile location starts at the disk offset of 16 (spfile=16)
Considering the allocation unit size (kfdhdb.ausize = 1M), let’s dump the ASM spfile from the device:
[root@host01 ~]#  dd if=/dev/sdb3 of=spfileASM_Copy2.ora skip=16  bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.170611 seconds, 6.1 MB/s
[root@host01 ~]# strings spfileASM_Copy2.ora
+ASM1.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value
+ASM2.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value
+ASM3.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value
+ASM3.asm_diskgroups=’FRA’#Manual Mount
+ASM2.asm_diskgroups=’FRA’#Manual Mount
+ASM1.asm_diskgroups=’FRA’#Manual Mount
*.asm_power_limit=1
*.diagnostic_dest=’/u01/app/grid’
*.instance_type=’asm’
*.large_pool_size=12M
*.remote_login_passwordfile=’EXCLUSIVE’
Using the parameters in SPfile, ASM is started.
Once ASM is up, OCR is read by CRSD and various resources on the node are started.
Each node reads network information in GPnP profile and using GNS,  negotiates appropriate network identity for itself . Hence, nodes can be dynamically added/deleted.
What happens if GPnP profile is lost?
To know please click  here.
————————————————————
GPNPTOOL COMMAND REFERENCE:

- How to read the profile
[root@inssc3 bin]# ./gpnptool get
- How to find GPnP Deamons are running on the local node
[root@host01 peer]# gpnptool lfind
Success. Local gpnpd found.
- How to find the location of ASM spfile if the ASM is down
[root@host01 peer]# gpnptool getpval -asm_spf
+DATA/cluster01/asmparameterfile/registry.253.783619911
- How to find all RD-discoverable resources of given type
[root@host01 peer]# gpnptool find
Found 3 instances of service ‘gpnp’.
        mdns:service:gpnp._tcp.local.://host03:18015/agent=gpnpd,cname=cluster01,host=host03,pid=5066/gpnpd h:host03 c:cluster01
        mdns:service:gpnp._tcp.local.://host02:17637/agent=gpnpd,cname=cluster01,host=host02,pid=5236/gpnpd h:host02 c:cluster01
        mdns:service:gpnp._tcp.local.://host01:16633/agent=gpnpd,cname=cluster01,host=host01,pid=5206/gpnpd h:host01 c:cluster01
References:
                     —————
Related links:

 

11g R2 RMAN: TRANSPORT DATABASE FROM LINUX TO WINDOWS

In this post, I will demonstrate transporting a database on Linux 32 bit platform to Windows 32 bit.
Overview
TDB requires that data files be converted to the target platform format. The data file
conversion can occur on either the source system or the target system. When performing a
source system conversion, TDB creates a second copy of all data files on the source system
in the format of the target system. The converted data files must then be transferred to the
proper location on the target system.
Using TDB to migrate a database to a new platform of the same endian format consists of the
following high-level steps:
1. Check prerequisites
2. Prepare for the platform migration
3. Start the database in READ ONLY mode
4. Verify the database is ready for migration
5. Run the RMAN CONVERT DATABASE command
6. Move necessary files to the target system
7. Complete the migration
Implementation:
– Check Prerequisites
– check the platform name of current database –
SQL>SELECT PLATFORM_NAME
          FROM  V$DATABASE;
PLATFORM_NAME
——————————
Linux IA (32-bit)
– Check the endian format of current platform
– Note that ENDIAN format is little

SQL>SELECT PLATFORM_ID, PLATFORM_NAME, ENDIAN_FORMAT
          FROM   V$TRANSPORTABLE_PLATFORM
         WHERE UPPER(PLATFORM_NAME) LIKE ‘%LINUX IA%';
PLATFORM_ID PLATFORM_NAME                  ENDIAN_FORMAT
———– —————————— ————–
         10 Linux IA (32-bit)              Little
         11 Linux IA (64-bit)              Little
– Check that endian format of the target platform i.e. Windows IA(32-bit) is also little
– Hence database can be transported
SQL>SELECT PLATFORM_ID, PLATFORM_NAME, ENDIAN_FORMAT
          FROM   V$TRANSPORTABLE_PLATFORM
          WHERE UPPER(PLATFORM_NAME) LIKE ‘%MICROSOFT WINDOWS%';


PLATFORM_ID PLATFORM_NAME                  ENDIAN_FORMAT
———– —————————— ————–
          7 Microsoft Windows IA (32-bit)  Little
          8 Microsoft Windows IA (64-bit)  Little
         12 Microsoft Windows x86 64-bit   Little
If the target platform does not appear in the output from V$TRANSPORTABLE_PLATFORM, then
the database cannot be migrated using TDB.
Start the database in READ ONLY mode
SQL> shutdown immediate;
     startup mount;
     alter database open read only;
— Verify the database is ready for migration
SQL>
    SET  SERVEROUTPUT ON;
    DECLARE
      db_ready BOOLEAN;
    BEGIN
      db_ready :=
         DBMS_TDB.CHECK_DB(‘Microsoft Windows IA (32-bit)’,DBMS_TDB.SKIP_READONLY);
      if db_ready
          then
             dbms_output.put_line(‘YES your database can be transported to Windows
platform’);
      else
         dbms_output.put_line(‘NO your database can not be transported to WIndows
Platform’);
      end if;
     end;
/
– Make a folder to hold converted datafiles
$mkdir /home/oracle/convertdb
— Run the RMAN CONVERT DATABASE Command
The CONVERT DATABASE command specified in the example creates
  – a transport script named /home/oracle/convertdb/transport.sql,
  – a PFILE initnewdb.ora in file /home/oracle/convertdb
  – a copy of all data files in the =/home/oracle/convertdb  directory in the format of the target platform ‘Microsoft Windows IA (32-bit)
RMAN>CONVERT DATABASE
              NEW DATABASE ‘newdb’
              TRANSPORT SCRIPT ‘/home/oracle/convertdb/transport.sql’
              TO PLATFORM ‘Microsoft Windows IA (32-bit)’
               FORMAT ‘/home/oracle/convertdb/newdb’
               DB_FILE_NAME_CONVERT ‘/u01/app/oracle/oradata/linuxdb/’ ‘/home/oracle/convertdb/';


Starting conversion at source at 18-NOV-12
….
channel ORA_DISK_1: starting datafile conversion
input datafile file number=00001 name=/u01/app/oracle/oradata/linuxdb/system01.dbf
converted datafile=/home/oracle/convertdb/system01.dbf
input datafile file number=00004 name=/u01/app/oracle/oradata/linuxdb/users01.dbf
converted datafile=/home/oracle/convertdb/users01.dbf
channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:00:01
Edit init.ora file /home/oracle/convertdb/init_newdb.ora. This PFILE will be used to create
the database on the target platform
Run SQL script /home/oracle/convertdb/transport.sql on the target platform to create
database
To recompile all PL/SQL modules, run utlirp.sql and utlrp.sql on the target platform
To change the internal database identifier, use DBNEWID Utility
Finished conversion at source at 18-NOV-12
– Review and Edit the PFILE
$vi  /home/oracle/convertdb/init_newdb.ora
# Please change the values of the following parameters:
  control_files            = “c:\app\administrator\oradata\newdb\control01.ctl”
  db_recovery_file_dest=”c:\app\administrator\flash_recovery_area”
  audit_file_dest          = “c:\app\administrator\admin\newdb\adump”
– Review and edit the transport script transport.sql
$vi  /home/oracle/convertdb/transport.sql
– Modify the following :
STARTUP NOMOUNT PFILE=’C:\APP\ADMINISTRATOR\PRODUCT\11.2.0\DBHOME_1\DBS
\initnewdb.ora';
LOGFILE
  GROUP 1 ‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\REDO01.LOG’ SIZE 50M BLOCKSIZE 512,
  GROUP 2 ‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\REDO02.LOG’ SIZE 50M BLOCKSIZE 512,
  GROUP 3 ‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\REDO03.LOG’ SIZE 50M BLOCKSIZE 512
DATAFILE
  ‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\system01.dbf’,
  ‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\sysaux01.dbf’,
  ‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\undotbs01.dbf’,
  ‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\users01.dbf’,
  ‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\example01.dbf’
ALTER TABLESPACE TEMP
ADD TEMPFILE
 ‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\TEMP01.DBF’
  SIZE 20971520  AUTOEXTEND ON NEXT 655360  MAXSIZE 32767M;
STARTUP UPGRADE PFILE=’C:\APP\ADMINISTRATOR\PRODUCT\11.2.0\DBHOME_1\DBS\initnewdb.ora';
@@ ?\rdbms\admin\utlirp.sql
SHUTDOWN IMMEDIATE
STARTUP PFILE=’C:\APP\ADMINISTRATOR\PRODUCT\11.2.0\DBHOME_1\DBS\initnewdb.ora';
– Make folders on target windows host
c:\>mkdir c:\app\administrator\oradata\newdb
      mkdir c:\app\administrator\admin\newdb\adump”
– Copy all the files in /home/oracle/convertdb folder to c:\app\administrator\oradata\newdb
   i.e. datafiles
        Pfile
        transport script
– Move PFILE to $ORACLE_HOME/database  i.e. c:\app\administrator\product
\11.2.0\dbhome_1\dbs
C:\>copy c:\app\administrator\oradata\newdb\init_newdb.ora c:\app\administrator\product
\11.2.0\dbhome_1\dbs\initnewdb.ora
– create a service for newdb on windows
c:\>oradim -new -sid newdb
– Run the transport script
c:\>set oracle_sid=newdb
      sqlplus  / as sysdba
SQL>@c:\app\administrator\oradata\newdb\transport.sql
– check that database has been transported —
SQL>select * from hr.employees;
——————————————————————————-
Related links:
                                                                                             ———————–

11g R2 RAC: NODE EVICTION DUE TO MISSING NETWORK HEARTBEAT

In this post, I will demonstrate node eviction due to missing netsork heartbeat i.e. a node will be evicted from the cluster, if it can’t communicate wioth other nodes in the cluster. To simulate it, I will stop private network on one of the nodes and then scan alert logs  of the surviving nodes.
Current scenario:
No. of nodes in the cluster  : 3
Names of the nodes      : host01, host02, host03
Name of the cluster database : orcl
I will stop PVT. network  service on host03 so that it is evicted.
– Find out the pvt network name
[root@host03 ~]# oifcfg getif
eth0  192.9.201.0  global  public
eth1  10.0.0.0  global  cluster_interconnect
– Stop pvt. network service on host03 so that it can’t communicate with host01 and host02 and will be evicted.
[root@host03 ~]# ifdown eth1
——————-
OCSSD log of host03
——————–
It can be seen that CSSD process of host03 can’t communicate with host01 and host02
at 09:43:52
Hence votedisk timeouot is set to Short Disk Time OUT (SDTO) = 27000 ms (27 secs)
2012-11-19 09:43:52.714: [    CSSD][843736976]clssnmPollingThread: node host01 (1) at 50% heartbeat fatal, removal in 14.880 seconds
2012-11-19 09:43:52.714: [    CSSD][843736976]clssnmPollingThread: node host01 (1) is impending reconfig, flag 132108, misstime 15120
2012-11-19 09:43:52.714: [    CSSD][843736976]clssnmPollingThread: node host02 (2) at 50% heartbeat fatal, removal in 14.640 seconds
2012-11-19 09:43:52.714: [    CSSD][843736976]clssnmPollingThread: node host02 (2) is impending reconfig, flag 132108, misstime 15360
2012-11-19 09:43:52.714: [    CSSD][843736976]clssnmPollingThread: local diskTimeout set to 27000 ms, remote disk timeout set to 27000, impending reconfig status(1)
2012-11-19 09:43:52.927: [    CSSD][2833247120]clssnmSendingThread: sending status msg to all nodes
———————–
– Alert log of host03
———————–
– At 09:43:52, CSSD process host03 identifies that it can’t communicate with CSSD on host02 and host03
[cssd(5124)]CRS-1612:Network communication with node host01 (1) missing for 50% of timeout interval.  Removal of this node from cluster in 14.880 seconds
2012-11-19 09:43:52.714
[cssd(5124)]CRS-1612:Network communication with node host02 (2) missing for 50% of timeout interval.  Removal of this node from cluster in 14.640 seconds
2012-11-19 09:44:01.880
[cssd(5124)]CRS-1611:Network communication with node host01 (1) missing for 75% of timeout interval.  Removal of this node from cluster in 6.790 seconds
2012-11-19 09:44:01.880
[cssd(5124)]CRS-1611:Network communication with node host02 (2) missing for 75% of timeout interval.  Removal of this node from cluster in 6.550 seconds
2012-11-19 09:44:06.536
[cssd(5124)]CRS-1610:Network communication with node host01 (1) missing for 90% of timeout interval.  Removal of this node from cluster in 2.780 seconds
2012-11-19 09:44:06.536
[cssd(5124)]CRS-1610:Network communication with node host02 (2) missing for 90% of timeout interval.  Removal of this node from cluster in 2.540 seconds
2012-11-19 09:44:09.599
– At 09:44:16, CSSD process of host03 reboots the node to preserve cluster integrity
[cssd(5124)]CRS-1609:This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /u01/app/11.2.0/grid/log/host03/cssd/ocssd.log.
2012-11-19 09:44:16.697
[/u01/app/11.2.0/grid/bin/orarootagent.bin(5713)]CRS-5822:Agent ‘/u01/app/11.2.0/grid/bin/orarootagent_root’ disconnected from server. Details at (:CRSAGF00117:) in /u01/app/11.2.0/grid/log/host03/agent/crsd/orarootagent_root/orarootagent_root.log.
2012-11-19 09:44:16.193
[ctssd(5285)]CRS-2402:The Cluster Time Synchronization Service aborted on host host03. Details at (:ctsselect_mmg5_1: in /u01/app/11.2.0/grid/log/host03/ctssd/octssd.log.
2012-11-19 09:44:21.177
——————–
Ocssd log of host01
——————–
– At 09:43:53, CSSD process of host01 identifies that it can’tommunicate with CSSD on host03
2012-11-19 09:43:53.340: [    CSSD][841635728]clssnmPollingThread: node host03 (3) at 50% heartbeat fatal, removal in 14.500 seconds
2012-11-19 09:43:53.340: [    CSSD][841635728]clssnmPollingThread: node host03 (3) is impending reconfig, flag 132110, misstime 15500
2012-11-19 09:43:53.340: [    CSSD][841635728]clssnmPollingThread: local diskTimeout set to 27000 ms, remote disk timeout set to 27000, impending reconfig status(1)
——————-
Alert log of host01
——————-
– At 09:44:01, alert log of host01 is updated regarding communication failure with host03
[cssd(5308)]CRS-1612:Network communication with node host03 (3) missing for 50% of timeout interval.  Removal of this node from cluster in 14.500 seconds
2012-11-19 09:44:01.695
[cssd(5308)]CRS-1611:Network communication with node host03 (3) missing for 75% of timeout interval.  Removal of this node from cluster in 7.450 seconds
2012-11-19 09:44:07.666
[cssd(5308)]CRS-1610:Network communication with node host03 (3) missing for 90% of timeout interval.  Removal of this node from cluster in 2.440 seconds
2012-11-19 09:44:10.606
[cssd(5308)]CRS-1607:Node host03 is being evicted in cluster incarnation 32819913; details at (:CSSNM00007:) in /u01/app/11.2.0/grid/log/host01/cssd/ocssd.log.
2012-11-19 09:44:24.705
– At 09:44:24, OHASD process on host01 receives reboot message from host03
[ohasd(4941)]CRS-8011:reboot advisory message from host: host03, component: ag050107, with time stamp: L-2012-11-19-09:44:24.373
[ohasd(4941)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS
2012-11-19 09:44:24.705
[ohasd(4941)]CRS-8011:reboot advisory message from host: host03, component: mo050107, with time stamp: L-2012-11-19-09:44:24.376
[ohasd(4941)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS
2012-11-19 09:44:46.379
[cssd(5308)]CRS-1601:CSSD Reconfiguration complete. Active nodes are host01 host02 .
——————-
OCSSD log of host02
——————–
– At 09:43:52, CSSD process of host02 identifies communication failure with host03
2012-11-19 09:43:52.385: [    CSSD][841635728]clssnmPollingThread: node host03 (3) at 50% heartbeat fatal, removal in 14.950 seconds
2012-11-19 09:43:52.386: [    CSSD][841635728]clssnmPollingThread: node host03 (3) is impending reconfig, flag 394254, misstime 15050
2012-11-19 09:43:52.386: [    CSSD][841635728]clssnmPollingThread: local diskTimeout set to 27000 ms, remote disk timeout set to 27000, impending reconfig status(1)
2012-11-19 09:43:52.733: [    CSSD][810166160]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:ASMDISK01 sched delay 970 > margin 750 cur_ms 18331974 lastalive 18331004
20
——————–
Alert log of host02
———————
– At 09:44:01 (same as host01), alert log of host02 is updated regarding communication failure with host03
[cssd(5284)]CRS-1612:Network communication with node host03 (3) missing for 50% of timeout interval.  Removal of this node from cluster in 14.950 seconds
2012-11-19 09:44:01.971
[cssd(5284)]CRS-1611:Network communication with node host03 (3) missing for 75% of timeout interval.  Removal of this node from cluster in 6.930 seconds
2012-11-19 09:44:06.750
[cssd(5284)]CRS-1610:Network communication with node host03 (3) missing for 90% of timeout interval.  Removal of this node from cluster in 2.920 seconds
2012-11-19 09:44:24.520
– At 09:44:24 (same as host01), OHASD process on host01 receives reboot message from host03
[ohasd(4929)]CRS-8011:reboot advisory message from host: host03, component: ag050107, with time stamp: L-2012-11-19-09:44:24.373
[ohasd(4929)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS
2012-11-19 09:44:24.520
[ohasd(4929)]CRS-8011:reboot advisory message from host: host03, component: mo050107, with time stamp: L-2012-11-19-09:44:24.376
[ohasd(4929)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS
2012-11-19 09:44:46.073
[cssd(5284)]CRS-1601:CSSD Reconfiguration complete. Active nodes are host01 host02 .
20
————————————————————————————–
Related links: