Monthly Archives: November 2012

11g R2 RAC : AUTOLOCATE BACKUPS

November 30, 201211g R2 RAC, 11g R2 RMAN, UncategorizedAnju Garg

During a restore operation, RMAN automatically performs the autolocation of backups. A channel connected to a specific node only attempts to restore files that were backed up to the node. For example, assume that example tablespace is backed up to the drive attached to node1, while tablespace users is backed up to the drive attached to node2. If you then allocate channels that connect to each node, then the channel connected to node1 can restore example tablespace (but not users), and the channel connected to node2 can restore users tablespace (but not example).

Let’s demonstrate in the two node RAC setup I have:

Overview :

– Create folder for backups on both the nodes with appropriate permissions.

– Take backup of example and users tablespaces on node1 and node2 by allocating channels explicitly

– Take users and example tablespaces offline in immediate mode so that they will need media recovery to come online.

– Restore both tablespaces from backups on node1 and node2 by allocating channels explicitly to corresponding instances

– Note that RMAN automatically finds out that backup for example is stored on node1 and backup for users is stored on node2

and restores them from respective nodes.

Implementation:

– Create folder for backups on both the nodes with appropriate permissions.

[oracle]$mkdir /u01/app/oracle/oradata/orcl/backup

-- Take backup of example and users tablespaces on node1 and node2 by allocating channels explicitly

– Note that backup of tablespace example is created on node1 using channel c1

and backup of tablespace users is created on node1 using channel c2

RMAN> run {

allocate channel c1 device type disk

format ‘/u01/app/oracle/oradata/orcl/backup/orcl%U.bak’ connect=’sys/oracle@orcl1′;

allocate channel c2 device type disk

format ‘/u01/app/oracle/oradata/orcl/backup/orcl%U.bak’ connect=’sys/oracle@orcl2′;

backup tablespace users, example;

}

allocated channel: c1

channel c1: SID=57 instance=orcl1 device type=DISK

allocated channel: c2

channel c2: SID=46 instance=orcl2 device type=DISK

channel c1: specifying datafile(s) in backup set

input datafile file number=00005 name=+DATA/orcl/datafile/example.264.799999785

channel c2: specifying datafile(s) in backup set

input datafile file number=00004 name=+DATA/orcl/datafile/users.259.799999587

channel c1: finished piece 1 at 30-NOV-12

piece handle=/u01/app/oracle/oradata/orcl/backup/orcl0snrivk1_1_1.bak tag=TAG20121130T044136

channel c2: finished piece 1 at 30-NOV-12

piece handle=/u01/app/oracle/oradata/orcl0tnrivk1_1_1.bak tag=TAG20121130T044136

-- Take users and example tablespaces offline in immediate mode so that they will need media recovery to come online.

SQL>alter tablespace users offline immediate;

alter tablespace example offline immediate;

alter system switch logfile;

alter tablespace users online;

alter tablespace example online;

ERROR at line 1:

ORA-01113: file 4 needs media recovery

ORA-01110: data file 4: ‘+DATA/orcl/datafile/users.259.799999587′

ERROR at line 1:

ORA-01113: file 5 needs media recovery

ORA-01110: data file 5: ‘+DATA/orcl/datafile/example.264.799999785′

– Restore both tablespaces from backups on node1 and node2 by allocating channels explicitly to corresponding instances

– Note that RMAN automatically finds out that backup for example is stored on node1 and backup for users is stored on node2

and restores them from respective nodes.

RMAN> run {

allocate channel c1 device type disk

connect=’sys/oracle@orcl1′;

allocate channel c2 device type disk

connect=’sys/oracle@orcl2′;

restore tablespace users, example;

}

allocated channel: c1

channel c1: SID=57 instance=orcl1 device type=DISK

allocated channel: c2

channel c2: SID=46 instance=orcl2 device type=DISK

channel c1: restoring datafile 00005 to +DATA/orcl/datafile/example.264.799999785

channel c1: reading from backup piece /u01/app/oracle/oradata/orcl/backup/orcl0snrivk1_1_1.bak

channel c2: restoring datafile 00004 to +DATA/orcl/datafile/users.259.799999587

channel c2: reading from backup piece /u01/app/oracle/oradata/orcl/backup/orcl0tnrivk1_1_1.bak

– Recover tablespaces and bring them online

RMAN>recover tablespace example, users;

SQL> alter tablespace example online;

alter tablespace users online;

I hope you found the article useful. Keep visiting the blog …

———————————————————————

Related links:

Home

11G R2 RAC INDEX
11G RMAN INDEX

11g R2 RAC: HOW TO FIND THE RESOURCE MASTER?

November 27, 201211g R2 RAC, UncategorizedAnju Garg

In this post, I will demonstrate three methods to find the master of a resource(block).

In RAC, every data block is mastered by an instance. Mastering a block simply means that master instance keeps track of the state of the block until the next reconfiguration event.

– Remastering can be triggered as result of

– Manually

– Resource affinity

– Instance crash

- Method – I gets info about master node from v$gcspfmaster_info using data_object_id

- Method – II gets info about master node from v$dlm_ress and v$ges_enqueue using resource name in hexadecimal format

- Method – III gets info about master node from x$kjbl with x$le using resource name in hexadecimal format

– CURRENT SCENARIO -

- 3 node setup

- name of the database – orcl

— SETUP –

SYS@NODE1>create table scott.emp1 as

select * from scott.emp;

– Get data_object_id for scott.emp1

SQL> col owner for a10
col data_object_id for 9999999
col object_name for a15
select owner, data_object_id, object_name
from dba_objects
where owner = ‘SCOTT’
and object_name = ‘EMP1′;

OWNER DATA_OBJECT_ID OBJECT_NAME

———- ————– —————

SCOTT 74652 EMP1

For Method-II and Method-III, we need to find out file_id and block_id and hence GCS resource name in hexadecimal format

— Get File_id and range of block_ids of emp1 table

– It can be seen that emp1 lies in block 523 of file 4.

SQL>select dbms_rowid.rowid_relative_fno(rowid) FILE_NO,
min(dbms_rowid.rowid_block_number(rowid)) MIN_BLOCK_ID,
max(dbms_rowid.rowid_block_number(rowid)) MAX_BLOCK_ID
from scott.emp1
group by dbms_rowid.rowid_relative_fno(rowid);

FILE_NO MIN_BLOCK_ID MAX_BLOCK_ID

———- ———— ————

4 523 523

– Find the GCS resource name to be used in the query using blodk_id and data_object_id retrieved above.

x$kjbl.kjblname = resource name in hexadecimal format([id1],[id2],[type]

x$kjbl.kjblname2 = resource name in decimal format

Hexname will be used to query resource master using v$dlm_ress , v$ges_enqueue, $kjbl

and x$le

SQL> col hexname for a25
col resource_name for a15
select b.kjblname hexname, b.kjblname2 resource_name
from x$le a, x$kjbl b
where a.le_kjbl=b.kjbllockp
and a.le_addr = ( select le_addr
from x$bh
where dbablk = 523
and obj = 74652
and class = 1
and state <> 3);

HEXNAME RESOURCE_NAME

————————- —————

[0x20b][0x4],[BL] 523,4,BL

– Manually master the EMP table to node1 –

SYS@NODE1>oradebug lkdebug -m pkey <objectid>

SYS@NODE1>oradebug lkdebug -m pkey 74652

—- GET RESOURCE MASTER NAME ———-

Method – I gets info about master node from v$gcspfmaster_info using data_object_id

– ——-

– Note that current master is node1 (Node numbering starts from 0)

SYS@node1>col object_name for A10

select o.object_name, m.CURRENT_MASTER
from dba_objects o, v$gcspfmaster_info m
where o.data_object_id=74652
and m.data_object_id = 74652 ;

OBJECT_NAM CURRENT_MASTER

———- ————–

EMP1 0

—- Method II gets info about master node from v$dlm_ress and v$ges_enqueue

using resource name in hexadecimal format

– check that master node is node1 (node numbering starts with 0)

SYS@NODE1>col resource_name for a22 select a.resource_name, a.master_node
from v$dlm_ress a, v$ges_enqueue b
where upper(a.resource_name) = upper(b.resource_name1)
and a.resource_name like ‘%[0x20b][0x4],[BL]%';

RESOURCE_NAME MASTER_NODE

———————- ———–

[0x20b][0x4],[BL] 0

Method – III gets info about master node from x$kjbl with x$le

using resource name in hexadecimal format

– This SQL joins x$kjbl with x$le to retrieve resource master for a block

– Note that current master is node1(MASTER=0)

SYS@NODE1> select kj.kjblname, kj.kjblname2, kj.kjblmaster master
from (select kjblname, kjblname2, kjblowner, kjblmaster, kjbllockp
from x$kjbl
where kjblname = ‘[0x20b][0x4],[BL]’
) kj, x$le le
where le.le_kjbl = kj.kjbllockp
order by le.le_addr;

KJBLNAME KJBLNAME2 MASTER

—————————— —————————— ———-

[0x20b][0x4],[BL] 523,4,BL 0

I hope you found this information useful.

Keep visiting the blog !!!

——————————————————————————————–

Home

11g R2 RAC Index

11g R2 RAC : Dynamic Remastering

11g R2 RAC: How To Find The Master Node?

11G R2 RAC: HOW TO IDENTIFY THE MASTER NODE IN RAC

November 24, 201211g R2 RAC, UncategorizedAnju Garg

In this post, I will demonstrate three methods to identify the oracle clusterware’s master node. Pls note that clusterware master is different from Resource master in oracle database instance. To know about how to find the resource master, pls click here:.
Importance of master node in a cluster:
- Master node has the least Node-id in the cluster. Node-ids are assigned to the nodes in the same order as the nodes join the cluster. Hence, normally the node which joins the cluster first is the master node.

- CRSd process on the Master node is responsible to initiate the OCR backup as per the backup policy
- Master node is also responsible to sync OCR cache across the nodes
- CRSd process oth the master node reads from and writes to OCR on disk
- In case of node eviction, The cluster is divided into two sub-clusters. The sub-cluster containing fewer no. of nodes is evicetd. But, in case both the sub-clusters have same no. of nodes, the sub-cluster having the master node survives whereas the other sub-cluster is evicted.

Oracle ClusterWare master’s information can be found

- by scanning ocssd logs from various nodes
- by scanning crsd logs from various nodes.
- by identifying the node which takes the backup of the OCR.

If master node gets evicted/rebooted, another node becomes the master.

I have a 3 node setup. I check the ocssd logs on the 3 nodes for the string ‘master node’ and note that node 3 is the master node.

[grid@host01 root]$ cat $ORACLE_HOME/log/host01/cssd/ocssd.log |grep ‘master node’ |tail -1

2012-11-23 10:14:36.949: [ CSSD][2778696592]clssgmCMReconfig: reconfiguration successful, incarnation 248954981 with 3 nodes, local node number 1, master node number 3

[root@host02 cssd]# cat $ORACLE_HOME/log/host02/cssd/ocssd.log |grep ‘master node’ |tail -1

2012-11-23 10:14:36.953: [ CSSD][778696592]clssgmCMReconfig: reconfiguration successful, incarnation 248954981 with 3 nodes, local node number 2, master node number 3

[root@host03 ~]# cat $ORACLE_HOME/log/host03/cssd/ocssd.log |grep ‘master node’ |tail -1

2012-11-23 10:14:37.001: [ CSSD][778700688]clssgmCMReconfig: reconfiguration successful, incarnation 248954981 with 3 nodes, local node number 3, master node number 3

If I take the OCR backup right now, it will be taken by node3 (master node).

[root@host02 cssd]# ocrconfig -manualbackup

host03 2012/11/24 09:54:48 /u01/app/11.2.0/grid/cdata/cluster01/backup_20121124_095448.ocr

Let us check crsd logs of various nodes, looking for the string OCR MASTER.Note that node3 is the master node presently.

[grid@host01 crsd]$ cat /u01/app/11.2.0/grid/log/host01/crsd/crsd.log |grep ‘OCR MASTER’ | tail -1

2012-11-23 10:15:01.403: [ OCRMAS][2877356944]th_master: NEW OCR MASTER IS 3

[root@host02 crsd]# cat /u01/app/11.2.0/grid/log/host02/crsd/crsd.log |grep ‘OCR MASTER’ | tail -1

2012-11-23 10:15:03.561: [ OCRMAS][876976016]th_master: NEW OCR MASTER IS 3

[root@host03 crsd]# cat /u01/app/11.2.0/grid/log/host03/crsd/crsd.log |grep ‘OCR MASTER’ | tail -3

2012-11-23 10:11:18.499: [ OCRMAS][877467536]th_master:13: I AM THE NEW OCR MASTER at incar 44. Node Number 3

[

Let me reboot node3 and check which node is assigned the mastership now.

[root@host03 ~]# init 6

check the ocssd logs on the remaining two nodes (node1 and node2) for the string ‘master node’ and note that node 1 is the master node.

[grid@host01 root]$ cat $ORACLE_HOME/log/host01/cssd/ocssd.log |grep ‘master node’ |tail -1

2012-11-24 10:09:23.522: [ CSSD][2778696592]clssgmCMReconfig: reconfiguration successful, incarnation 248954982 with 2 nodes, local node number 1, master node number 1

[root@host02 cssd]# cat $ORACLE_HOME/log/host02/cssd/ocssd.log |grep ‘master node’ |tail -1

2012-11-24 10:09:23.502: [ CSSD][778696592]clssgmCMReconfig: reconfiguration successful, incarnation 248954982 with 2 nodes, local node number 2, master node number 1

As can be seen from ocssd logs of the remaining two nodes, node1 has become the master now.

Now If I take the OCR backup, it is taken by node1 while earlier backups were taken by node3 which was the then master.

[root@host02 cssd]# ocrconfig -manualbackup

host01 2012/11/24 10:12:29 /u01/app/11.2.0/grid/cdata/cluster01/backup_20121124_101229.ocr

host03 2012/11/24 09:54:48 /u01/app/11.2.0/grid/cdata/cluster01/backup_20121124_095448.ocr

Let us check crsd logs of various nodes, looking for the string OCR MASTER

[grid@host01 crsd]$ cat /u01/app/11.2.0/grid/log/host01/crsd/crsd.log |grep ‘OCR MASTER’ | tail -1

2012-11-24 10:08:45.884: [ OCRMAS][877356944]th_master:13: I AM THE NEW OCR MASTER at incar 47. Node Number 1

[root@host02 crsd]# cat /u01/app/11.2.0/grid/log/host02/crsd/crsd.log |grep ‘OCR MASTER’ | tail -1

2012-11-24 10:08:45.364: [ OCRMAS][876976016]th_master: NEW OCR MASTER IS 1

[root@host03 crsd]# cat /u01/app/11.2.0/grid/log/host03/crsd/crsd.log |grep ‘OCR MASTER’ | tail -1

2012-11-24 10:12:20.282: [ OCRMAS][877422480]th_master: NEW OCR MASTER IS 1

Hope you enjoyed reading !!

Keep visiting the blog ….

Regards
——————————————————————————————–Related links:

Home

11g R2 RAC Index

11g R2 RAC : Dynamic Remastering

11g R2 RAC: How To Find The Resource Master?

——————

DUPLICATE DATABASE WITH CONNECTION TO RECOVERY CATALOG

November 22, 201211g R2 RMAN, UncategorizedAnju Garg

In this post, I will demonstrate how to duplicate a database using its backups.

This method requires connection to recovery catalog only to get information about the

backups. Connection with the target database is not needed.

********************************

source database orcl (ASM)

Duplicate database orclt (ASM)

Recovery catalog database : rcat

***********************************

Overview:

on the source host

- Create recovery catalog in rcat database
- Register source database orcl with recovery catalog
- BACKUP DATABASE PLUS ARCHIVELOG AND CONTROLFILE while connected to recovery catalog
- Copy these backup files to the server where you want to create the duplicate copy.
- CREATE PFILE FROM SOURCE DATABASE

on the target host

- Add a line in the file /etc/oratab to reflect the database instance(orclt) you are going
to copy
- create folders
- Copy the initialization parameter file from the source database add edit it.
- Copy the password file
- Startup the duplicate database (orclt) in nomount mode using modified parameter file
- Using RMAN connect to the catalog database(rcat) as target database and duplicate
database (orclt) as auxiliary instance
- duplicate the database orcl to orclt

———————-

on the source host

———————-

- Create recovery catalog in rcat database

.create tablespace to hold recovery catalog data –

sql>create tablespace rcat datafile '+data/rcat01.dbf' size 15m;

. create recovery catalog owner

sql>create user rcatowner identified by oracle

    default tablespace rcat quota unlimited on rcat;

    grant recovery_catalog_owner to rcatowner;

. Create recovery catalog

$rman catalog rcatowner/oracle@rcat

RMAN>CREATE CATALOG;

- Register source database orcl with recovery catalog

$RMAN TARGET sys/oracle@orcl CATALOG rcatowner/oracle@rcat

RMAN>REGISTER DATABASE;

– Make a folder to stage the backup

oracle$mkdir /home/orale/stage

– Take the backup of the source database

oracle$. oraenv orcl

         rman target /

RMAN>CONFIGURE CONTROLFILE AUTOBACKUP ON;

     backup database format '/home/oracle/stage/%U.rmb';

     backup archivelog all  format '/home/oracle/stage/arc_%r_%s_%t.log';

The controlfile backup is also required. If you have configured the controlfile

autobackup, the backup would contain the controlfile as well. If you want to be sure,

ors you have not configured controlfile autobackup, you can backup the controlfile

explicitly.

– CREATE PFILE FROM SOURCE DATABASE

SQL>CREATE PFILE='/u01/app/oracle/oradata/orcl/initsource.ora'

            FROM SPFILE

———————

on the target host.

————————

— Copy the backup files to the server where you want to create the duplicate copy.

$mkdir -p /home/oracle/stage

 scp sourcehost:/home/oracle/stage/*.rmb desthost::/home/oracle/stage/

– Add a line in the file /etc/oratab to reflect the database instance you are going to copy:

orclt:/u01/app/oracle/product/11.2.0/db_1:N

– Now set the Oracle SID as the duplicated database SID:

# . oraenv

ORACLE_SID = [orclt] ?

– create folders

 $mkdir -p /u01/app/oracle/oradata/orclt

  mkdir -p /u01/app/oracle/flash_recovery_area/orclt

  mkdir -p /u01/app/oracle/admin/orclt/adump

  mkdir -p /u01/app/oracle/admin/orclt/dpdump

– Copy the initialization parameter file from the main database.

– If u are duplicating on the same host

$cp  /u01/app/oracle/oradata/orcl/initsource.ora

/u01/app/oracle/oradata/orclt/inittarget.ora

– If u are duplicating on the different host

$scp  soucehost:/u01/app/oracle/oradata/orcl/initsource.ora

/u01/app/oracle/oradata/orclt/inittarget.ora

– Edit parameter file

$vi /u01/app/oracle/oradata/orclt/inittarget.ora

– Change db_name = orclt

– Edit it to reflect the new locations that might be appropriate

such as control file locations,audit dump destinations, datafile

locations, etc.

– add these lines –

db_file_name_convert = (“+DATA/orcl”,”+DATA/orclt”)

log_file_name_convert = (“+DATA/orcl”,”+DATA/orclt”, “+FRA/orcl”,”+FRA/orclt”)

————————————–

– Copy the password file as well

$ cp /u01/app/oracle/product/11.2.0/db_1/dbs/orapworcl  /u01/app/oracle/product/11.2.0/db_1/dbs/orapworclt

$ scp sourcehost:/u01/app/oracle/product/11.2.0/db_1/dbs/orapworcl  /u01/app/oracle/product/11.2.0/db_1/dbs/orapworclt

– Startup the duplicate database in nomount mode using modified parameter file

$ . oraenv

ORACLE_SID = [orclt] ?

$sqlplus sys/oracle as sysdba

SQL> startup nomount pfile='/u01/app/oracle/oradata/orclt/inittarget.ora';

          create spfile from pfile='/u01/app/oracle/oradata/orclt/inittarget.ora';

- Using RMAN connect to the catalog database (rcat) and duplicate database (orclt) as

auxiliary instance

$rman catalog rcatowner/oracle@rcat  auxiliary sys/oracle@orclt

– duplicate the database orcl to orclt

– the command performs the following steps:

* Creates an SPFILE

* Shuts down the instance and restarts it with the new spfile

* Restores the controlfile from the backup

* Mounts the database

* Performs restore of the datafiles. In this stage it creates the files in the

converted names.

* Recovers the datafiles up to the time specified and opens the database

RMAN>duplicate  database orcl to orclt;

– check that duplicate database is up

$sqlplus / as sysdba

sql>conn hr/hr

    select * from tab;

– Note that DBID is different from the main database so it can be backed up

independently and using the same catalog as well.

SQL> select dbid from v$database;

DBID

———-

3780202035

SQL>conn sys/oracle@orcl as sysdba

    select dbid from v$database;

DBID

———-

1327232823

——————————————————————————————–

—

11G R2 RAC: GPNP PROFILE DEMYSTIFIED

November 19, 201211g R2 RAC, UncategorizedAnju Garg

In this post, I will explain what is GPnP profile, what does it contain and how is it used by clusterware.

WHAT IS GPNP PROFILE?

The GPnP profile is a small XML file located in GRID_HOME/gpnp/<hostname>/profiles/peer under the name profile.xml. It is used to establish the correct global personality of a node. Each node maintains a local copy of the GPnP Profile and is maintanied by the GPnP Deamon (GPnPD) .

WHAT DOES GPNP PROFILE CONTAIN?

GPnP Profile is used to store necessary information required for the startup of Oracle Clusterware like SPFILE location,ASM DiskString etc.

It contains various attributes defining node personality.

- Cluster name

- Network classifications (Public/Private)

- Storage to be used for CSS

- Storage to be used for ASM : SPFILE location,ASM DiskString etc

- Digital signature information : The profile is security sensitive. It might identify the storage to be used as the root partition of a machine. Hence, it contains digital signature information of the provisioning authority.

Here is the GPnP profile of my RAC setup.

gpnptool can be used for reading/editing the gpnp profile.

[root@host01 peer]# gpnptool get

<?xml version=”1.0″ encoding=”UTF-8″?><gpnp:GPnP-Profile Version=”1.0″ xmlns=”http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:gpnp=”http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:orcl=”http://www.oracle.com/gpnp/2005/11/gpnp-profile” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd” ProfileSequence=”7″ ClusterUId=”14cddaccc0464f92bfc703ec1004a386″ ClusterName=”cluster01″ PALocation=””><gpnp:Network-Profile><gpnp:HostNetwork id=”gen” HostName=”*”><gpnp:Network id=”net1″ IP=”192.9.201.0″ Adapter=”eth0″ Use=”public”/><gpnp:Network id=”net2″ IP=”10.0.0.0″ Adapter=”eth1″ Use=”cluster_interconnect”/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id=”css” DiscoveryString=”+asm” LeaseDuration=”400″/><orcl:ASM-Profile id=”asm” DiscoveryString=”” SPFile=”+DATA/cluster01/asmparameterfile/registry.253.783619911″/><ds:Signature xmlns:ds=”http://www.w3.org/2000/09/xmldsig#”><ds:SignedInfo><ds:CanonicalizationMethod Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#”/><ds:SignatureMethod Algorithm=”http://www.w3.org/2000/09/xmldsig#rsa-sha1″/><ds:Reference URI=””><ds:Transforms><ds:Transform Algorithm=”http://www.w3.org/2000/09/xmldsig#enveloped-signature”/><ds:Transform Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#”> <InclusiveNamespaces xmlns=”http://www.w3.org/2001/10/xml-exc-c14n#” PrefixList=”gpnp orcl xsi”/></ds:Transform></ds:Transforms><ds:DigestMethod Algorithm=”http://www.w3.org/2000/09/xmldsig#sha1″/><ds:DigestValue>4VMorzxVNa+FeOx2SCk1unVBpfU=</ds:DigestValue></ds:Reference></ds:SignedInfo><ds:SignatureValue>bbzV04n2zSGTtUEvqqB+pjw1vH7i8MOEUqkhXAyloX0a41T2FkDEA++ksc0BafndAk7tR+6LGdppE1aOsaJUtYxQqaHJdpVsJF+sj2jN7LPJlT5NBt+K7b08TLjDID92Se6vEiDAeeKlEbpVWKMUIvQvp6LrYK8cDB/YjUnXuGU=</ds:SignatureValue></ds:Signature></gpnp:GPnP-Profile>

WHO UPDATES GPNP PROFILE?

GPnPd daemon replicates changes to the profile during

- installation,
- system boot or
- when updated

Profile is updated Whenever changes are made to a cluster with configuration tools like

. oifcfg (Change network),
. crsctl (change location of voting disk),
. asmcmd (change ASM_DISKSTRING, SPfile location) etc.

HOW IS GPNP PROFILE USED BY CLUSTERWARE?

To start clusterware, voting disk needs to be accessed. If voting disk is on ASM, this information (that voting disk is on ASM) is read from GPnP profile (<orcl:CSS-Profile id=”css” DiscoveryString=”+asm” LeaseDuration=”400″/>). The voting disk is read using kfed utility even if ASM is not up.

Next, the clusterware checks if all the nodes have the updated GPnP profile and the node joins the cluster based on the GPnP configuration . Whenver a node is started/added to the cluster, the clusterware software on the starting node starts a GPnP agent.

- If the node is already part of the cluster, the GPnP agent reads the existing profile on that node.
- If the node is being added to the cluster, GPnP agent locates agent on another existing node using multicast protocol (provided by mDNS) and gets the profile from that agent.

Next CRSD needs to read OCR to startup various resources on the node and hence update it as status of resources changes. Since OCR is also on ASM, location of ASM SPfile should be known.

The order of searching the ASM SPfile is

- GPnP profile
- ORACLE_HOME/dbs/spfile<sid.ora>
- ORACLE_HOME/dbs/init<sid.ora>

In cluster environment, the location of SPfile for ASMread from GPnP profile.

[grid@host01 peer]$ gpnptool getpval -asm_spf

Warning: some command line parameters were defaulted. Resulting command line:

/u01/app/11.2.0/grid/bin/gpnptool.bin getpval -asm_spf -p=profile.xml -o-

+DATA/cluster01/asmparameterfile/registry.253.793721441

The oputput of the query shows that SPfile is on ASM in DATA diskgroup. To find out the

location of ASM disks, following query is issued :

[root@host01 peer]# gpnptool getpval -asm_dis

ASM-Profile id=”asm” DiscoveryString=””

The device headers of every device in the disk string returned by the above query are scanned (if configured by you at ASM initial setup time). Here Discovery String is blank is as ASMDISKSTRINGS parameter has not been set. Hence, headers of all the ASM disks are scanned .

Here, I have shown the output of the query only on the disk which contains SPfile.(spfflg is not null)

[root@host01 ~]# kfed read /dev/sdb3 | grep -E ‘spf|ausize’

kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000

kfdhdb.spfile: 16 ; 0x0f4: 0x00000010

kfdhdb.spfflg: 1 ; 0x0f8: 0x00000001

In the output above, we see that

the device /dev/sdb3 contains a copy of the ASM spfile (spfflg=1).

The ASM spfile location starts at the disk offset of 16 (spfile=16)

Considering the allocation unit size (kfdhdb.ausize = 1M), let’s dump the ASM spfile from the device:

[root@host01 ~]# dd if=/dev/sdb3 of=spfileASM_Copy2.ora skip=16 bs=1M count=1

1+0 records in

1+0 records out

1048576 bytes (1.0 MB) copied, 0.170611 seconds, 6.1 MB/s

[root@host01 ~]# strings spfileASM_Copy2.ora

+ASM1.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value

+ASM2.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value

+ASM3.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value

+ASM3.asm_diskgroups=’FRA’#Manual Mount

+ASM2.asm_diskgroups=’FRA’#Manual Mount

+ASM1.asm_diskgroups=’FRA’#Manual Mount

*.asm_power_limit=1

*.diagnostic_dest=’/u01/app/grid’

*.instance_type=’asm’

*.large_pool_size=12M

*.remote_login_passwordfile=’EXCLUSIVE’

Using the parameters in SPfile, ASM is started.

Once ASM is up, OCR is read by CRSD and various resources on the node are started.

Each node reads network information in GPnP profile and using GNS, negotiates appropriate network identity for itself . Hence, nodes can be dynamically added/deleted.

What happens if GPnP profile is lost?

To know please click here.

————————————————————

GPNPTOOL COMMAND REFERENCE:

- How to read the profile

[root@inssc3 bin]# ./gpnptool get

- How to find GPnP Deamons are running on the local node

[root@host01 peer]# gpnptool lfind

Success. Local gpnpd found.

- How to find the location of ASM spfile if the ASM is down

[root@host01 peer]# gpnptool getpval -asm_spf

+DATA/cluster01/asmparameterfile/registry.253.783619911

- How to find all RD-discoverable resources of given type

[root@host01 peer]# gpnptool find

Found 3 instances of service ‘gpnp’.

mdns:service:gpnp._tcp.local.://host03:18015/agent=gpnpd,cname=cluster01,host=host03,pid=5066/gpnpd h:host03 c:cluster01

mdns:service:gpnp._tcp.local.://host02:17637/agent=gpnpd,cname=cluster01,host=host02,pid=5236/gpnpd h:host02 c:cluster01

mdns:service:gpnp._tcp.local.://host01:16633/agent=gpnpd,cname=cluster01,host=host01,pid=5206/gpnpd h:host01 c:cluster01

References:

http://ebobs.wordpress.com/2010/09/15/oracles-gpnp/

http://community.oraclepressbooks.com/downloads/tcoug-top-10-asm-questions.pdf

Oracle Grid Infrastructure: How to recover from a messed up ASM/CSSD diskstring

—————

11g R2 RMAN: TRANSPORT DATABASE FROM LINUX TO WINDOWS

November 19, 201211g R2 RMAN, UncategorizedAnju Garg

In this post, I will demonstrate transporting a database on Linux 32 bit platform to Windows 32 bit.

Overview

TDB requires that data files be converted to the target platform format. The data file

conversion can occur on either the source system or the target system. When performing a

source system conversion, TDB creates a second copy of all data files on the source system

in the format of the target system. The converted data files must then be transferred to the

proper location on the target system.

Using TDB to migrate a database to a new platform of the same endian format consists of the following high-level steps:

1. Check prerequisites

2. Prepare for the platform migration

3. Start the database in READ ONLY mode

4. Verify the database is ready for migration

5. Run the RMAN CONVERT DATABASE command

6. Move necessary files to the target system

7. Complete the migration

Implementation:

– Check Prerequisites

– check the platform name of current database –

SQL>SELECT PLATFORM_NAME

FROM V$DATABASE;

PLATFORM_NAME

——————————

Linux IA (32-bit)

– Check the endian format of current platform

– Note that ENDIAN format is little

SQL>SELECT PLATFORM_ID, PLATFORM_NAME, ENDIAN_FORMAT

FROM V$TRANSPORTABLE_PLATFORM

WHERE UPPER(PLATFORM_NAME) LIKE ‘%LINUX IA%';

PLATFORM_ID PLATFORM_NAME ENDIAN_FORMAT

———– —————————— ————–

10 Linux IA (32-bit) Little

11 Linux IA (64-bit) Little

– Check that endian format of the target platform i.e. Windows IA(32-bit) is also little

– Hence database can be transported

SQL>SELECT PLATFORM_ID, PLATFORM_NAME, ENDIAN_FORMAT

FROM V$TRANSPORTABLE_PLATFORM

WHERE UPPER(PLATFORM_NAME) LIKE ‘%MICROSOFT WINDOWS%';

PLATFORM_ID PLATFORM_NAME ENDIAN_FORMAT

———– —————————— ————–

7 Microsoft Windows IA (32-bit) Little

8 Microsoft Windows IA (64-bit) Little

12 Microsoft Windows x86 64-bit Little

If the target platform does not appear in the output from V$TRANSPORTABLE_PLATFORM, then the database cannot be migrated using TDB.

— Start the database in READ ONLY mode

SQL> shutdown immediate;

startup mount;

alter database open read only;

— Verify the database is ready for migration

SQL>

SET SERVEROUTPUT ON;

DECLARE

db_ready BOOLEAN;

BEGIN

db_ready :=

DBMS_TDB.CHECK_DB(‘Microsoft Windows IA (32-bit)’,DBMS_TDB.SKIP_READONLY);

if db_ready

then

dbms_output.put_line(‘YES your database can be transported to Windows

platform’);

else

dbms_output.put_line(‘NO your database can not be transported to WIndows

Platform’);

end if;

end;

– Make a folder to hold converted datafiles

$mkdir /home/oracle/convertdb

— Run the RMAN CONVERT DATABASE Command

The CONVERT DATABASE command specified in the example creates

– a transport script named /home/oracle/convertdb/transport.sql which contains SQL statements used to create the new database on the destination platform,

– a PFILE initnewdb.ora in file /home/oracle/convertdb for use with the new database on the destination platform, containing settings used from the source database. Several entries at the top of the PFILE should be edited when the database is moved to the destination platform

– a copy of all data files in the =/home/oracle/convertdb directory in the format of the target platform ‘Microsoft Windows IA (32-bit)

RMAN>CONVERT DATABASE

NEW DATABASE ‘newdb’

TRANSPORT SCRIPT ‘/home/oracle/convertdb/transport.sql’

TO PLATFORM ‘Microsoft Windows IA (32-bit)’

FORMAT ‘/home/oracle/convertdb/newdb’

DB_FILE_NAME_CONVERT ‘/u01/app/oracle/oradata/linuxdb/’ ‘/home/oracle/convertdb/';

Starting conversion at source at 18-NOV-12

….

channel ORA_DISK_1: starting datafile conversion

input datafile file number=00001 name=/u01/app/oracle/oradata/linuxdb/system01.dbf

converted datafile=/home/oracle/convertdb/system01.dbf

…

input datafile file number=00004 name=/u01/app/oracle/oradata/linuxdb/users01.dbf

converted datafile=/home/oracle/convertdb/users01.dbf

channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:00:01

Edit init.ora file /home/oracle/convertdb/init_newdb.ora. This PFILE will be used to create

the database on the target platform

Run SQL script /home/oracle/convertdb/transport.sql on the target platform to create

database

To recompile all PL/SQL modules, run utlirp.sql and utlrp.sql on the target platform

To change the internal database identifier, use DBNEWID Utility

Finished conversion at source at 18-NOV-12

– Review and Edit the PFILE to change any settings for the destination database.

$vi /home/oracle/convertdb/init_newdb.ora

# Please change the values of the following parameters:

control_files = “c:\app\administrator\oradata\newdb\control01.ctl”

db_recovery_file_dest=”c:\app\administrator\flash_recovery_area”

audit_file_dest = “c:\app\administrator\admin\newdb\adump”

– Review and edit the transport script transport.sql

$vi /home/oracle/convertdb/transport.sql

– Modify the following :

STARTUP NOMOUNT PFILE=’C:\APP\ADMINISTRATOR\PRODUCT\11.2.0\DBHOME_1\DBS

\initnewdb.ora';

LOGFILE

GROUP 1 ‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\REDO01.LOG’ SIZE 50M BLOCKSIZE 512,

GROUP 2 ‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\REDO02.LOG’ SIZE 50M BLOCKSIZE 512,

GROUP 3 ‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\REDO03.LOG’ SIZE 50M BLOCKSIZE 512

DATAFILE

‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\system01.dbf’,

‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\sysaux01.dbf’,

‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\undotbs01.dbf’,

‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\users01.dbf’,

‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\example01.dbf’

ALTER TABLESPACE TEMP

ADD TEMPFILE

‘C:\APP\ADMINISTRATOR\ORADATA\NEWDB\TEMP01.DBF’

SIZE 20971520 AUTOEXTEND ON NEXT 655360 MAXSIZE 32767M;

STARTUP UPGRADE PFILE=’C:\APP\ADMINISTRATOR\PRODUCT\11.2.0\DBHOME_1\DBS\initnewdb.ora';

@@ ?\rdbms\admin\utlirp.sql

SHUTDOWN IMMEDIATE

STARTUP PFILE=’C:\APP\ADMINISTRATOR\PRODUCT\11.2.0\DBHOME_1\DBS\initnewdb.ora';

– Make folders on target windows host

c:\>mkdir c:\app\administrator\oradata\newdb

mkdir c:\app\administrator\admin\newdb\adump”

– Copy all the files in /home/oracle/convertdb folder to c:\app\administrator\oradata\newdb

i.e. datafiles

Pfile

transport script

– Move PFILE to $ORACLE_HOME/database i.e. c:\app\administrator\product

\11.2.0\dbhome_1\dbs

C:\>copy c:\app\administrator\oradata\newdb\init_newdb.ora c:\app\administrator\product

\11.2.0\dbhome_1\dbs\initnewdb.ora

– create a service for newdb on windows

c:\>oradim -new -sid newdb

– Execute the transport script in SQL*Plus to create the new database on the destination host.

c:\>set oracle_sid=newdb

sqlplus / as sysdba

SQL>@c:\app\administrator\oradata\newdb\transport.sql

– check that database has been transported —

SQL>select * from hr.employees;

——————————————————————————-

Related links:

HOME

11G R2 RMAN INDEX

DUPLICATE DATABASE USING BACKUP WITH CONNECTION TO TARGET DATABASE

DUPLICTAE DATABASE WITHOUT CONNECTION TO TARGET DATABASE

DUPLICATE DATABASE WITH CONNECTION TO RECOVERY CATALOG

ORACLE 11G :ACTIVE DATABASE DUPLICATION

———————–

11g R2 RAC: NODE EVICTION DUE TO MISSING NETWORK HEARTBEAT

November 17, 201211g R2 RAC, UncategorizedAnju Garg

In this post, I will demonstrate node eviction due to missing netsork heartbeat i.e. a node will be evicted from the cluster, if it can’t communicate wioth other nodes in the cluster. To simulate it, I will stop private network on one of the nodes and then scan alert logs of the surviving nodes.

Current scenario:

No. of nodes in the cluster : 3

Names of the nodes : host01, host02, host03

Name of the cluster database : orcl

I will stop PVT. network service on host03 so that it is evicted.

– Find out the pvt network name

[root@host03 ~]# oifcfg getif

eth0 192.9.201.0 global public

eth1 10.0.0.0 global cluster_interconnect

– Stop pvt. network service on host03 so that it can’t communicate with host01 and host02 and will be evicted.

[root@host03 ~]# ifdown eth1

——————-

OCSSD log of host03

——————–

It can be seen that CSSD process of host03 can’t communicate with host01 and host02

at 09:43:52

Hence votedisk timeouot is set to Short Disk Time OUT (SDTO) = 27000 ms (27 secs)

2012-11-19 09:43:52.714: [ CSSD][843736976]clssnmPollingThread: node host01 (1) at 50% heartbeat fatal, removal in 14.880 seconds

2012-11-19 09:43:52.714: [ CSSD][843736976]clssnmPollingThread: node host01 (1) is impending reconfig, flag 132108, misstime 15120

2012-11-19 09:43:52.714: [ CSSD][843736976]clssnmPollingThread: node host02 (2) at 50% heartbeat fatal, removal in 14.640 seconds

2012-11-19 09:43:52.714: [ CSSD][843736976]clssnmPollingThread: node host02 (2) is impending reconfig, flag 132108, misstime 15360

2012-11-19 09:43:52.714: [ CSSD][843736976]clssnmPollingThread: local diskTimeout set to 27000 ms, remote disk timeout set to 27000, impending reconfig status(1)

2012-11-19 09:43:52.927: [ CSSD][2833247120]clssnmSendingThread: sending status msg to all nodes

———————–

– Alert log of host03

———————–

– At 09:43:52, CSSD process host03 identifies that it can’t communicate with CSSD on host02 and host03

[cssd(5124)]CRS-1612:Network communication with node host01 (1) missing for 50% of timeout interval. Removal of this node from cluster in 14.880 seconds

2012-11-19 09:43:52.714

[cssd(5124)]CRS-1612:Network communication with node host02 (2) missing for 50% of timeout interval. Removal of this node from cluster in 14.640 seconds

2012-11-19 09:44:01.880

[cssd(5124)]CRS-1611:Network communication with node host01 (1) missing for 75% of timeout interval. Removal of this node from cluster in 6.790 seconds

2012-11-19 09:44:01.880

[cssd(5124)]CRS-1611:Network communication with node host02 (2) missing for 75% of timeout interval. Removal of this node from cluster in 6.550 seconds

2012-11-19 09:44:06.536

[cssd(5124)]CRS-1610:Network communication with node host01 (1) missing for 90% of timeout interval. Removal of this node from cluster in 2.780 seconds

2012-11-19 09:44:06.536

[cssd(5124)]CRS-1610:Network communication with node host02 (2) missing for 90% of timeout interval. Removal of this node from cluster in 2.540 seconds

2012-11-19 09:44:09.599

– At 09:44:16, CSSD process of host03 reboots the node to preserve cluster integrity

[cssd(5124)]CRS-1609:This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /u01/app/11.2.0/grid/log/host03/cssd/ocssd.log.

2012-11-19 09:44:16.697

[/u01/app/11.2.0/grid/bin/orarootagent.bin(5713)]CRS-5822:Agent ‘/u01/app/11.2.0/grid/bin/orarootagent_root’ disconnected from server. Details at (:CRSAGF00117:) in /u01/app/11.2.0/grid/log/host03/agent/crsd/orarootagent_root/orarootagent_root.log.

2012-11-19 09:44:16.193

[ctssd(5285)]CRS-2402:The Cluster Time Synchronization Service aborted on host host03. Details at (:ctsselect_mmg5_1: in /u01/app/11.2.0/grid/log/host03/ctssd/octssd.log.

2012-11-19 09:44:21.177

——————–

Ocssd log of host01

——————–

– At 09:43:53, CSSD process of host01 identifies that it can’tommunicate with CSSD on host03

2012-11-19 09:43:53.340: [ CSSD][841635728]clssnmPollingThread: node host03 (3) at 50% heartbeat fatal, removal in 14.500 seconds

2012-11-19 09:43:53.340: [ CSSD][841635728]clssnmPollingThread: node host03 (3) is impending reconfig, flag 132110, misstime 15500

2012-11-19 09:43:53.340: [ CSSD][841635728]clssnmPollingThread: local diskTimeout set to 27000 ms, remote disk timeout set to 27000, impending reconfig status(1)

——————-

Alert log of host01

——————-

– At 09:44:01, alert log of host01 is updated regarding communication failure with host03

[cssd(5308)]CRS-1612:Network communication with node host03 (3) missing for 50% of timeout interval. Removal of this node from cluster in 14.500 seconds

2012-11-19 09:44:01.695

[cssd(5308)]CRS-1611:Network communication with node host03 (3) missing for 75% of timeout interval. Removal of this node from cluster in 7.450 seconds

2012-11-19 09:44:07.666

[cssd(5308)]CRS-1610:Network communication with node host03 (3) missing for 90% of timeout interval. Removal of this node from cluster in 2.440 seconds

2012-11-19 09:44:10.606

[cssd(5308)]CRS-1607:Node host03 is being evicted in cluster incarnation 32819913; details at (:CSSNM00007:) in /u01/app/11.2.0/grid/log/host01/cssd/ocssd.log.

2012-11-19 09:44:24.705

– At 09:44:24, OHASD process on host01 receives reboot message from host03

[ohasd(4941)]CRS-8011:reboot advisory message from host: host03, component: ag050107, with time stamp: L-2012-11-19-09:44:24.373

[ohasd(4941)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS

2012-11-19 09:44:24.705

[ohasd(4941)]CRS-8011:reboot advisory message from host: host03, component: mo050107, with time stamp: L-2012-11-19-09:44:24.376

[ohasd(4941)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS

2012-11-19 09:44:46.379

[cssd(5308)]CRS-1601:CSSD Reconfiguration complete. Active nodes are host01 host02 .

——————-

OCSSD log of host02

——————–

– At 09:43:52, CSSD process of host02 identifies communication failure with host03

2012-11-19 09:43:52.385: [ CSSD][841635728]clssnmPollingThread: node host03 (3) at 50% heartbeat fatal, removal in 14.950 seconds

2012-11-19 09:43:52.386: [ CSSD][841635728]clssnmPollingThread: node host03 (3) is impending reconfig, flag 394254, misstime 15050

2012-11-19 09:43:52.386: [ CSSD][841635728]clssnmPollingThread: local diskTimeout set to 27000 ms, remote disk timeout set to 27000, impending reconfig status(1)

2012-11-19 09:43:52.733: [ CSSD][810166160]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:ASMDISK01 sched delay 970 > margin 750 cur_ms 18331974 lastalive 18331004

——————–

Alert log of host02

———————

– At 09:44:01 (same as host01), alert log of host02 is updated regarding communication failure with host03

[cssd(5284)]CRS-1612:Network communication with node host03 (3) missing for 50% of timeout interval. Removal of this node from cluster in 14.950 seconds

2012-11-19 09:44:01.971

[cssd(5284)]CRS-1611:Network communication with node host03 (3) missing for 75% of timeout interval. Removal of this node from cluster in 6.930 seconds

2012-11-19 09:44:06.750

[cssd(5284)]CRS-1610:Network communication with node host03 (3) missing for 90% of timeout interval. Removal of this node from cluster in 2.920 seconds

2012-11-19 09:44:24.520

– At 09:44:24 (same as host01), OHASD process on host01 receives reboot message from host03

[ohasd(4929)]CRS-8011:reboot advisory message from host: host03, component: ag050107, with time stamp: L-2012-11-19-09:44:24.373

[ohasd(4929)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS

2012-11-19 09:44:24.520

[ohasd(4929)]CRS-8011:reboot advisory message from host: host03, component: mo050107, with time stamp: L-2012-11-19-09:44:24.376

[ohasd(4929)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS

2012-11-19 09:44:46.073

[cssd(5284)]CRS-1601:CSSD Reconfiguration complete. Active nodes are host01 host02 .

References:

http://www.unbreakablecloud.com/wordpress/2010/11/02/understanding-cluster-node-eviction/

————————————————————————————–

ORACLE IN ACTION

Let's do it simply…

Monthly Archives: November 2012

11g R2 RAC : AUTOLOCATE BACKUPS

Like this:

11g R2 RAC: HOW TO FIND THE RESOURCE MASTER?

Like this:

11G R2 RAC: HOW TO IDENTIFY THE MASTER NODE IN RAC

Like this:

DUPLICATE DATABASE WITH CONNECTION TO RECOVERY CATALOG

Like this:

11G R2 RAC: GPNP PROFILE DEMYSTIFIED

Like this:

11g R2 RMAN: TRANSPORT DATABASE FROM LINUX TO WINDOWS

Like this:

11g R2 RAC: NODE EVICTION DUE TO MISSING NETWORK HEARTBEAT

Like this:

Follow ORACLE IN ACTION

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Follow ORACLE IN ACTION