Monthly Archives: November 2012

11g R2 RAC : DELETE A NODE

11g R2 RAC has extremely simplified the process of deleing a node to the cluster as compared to 10g RAC or 11g R1 RAC.  To know the  procedure to delete a node in 10g RAC or 11g R1 RAC pls click here.
Now, after the introduction of SCAN and GPNP, we need to follow very simple steps.
Current scenario:
I have three nodes in the cluster presently.
Host names :
- host01.example.com
- host02.example.com
-  host03.example.com
Node to be deleted : 
- host03.example.com
Procedure:
– Login as root on the node(s) to be deleted(host03) navigate to  <gridhome>/crs/install
 
[root@host03]#cd /u01/app/11.2.0/grid/crs/install
 
– Disable oracle clusterware applications and applications  on the node (host03) 
 
[root@host03]#./rootcrs.pl -deconfig  -force
 
– From a node that will remain(host01) , delete the node
 
root@host01]#crsctl delete node -n host03
 
– on a node that will remain , login as grid user, navigate to gridhome/oui/bin and update node list
   
[grid@host01]$cd /u01/app/11.2.0/grid/oui/bin
 
 [grid@host0$]./runInstaller -updateNodeList ORACLE_HOME=/u01/app/11.2.0/grid “CLUSTER_NODES={host01,host02}”
 
 
– check that only nodes 1 and 2 remain
 
$crsctl stat res -t
 
References:

11g R2 RAC: ADD A NODE

11g R2 RAC has extremely simplified the process of adding a node to the cluster as compared to 10g RAC or 11g R1 RAC.  To know the  procedure to add a node in 10g RAC or 11g R1 RAC pls visit click here.
Now, after the introduction of SCAN and GPNP, we need to follow very simple steps.
Current scenario:
I have two nodes in the cluster presently.
Host names :
- host01.example.com
- host02.example.com
Node to be added : 
- host03.example.com
Procedure:
————————————
 Prepare the machine for third node
————————————
 — Set kernel parameters
 — Install required rpm’s
 — Create users/groups
  — configure oracleasm
root@host03#oracleasm configure -i
                       oracleasm exit
                       oracleasm init
                       oracleasm scandisks
                       oracleasm listdisks
              all asm disks will be listed
– Configure ssh connectivity for grid user among all 3 nodes –
   –  On  node3 as grid user
[grid@host03 .ssh]$ssh-keygen -t rsa
 
                             ssh-keygen -t dsa
 
                             cd /home/grid/.ssh
                              cat *.pub > host03
 
                             scp host03 host01:/home/grid/.ssh/
[grid@host03 .ssh] $ssh host01 
                           – Enter password
 [grid@host01 ~]$cd /home/grid/.ssh
                          
                          cat host03 >> authorized_keys
 
                         scp authorized_keys host02:/home/grid/.ssh/
                         scp authorized_keys host03:/home/grid/.ssh/
– Test ssh connectivity on all 3 nodes as grid user –
– run following  on all 3 nodes twice as grid user–
echo ssh host01 hostname >> a.sh
echo ssh host02 hostname >> a.sh
echo ssh host03 hostname >> a.sh
echo ssh host01-priv hostname >> a.sh
echo ssh host02-priv hostname >> a.sh
echo ssh host03-priv hostname  >> a.sh
 
chmod +x a.sh
 
./a.sh
– Run cluster verify to check that host03 can be added as node –
grid host01# cluvfy stage -pre crsinst -n host03 -verbose
– if time synchronization problem, restart ntpd service on each node
– Error grid is not a member of dba group – ignore
grid@host01 ~]$. oraenv –+ASM1
[grid@host01 ~]$ cd /u01/app/11.2.0/grid/oui/bin
– Add node 
[grid@host01 bin]$./addNode.sh -silent “CLUSTER_NEW_NODES={host03}” “CLUSTER_NEW_VIRTUAL_HOSTNAMES={host03-vip}”
– Execute oraInstroot.sh and root.sh on node3  as root –
[root@host03]#/u01/app/oraInventory/oraInstroot.sh
                         /u01/app/11.2.0/grid/root.sh
– check from host01 that node has been added –
host01 grid > crsctl stat res -t
– Start any resources if they are not up already –
host01 grid > crsctl start resource <resource name>

After you have added a node, in case you want to clone database home on that node, please click here.

References:

———————–

11g R2 RAC: CLONE DATABASE HOME

In this article, I am going to discuss the procedure to clone a database home in RAC.
CURRENT SCENARIO:
  • - Two node RAC setup.
  • - Both the current nodes have non shared database home.
  • - Current two node setup is not able to handle increased workload.
  • - Want to add third node to the setup.
Need to
  •   . Install oracle clusterware on the third node
  •   . Install database software on the third node (Since database home is non shared).
Here is the detailed procedure which needs to be implemented:
————————————–
Preparing to Clone Oracle RAC
————————————–
On one of the existing nodes (node1 say):
Step 1   Create a backup of the source home
Create a copy of the Oracle RAC home. You will use this file to copy the Oracle RAC home to each node in the cluster
When creating the backup (tar) file, the best practice is to include the release number in the name of the file. For example:
– Go to ORACLE_HOME –
[root@node1 root]#  cd /u01/app/oracle/product/11.2.0/dbhome_1
– create the backup(tar) file in /home/oracle with name db1120.tgz
[root@node1 dbhome_1]#  tar -zcvf /home/oracle/db1120.tgz .
Step 2  Install and start Oracle Clusterware (Use addnode.sh to add new node in existing cluster)
Before you can use cloning to create a new Oracle RAC home, you must first install and start Oracle Clusterware on the node on which you want to copy a cloned Oracle RAC home. In other words, you configure an Oracle RAC home that you cloned from a source cluster onto the nodes in a target cluster in the same order that you installed the Oracle Clusterware and Oracle RAC software components on the original nodes.
————————————————
Deploying Oracle RAC Database Homes
————————————————
Deploying the Oracle RAC database home to a cluster is a multiple-step process.
This section provides step-by-step instructions that describe how to:
  1.  Prepare the new cluster node
  2.  Deploy the Oracle RAC database softwar
  3.  Run the clone.pl script on each node
  4.  Run the $ORACLE_HOME/root.sh script on each node
Step 1   Prepare the new cluster nodes
Perform the Oracle RAC preinstallation steps, including such things as:
  • ·         Specify the kernel parameters.
  • ·         Use short, nondomain-qualified names for all names in the Hosts file.
  • ·         Verify that you can ping the public and interconnect names.
  • ·         Ensure Oracle Clusterware is active.
  • ·         Ensure that Oracle ASM is active and there at least one Oracle ASM disk group exists and is mounted.
See your platform-specific Oracle RAC installation guide for a complete preinstallation checklist.
Step 2   Deploy the Oracle RAC database software on third node
To deploy the Oracle RAC software, you need to:
1.  Restore the Oracle home to the third node.
    [root@node1 root]# scp /home/oracle/db1120.tgz node3:/home/oracle/db1120.tgz
2.  create the directory for oracle home
     [root@node3 root]# mkdir -p /u01/app/oracle/product/11.2.0/dbhome_1
3.  [root@node3 root]# cd /u01/app/oracle/product/11.2.0/dbhome_1
4.  Extract the contents of the tar file
     [root@node3 dbhome_1]# tar -zxvf /home/oracle/db1120.tgz
When providing the home location and path_name, the home location can be in the same directory path or in a different directory path from the source home that you used to create the tar.
5.  Change the ownership of all files to the oracle and oinstall group. For example:
      [root@node3 dbhome_1]# chown -R oracle:oinstall /u01/app/oracle/product/11.2.0/dbhome_1
Step 3 Create a script file start.sh as oracle user on node3 in folder $ORACLE_HOME/clone/bin with following environment variables.
[oracle@node3 dbhome_1]$cd  $ORACLE_HOME/clone/bin
                                             vi start.sh
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1
cd $ORACLE_HOME/clone
THISNODE=`hostname -s`
E01=ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1
E02=ORACLE_HOME_NAME=OraDBRAC
E03=ORACLE_BASE=/u01/app/oracle
C01=”-O’CLUSTER_NODES={node1, node2, node3}'”
C02=”-O’LOCAL_NODE=$THISNODE'”
perl $ORACLE_HOME/clone/bin/clone.pl $E01 $E02 $E03 $C01 $C02
THISNODE should be set to the name of the node to be added e.g. node3
Step 4 Run start.sh script as oracle user on node3(owner of database home) from one of existing cluster node where we have database binaries
. This script will copy database binaries from existing node to the node specified in THISNODE variable as well as update inventories on all cluster nodes.
oracle@node3 bin]$chmod u+x start.sh
                               ./start.she
Step 5   Run the $ORACLE_HOME/root.sh script on each node

- check the contents of the inventory.xml file, we will see that it has information about the 11g Oracle Home 

[oracle@node3 dbhome_1]$cd /u01/app/oracle/oraInventory/ContentsXML
                      vi inventory.xml
 
- check the contents of the inventory.xml file on the other nodes. If entry for newly added node  for  database home does not exist, add it manually
 
[oracle@node1 dbhome_1]$vi /u01/app/oracle/oraInventory/ContentsXML/inventory.xml
[oracle@node2 dbhome_1]$vi /u01/app/oracle/oraInventory/ContentsXML/inventory.xml
 
                           ———–

 

Related links:  

 

                                                   ————–
                                                          —————–

11g R2 RAC : NODE EVICTION DUE TO MEMBER KILL ESCALATION

    If the Oracle Clusterware itself is working perfectly but one of the RAC instances is hanging , the database LMON process will request a member kill escalation and ask the CSS process to remove the hanging  database instance from the cluster.
The following example will demonstrate it in a cluster consisting of two nodes:
SQL> select instance_name, host_name from gv$instance;
SQL> col host_name for a20
          select instance_name, host_name from gv$instance;
INSTANCE_NAME    HOST_NAME
—————- ——————–
orcl1            host01.example.com
orcl2            host02.example.com
-  On host02 server  stop the execution of all rdbms processes (by sending the STOP signal)
– Find out current database processes
[root@host02 ~]#  ps -ef | grep ora_ | grep orcl2
oracle    6215     1  0 11:20 ?        00:00:00 ora_pmon_orcl2
oracle    6217     1  0 11:20 ?        00:00:00 ora_vktm_orcl2
oracle    6221     1  0 11:20 ?        00:00:00 ora_gen0_orcl2
oracle    6223     1  0 11:20 ?        00:00:00 ora_diag_orcl2
oracle    6225     1  0 11:20 ?        00:00:00 ora_dbrm_orcl2
oracle    6227     1  0 11:20 ?        00:00:00 ora_ping_orcl2
oracle    6229     1  0 11:20 ?        00:00:00 ora_psp0_orcl2
oracle    6231     1  0 11:20 ?        00:00:00 ora_acms_orcl2
oracle    6233     1  0 11:20 ?        00:00:00 ora_dia0_orcl2
oracle    6235     1  0 11:20 ?        00:00:00 ora_lmon_orcl2
oracle    6237     1  0 11:20 ?        00:00:02 ora_lmd0_orcl2
……
– stop the execution of all rdbms processes (by sending the STOP signal)
[root@host02 ~]#  ps -ef | grep ora_ | grep orcl2 | awk ‘{print $2}’ | while read PID
                               do
                               kill -STOP $PID
                               done
–. From the client point of view the Real Application Cluster database is hanging on both nodes. No queries or DMLs are possible. Try to execute a query. The query will hang.
SQL> select instance_name, host_name from gv$instance;
– no output, query hangs …
– . Due to missing heartbeats the healthy RAC instance on node host01  will remove the hanging RAC instance by requesting a member kill escalation.
– Check the  database alert log file on host01 : LMS process issues a request to CSSD to reboot the node.
   The node is evicted and instance is restarted after node joins the cluster.
[root@host01 trace]# tailf /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/alert_orcl1.log
LMS0 (ospid: 31771) has detected no messaging activity from instance 2
LMS0 (ospid: 31771) issues an IMR to resolve the situation
Please check LMS0 trace file for more detail.
Fri Nov 09 11:15:04 2012
Remote instance kill is issued with system inc 30
Remote instance kill map (size 1) : 2
LMON received an instance eviction notification from instance 1
The instance eviction reason is 0x20000000
The instance eviction map is 2
Fri Nov 09 11:15:13 2012
IPC Send timeout detected. Sender: ospid 6308 [oracle@host01.example.com (PZ97)]
Receiver: inst 2 binc 429420846 ospid 6251
Waiting for instances to leave:
2
Reconfiguration started (old inc 4, new inc 8)
List of instances:
 1 (myinst: 1)
 …..  Recovery of instance 2 starts
Global Resource Directory frozen
….
All grantable enqueues granted
 Post SMON to start 1st pass IR
—-
Instance recovery: looking for dead threads
Beginning instance recovery of 1 threads
Started redo scan
IPC Send timeout to 2.0 inc 4 for msg type 12 from opid 42
Completed redo scan
 read 93 KB redo, 55 data blocks need recovery
Started redo application at
 Thread 2: logseq 9, block 42
Recovery of Online Redo Log: Thread 2 Group 3 Seq 9 Reading mem 0
  Mem# 0: +DATA/orcl/onlinelog/group_3.266.798828557
  Mem# 1: +FRA/orcl/onlinelog/group_3.259.798828561
Completed redo application of 0.05MB
Completed instance recovery at
 Thread 2: logseq 9, block 228, scn 1069404
 52 data blocks read, 90 data blocks written, 93 redo k-bytes read
Thread 2 advanced to log sequence 10 (thread recovery)
Fri Nov 09 12:18:55 2012
….
— Check the cluster clusterware alert log of host01 –
– The node is evicted and rebooted to join the cluster
[grid@host01 host01]$ tailf /u01/app/11.2.0/grid/log/host01/alerthost01.log
[cssd(14493)]CRS-1607:Node host02 is being evicted in cluster incarnation 247848838; details at (:CSSNM00007:) in
/u01/app/11.2.0/grid/log/host01/cssd/ocssd.log.
2012-11-09 11:15:56.140
[ohasd(12412)]CRS-8011:reboot advisory message from host: host02, component: mo103324, with time stamp: L-2012-11-09-
11:15:56.580
[ohasd(12412)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from
CSS
2012-11-09 11:16:17.365
[cssd(14493)]CRS-1601:CSSD Reconfiguration complete. Active nodes are host01 .
2012-11-09 11:16:17.400
[crsd(14820)]CRS-5504:Node down event reported for node ‘host02‘.
2
…… Node 2 joins the cluster
[cssd(14493)]CRS-1601:CSSD Reconfiguration complete. Active nodes are host01 host02 .
2012-11-09 12:18:52.713
[crsd(14820)]CRS-2772:Server ‘host02′ has been assigned to pool ‘Generic’.
2012-11-09 12:18:52.713
[crsd(14820)]CRS-2772:Server ‘host02′ has been assigned to pool ‘ora.orcl’.
7. After the node rejoins the cluster and the instance is restarted,   reexecute the query – it succeeds
SQL> conn sys/oracle@orcl as sysdba
          col host_name for a20
          select instance_name, host_name from gv$instance;
INSTANCE_NAME    HOST_NAME
—————- ——————–
orcl1            host01.example.com
orcl2            host02.example.com
——————————————————————————————————-
Related links:

                                                     ———————-

 

 

11g R2 RAC: NODE EVICTION DUE TO CSSDAGENT STOPPING

In addition to the ocssd.bin process which is responsible, among other things, for the network
and disk heartbeats, Oracle Clusterware 11g Release 2 uses two new monitoring processes
cssdagent and cssdmonitor , which run with the highest real-time scheduler priority and are also
able to fence a server.
– Find out PID for cssdagent
[root@host02 lastgasp]# ps -ef |grep cssd |grep -v grep
root      5085     1  0 09:45 ?        00:00:00 /u01/app/11.2.0/grid/bin/cssdmonitor
root      5106     1  0 09:45 ?        00:00:00 /u01/app/11.2.0/grid/bin/cssdagent
grid      5136     1  0 09:45 ?        00:00:02 /u01/app/11.2.0/grid/bin/ocssd.bin
– Find out the scheduling priority of cssdagent
[root@host02 lastgasp]# chrt -p 5106
pid 5106’s current scheduling policy: SCHED_RR
pid 5106’s current scheduling priority: 99
Since cssdagent and cssdmonitor have schedulilng priority of 99 stopping them can reset a server in case :
• there is some problem with the ocssd.bin process
• there is some problem with OS scheduler
. CPU starvation
• OS is locked up in a driver or hardware (e.g. I/O call)
Both of them are also associated with an undocumented timeout. In case the execution of the
processes stops for more than 28 sec., the node will be evicted.
– Let us stop the execution of  cssdagent for 40 secs
root@rac1 ~]# kill -STOP  5106; sleep 40; kill -CONT 5106
– check the alert log of host01 –
– Node2 is rebooted
[grid@host01 host01]$ tailf /u01/app/11.2.0/grid/log/host01/alerthost01.log
[ohasd(12412)]CRS-8011:reboot advisory message from host: host02, component: ag100946, with time stamp: L-2012-11-09-
10:21:28.040
[ohasd(12412)]CRS-8013:reboot advisory message text: Rebooting after limit 28100 exceeded; disk timeout 28100, network
timeout 27880, last heartbeat from CSSD at epoch seconds 352436647.013, 34280 milliseconds ago based on invariant clock
– Node 2 is rebooted and network connection with it breaks
value of 294678040
2012-11-09 10:21:45.671
[cssd(14493)]CRS-1612:Network communication with node host02 (2) missing for 50% of timeout interval.  Removal of this node
from cluster in 14.330 seconds
2012-11-09 10:21:53.923
[cssd(14493)]CRS-1611:Network communication with node host02 (2) missing for 75% of timeout interval.  Removal of this node
from cluster in 7.310 seconds
2012-11-09 10:21:59.845
[cssd(14493)]CRS-1610:Network communication with node host02 (2) missing for 90% of timeout interval.  Removal of this node
from cluster in 2.300 seconds
2012-11-09 10:22:02.587
[cssd(14493)]CRS-1632:Node host02 is being removed from the cluster in cluster incarnation 247848834
2012-11-09 10:22:02.717
[cssd(14493)]CRS-1601:CSSD Reconfiguration complete. Active nodes are host01 .
2012-11-09 10:22:02.748
[crsd(14820)]CRS-5504:Node down event reported for node ‘host02′.
2012-11-09 10:22:10.086
[crsd(14820)]CRS-2773:Server ‘host02′ has been removed from pool ‘Generic’.
2012-11-09 10:22:10.086
[crsd(14820)]CRS-2773:Server ‘host02′ has been removed from pool ‘ora.orcl’.
References:
————————————————————————————————————————————–

Related links:

Home

11G R2 RAC Index

Node Eviction Due To Missing Network Heartbeat
Node Eviction Due T0 Missing Disk Heartbeat
Node Eviction Due To Member Kill Escalatio

11g R2 RAC: Reboot-less Fencing With Missing Network Heartbeat

 

                                                          ——————- 

 

11g DATAGUARD SETUP USING ACTIVE DATAGUARD

In this blog we’ll see how to configure 11g Data Guard using active dataguard i.e. standby database will be created from running primary database . We do  not need to take backup of database or controlfile. Primary database dg01 on node1.oracle.com (192.9.201.192)

Standby database dg02 on node2.oracle.com (192.9.201.193)

Listeners :

primary : on port 1521 in grid home

on port 1522 in database home

standby : on port 1521 in grid home

on port 1522 in database home

TNS entries

primary : dg01  on node1.oracle.com on ports 1521, 1522

registered with both the listeners

dg02 on node2.oracle.com on port 1522

primary : dg02  on node2.oracle.com on ports 1521, 1522

registered with both the listeners

dg01 on node1.oracle.com on port 1522

Configuration on primary site:-

——————————————-

- As root assign

.  IP address 192.9.201.192

. hostname node1.oracle.com

#service network restart

-  Make entries in /etc/hosts

192.9.201.182 node1.oracle.com node1

192.168.2.102 node2.oracle.com node2

– Reboot machine (mandatory as hostname has been changed)

create dg01 database in archivelog mode , with sample schemas, with FRA default

Invoke netmgr from db home

  • - make service dg01 available on port 1522
  • - create a listener in database home on port 1522
  • - statically register dg01 with listener on port 1522
  • - create service for dg02 running on node2.oracle.com on port 1522
  • - delete orcl database

Configuration on standby site:-

——————————————-

- As root assign

. IP address 192.9.201.193

. hostname node2.oracle.com

#service network restart

Reboot machine (mandatory as hostname has been changed)

- Make entries in /etc/hosts

192.9.201.182 node1.oracle.com node1

192.9.201.193 node2.oracle.com node2

Test

#ping node1

ping node2

– Invoke netmgr from db home

  • - create service for dg01 running on node1.oracle.com on port 1522
  • - create service for dg02 running on node2.oracle.com on port 1522
  • - create a listener in database home on port 1522
  • - statically register dg02 with it

—————————–
On primary node01

—————————–

— Configure the database dg01 in archive log mode and enable the force logging.

SQL>SHUTDOWN IMMEDIATE;
    STARTUP MOUNT;
    ALTER DATABASE ARCHIVELOG;
    ALTER SYSTEM SET DB_RECOVERY_FILE_DEST_SIZE=6G;
    ALTER DATABASE FORCE LOGGING;
    select force_logging from v$database;
    ALTER DATABASE OPEN;

–  Configure standby redo log files to enable fast failover and real time apply. (Assume we have three redo log group)

– Find out no. and size of current redo log groups

SQL>select group#, bytes/1024/1024 MB   from v$log;

– Add standby log files –

Note: – Number of standby redo log group must be more than number of redo log group and size of the standby logfile must be same as the size of redo log file.

SQL>  ALTER DATABASE ADD STANDBY LOGFILE GROUP 4 '/u01/app/oracle/oradata/dg01/sredo04.log' size 50m;
      ALTER DATABASE ADD STANDBY LOGFILE GROUP 5 '/u01/app/oracle/oradata/dg01/sredo05.log' size 50m;
      ALTER DATABASE ADD STANDBY LOGFILE GROUP 6 '/u01/app/oracle/oradata/dg01/sredo06.log' size 50m;
      ALTER DATABASE ADD STANDBY LOGFILE GROUP 7 '/u01/app/oracle/oradata/dg01/sredo07.log' size 50m;

– check that standby logfiles have been created

SQL> col member for a30
           select group#, TYPE, MEMBER from v$logfile where type = 'STANDBY';

GROUP# TYPE    MEMBER

———- ——- ——————————

4 STANDBY /u01/app/oracle/oradata/dg01/sredo04.log

5 STANDBY /u01/app/oracle/oradata/dg01/sredo05.log

6 STANDBY /u01/app/oracle/oradata/dg01/sredo06.log

7 STANDBY /u01/app/oracle/oradata/dg01/sredo07.log

SQL> select group#, bytes/1024/1024 MB from v$standby_log;

GROUP#         MB

———- ———-

4         50

5         50

6         50

7         50

– Make a staging folder

oracle@node1$mkdir /home/oracle/stage

— CREATE PFILE FROM SPFILE.

PRI>create pfile='/home/oracle/stage/initdg01.ora' from spfile;

-  Now configure following parameters for primary database:-

– add following lines to initdg01.ora

oracle@node1$vi /home/oracle/stage/initdg01.ora

LOG_ARCHIVE_CONFIG=’DG_CONFIG=(dg01,dg02)’

DB_UNIQUE_NAME=’dg01′

SERVICE_NAMES=’dg01′

DB_FILE_NAME_CONVERT=’/u01/app/oracle/oradata/dg02′,’/u01/app/oracle/oradata/dg01′

LOG_FILE_NAME_CONVERT=’/u01/app/oracle/oradata/dg02′,’/u01/app/oracle/oradata/dg01′

LOG_ARCHIVE_DEST_1=’location=USE_DB_RECOVERY_FILE_DEST’

LOG_ARCHIVE_DEST_STATE_1=enable

LOG_ARCHIVE_DEST_2=’service=dg02 VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLES) DB_UNIQUE_NAME=dg02′

LOG_ARCHIVE_DEST_STATE_2=enable

STANDBY_FILE_MANAGEMENT=auto

FAL_CLIENT=’dg01′

FAL_SERVER=’dg02′

LOG_ARCHIVE_FORMAT=’%t_%s_%r.arc’

Note:

In case source and destination databases are ASM,  parameters db_file_name_convert and log_file_name_convert may be defined as follows:

db_file_name_convert = (“+DATA/dg01″,”+DATA/dg02″)

log_file_name_convert = (“+DATA/dg01″,”+DATA/dg02″, “+FRA/dg01″,”+FRA/dg02″)

– create spfile from pfile and start dg01 using new parameters

PRI>shu immediate;
    create spfile from pfile='/home/oracle/stage/initdg01.ora';
    startup;

—– Take the entire database backup on pre configured staging directory(optional with active dataguard)

RMAN> CONFIGURE CONTROLFILE AUTOBACKUP ON;
      BACKUP FORMAT '/home/oracle/stage/%U' DATABASE PLUS ARCHIVELOG;
      BACKUP FORMAT '/home/oracle/stage/%U' CURRENT CONTROLFILE FOR STANDBY;
      sql "alter system archive log current";

– create staging folder on node2

oracle@node2$mkdir /home/oracle/stage

—- Copy the following items from node1 to node2.

PASSWORD FILE

[oracle@node1 ~]$scp /u01/app/oracle/product/11.2.0/db_1/dbs/orapwdg01 node2:/u01/app/oracle/product/11.2.0/db_1/dbs/orapwdg02

PARAMETER FILE and rman backups(optional with active dataguard)

[oracle@node1 ~]$ scp /home/oracle/stage/* node2:/home/oracle/stage/

Note: – RMAN backup must be copied at the same directory on machine2

i.e. /home/oracle/stage

Configure net service for dg01 (primary database)  on node1 at port 1522

as well as for dg02 (standby database) on node2 at port 1522

set environment to dg01 and invoke netmgr

DG02 =

(DESCRIPTION =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = node2.oracle.com)(PORT = 1522))

)

(CONNECT_DATA =

(SERVICE_NAME = dg02)

)

)

DG01 =

(DESCRIPTION =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = node1.oracle.com)(PORT = 1522))

)

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = dg01)

)

)

– Create a listener in database home on port 1522 and statically register dg01 with it.
—————————-

On standby node02

—————————–

 - Edit Pfile for Standby Database:
oracle@node2$mv /home/oracle/stage/initdg01.ora /home/oracle/stage/initdg02.ora
             vi initdg02.ora

— Final pfile should look like this

dg02.__db_cache_size=440401920

dg02.__java_pool_size=4194304

dg02.__large_pool_size=4194304

dg02.__shared_pool_size=163577856

dg02.__streams_pool_size=0
*.audit_file_dest=’/u01/app/oracle/admin/dg02/adump’

*.control_files=’/u01/app/oracle/oradata/dg02/control01.ctl’

*.core_dump_dest=’/u01/app/oracle/admin/dg02/cdump’
*.db_block_size=8192

*.db_domain=”

*.db_file_multiblock_read_count=16

*.db_name=’dg01′

*.db_recovery_file_dest=’/u01/app/oracle/flash_recovery_area’

*.db_recovery_file_dest_size=6G

*.dispatchers='(PROTOCOL=TCP) (SERVICE=dg02XDB)’

*.job_queue_processes=10

*.log_archive_format=’%t_%s_%r.dbf’

*.open_cursors=300

*.pga_aggregate_target=204472320

*.processes=150

*.remote_login_passwordfile=’EXCLUSIVE’

*.sga_target=613416960

*.undo_management=’AUTO’

*.undo_tablespace=’UNDOTBS1′

*.user_dump_dest=’/u01/app/oracle/admin/dg02/udump’

LOG_ARCHIVE_CONFIG=’DG_CONFIG=(dg01,dg02)’
DB_UNIQUE_NAME=’dg02′

SERVICE_NAMES=’dg02′

DB_FILE_NAME_CONVERT=’/u01/app/oracle/oradata/dg02,’/u01/app/oracle/oradata/dg01′

LOG_FILE_NAME_CONVERT=’/u01/app/oracle/oradata/dg02′,’/u01/app/oracle/oradata/dg01′

LOG_ARCHIVE_DEST_1=’location=USE_DB_RECOVERY_FILE_DEST’

LOG_ARCHIVE_DEST_STATE_1=enable

LOG_ARCHIVE_DEST_2=’service=dg01 VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLES) DB_UNIQUE_NAME=DG01′
LOG_ARCHIVE_DEST_STATE_2=enable

STANDBY_FILE_MANAGEMENT=auto
FAL_CLIENT=’dg02′

FAL_SERVER=’dg01′

LOG_ARCHIVE_FORMAT=’%t_%s_%r.arc’
@node2: Create the required directory for standby database

[oracle@node2 ~]$ mkdir -p  /u01/app/oracle/admin/dg02/adump
                  mkdir -p  /u01/app/oracle/oradata/dg02/
                  mkdir -p  /u01/app/oracle/flash_recovery_area/dg02

@node2:– Configure net service for dg01 (primary database)  on node1 at port 1522

as well as for dg02 (standby database) on node2 at port 1522

– set oracle home to database home and start netmgr
DG02 =

(DESCRIPTION =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = node2.oracle.com)(PORT = 1522))

)

(CONNECT_DATA =

(SERVICE_NAME = dg02)

)

)

DG01 =

(DESCRIPTION =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = node1.oracle.com)(PORT = 1522))

)

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = dg01)

)

)

@node2 –create a listener on port 1522 in database home and statically dg02 service with it

@node2 : Start dg02 at nomount stage

 

[oracle@node2 ~]$ export ORACLE_SID=dg02
SBY> CREATE SPFILE FROM PFILE='/home/oracle/stage/initdg02.ora';
     STARTUP NOMOUNT ;
     EXIT

@node2: Now using RMAN duplicate command create standby database

[oracle@node2 ~]$ export ORACLE_SID=dg02
RMAN TARGET SYS/oracle@dg01
RMAN> CONNECT AUXILIARY SYS/oracle@dg02

      DUPLICATE TARGET DATABASE FOR STANDBY from active database;

– Switch logs on primary

PRI>alter system switch logfile;

– start redo apply on standby

SBY>alter database recover managed standby database using current logfile disconnect;

– Verify that standby database is performing correctly

SBY>select sequence#, first_time, next_time from v$archived_log order by sequence#;

– Perform a few log switches on primary

PRI>alter system switch logfile;
alter system switch logfile;
alter system switch logfile;
alter system switch logfile;

– On standby database verify that new redo data was received , archived and applied

SBY>select sequence#, first_time, next_time,applied from v$archived_log order by sequence#;

——————————

Related links:

Home
11g Dataguard Index

11g Dataguard: Automatic Block Media Recovery (BMR)

11g Dataguard: Automatic Client Failover

11g Dataguard: Automatic Job Failover

11g Dataguard: Flashback  Standby After  Resetlogs On Primary

ORA-12528: TNS:listener: all appropriate instances are blocking new connection

Recover Standby Datafile From Primary

Warning: ORA-16792: configurable property value is inconsistent with database setting

                                                   ————

HOW DOES 11G R2 CLUSTERWARE START ASM WHEN ASM SPFILE IS STORED ON ASM ITSELF?

Beginning with the version 11g Release 2, the ASM spfile is stored automatically in the first disk group created during Grid Infrastructure installation.
Since voting disk/OCR are stored on ASM, ASM needs to be started on the node. To startup ASM, its SPfile is needed. But SPFILE is again located on ASM diskgroup only.  How does clusterware resolve this issue?
- When a node of an Oracle Clusterware cluster restarts, OHASD is started by platform-specific means. OHASD accesses OLR (Oracle Local Registry) stored on the local file system to get the data needed to complete OHASD initialization
-  OHASD brings up GPNPD and CSSD. CSSD accesses the GPNP Profile stored on the local file system which contains the following vital bootstrap data;
a. ASM_DISKSTRING parameter (if specified) to locate the disks on which ASM disks are configured.
b. ASM SPFILE location : Name of the diskgroup containing ASM spfile
c. Location of  Voting Files : ASM

– CSSD scans the headers of all ASM disks ( as indicated in ASM_DISKSTRING in GPnP profile) to identify the disk containing the voting file.  Using the pointers in ASM disk headers, the Voting Files locations on ASM Disks are accessed by CSSD and CSSD is able to complete initialization and start or join an existing cluster.

To read the ASM spfile during the ASM instance startup, it is not necessary to open the disk group. All information necessary to access the data is stored in the device’s header. OHASD reads the header of the ASM disk containing ASM SPfile (as read from GPnP profile) and using the pointers in disk header, contents of ASM spfile are read. Thereafter, ASM instance is started.

–  With an ASM instance operating and its Diskgroups mounted, access to Clusterware’s OCR is available to CRSD.

–  OHASD starts CRSD with access to the OCR in an ASM Diskgroup.

–  Clusterware completes initialization and brings up other services under its control.

Demonstration :

In my environment, the ASM disk group DATA created with EXTERNAL  redundancy is used exclusively for ASM spfile, voting and OCR files:
- Let us read  gpnp profile to find out the location of ASM SPfile
[grid@host01 peer]$ cd /u01/app/11.2.0/grid/gpnp/host01/profiles/peer
                                  gpnptool getpval -asm_spf
Warning: some command line parameters were defaulted. Resulting command line:
         /u01/app/11.2.0/grid/bin/gpnptool.bin getpval -asm_spf -p=profile.xml -o-
+DATA/cluster01/asmparameterfile/registry.253.793721441
– Let us find out the disks in DATA diskgroup
[grid@host01 peer]$ asmcmd lsdsk -G DATA
Path
ORCL:ASMDISK01
ORCL:ASMDISK010
ORCL:ASMDISK02
ORCL:ASMDISK03
ORCL:ASMDISK04
ORCL:ASMDISK09
– Let us find out which ASM disk maps to which partition
   Note down major/minor device numbers of the disks in DATA diskgroup
[root@host01 ~]# ls -lr /dev/oracleasm/disks/*
brw-rw—- 1 grid asmadmin 8, 26 Nov  8 09:35 /dev/oracleasm/disks/ASMDISK09
brw-rw—- 1 grid asmadmin 8, 25 Nov  8 09:35 /dev/oracleasm/disks/ASMDISK08
brw-rw—- 1 grid asmadmin 8, 24 Nov  8 09:35 /dev/oracleasm/disks/ASMDISK07
brw-rw—- 1 grid asmadmin 8, 23 Nov  8 09:35 /dev/oracleasm/disks/ASMDISK06
brw-rw—- 1 grid asmadmin 8, 22 Nov  8 09:35 /dev/oracleasm/disks/ASMDISK05
brw-rw—- 1 grid asmadmin 8, 21 Nov  8 09:35 /dev/oracleasm/disks/ASMDISK04
brw-rw—- 1 grid asmadmin 8, 19 Nov  8 09:35 /dev/oracleasm/disks/ASMDISK03
brw-rw—- 1 grid asmadmin 8, 18 Nov  8 09:35 /dev/oracleasm/disks/ASMDISK02
brw-rw—- 1 grid asmadmin 8, 31 Nov  8 09:35 /dev/oracleasm/disks/ASMDISK014
brw-rw—- 1 grid asmadmin 8, 30 Nov  8 09:35 /dev/oracleasm/disks/ASMDISK013
brw-rw—- 1 grid asmadmin 8, 29 Nov  8 09:35 /dev/oracleasm/disks/ASMDISK012
brw-rw—- 1 grid asmadmin 8, 28 Nov  8 09:35 /dev/oracleasm/disks/ASMDISK011
brw-rw—- 1 grid asmadmin 8, 27 Nov  8 09:35 /dev/oracleasm/disks/ASMDISK010
brw-rw—- 1 grid asmadmin 8, 17 Nov  8 09:35 /dev/oracleasm/disks/ASMDISK01
- Let us find out the major/minor device numbers of various disk partitions
[root@host01 ~]# ls -lr /dev/sdb*
brw-r—– 1 root disk 8, 25 Nov  8 09:35 /dev/sdb9
brw-r—– 1 root disk 8, 24 Nov  8 09:35 /dev/sdb8
brw-r—– 1 root disk 8, 23 Nov  8 09:35 /dev/sdb7
brw-r—– 1 root disk 8, 22 Nov  8 09:35 /dev/sdb6
brw-r—– 1 root disk 8, 21 Nov  8 09:35 /dev/sdb5
brw-r—– 1 root disk 8, 20 Nov  8 09:35 /dev/sdb4
brw-r—– 1 root disk 8, 19 Nov  8 09:35 /dev/sdb3
brw-r—– 1 root disk 8, 18 Nov  8 09:35 /dev/sdb2
brw-r—– 1 root disk 8, 31 Nov  8 09:35 /dev/sdb15
brw-r—– 1 root disk 8, 30 Nov  8 09:35 /dev/sdb14
brw-r—– 1 root disk 8, 29 Nov  8 09:35 /dev/sdb13
brw-r—– 1 root disk 8, 28 Nov  8 09:35 /dev/sdb12
brw-r—– 1 root disk 8, 27 Nov  8 09:35 /dev/sdb11
brw-r—– 1 root disk 8, 26 Nov  8 09:35 /dev/sdb10
brw-r—– 1 root disk 8, 17 Nov  8 09:35 /dev/sdb1
brw-r—– 1 root disk 8, 16 Nov  8 09:35 /dev/sdb
Now we can find out the partitions mapping to various ASM disks by matching their
   major/minor device numbers
 ASMDISK01    8,17     /dev/sdb1
 ASMDISK02    8,18     /dev/sdb2
 ASMDISK03    8,19     /dev/sdb3
 ASMDISK04    8,21     /dev/sdb5
 ASMDISK09    8,26     /dev/sdb10
 ASMDISK10    8,27     /dev/sdb11
– Let’s scan the headers of those devices:
[root@host01 ~]#  kfed read /dev/sdb1 | grep -E ‘spf|ausize’
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000
[root@host01 ~]#  kfed read /dev/sdb2 | grep -E ‘spf|ausize’
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000
[root@host01 ~]#  kfed read /dev/sdb3 | grep -E ‘spf|ausize’
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                       16 ; 0x0f4: 0x00000010
kfdhdb.spfflg:                        1 ; 0x0f8: 0x00000001
[root@host01 ~]#  kfed read /dev/sdb5 | grep -E ‘spf|ausize’
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000
[root@host01 ~]#  kfed read /dev/sdb10 | grep -E ‘spf|ausize’
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000
[root@host01 ~]#  kfed read /dev/sdb11 | grep -E ‘spf|ausize’
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000
In the output above, we see that
     the device /dev/sdb3 contains a copy of the ASM spfile (spfflg=1).
     The ASM spfile location starts at the disk offset of 16 (spfile=16)
Considering the allocation unit size (kfdhdb.ausize = 1M), let’s dump the ASM spfile from the device:
[root@host01 ~]#  dd if=/dev/sdb3 of=spfileASM_Copy2.ora skip=16  bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.170611 seconds, 6.1 MB/s
[root@host01 ~]# strings spfileASM_Copy2.ora
+ASM1.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value
+ASM2.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value
+ASM3.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value
+ASM3.asm_diskgroups=’FRA’#Manual Mount
+ASM2.asm_diskgroups=’FRA’#Manual Mount
+ASM1.asm_diskgroups=’FRA’#Manual Mount
*.asm_power_limit=1
*.diagnostic_dest=’/u01/app/grid’
*.instance_type=’asm’
*.large_pool_size=12M
*.remote_login_passwordfile=’EXCLUSIVE’
a:O/
 The same technique is used to access the Clusterware voting files which are also stored in an ASM disk group. In this case, Clusterware does not need a running ASM instance to access the cluster voting files:
Let’s check the location of voting disk :
[grid@host01 peer]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
–  —–    —————–                ——— ———
 1. ONLINE   243ec3b2a3cf4fbbbfed6f20a1ef4319 (ORCL:ASMDISK01) [DATA]
Located 1 voting disk(s).
– Since above query shows that voting disk is stored on ASMDISK01 which maps to /dev/sdb1,
   we will scan the header of /dev/sdb1
[root@host01 ~]#  kfed read /dev/sdb1 | grep vf
kfdhdb.vfstart:                      96 ; 0x0ec: 0x00000060
kfdhdb.vfend:                       128 ; 0x0f0: 0x00000080
Here we can see that voting disk resides on /dev/sdb1 .
Related links:

Home

11G R2 RAC Index

11g R2 RAC: GPNP Profile Demystified

11g R2 RAC: Let’s Edit GPNP Profile

11g R2 RAC: Let’s Lose GPNP Profile

11g R2 RAC : OCR Demystified

11g R2 RAC : OLR  Demystified

11g R2 RAC : Repair OCR

11g R2 RAC : Voting Disk Demystified

Need For VIP In RAC

 

 

 
                                              ——————