Monthly Archives: January 2013

LET’S EDIT GPNP PROFILE

January 22, 201311g R2 RAC, UncategorizedAnju Garg

In 11g R2 RAC, Voting disk is stored on an ASM diskgroup. . But CSSD needs the voting files before ASM is online. At its startup, CSSD scans the device headers of all devices specified in the gpnp profile xml file, tag “DiscoveryString” which contains the same value as specified in asm_diskstring parameter for ASM instance.

If it can find at least more than half the number of total no. of voting files, the party takes place, otherwise, CSSD will cycle with appropriate error messages in $GRID_HOME/log/hostname/log/cssd/ocssd.log for each loop.

To verify it, I modified ASM_DISKSTRING to such a value that the ASM disk containing voting disk was not included and tried to restart crs on a node.. Looking up the CSSD logfile, I saw that the CSSD had trouble identifying its voting files.

Now, I had to change back the ASM disk_string parameter without having ASM running, and with no CSSD available, which is necessary to start it? How do we tell the CSSD, that’s running fairly in advance of ASM, to scan the right devices? I used gpnptool to edit the gpnp profile and restored the discovery string to the appropriate value. Now I could get crs running on my machine.

Overview:

– Move voting disk to a new diskgroup VOTE.

– Set ASM_DISKTRING to all the disks which are members of all the other diskgroups (DATA/FRA) using SQL

— stop and restart crs on the node –

– check that HAS services have started but rest of the services are not up

–Check the ocssd.log — Scans all the disks which are part of the discovery string but does not find voting disk

– Edit the gpnp profile to modify discovery string for asm

– Try to restart crs

– check that all the daemons have started and cluster servicves are up

Implementation:

– cretae a diskgroup with external redundancy to be used to store VD

– Move voting disk to the diskgroup vote – fails because

— diskgroup not mounted on all the nodes – mount it

— diskgroup compatibility < 11.2.0.0 – Modify it

[root@host01 ~]# crsctl replace votedisk +vote

Failed to create voting files on disk group vote.

Change to configuration failed, but was successfully rolled back.

CRS-4000: Command Replace failed, or completed with errors.

– Mount vote diskgroup on all the nodes

— Modify diskgroup compatibility to 11.2.0.0

— Move voting disk to the diskgroup vote – succeeds

[root@host01 ~]#  crsctl replace votedisk +vote

Successful addition of voting disk 443f1c60e16f4fa5bfbfeaae6b2f919d.

Successful deletion of voting disk 3d9da0d16baa4f10bf4e4b9b4aa688c6.

Successful deletion of voting disk 369621097c034f6dbf29a8cc97dc4bbc.

Successful deletion of voting disk 407b5d6588ea4f6fbf1503f7f6cc2951.

Successfully replaced voting disk group with +vote.

CRS-4266: Voting file(s) successfully replaced

– check that voting disk has been moved to vote diskgroup

[root@host01 ~]#  crsctl query css votedisk

## STATE File Universal Id File Name Disk group

– —– —————– ——— ———

1. ONLINE 443f1c60e16f4fa5bfbfeaae6b2f919d (ORCL:ASMDISK014) [VOTE]

Located 1 voting disk(s).

– Find out the disks in the other diskgroups – DATA and FRA

[grid@host01 ~]$ asmcmd lsdsk -G data

Path

ORCL:ASMDISK01

ORCL:ASMDISK010

ORCL:ASMDISK011

ORCL:ASMDISK012

[grid@host01 ~]$ asmcmd lsdsk -G fra

Path

ORCL:ASMDISK02

ORCL:ASMDISK03

– Set ASM_DISKTRING to all the disks which are members of DATA/FRA diskgroups


 SQL> sho parameter asm_diskstring

NAME TYPE VALUE

———————————— ———– ————

asm_diskstring string

SQL> alter system set asm_diskstring='ORCL:ASMDISK01',
   'ORCL:ASMDISK010','ORCL:ASMDISK011','ORCL:ASMDISK012',
   'ORCL:ASMDISK02','ORCL:ASMDISK03';

– check that discovery string in gpnp profile points to the disks as specified

[root@host01 ~]# cd /u01/app/11.2.0/grid/gpnp/host01/profiles/peer/ 
                         vi profile.xml

DiscoveryString=”ORCL:ASMDISK01,ORCL:ASMDISK010,ORCL:ASMDISK011,ORCL:ASMDISK012,ORCL:ASMDISK02,ORCL:ASMDISK03“

— stop and restart crs on the node –

]

[root@host01 ~]# crsctl stop crs 
  [root@host01 ~]# crsctl start  crs

– check that has has started but rest of the services are not up

[root@host01 ~]# crsctl check has

CRS-4638: Oracle High Availability Services is online

[root@host01 ~]# crsctl check crs

]CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4530: Communications failure contacting Cluster Synchronization Services daemon

CRS-4534: Cannot communicate with Event Manager

[root@host01 ~]# crsctl check css

CRS-4530: Communications failure contacting Cluster Synchronization Services daemon

[root@host01 ~]# crsctl stat res -t

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4000: Command Status failed, or completed with errors.

–Check the ocssd.log

– Scans all the disks which are part of the discovery string but does not find voting disk


 [root@host01 ~]# vi $ORACLE_HOME/log/host01/cssd/ocssd.log

2013-01-21 12:55:14.759: [ CLSF][2985733008]Read ASM header off dev:ORCL:ASMDISK01:0:0

2013-01-21 12:55:14.759: [ SKGFD][2985733008]Lib :ASM:/opt/oracle/extapi/32/asm/orcl/1/libasm.so: closing handle 0x91d0410 for disk :ORCL:ASMDISK01:

2013-01-21 12:55:14.759: [ CLSF][2985733008]Read ASM header off dev:ORCL:ASMDISK010:0:0

2013-01-21 12:55:14.759: [ SKGFD][2985733008]Lib :ASM:/opt/oracle/extapi/32/asm/orcl/1/libasm.so: closing handle 0x91d0c10 for disk :ORCL:ASMDISK010:

2013-01-21 12:55:14.759: [ CLSF][2985733008]Read ASM header off dev:ORCL:ASMDISK011:0:0

2013-01-21 12:55:14.759: [ SKGFD][2985733008]Lib :ASM:/opt/oracle/extapi/32/asm/orcl/1/libasm.so: closing handle 0x91d1508 for disk :ORCL:ASMDISK011:

2013-01-21 12:55:14.759: [ CLSF][2985733008]Read ASM header off dev:ORCL:ASMDISK012:0:0

2013-01-21 12:55:14.759: [ SKGFD][2985733008]Lib :ASM:/opt/oracle/extapi/32/asm/orcl/1/libasm.so: closing handle 0x91d1e00 for disk :ORCL:ASMDISK012:

2013-01-21 12:55:14.760: [ CLSF][2985733008]Read ASM header off dev:ORCL:ASMDISK02:0:0

2013-01-21 12:55:14.760: [ SKGFD][2985733008]Lib :ASM:/opt/oracle/extapi/32/asm/orcl/1/libasm.so: closing handle 0x91d26f8 for disk :ORCL:ASMDISK02:

2013-01-21 12:55:14.760: [ CLSF][2985733008]Read ASM header off dev:ORCL:ASMDISK03:0:0

2013-01-21 12:55:14.760: [ SKGFD][2985733008]Lib :ASM:/opt/oracle/extapi/32/asm/orcl/1/libasm.so: closing handle 0x91d2ff0 for disk :ORCL:ASMDISK03:

2013-01-21 12:55:14.760: [ CSSD][2985733008]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting

2013-01-21 12:55:14.760: [ CSSD][2985733008]clssnmvDiskVerify: Successful discovery of 0 disks

2013-01-21 12:55:14.760: [ CSSD][2985733008]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery

2013-01-21 12:55:14.760: [ CSSD][2985733008]clssnmvFindInitialConfigs: No voting files found

– Copy the existing gpnp profile to profile.bak and edit the copy to modify discovery string for asm

[root@host01 ~]# cd /u01/app/11.2.0/grid/gpnp/host01/profiles/peer/
[root@host01 peer]# cp profile.xml profile.bak

– remove the oracle signature from the file –

[root@host01 peer]# gpnptool unsign -p=profile.bak

Warning: some command line parameters were defaulted. Resulting command line:

/u01/app/11.2.0/grid/bin/gpnptool.bin unsign -p=profile.bak -o-

– change the DiscoveryString itself –

[root@host01 peer]# gpnptool edit -asm_dis='ORCL:*' -p=profile.bak -o=profile.bak -ovr

Resulting profile written to “profile.bak”.

Success.

– sign the profile xml file with the wallet (Notice: the path is only the directory to the wallet, NOT the wallet file itself) –

[root@host01 peer]# gpnptool sign -p=profile.bak -w=file:/u01/app/11.2.0/grid/gpnp/host01/wallets/peer/ -o=profile.new

Resulting profile written to “profile.new”.

Success.

– move the original profile.xml out of the way

[root@host01 peer]# mv profile.new profile.xml

– check that discovery string has been modified

[root@host01 peer]# vi profile.xml

DiscoveryString=”ORCL:*”

– Try to restart crs

[root@host01 peer]# ps -ef |grep d.bin

kill all the processes

[root@host01 peer]# crsctl start crs

– check that all the daemons have started and cluster servicves are up

[root@host01 peer]# ps -ef |grep d.bin
[root@host01 peer]#crsctl stat res -t

Conclusion:

To access voting disk, gpnp profile is read to find the location of voting disk : found as asm

The asm disks as per the discovery string in gpnp profile are scanned to search for voting disk.

– Note : GPNP profile on other nodes also contains erroneous discovery string.

Hence copy the profile from the current node to the other nodes also

References:

Oracle Grid Infrastructure: How to recover from a messed up ASM/CSSD diskstring

————————————————————————————————–

Related links:

HOME

11g R2 RAC INDEX

11G R2 RAC: GPNP PROFILE DEMYSTIFIED
11g R2 RAC : OCR DEMYSTIFIED
11g R2 RAC : OLR DEMYSTIFIED

11g R2 RAC : VOTING DISK DEMYSTIFIED

HOW DOES 11G R2 CLUSTERWARE START ASM WHEN ASM SPFILE IS STORED ON ASM ITSELF?

——————

11g R2 RAC : REAPIR ORACLE CLUSTR REGISTRY(OCR)

January 18, 201311g R2 RAC, UncategorizedAnju Garg

In this post, I will demonstrate how we can use ocrconfig -repair command to repair ocr configuration on a node which was not up when the configuration was modified on the other nodes.

Current scenario:

3 node cluster

Nodes: host01, host02, host03

Nodes host02 and host03 are up

Node host01 is down

OCR is stored on ASM diskgroup DATA

Overview:

- Store OCR on additionally on FRA diskgroup

- This information is modified in /etc/oracle/ocr.loc on nodes host02 and host03 which are up

- This information is not modified in /etc/oracle/ocr.loc on node host01 which is down.
– Startup Node host01
– Clusterware does not come up on host01
– Check alert log and crsd log on host01
– Repair OCR configuration on host01 so that /etc/oracle/ocr.loc on host01 gets updated
– STart clusterware on host01 – succeeds

Implementation:

- Store OCR on additionally on FRA diskgroup

[root@host02 ~]# ocrconfig -add +FRA

- Check that new OCR location is added in /etc/oracle/ocr.loc on nodes host02 and host03 which are up

[root@host02 ~]# cat /etc/oracle/ocr.loc

#Device/file getting replaced by device +FRA

ocrconfig_loc=+DATA

ocrmirrorconfig_loc=+FRA

[root@host03 ~]# cat /etc/oracle/ocr.loc

#Device/file getting replaced by device +FRA

ocrconfig_loc=+DATA

ocrmirrorconfig_loc=+FRA

- Check that new OCR location is not added in /etc/oracle/ocr.loc on node host01 which was down

[root@host01 ~]# cat /etc/oracle/ocr.loc

ocrconfig_loc=+DATA

local_only=FALSE

– Bring up host01 .

– Check that clusterware has not come up there

[root@host01 host01]# crsctl stat res -t

- Check the alert log of host01

[root@host01 host01]# tailf /u01/app/11.2.0/grid/log/host01/alerthost01.log

[ohasd(4914)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘host01‘.

2013-01-18 23:35:01.950

- Check the crsd log of host01 – Indicates that local and master information of OCR configuration does not match


 [root@host01 crsd]# vi /u01/app/11.2.0/grid/log/host01/crsd/crsd.log

[ OCRMAS][2876611472]th_calc_av:5′: Rturn persisted AV [186646784] [11.2.0.1.0]

2013-01-18 23:35:13.931: [ OCRSRV][2876611472]th_not_master_change: Master change callback not registered

2013-01-18 23:35:13.931: [ OCRMAS][2876611472]th_master:91: Comparing device hash ids between local and master failed

2013-01-18 23:35:13.931: [ OCRMAS][2876611472]th_master:91 Local dev (1862408427, 1028247821, 0, 0, 0)

2013-01-18 23:35:13.931: [ OCRMAS][2876611472]th_master:91 Master dev (1862408427, 1897369836, 0, 0, 0)

2013-01-18 23:35:13.931: [ OCRMAS][2876611472]th_master:9: Shutdown CacheLocal. my hash ids don’t match

– Repair OCR configuration on host01

[root@host01 crsd]# ocrconfig -repair -add +FRA

- Check that new OCR location is added in /etc/oracle/ocr.loc on node host01

[root@host01 crsd]# cat /etc/oracle/ocr.loc

#Device/file getting replaced by device +FRA

ocrconfig_loc=+DATA

ocrmirrorconfig_loc=+FRA

– Shutdown and restart cluster on host01

[root@host01 crsd]# crsctl stop crs -f
[root@host01 crsd]# crsctl start crs
[root@host01 crsd]# crsctl start cluster

– Check that crsd is started on host03

[root@host01 host01]# tailf /u01/app/11.2.0/grid/log/host01/ alerthost01.log

[crsd(7297)]CRS-1012:The OCR service started on node host01.

2013-01-18 23:46:07.609

[crsd(7297)]CRS-1201:CRSD started on node host01.

[root@host02 ~]# crsctl stat res -t

——————————————————————————————-

11g R2 RAC: LET’S LOSE GPNP PROFILE

January 16, 201311g R2 RAC, UncategorizedAnju Garg

In an earlier post GPNP PROFILE DEMYSTIFIED , I had explained what is GPnP profile, what does it contain and how is it used by clusterware.

In this post, I will demonstrate what happens if we lose GPnP profile.

Current scenario:

2 Node cluster

Cluster name : cluster01.example.com

Host names : host01, host02

Overview:

- Rename GPnP profile on a node (host02)

- Stop and restart crs

- File pending.xml is created in GRID_HOME/gpnp//profiles/peer

- Change ASM_DISKSTRING parameter which causes GPnP profile to be updated

- As soon as the parameter is changed, pending.xml is renamed to profile.xml and Discoverystring for ASM is updated in it.

Let’s start …

– Rename GPnP profile on host02


   [root@host02 cdata]# mv /u01/app/11.2.0/grid/gpnp/host02/profiles/peer/profile.xml /u01/app/11.2.0/grid/gpnp/host02/profiles/peer/profile.bak

- Stop and restart crs on host02

     [root@host02 cdata]#crsctl stop crs
     [root@host02 cdata]#crsctl start crs

-- Check the alert log simultaneously and note that just prior to the start of CSSD daemon, GPnP daemon on node2 copies the GPnP profile from host01 and file pending.xml is created in GRID_HOME/gpnp//profiles/peer

[ohasd(24187)]CRS-2772:Server ‘host02′ has been assigned to pool ‘Free’.

2013-01-17 01:35:14.635

[cssd(24416)]CRS-1713:CSSD daemon is started in clustered mode

 [root@host02 cdata]# ls /u01/app/11.2.0/grid/gpnp/host02/profiles/peer/pending*

/u01/app/11.2.0/grid/gpnp/host02/profiles/peer/pending.xml

—- Open the file pending.xml and note that DiscoveryString for ASM is “”

DiscoveryString=””SPFile=”+DATA/cluster01/asmparameterfile/registry.253.804837235″/>

– Change ASM_DISKSTRING parameter which causes GPnP profile to be updated

   SQL>alter system set ASM_DISKSTRING='ORCL:*';

- As soon as the parameter is changed, pending.xml is renamed to profile.xml and Discoverystring for ASM is updated in it.

[root@host02 cdata]# ls /u01/app/11.2.0/grid/gpnp/host02/profiles/peer/profile.xml

/u01/app/11.2.0/grid/gpnp/host02/profiles/peer/profile.xml

—- Open the file profile.xml and note that DiscoveryString for ASM has been updated

DiscoveryString=”ORCL:*” SPFile=”+DATA/cluster01/asmparameterfile/registry.253.804837235″/>

Conclusion:

Whenever GPnP profile is not present, it is automatically copied from existing nodes by GPnPd as pending.xml . Whenever, profile is updated, it is automatically renamed to profile.xml.

———————————————————————————-

Related links:

HOME

11g R2 RAC INDEX

11G R2 RAC: GPNP PROFILE DEMYSTIFIED

11g R2RAC: LET’S EDIT GPNP PROFILE

11g R2 RAC : OCR DEMYSTIFIED

11g R2 RAC : OLR DEMYSTIFIED

11g R2 RAC : VOTING DISK DEMYSTIFIED

HOW DOES 11G R2 CLUSTERWARE START ASM WHEN ASM SPFILE IS STORED ON ASM ITSELF?

——————

11g R2 RAC : INSTANCE FAILOVER IN SERVER POOL

January 11, 201311g R2 RAC, UncategorizedAnju Garg

In my earlier post , I had explained the concept of server pools which was introduced in 11g R2 RAC. In this post, I will demonstrate how instance failover takes place in a policy managed database in 11g R2 RAC.

Prior to 11g R2 RAC, various instance of a database were mapped to various nodes in the cluster. Imagine a cluster having 4 nodes host01, host02, host03 and host04. I have a database DB configured to run on 3 nodes in the cluster i.e.

Instance DB1 runs on host01

Instance DB2 runs on host02

Instance DB3 runs on host03

Let us say I have a service DB_SERV defined for the database with

Preferred instances : DB1, DB2

AVailable instances : DB3

As the clusterware starts up , the status is as follows:

Instance DB1 runs on host01

Instance DB2 runs on host02

Instance DB3 runs on host03

Service DB_SERV is supported by host01 and host02 i.e. if a user connects to the database using service db_serv, he will be connected to either of host01 and host02.

DB1 DB2 DB3

DB_SERV DB_SERV |

| | |

V V V

host01 host02 host03 host04

Now, if host02 crashes,

i.e.

DB1 DB2 DB3

DB_SERV DB_SERV |

| | |

V V V

host01 host02 host03 host04

- Instance DB2 crashes.

- Service DB_SERV fails over from crashed instance (DB2) to available instance i.e. DB3 i.e. now if a user connects to the database using service db_serv, he will be connected to either of host01 and host03.

DB1 DB3

DB_SERV DB_SERV

| |

V V

host01 host03 host04

It implies that although we had an idle server host04 available but still instance DB2 crashed instead of failing over to host04 since instance DB2 has been configured to run on host02 only.

ENTER SERVER POOLS:

Above problem has been resolved in 11g R2 RAC with the introduction of server pools. Now the database is configured to run in a server pool. The cardinality of the database is decided by MIN-SIZE attribute of the server pool. The database runs on the servers which are assigned to the server pool. If a node hosting the database crashes, the idle node (if available) will be moved from Free pool to the server pool and the instance running on the crashed node will fail over to the that node. All the services supported by the instance will fail over along with the instance.

Let us see how.

Imagine a cluster having 3 nodes host01, host02 and host03

HOST01 HOST02 HOST03

– create a server pool pmpool with min = 2, max = 2, importance = 1

[grid@host01 ~]$srvctl add srvpool -g pmpool -l 3 -u 3 -i 1

– check that 2 servers (host02, bost03) have been allocated to this server pool

[grid@host01 ~]$crsctl  status  serverpool

PMPOOL

+——————————————+

| |

HOST01 | HOST02 HOST03 |

+——————————————-+

– create a policy managed database pmdb which will run in the serverpool pmpool

– check that two servers (host02 and host03) have been allocated to serverpool pmpool

[root@host01 ~]#crsctl status serverpool

– check that two instances of pmdb are running on the 2 servers (host02, host03) in pmpool

[oracle@host01 ~]$ srvctl  status database -d pmdb

Instance pmdb_1 is running on node host02

Instance pmdb_2 is running on node host03

PMPOOL

+——————————————+

| PMDB_1 PMDB_2 |

HOST01 | HOST02 HOST03 |

+——————————————-+

REBOOT HOST03 WHERE INSTANCE PMDB-2 is running

PMPOOL

+——————————————+

| PMDB_1 PMDB_2 |

HOST01 | HOST02 HOST03 |

| X |

+——————————————-+

– Check the status of the database pmdb repeatedly –

–Note that instance pmdb_2 fails over to host01 –

[oracle@host01 ~]$ srvctl  status database -d pmdb

Instance pmdb_1 is running on node host02

Instance pmdb_2 is running on node host03

[oracle@host01 ~]$

[oracle@host01 ~]$ srvctl  status database -d pmdb

Instance pmdb_1 is running on node host02

Instance pmdb_2 is not running on node host03

[oracle@host01 ~]$

[oracle@host01 ~]$ srvctl  status database -d pmdb

Instance pmdb_1 is running on node host02

Instance pmdb_2 is running on node host01

PMPOOL

+——————————————+

| PMDB_1 PMDB_2 |

HOST03 | HOST02 HOST01 |

+——————————————-+

– check that host01 has been moved to pmpool

PMPOOL

+——————————————-+

| |

HOST03 | HOST02 HOST01 |

X +——————————————-+

[root@host02 ~]# crsctl status serverpool

NAME=Free

ACTIVE_SERVERS=

NAME=Generic

ACTIVE_SERVERS=

NAME=ora.pmpool

ACTIVE_SERVERS=host01 host02

——————————————————————————————————–

Related links:

HOME

11gR2 RAC INDEX

11g R2 RAC : SERVER POOLS

11g R2 RAC : SERVICES

WORKLOAD MANAGEMENT IN 11g R2 RAC : FAILOVER

WORKLOAD MANAGEMENT IN 11g R2 RAC : LOAD BALANCING

11g R2 RAC: SERVER POOLS

January 11, 201311g R2 RAC, UncategorizedAnju Garg

In this post, I will explain about Server Pools. 11G R2 RAC introduced serverpools which will revolutionalize cluster management and ease the life of the DBA’s. To understand srever pools, let us see how the things were earlier :

ADMIN MANAGED DATABASES

In prior releases of Oracle RAC, the DBA would explicitly manage which nodes of a cluster would be used to run the various instances associated with a RAC database. Whenever a node was added/removed from the cluster, DBAs were responsible for adding/removing instances from a RAC database, including the creation of redo log threads and undo tablespaces. Additionally, database services would be manually configured with preferred and available nodes, which facilitated the balancing of connections and failover of the services in a RAC environment. This scheme worked well for a cluster with a small number of nodes.

NEED FOR POLICY MANAGED DATABASES

- As the number of nodes become large, DBA’s tasks became very complicated .

- It was not possible to add more instances dynamically to newly added servers .

- It was not possible for additional instances of database to start automatically as some servers in the cluster became idle.

– It was not possible for a service to be available on newly added nodes dynamically.

Now, from Oracle 11g R.2 RAC all these tasks are automated using server pool , a new feature of Oracle 11g RAC. Server Pools logically divide the cluster into small segments that can be used for varying workloads. Now databases and services can be defined to run in server pools. and servers in the server pools can change dynamically.

ENTER SERVER POOLS

Oracle 11g R2 introduced server pools in which we could divide the cluster into smaller server pools. Each pool could be assigned some attributes:

IMPORTANCE : 0(default) to 1000 – It is used to determine how to reconfigure the servers in various pools when a node joins/leaves the cluster. Using importance parameter you can ensure that low precedence workload doesn’t eat up more resources.

MAX_SIZE: The maximum number of servers a server pool can contain. Default value is -1 which indicates means that all the servers in the cluster can be a part of this pool.

MIN_SIZE: The minimum size of a server pool. If the number of servers contained in a server pool is below the number you specify in this attribute, then Oracle Clusterware automatically moves servers from other pools into this one until that number is met. The default value is 0.

ACL: String in the following format:owner:user:rwx,pgrp:group:rwx,other::r— Defines the owner of the server pool and which privileges are granted to various operating system users and groups.

r: Read only
w: Modify attributes of the pool or delete it
x: Assign resources to this pool

By default, the identity of the client that creates the server pool is the owner.

Clusterware can move servers from server pool robotically to maintain important server pool configuration. By default two pools get created after installation of the cluster : Generic Pool and Free Pool

- Generic Pool: The servers in the pool are used to host Admin Managed databases.

When you upgrade your existing version of the clusterware to 11.2, all the nodes get mapped to the in-built Generic pool.This is an internally managed server pool and the modification of the attributes of this pool is not allowed. When you create a database which is Admin-managed, that also becomes a part of the Generic Pool as a child pool of it.

Free Pool: All the servers which have not been assigned to any pool are part of this pool. When the clusterware starts , initially all the servers are assigned to this pool only. As applications start, servers move from this pool to Generic or user defined pool dynamically. As like the Generic pool, this is also an internally managed pool but still some attributes are available to be modified by the dba like IMPORTANCE.

In addition to above pools, users can also define server pools.

The allocation of the hosts is going to be happening in the following order:

1. Generic Pool

2. Explicitly created pool

3. Free Pool

Servers will be assigned to explicitly created pools as per the following rules:

1. Fill all server pools in order of importance until they meet their minimum.

2. Fill all server pools in order of importance until they meet their maximum.

3. By default any left over go to FREE.

If servers leave the cluster for any reason, a server reallocation may take place. If there are servers in the Free Pool and another server pool falls below its maximum value, a free server is allocated to the affected server pool. If there are no free servers, then server reallocation takes place only if a server pool falls below its minimum level. If that occurs, a server will be sourced from one of the following locations in the following order:

1. The server pool with the lowest importance that has more than its minimum number of servers
2. The server pool with the lowest importance that has at least one server and has lower importance than the affected server pool

Using these mechanisms, server pools can maintain an optimal level of resources based on current no. of servers that are available.

DEMONSTRATION:

Current scenario:

Nodes in the cluster : host01, host02, host03

- Let’s see the current status of server pools

#crsctl status serverpool

NAME=Free

ACTIVE_SERVERS=host01 host02 host03

NAME=Generic

ACTIVE_SERVERS=

- create a serverpool pool mypool with min = 2, max = 2, imp = 1

[root@host02 host02]# srvctl add srvpool -g mypool -l 2 -u 2 -i 1

- check that two servers have been allocated to mypool from free pool

[root@host02 host02]# crsctl stat serverpool

NAME=Free

ACTIVE_SERVERS=host01

NAME=Generic

ACTIVE_SERVERS=

NAME=ora.mypool

ACTIVE_SERVERS=host02 host03

- Add another server pool mypool1 with min = 2, max = 2, imp = 1

[root@host02 host02]# srvctl add srvpool -g mypool1 -l 2 -u 2 -i 1

- check that one server has been allocated to mypool1 from free pool

 [root@host02 host02]# crsctl stat serverpool

NAME=Free

ACTIVE_SERVERS=

NAME=Generic

ACTIVE_SERVERS=

NAME=ora.mypool

ACTIVE_SERVERS=host02 host03

NAME=ora.mypool1

ACTIVE_SERVERS=host01

- Increase the importance of mypool1 to 2

[root@host02 host02]# srvctl modify srvpool -g mypool1 -i 2

- Check that one server has been moved from mypool of lower importance(1) to mypool of higher importance (2)

[root@host02 host02]# crsctl stat serverpool

NAME=Free

ACTIVE_SERVERS=

NAME=Generic

ACTIVE_SERVERS=

NAME=ora.mypool

ACTIVE_SERVERS=host03

NAME=ora.mypool1

ACTIVE_SERVERS=host01 host02

- Increase the importance of mypool to 3

[root@host02 host02]# srvctl modify srvpool -g mypool -i 3

- Check that one server has been moved from mypool1 to mypool

[root@host02 host02]# crsctl stat serverpool

NAME=Free

ACTIVE_SERVERS=

NAME=Generic

ACTIVE_SERVERS=

NAME=ora.mypool

ACTIVE_SERVERS=host02 host03

NAME=ora.mypool1

ACTIVE_SERVERS=host01

- Cleanup –

[root@host02 host02]# srvctl remove srvpool -g mypool1
[root@host02 host02]# srvctl remove srvpool -g mypool

CHILD POOLS

We can have various child pools for the same parent pool. The child pools will have their own properties. Servers will be allocated to the child pools based on the IMPORTANCE parameter assigned to them. For example we can have a database which runs in a parent pool. We can assign various services of the database assigned to different pools. We can have different attributes set up for the child pools depending on the importance that we want to give to each service of the database. The hosts assigned to various parent pools are mutually exclusive i.e. same server can’t be assigned to different parent pools. For the child pools, whether the same server can be assigned to different child pools or not, would be based on the value of the attribute EXCLUSIVE_POOL for them. If there is a string value assigned to this parameter and this is shared by two child pools, host assigned to one child pool can’t be shared by its other sibling.

– Create a parent pool pmpool3 with min = 3, max=3 and importance = 1

[root@host02 host02]# srvctl add srvpool -g pmpool3 -l 3 -u 3 -i 1

– Check that all the three servers have been assigned to pmpool3 –

[root@host02 host02]# crsctl stat serverpool

NAME=Free

ACTIVE_SERVERS=

NAME=Generic

ACTIVE_SERVERS=

NAME=ora.pmpool3

ACTIVE_SERVERS=host01 host02 host03

- Add a child pool ch1 with parent pool = pmpool3, min = 2, max =2 and importance = 1

[grid@host02 ~]$ crsctl add serverpool ch1 -attr "PARENT_POOLS=ora.pmpool3,MIN_SIZE=2,MAX_SIZE=2,IMPORTANCE=1"

- Check that two servers have been allocated to child pool ch1

[root@host02 ~]# crsctl status serverpool

NAME=Free

ACTIVE_SERVERS=

NAME=Generic

ACTIVE_SERVERS=

NAME=ch1

ACTIVE_SERVERS=host01 host02

NAME=ora.pmpool3

ACTIVE_SERVERS=host01 host02 host03

- Add another child pool ch2 with min = 2, max = 2, importance = 2(higher than ch1(1))

 [grid@host02 ~]$ crsctl add serverpool ch2 -attr "PARENT_POOLS=ora.pmpool3,MIN_SIZE=2,MAX_SIZE=2,IMPORTANCE=2"

- Check that two servers have been allocated to child pool ch2. Since we haven’t set the parameter EXCLUSIVE_POOLS, the same hosts host01 and host02 are shared by both the children.

[root@host02 ~]# crsctl status serverpool

NAME=Free

ACTIVE_SERVERS=

NAME=Generic

ACTIVE_SERVERS=

NAME=ch1

ACTIVE_SERVERS=host01 host02

NAME=ch2

ACTIVE_SERVERS=host01 host02

NAME=ora.pmpool3

ACTIVE_SERVERS=host01 host02 host03

————————————————————————————

EXCLUSIVE_POOLS

– Delete both child pools ch1 and ch2

[grid@host02 ~]$ crsctl delete serverpool ch1 
[grid@host02 ~]$ crsctl delete serverpool ch2

– Add child pool ch1 with min = 2,max = 2, importance = 1, exclusive_pools=’x’

[grid@host02 ~]$ crsctl add serverpool ch1 -attr "PARENT_POOLS=ora.pmpool3,MIN_SIZE=2,MAX_SIZE=2,IMPORTANCE=1,EXCLUSIVE_POOLS='x'"

– check that 2 servers have been assigned to child pool ch1

[grid@host02 ~]$ crsctl stat serverpool

NAME=Free

ACTIVE_SERVERS=

NAME=Generic

ACTIVE_SERVERS=

NAME=ch1

ACTIVE_SERVERS=host01 host02

NAME=ora.pmpool3

ACTIVE_SERVERS=host01 host02 host03

- Add another child pool ch2 with min = 2, max = 2, importance = 2(higher than ch1(1)), exclusive_pools = x so that it won’t share any servers with ch1

[grid@host02 ~]$ crsctl add serverpool ch2 -attr "PARENT_POOLS=ora.pmpool3,MIN_SIZE=2,MAX_SIZE=2,IMPORTANCE=2,EXCLUSIVE_POOLS='x'"

- CHeck that one server has been moved from ch1 to ch2 as importance of ch2 is higher and servers are not shared between ch1 and ch2.

[grid@host02 ~]$ crsctl stat serverpool

NAME=Free

ACTIVE_SERVERS=

NAME=Generic

ACTIVE_SERVERS=

NAME=ch1

ACTIVE_SERVERS=host01

NAME=ch2

ACTIVE_SERVERS=host02 host03

NAME=ora.pmpool3

ACTIVE_SERVERS=host01 host02 host03

=========

WIth the introduction of server pools, we can have two management styles for RAC database.

Administrator managed databases (similar to what we had prior to 11g R2)

. DBA defines which servers should the database run on

. DBA decides the servers (preferred/available nodes) on which services of the database should run.

. DBA decides which instance of the database will run on which server.

. This database type would assign the instances to a child pool of the default pool available Generic pool.

Although this type seems to be much simpler, this can’t be used when the number of nodes pass a number which is difficult to manage by a DBA, for example 100. In that case, the next type of the 11.2 database would be helpful-Policy Managed database.

Policy Managed Databases

- The database can be associated with a server pool rather than a specific set of nodes. It will decide the minimum and maximum no. of servers needed by that resource (database , service, third party application).

- The database will run on the servers which have been allocated to the serverpool it has been assigned to.

- Since servers allocated to the server pool can change dynamically, this would allow Oracle to dynamically deliver services based on the total no. of servers available to the cluster.

. The database will started on enough servers subject to the availability of the servers.

. We need not hardcode the servers a database should run on.

. Any instance of the database can run on any node. There is no fixed mapping between instance number and node.

. As servers are freed/added/deleted, they will be allocated to existing pools as per the rules mentioned earlier.

For example, if a cluster consisted of eight nodes in total and supported three RAC databases. Each database would be defined with a minimum and maximum number of servers. Let’s assume that

DB1 is defined with a minimum of 4 and a maximum of 6 with an importance of 10,

DB2 is defined with a minimum of 2 and maximum of 3 and an importance of 7, and

DB3 is set to a minimum of 2 and maximum of 3 with an importance of 5.

Initially the 8 nodes could be configured as nodes 1-4 allocated to DB1, nodes 5-6 allocated to DB2 and nodes 7-8 allocated to DB3. If node 3 failed for some reason, the system would allocate node 7 or 8 to DB1 because it has a higher importance than DB3 and a minimum requirement of 4 servers, even though it would cause DB3 to fall below the minimum number of servers. If node 3 is re-enabled it would be allocated immediately to DB3 to bring that database back to its minimum required servers.

If a 9th node were added to the cluster, it would be assigned to DB1 because of the importance factor and the fact that DB1 has not yet hit its maximum number of servers.

Conclusion
As the number of cluster increase, DBA’s will get a huge benefit from server pools as the management of the clusterware resources will become quite easy and dynamic.
In my next post, I will write about instance failover in server pools.
References:
http://docs.oracle.com/cd/E11882_01/rac.112/e16794/admin.htm
http://allthingsoracle.com/an-introduction-to-11-2-rac-server-pools/
http://www.mahir-quluzade.com/2013/05/converting-administrator-managed.html
Policy based cluster management in Oracle 12c

———————————————————————————

Related links:

HOME

11gR2 RAC INDEX

11gR2 RAC: INSTANCE FAILOVER IN SERVERPOOLS

11g R2 RAC : SERVICES
WORKLOAD MANAGEMENT IN 11g R2 RAC : FAILOVER
WORKLOAD MANAGEMENT IN 11g R2 RAC : LOAD BALANCING

11g R2 RAC :SCAN DEMYSTIFIED

January 7, 201311g R2 RAC, UncategorizedAnju Garg

Single Client Access Name or SCAN is a new feature that has been introduced in 11g R2 RAC. It enables the clients to use a single hostname to access oracle databases running in the cluster. Clients using SCAN do not need to change their TNS configuration if you add or remove nodes in the cluster.

To understand it better, let us go back a little to 10g/11gR1 RAC.

Let us say we have a database orcl which is running on 3 nodes in a cluster;

Name of the cluster: cluster01.example.com

Instance Host name IP address VIP Pvt network

orcl1 host01.example.com 192.9.201.181 192.9.201.182 10.0.0.1

orcl2 host02.example.com 192.9.201.183 192.9.201.184 10.0.0.2

orcl3 host03.example.com 192.9.201.185 192.9.201.186 10.0.0.3

DNS server

server1.example.com 192.9.201.59

Storage

openfiler1.example.com 192.9.201.9

Prior to 11g R2, when the user requested the connection to the database by issuing

$sqlplus hr/hr@orcl

Entry in tnsnames.ora on client side was read which looked like:

ORCL =

(DESCRIPTION =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = host01-vip.example.com)(PORT = 1521))

(ADDRESS = (PROTOCOL = TCP)(HOST = host02-vip.example.com)(PORT = 1521))

(ADDRESS = (PROTOCOL = TCP)(HOST = host03-vip.example.com)(PORT = 1521))

(LOAD_BALANCE = yes)

)

(CONNECT_DATA =

(SID = orcl)

)

TNS entry contains reference(address) to all the nodes in the cluster. The request was to be sent to any one of the host names listed in ADDRESS_LIST.

To resolve the host name two methods were available :

- hosts file

- DNS Server

Let us discuss hosts file first. The /etc/hosts file contained entries using which hostnames could be resolved to corresponding VIPs.

#Public Network

192.9.201.181 host01.example.com host01

192.9.201.183 host02.example.com host02

192.9.201.185 host03.example.com host03

#Private Network

10.0.0.1 host01-priv.example.com host01-priv

10.0.0.2 host02-priv.example.com host02-priv

10.0.0.3 host03-priv.example.com host03-priv

#Virtual Network

192.9.201.182 host01-vip.example.com host01-vip

192.9.201.184 host02-vip.example.com host02-vip

192.9.201.186 host03-vip.example.com host03-vip

#Storage

192.9.201.9 openfiler1.example.com openfiler

Once VIP of the host was known, the request was sent to that IP address where it would be received by the local listener running there. The listener’s information was stored on the listener.ora file on the server side.

In this method of hostname resolution, whenever we added/deleted a node , we needed to modify both – tnsnames.ora and hosts file on each client machine which was a cumbersome task. This problem of hosts file was resolved by storing hostname resolution entries in a centralised DNS server. Now , on adding/deleting a node, hostname resolution entries needed to be modified only in one place i.e. DNS server but tnsnames.ora still needed to be modified on each client machine.

To avoid modifying tnsnames.ora on each client machine, SCAN was introduced in 11gR2.

ENTER SCAN in 11g R2 RAC ….

Now we can specify only the cluster name in the tnsnames.ora and client would be connected to any of the nodes in the cluster. Let us see how it can be implemented:

Now tnsnames.ora entry looks like:

ORCL =

(DESCRIPTION =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = cluster01-scan.cluster01.example.com)(PORT = 1521))

(LOAD_BALANCE = yes)

)

(CONNECT_DATA =

(SID = orcl)

)

When a user tried to connect to orcl database, he needs to resolve cluster name i.e. cluster01.example.com. To resolve the cluster name again we can use the hosts file or DNS server.

SCAN USING HOSTS FILE

If we use /etc/hosts file, it would look like this :

cluster01-scan 192.9.201.181

Here, we have the limitation that cluster name can resolve only to one SCAN-VIP i.e. all the client connection requests will be directed to same SCAN listener i.e. client side load balancing won’t take place. In case we want client side load balancing, cluster name should resolve to multiple SCAN-VIP’s and we must use DNS .

SCAN USING DNS

Hence we include the following in /etc/resolve.conf to forward any connection request for example.com to DNS server running at IP address 192.9.201.59

/etc/resolv.conf

Search nameserver

example.com 192.9.201.59

Various files associated with DNS on DNS server are :

- /var/named/chroot/etc/named.conf

- /var/named/chroot/var/named/forward.zone

- /var/named/chroot/var/named/reverse.zone

- /var/named/chroot/var/named/reverse1.zone

——————————————

- /var/named/chroot/etc/named.conf

——————————————

When a request comes to DNS, this file /var/named/chroot/etc/named.conf

is looked into first

Here it informs that domain example.com’s info is in forward.zone

zone “example.com” IN {

type master;

file “forward.zone”;

};

In forward.zone , following are specified :
– IP addresses of all the nodes in the cluster
– VIP addresses of all the nodes in the cluster
– Private network IP addresses of all the nodes in the cluster
– IP address of DNS Server
– IP address of Storage

- SCAN-VIP’s

************

——————————————

- /var/named/chroot/var/named/forward.zone

—————————————–

This file is used to resolve host names to ip addresses.

IN NS server1

IN A 192.9.201.59

server1 IN A 192.9.201.59

host01 IN A 192.9.201.183

host02 IN A 192.9.201.185

host03 IN A 192.9.201.187

openfiler1 IN A 192.9.201.181

host01-priv IN A 10.0.0.11

host02-priv IN A 10.0.0.22

host03-priv IN A 10.0.0.33

host01-vip IN A 192.9.201.184

host02-vip IN A 192.9.201.186

host03-vip IN A 192.9.201.188

cluster01-scan IN A 192.9.201.190

IN A 192.9.201.191

IN A 192.9.201.192

After getting address, the request is sent in round-robin fashion to the SCAN listeners listening at SCAN-VIP’s.

——————————————–

- /var/named/chroot/var/named/reverse.zone

——————————————–

This file is used to resolve ip addresses to host names.

********

$TTL 86400

@ IN SOA server1.example.com. root.server1.example.com. (

997022700 ; Serial

28800 ; Refresh

14400 ; Retry

3600000 ; Expire

86400 ) ; Minimum

IN NS server1.example.com.

59 IN PTR server1.example.com.

182 IN PTR openfiler.example.com.

183 IN PTR host01.example.com.

185 IN PTR host02.example.com.

187 IN PTR host03.example.com.

184 IN PTR host01-vip.example.com.

186 IN PTR host02-vip.example.com.

188 IN PTR host03-vip.example.com.

190 IN PTR cluster01-scan.example.com

191 IN PTR cluster01-scan.example.com

192 IN PTR cluster01-scan.example.com

——————————————–

- /var/named/chroot/var/named/reverse1.zone

——————————————–

This file is used to resolve private interconnect ip addresses

IN NS server1.example.com.

11 IN PTR host11-priv.example.com.

22 IN PTR host22-priv.example.com.

33 IN PTR host33-priv.example.com.

Thus, now we do not need to modify tnsnames.ora file on client machine whenever we add/delete node(s)as the file contains just a reference to the cluster. Also, we need not modify the entries in DNS on adding / deleting as node as SCAN name in DNS resolves to SCAN-VIP’s and not node-VIP’s. But we still needed to assign VIPs manually to various nodes. If we let DHCP assign VIP’s to various nodes automatically and let these VIP’s be known to DNS somehow, we would just need to run a few commands to let clusterware know about the node addition/deletion and we are done!!! No need to assign VIP’s manually.

This feature of 11gR2 RAC is called have GPNP – Grid Plug n Play . This is implemented by GNS (Grid Naming Service) which is similar to DNS but is used to resolve the cluster name only within a corporate domain (i.e. example.com).

SCAN WITH GNS

Let us see how DNS entries are modified now :

Various files associated with DNS are :

- /var/named/chroot/etc/named.conf

- /var/named/chroot/var/named/forward.zone

- /var/named/chroot/var/named/reverse.zone

- /var/named/chroot/var/named/reverse1.zone

——————————————

- /var/named/chroot/etc/named.conf

——————————————

When a request comes to DNS, this file is looked into first

Here domain example.com’s info is in forward.zone

zone “example.com” IN {

type master;

file “forward.zone”;

allow-transfer { 192.9.201.180; };

};

As per forward.zone request for cluster01.example.com is to be forwarded to the address for cluster01-gns i.e. 192.9.201.180

——————————————

- /var/named/chroot/var/named/forward.zone

—————————————–

This file is used to resolve host names to ip addresses.

IN NS server1

IN A 192.9.201.59

server1 IN A 192.9.201.59

host01 IN A 192.9.201.183

host02 IN A 192.9.201.185

host03 IN A 192.9.201.187

openfiler1 IN A 192.9.201.181

host01-priv IN A 10.0.0.11

host02-priv IN A 10.0.0.22

host03-priv IN A 10.0.0.33

$ORIGIN cluster01.example.com.

@ IN NS cluster01-gns.cluster01.example.com.

cluster01-gns IN A 192.9.201.180

After getting gns address, it comes back to named.conf and checks if transfer is allowed to that address. if it is allowed, request is placed on that address to gns.

zone “example.com” IN {

type master;

file “forward.zone”;

allow-transfer { 192.9.201.180; };

};

——————————————–

- /var/named/chroot/var/named/reverse.zone

——————————————–

This file is used to resolve ip addresses to host names.

$TTL 86400

@ IN SOA server1.example.com. root.server1.example.com. (

997022700 ; Serial

28800 ; Refresh

14400 ; Retry

3600000 ; Expire

86400 ) ; Minimum

IN NS server1.example.com.

59 IN PTR server1.example.com.

182 IN PTR openfiler.example.com.

183 IN PTR host01.example.com.

185 IN PTR host02.example.com.

187 IN PTR host03.example.com.

——————————————–

- /var/named/chroot/var/named/reverse1.zone

——————————————–

This file is used to resolve private interconnect

ip addresses

IN NS server1.example.com.

11 IN PTR host11-priv.example.com.

22 IN PTR host22-priv.example.com.

33 IN PTR host33-priv.example.com.

——————————————————————-

Once the request goes to GNS, it is multicast to all the nodes in the cluster. In all, three SCAN Listeners run on any of the nodes in the cluster. mDNS process running on the nodes hosting the SCAN listeners return the VIP’s of SCAN listeners to GNS. GNS in turn returns the addresses of the three SCAN listeners to DNS. The client request is forwarded to any one of the three SCAN listeners (say SCAN-Listener1). All the SCAN listeners have the load information of all the nodes. The SCAN listener which receives the client request (SCAN-Listener1) forwards the request to the local listener running on the least loaded node.

Thus, with SCAN all the VIP’s (Host VIP’s and SCAN-VIPs) are assigned dynamically by DHCP. The only VIP which is static is the VIP for GNS. Note that we have to specify the range of IP’s to be assigned by DHCP in the file /etc/dhcpd.conf.. For example the following entry specifies the IP range from 192.9.201.190 to 192.9.201.254.

range dynamic-bootp 192.9.201.190 192.9.201.254;

Now, to add a node, We just need to add an entry giving the physical IP of the node in files of DNS server. The VIP’s will be automatically assigned by DHCP.

I hope you found this post useful.

Your comments and suggestions are always welcome.

References:

http://www.oracle.com/technetwork/products/clustering/overview/scan-129069.pdf

https://saruamit4.wordpress.com/2013/09/27/how-many-scan-listeners/

——————————————————————————————————-

Related links:

Home

11G R2 RAC Index

11g R2 RAC: GPNP Profile Demystified

11g R2 RAC: Instance Failover In Serverpools

Need For VIP In RAC

——————

LATCHES, LOCKS, PINS AND MUTEXES

January 5, 2013Database, Tuning, UncategorizedAnju Garg

In my earlier article in this series ,I had discussed about the architecture of the shared pool. In this post, I will discuss about various memory protection structures used in shared pool i.e. latches, locks, pins and mutexes.

LATCHES

If there is not enough memory (no single chunk of free memory large enough) to create a new object in the library cache, Oracle uses LRU algorithm to delink objects from their hash chains list— which means picking two existing objects that currently point to each other, setting the forward and backward pointers in the new object to point to them, and then modifying the backward pointer on one object and the forward pointer on the other object to point to the new object.

Hash buckets Hash chains

H1 S1 <-> S2 <–> S5 <–> S9

^ ^

|___________________|

H2 S3 <–> S8

^ ^

|_______|

H3 S4 <–> S6

^ ^

|_______|

H4 S7

For example, if user issues a statement S10 which hashes to same value as bucket H2, following steps need to be taken :

- set forward pointer of S10 to S8

- set backward pointer of S10 to S3

- set forward pointer of S3 to S10

- set backward pointer of S8 to S10

so that finally hash bucket H2 looks like:

H2 S3 <–> S10 <–> S8

^ ^

|_______________|

Consider a scenario. User1 issues statement S1, finds out that it belongs to hash bucket H1. At the same time, another user User2 needs to load another object for which the object S2 in the bucket H1 needs to be aged out. User2 will need to delink S2 from bucket H1 for which he will

– set forward pointer of S1 to point to S5

– set backward pointer of S5 to point to S1

– set forward/backward pointers of S2 so that it is attached to memory free list.

so that finally hash bucket H1 looks like:

H1 S1 <–> S5 <–> S9

^ ^

|______________|

and S2 is linked to doubly linked list of free memory.

If both the users are allowed to access the contents of the hash bucket H1 at the same time, user1 will scan the doubly linked hash chain and could end up following a pointer to an object user2 has just discarded (S2) because user2 hadn’t yet time to correct both the pointers of S2. Somehow oracle has to ensure that a user can’t modify the linked list if another user is walking it and vice versa.

That is where oracle latches come into picture. Latches are simple, low-level serialization mechanisms to protect shared data structures in the System Global Area (SGA).Latches eliminate problems that would appear if multiple processes were allowed to modify shared memory simultaneously. A server or background process acquires a latch for a very short time while manipulating or looking at one of these structures.

Shared Pool Latch is used to protect and achieve the concurrency in Shared Pool. Multiple hash buckets are protected by one latch so that whenever a process wants to search for an object in a hash bucket, it has to first obtain the latch protecting the bucket, walk the linked list, do whatever it needs to do with the object if it finds it, and then drop the latch. If another user wants to walk a hash chain protected by the same latch, he will wait until the first user has dropped the latch. This gives rise to latch contention.

LATCH CONTENTION – LOCKS AND PINS

Latch contention occurs because a latch is held for too long, or because demand for the latch is too high, or both.

There are three important aspects, then, of latch contention:

• No. of different latches are covering the library cache — If number of latches are more, less no. of buckets will be protected by one latch and hence you are less likely to collide with someone who needs to walk a linked list covered by the latch you want; on the other hand, the more latches you have, the more work you may have to do in some form of maintenance, reporting, or garbage collection.

In shared pool architecture there are a fixed number of hash buckets (that grow to the next level as needed) protected by a fixed number of library cache latches .

Until 10g the number of latches covering the library cache was remarkably small. The number was dependent on the number of CPUs (it’s roughly the same as the cpu_count parameter) up to a maximum of 67 latches. That’s a surprisingly small number, really, given the potential for collisions occurring on even a small number of frequently executed statements; and two processes don’t have to be executing the same SQL, or even accessing the same hash bucket, to be colliding on the same latch—they need only be accessing two hash buckets covered by the same latch.

• No. of times you need to go through the process of acquiring a given latch — The more times you have to get the latch and walk the list, the more likely you are to collide with someone who wants the same latch.

We can minimize the number of times we have to search the library cache for an object by attaching a KGL lock to an object once we’ve found it so that we have a shortcut to it (Stored in PGA as open/closed cursor). Next time the same statement is issued, we will find it in the PGA itself and need not search hash chains for the object again (Softer parse).

• Time for which you have to hold the latch - The longer everyone holds latches, the more likely they are to have problems with other people holding latches that they want to acquire.

We ought to avoid holding latches for any length of time. This means that if we need to do something time-consuming when we’ve found the bit of memory we were searching for, we can pin the object to protect that memory while we’re using it so that we can release the latch.

So, after the server process has found out the corresponding hash bucket, the steps followed are :

1. Obtain latch on the corresponding hash bucket.

2. Obtain lock on the object so that pointer to the object is placed in the PGA (as Open cursor).

3. Pin the object and release the latch.

4. Do whatever you want to do with the object (e.g. execute a statement/procedure).

5. Obtain the latch, unpin the object and release the latch.

Locks and Pins are usually in share mode unless modifications are being made.

HOW ARE LIBRARY CACHE LOCKS (KGL LOCKS) OBTAINED?

There are three major ways in which you can obtain KGL locks :

• You can write your front-end code to “hold” cursors that you know you are going to use frequently.

• You can set the session_cached_cursors parameter so that Oracle library code will automatically start holding a cursor if it sees you using a statement more than two or three times.

• You can benefit from the semiautomatic way in which PL/SQL will hold cursors that have been opened (explicitly or implicitly) from within a PL/SQL call—from Oracle 9.2.0.5 onward, setting the session_cached_cursors parameter will also control this feature.The session_cached_cursors parameter dictates how many cursors can be held when your code does not explicitly hold cursors. It also controls the number of cursors that can be held open by PL/SQL code being run by the session.

BENEFITS OF KGL LOCKS

Besides minimizing the number of times we have to search the library cache for an object, locks provide additional benefits:

- By means of locks, one client can prevent other clients from accessing the same object. Library cache locks use object handles as resource structures and locks are taken on that resource. If the resources are not available in a compatible mode, then sessions must wait for library cache objects to be available.

- Library cache locks aka parse locks are needed to maintain dependency mechanism between objects and their dependent objects. For example, if an object (e.g. table) definition needs to be modified then dependent objects (e.g. execution plan of SQL statement referencing the table) must be invalidated. This dependency is maintained using library cache locks. For example, if a column is dropped from a table then all SQLs dependent upon the table must be invalidated and reparsed during next access to that object. Library cache locks are designed to implement this tracking mechanism. If we have lock on a SQL statement, we have pointer to its execution plan stored in PGA. If the structure of an underlying table is modified, the parse lock on the statement will be broken and the cursor in PGA will be invalidated i.e. next time we re-excute the same statement, it will be searched in the library cache and will subsequently be hard parsed.

WHY ARE LIBRARY CACHE PINS (KGL PINS) NEEDED

The KGL pin comes into play when you actually use an object. Although a KGL lock will hold an object in memory(cursor in PGA pointing to the object in shared pool), there are parts of the object that are dynamically re-creatable (the execution plan for an SQL statement, for example), and these can still be discarded (if memory is required in shared pool, the memory location to which the cursor in PGA is pointing may have some other object there) if there is a heavy demand for memory even if you have a KGL lock in place.

However, when you are actually using an object (running an SQL statement, say), you need to ensure that the re-creatable bits can’t be pushed out of memory, so you pin the object to protect them.

Pinning an object causes the heaps to be loaded into memory. If a client wants to modify or examine the object, he must acquire a pin after the lock.

For example, an underlying object should not be modified when a session is executing or accessing a dependent object (SQL).If a user tried to modify the underlying object (table structure for example) he will need to acquire library cache pin in exclusive mode on the table. If a dependent SQL is executing, the pins won’t be available and there will be waits for library cache pins. So, before parse locks on a library cache object can be broken, library cache pins must be acquired in Exclusive mode and then only library cache objects can be altered. Typically, this happens for long running SQL statement.

Library cache locks and pins are externalized in three x$ tables i.e. x$kgllk, x$kglpn and x$kglob.

. x$kgllk is externalizing all locking structures on an object.

. Entries in x$kglob acts as a resource structure.

. x$kglpn is externalizing all library cache pins.

ISSUES WITH KGL LOCKS AND PINS

The mechanisms of locking and pinning in the library cache allowed two problems to creep in.

The KGL locks and KGL pins themselves are simply little packets of memory that, at one time, were individually created and discarded on demand by making calls to allocate memory from the shared pool. Since a KGL lock is about 200 bytes and a KGL pin is about 40 bytes, you can imagine that between them the constant allocation and freeing of memory could cause the free memory in the shared pool to end up in a “honey-combed” state—i.e., lots of free memory in total, but not many large pieces of contiguous memory. The KGL pins were particularly nasty because they would come and go very quickly; the KGL locks weren’t quite so bad because they could stay attached to an object for some time.

Another issue with the KGL locks and KGL pins was that, to use them, you had to constantly create a packet of memory, label it properly, and insert it into a linked list (or, conversely, to remove a link from a linked list and put it back in the shared pool)and you had to do this while holding a latch exclusively . So for very busy systems, the whole lock/pin issue could become a significant scalability threat.

In 10g, Oracle Corp. introduced the library cache lock latch and the library cache pin latch, which allowed some concurrent activity to be done on different hash buckets covered by the same library cache latch (you could pin a cursor in one bucket while I locked a cursor in another bucket because we wouldn’t both need to hold the same library cache latch at the same time).

However, as we moved through 10g to 11g, the whole KGL lock/KGL pin mechanism was gradually replaced by the mutex mechanism.

MUTEXES

To improve cursor execution and also hard parsing, a new fine-grained memory serialization mechanism has was created in 10gR2. For certain shared-cursor related operations, mutexes (Mutual exclusion objects) were used as a replacement for library cache latches and library cache pins. Basically, it works same way as a latch does, but the code path for operating mutexes is shorter and hence it is lightweight and often directly hardware-supported. Thus, using mutexes is faster, uses less CPU and also allows significantly improved concurrency over the latch mechanism. On 32bit linux installation, regular latch structure is 110 bytes. whereas a mutex is 28 bytes in size. ALso, mutexes take less instructions. It takes 150-200 instructions to get a latch but a mutex get is about 30-35 instructions.

The size of mutex structure is smaller becauses they give you less information about who is waiting for what and for how long. You have information about sleep times, but not number of requests and misses.

BENEFITS OF MUTEX:

They can be taken in shared or exclusive mode and also getting a mutex can de done in wait or no-wait mode.

1. Mutexes have less potential for false contention. Previously multiple hash buckets were protected by the same latch. Hence, two users searching two different buckets protected by the same latch were prone to unnecessary false contention i.e. contention used to be for the protection mechanism (latch) rather than the target object you are attempting to access. Unlike latches, with mutexes it is possible to create a mutex for each structure protected. This means false contention is much less likely because each structure can be protected by its own mutex. Now each parent and child cursor has its own mutex, and we do not have to contend for a latch that’s co-used by many cursors.

2. Replace latches and pins : Mutexes have a dual nature; they can act as a serialization mechanism (as a latch) and also as a pin (for example, preventing an object from aging out). Whereas a latch can’t be obtained by multiple sessions, a mutex can be eferenced by many sessions, providing that all sessions reference the mutex in shared mode. The total no. of sessions referencing a mutex in shared (S) mode is called reference count (ref count). The ref count for a mutex is stored in the mutex itself. A mutex can also be held in exclusive(X) mode by one session only.

The ref count of a mutex is a replacement for a library cache pin. In case of latches, whenever a session executed a statement, it first created and then deleted a library cache pin but with mutexes each session increments and decrements the ref count ( so the ref countreplaces n distinct pins). It means that an object can’t be aged out until its refcount drops to zero i.e. no user is currently accessing the object.

3. The mutex structure is located in each child cursor handle and the mutex itself acts as cursor pin structure. To change the cursor pin status, earlier we needed to get the library cache latch but now we can modify the cursor’s mutex refcount directly.

As per Oracle Guru Tanel Podar :But the main scalability benefit comes from that there’s a mutex structure in each child cursor handle and the mutex itself acts as cursor pin structure. So if you have a cursor open (or cached in session cursor cache) you don’t need to get the library cache latch (which was previously needed for changing cursor pin status), but you can modify the cursor’s mutex refcount directly (with help of pointers in open cursor state area in sessions UGA).

Therefore you have much higher scalability when pinning/unpinning cursors (no library cache latching needed, virtually no false contention) and no separate pin structures need to be allocated/maintained.

Few notes:
1) library cache latching is still needed for parsing etc, the mutexes address only the pinning issue in library cache
2) mutexes are currently used for library cache cursors (not other objects like PL/SQL stored procs, table defs etc)
3) As mutexes are a generic mechanism (not library cache specific) they’re used in V$SQLSTATS underlying structures too
4) When mutexes are enabled, you won’t see cursor pins from X$KGLPN anymore (as X$KGLPN is a fixed table based on the KGL pin array – which wouldn’t be used for cursors anymore)

Note that latches and mutexes are independent mechanisms i.e. a process can hold a latch and a mutex at thye same time.In Oracle 10.2.0.2+ the library cache pin latch usage was replaced with mutexes whenever _kks_use_mutex_pin was true, also few other things like V$SQLSTATS arrays and parent cursor examination were protected by mutexes. However(the right child cursor lookup using kksfbc()) was still protected by library cache latches which could become a problem with frequent soft parsing combined with too little cursor cache and long library cache hash chains (remember, the library cache latches were always taken exclusively even for plain hash chain scanning).

In 10g you see there are 3 types of mutexes used:

Cursor Stat

Cursor Parent

Cursor Pin

So 11.2 got rid of all the little packets of memory and the linked lists — in fact, it even got rid of the latches used to protect the hash chains, and replaced them all with mutexes. Each hash bucket is protected by a mutex. If you have mutexes on hash buckets, you have one micro-latch per bucket instead of a maximum of 67 latches covering 131,072 BUCKETS. If you have a mutex on each library cache object to represent the KGL locks and another to represent the KGL pins, then you don’t have to do all that memory allocation and deallocation and don’t have to run code to connect things to linked lists. So, starting from 11g, each library cache bucket is protected by a separate mutex (yes all 131072 of them!).

In 11g there are couple additional mutexes, one (and most important) of them is Library Cache mutex.

In 11g all library cache related latches except “library cache load lock” are gone and corresponding operations are protected by mutexes instead. The “library cache” latches have been replaced by “Library Cache” mutexes.

The following latches which were present in 10g are no longer seen in 11g:

library cache pin allocation

library cache lock allocation

library cache hash chains

library cache lock

library cache

library cache pin

The only library cache latch present in 11g is :

library cache load lock

The following wait events related to library cache which were present in 10g but are no longer seen in 11g:

latch: library cache

latch: library cache lock

latch: library cache pin

The wait events related to library cache present in 11g are:

library cache pin

library cache lock

library cache load lock

library cache: mutex X

library cache: mutex S

OSD IPC library

library cache revalidation

library cache shutdown

Of course the library cache mutexes still don’t solve all problems in the world (especially the ones related to excessive hard parsing!), for example there’s still a chance of hash collision of two entirely different cursors. Also, if there are many child cursors under a parent and the application cursor management is poor (e.g. cursors are closed after every execution and no session cursor caching is done) then you could still have contention on the mutex due to continuous library cache hash chain traversing.

A demonstration of library cache locks/pins can be seen here.

In my next article in this series on shared pool, I will give a quick overview of shared pool tuning.
I hope this information was useful. Your comments and suggestions are always welcome.

References:

Oracle Core: Essential Internals for DBAs and Developers by Jonathan Lewis

http://orainternals.wordpress.com/2009/06/02/library-cache-lock-and-library-cache-pin-waits/

http://boundarybusters.com/Rich/RICH_DB_WEB/EnqueueWaitsLocks/Latch_Lock_And_Mutex_Contention_Troubleshooting.pdf

http://www.pythian.com/blog/locks-latches-mutexes-and-cpu-usage/#!

http://www.slideshare.net/hkchital/oracle-diagnostics-latches-and-enqueues

http://avdeo.com/2015/11/26/latches-what-do-we-know/

http://avdeo.com/2015/12/03/mutex-what-do-we-know/

Oracle Mutexes by David Fitzjarrell

Oracle Latches by David Fitzjarrell

———————————————————————————————————-

Related links:

Home

Database Index

Tuning Index

Cursor Sharing Demystified
Library Cache Lock And Pin Demonstrated
Parent And Child Cursors In Oracle
Quick Overview Of Shared Pool Tuning
Shared Pool Architecture

Tuning Shared Pool : A Demonstration

———————–

ORACLE IN ACTION

Let's do it simply…

Monthly Archives: January 2013

LET’S EDIT GPNP PROFILE

Like this:

11g R2 RAC : REAPIR ORACLE CLUSTR REGISTRY(OCR)

Like this:

11g R2 RAC: LET’S LOSE GPNP PROFILE

Like this:

11g R2 RAC : INSTANCE FAILOVER IN SERVER POOL

Like this:

11g R2 RAC: SERVER POOLS

———————————————————————————

11gR2 RAC INDEX

Like this:

11g R2 RAC :SCAN DEMYSTIFIED

Like this:

LATCHES, LOCKS, PINS AND MUTEXES

Like this:

Follow ORACLE IN ACTION

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

———————————————————————————

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Follow ORACLE IN ACTION