CACHE FUSION DEMONSTRATED

In this post I will demonstrate cache fusion.Current scenario:
No. of nodes in the cluster : 3
Names of the nodes : host01, host02, host03
Name of the database : racdb
Names of the instances : racdb1, racdb2, racdb3

In this demonstration, I will simulate read/read, read/write and write/write contention and will track the contents of the buffer cache of various instances as well as the Global Resource Directory (GRD).

OVERVIEW

— Get data_object_id for scott.emp
— Get File_id and block_id of emp table
— shutdown the database and restart
— Manually master the scott.emp table to node1

— Retrieve the data from disk on node3 (Physical read)

Read/Read contention:
– Read the block from cache on node2 (Cache read)

Read/Write contention
– Update a row in the block from node1

Write/write contention
– Update another row in the same block from node3

– Trigger checkpoint

IMPLEMENTATION

– Get data_object_id for scott.emp

SYS@NODE1> col owner for a10
col data_object_id for 9999999
col object_name for a15
select owner, data_object_id, object_name
from dba_objects
where owner = 'SCOTT'
and object_name = 'EMP';

OWNER DATA_OBJECT_ID OBJECT_NAME
———- ————– —————
SCOTT 73181 EMP

– Get File_id and block_id of emp table

SYS@NODE1> select empno, dbms_rowid.rowid_relative_fno(rowid) fileno,
dbms_rowid.rowid_block_number(rowid) block_no
from scott.emp
where empno in (7788, 7369);

EMPNO FILENO BLOCK_NO
———- ———- ———-
7369 4 151
7788 4 151

– Following queries will be used repeatedly during this demonstration –

– get_buffer_stat.sql
— The view x$bh contains the status of the block in the buffer cache.
— We can get the status of any cached buffer in the buffer cache from this view.
— The object-id for EMP table is used to restrict output only for blocks of EMP table.
— To get status of the block that contains the rows of empno = 7788 and 7369, dbablk = block value retrieved above is used. Class=1 in predicate is used to get the details about the data blocks of the emp table.

col object_name for a6
select (select object_name from dba_objects where object_id = b.obj) as
object_name,
decode (state,0, 'Free', 1, 'XCUR', 2, 'SCUR', 3, 'CR', 4,
'BEING READ',
5, 'MREC', 6, 'IREC', 7, 'WRITE_CLONE', 8, 'PI')
state,
mode_held, le_addr, cr_scn_bas, cr_scn_wrp
from x$bh b
where obj = 73181
and dbablk = 151
and class = 1;

– get_resource_name.sql
— Find the GCS resource name to be used in the query
— x$kjbl.kjblname = resource name in hexadecimal format([id1],[id2],[type]
— x$kjbl.kjblname2 = resource name in decimal format
— Hexname will be used to query resource in V$gc_element and v$dlm_ress views

col hexname for a25
col resource_name for a15
select b.kjblname hexname, b.kjblname2 resource_name,
b.kjblgrant, b.kjblrole, b.kjblrequest

from x$le a, x$kjbl b
where a.le_kjbl=b.kjbllockp
and a.le_addr = (     select le_addr
from x$bh
where dbablk = 151
and obj    = 73181
and class  = 1
and state   <> 3);

~
— get_resource_stat.sql
— Monitor the resources allocated to the data block from EMP table
— In the last line of SQL statement, substitute the literal by the value returned by above query

col resource_name for a22
col state for a8
col mast for 9999
col grnt for 9999
col cnvt for 9999

select a.resource_name, b.state, a.master_node mast, a.on_convert_q cnvt,
a.on_grant_q grnt, b.request_level, b.grant_level, b.owner_node
from   v$dlm_ress a, v$ges_enqueue b
where upper(a.resource_name) = upper(b.resource_name1)
and a.resource_name like '%%';

— IMPLEMENTATION ——

– Find the GCS resource name to be used in the query

SYS@NODE1>@get_resource_name

HEXNAME                   RESOURCE_NAME   KJBLGRANT   KJBLROLE KJBLREQUE
------------------------- --------------- --------- ---------- ---------
[0x97][0x4],[BL]          151,4,BL        KJUSERPR           0 KJUSERNL

– In the script get_resource_stat.sql, in the last line of SQL statement, substitute the literal by the hexname returned by above query i.e.

col resource_name for a22
col state for a8
col mast for 9999
col grnt for 9999
col cnvt for 9999

select a.resource_name, b.state, a.master_node mast, a.on_convert_q cnvt,                        a.on_grant_q grnt, b.request_level, b.grant_level, b.owner_node
from   v$dlm_ress a, v$ges_enqueue b
where upper(a.resource_name) = upper(b.resource_name1)
and a.resource_name like '%[0x97][0x4],[BL]%';

– NODE1 – shutdown the database and restart

[oracle@host01 ~]$ srvctl stop database -d racdb
srvctl start database -d racdb
srvctl status database -d racdb

– ON NODE1, NODE2, NODE3

– Run queries on all the nodes to look at the status of locks. The buffer cache does not contain
any block belonging to EMP table. Hence, no resources are needed to keep track of them. The table EMP is mastered by NODE1 and every request will be routed thru NODE1.

SYS>@get_buffer_stat

no rows selected

SYS>@get_resource_name

no rows selected

SYS>@get_resource_stat

no rows selected

——————BEGIN READ FROM DISK ———————

– Retrieve the data from disk on node3 (Physical read)

– Query the data from emp on node3 to load a buffer from disk to buffer cache of node3.
Instance 1 is the master of the resource.

SCOTT@NODE3> select empno, ename, sal
from emp
where  empno = 7788;

EMPNO ENAME             SAL
---------- ---------- ----------
7788 SCOTT            3000<

– Manually master the EMP table to node1 –

SYS@NODE1>oradebug lkdebug -m pkey
SYS@NODE1>oradebug lkdebug -m pkey 73181

– Query x$kjbr.kjbrmaster to find master node for a resource.
Note that node1 is master of the block (KJBRMASTER = 0, node numbering starts with 0)

SYS@NODE1>select kjbrname, kjbrmaster
                            from x$kjbr

where kjbrname like  '%[0x97][0x4],[BL]%';

KJBRNAME                       KJBRMASTER
------------------------------ ----------
[0x97][0x4],[BL]                        0

Monitor the resource movements between master and holder instances.

— Check that the block is in current shared mode (state = SCUR) on NODE3 from where
read was issued

SYS@NODE3>@get_buffer_stat

OBJECT_NAM STATE     MODE_HELD LE_ADDR  CR_SCN_BAS CR_SCN_WRP
---------- -------- ---------- -------- ---------- ----------
EMP       SCUR              0 3A3EAA60          0          0

— Check that a protected read lock (shared read) as indicated by KJUSERPR has been granted on this resource

SYS@NODE3>@get_resource_name

HEXNAME                   RESOURCE_NAME   KJBLGRANT   KJBLROLE KJBLREQUE
------------------------- --------------- --------- ---------- ---------
[0x97][0x4],[BL]          151,4,BL        KJUSERPR           0 KJUSERNL

— Issue following queries on master node NODE1 (Master) —

— check that master node is node1 (node numbering starts with 0) and
resource is granted on a protected read level (grant_level = KJUSERPR)

SYS@NODE1>@get_resource_stat

RESOURCE_NAME          STATE     MAST  CNVT  GRNT REQUEST_L GRANT_LEV OWNER_NODE
---------------------- -------- ----- ----- ----- --------- -------------------
[0x97][0x4],[BL]       GRANTED      0     0     1 KJUSERNL  KJUSERPR     2

—————— END READ FROM DISK ———————

—————— BEGIN READ THE BLOCK FROM CACHE —————-
READ/READ CONTENTION

– Run the same query from node2. The data is already cached in node3 and we should get the block from instance on node3. The on disk version of the block is same as that on disk as the block has not been modified yet.

SCOTT@NODE2> select empno, ename, sal
from emp
where empno = 7788;

EMPNO ENAME             SAL
---------- ---------- ----------
7788 SCOTT            3000

– Check that the block is held in current shared mode (state = SCUR)

SYS@NODE2>@get_buffer_stat

OBJECT STATE     MODE_HELD LE_ADDR  CR_SCN_BAS CR_SCN_WRP
------ -------- ---------- -------- ---------- ----------
EMP   SCUR              0 3A7E4C34          0          0

– Check that the resource has been granted in protected read mode to node2 and node3

SYS@NODE2>@get_resource_name

HEXNAME                   RESOURCE_NAME   KJBLGRANT   KJBLROLE KJBLREQUE
------------------------- --------------- --------- ---------- ---------
[0x97][0x4],[BL]          151,4,BL       KJUSERPR           0 KJUSERNL

SYS@NODE3>@get_resource_name

HEXNAME                   RESOURCE_NAME   KJBLGRANT   KJBLROLE KJBLREQUE
------------------------- --------------- --------- ---------- ---------
[0x97][0x4],[BL]          151,4,BL        KJUSERPR           0 KJUSERNL

-- Issue following queries on master node NODE1 (Master) and check that
on nodes node2 and node3 (node numbering starts with 0) the
resource is granted a protected read level (grant_level = KJUSERPR)

SYS@NODE1>@get_resource_stat

RESOURCE_NAME          STATE     MAST  CNVT  GRNT REQUEST_L GRANT_LEV OWNER_NODE
---------------------- -------- ----- ----- ----- --------- --------- ----------
[0x97][0x4],[BL]       GRANTED      0     0     1 KJUSERNL KJUSERPR     2
[0x97][0x4],[BL]       GRANTED      0     0     1 KJUSERNL  KJUSERPR     1

——- END READ BLOCK FROM CACHE ———————-

– —- START UPDATE THE BLOCK IN INSTANCE 1 ——————
READ/WRITE CONTENTION

SCOTT@NODE1>update emp set sal = sal where empno = 7788;
commit;

-- Check that the block is held in exclusive  mode (state = XCUR) on
the node where update command was issued
SYS@NODE1>@get_buffer_stat

OBJECT STATE     MODE_HELD LE_ADDR  CR_SCN_BAS CR_SCN_WRP
------ -------- ---------- -------- ---------- ----------
EMP    XCUR              0 3A3F1EC0          0          0

– Check that on node2 and node3, now we have CR copies (STATE = CR)
of the block with same SCN# = 955092

SYS@NODE2>@get_buffer_stat

OBJECT STATE     MODE_HELD LE_ADDR  CR_SCN_BAS CR_SCN_WRP
------ -------- ---------- -------- ---------- ----------
EMP    CR                0 00           955092          0

SYS@NODE3>@get_buffer_stat

OBJECT STATE     MODE_HELD LE_ADDR  CR_SCN_BAS CR_SCN_WRP
------ -------- ---------- -------- ---------- ----------
EMP    CR                0 00           955092          0

– Check that the resource has been granted exclusive access (KJBLGRNT = KJUSEREX) on node1 from where update was issued

SYS@NODE1>@get_resource_name

HEXNAME                   RESOURCE_NAME   KJBLGRANT   KJBLROLE KJBLREQUE
------------------------- --------------- --------- ---------- ---------
[0x97][0x4],[BL]          151,4,BL        KJUSEREX           0 KJUSERNL

– Note that node 2 and 3 no longer have any lock on the resource. The block is present in their buffer cache in CR mode and can be used locally by the instance for query.However, the block cannot be used for update and cannot be served to other instances, so it does not require any locks.

SYS@NODE2>@get_resource_name
no rows selected

SYS@NODE3>@get_resource_name
no rows selected

– Issue following queries on master node NODE1 (Master) to get GRD information—
— Note that now the resource has been granted to node1 (OWNER_NODE=0)
in exclusive mode (GRANT_LEVEL=KJUSEREX)
— Only node1 has the most recent block in the buffer cache and will serve the
block in case any instance requests it.

SYS@NODE1>@get_resource_stat

RESOURCE_NAME          STATE     MAST  CNVT  GRNT REQUEST_L GRANT_LEV  OWNER_NODE
---------------------- -------- ----- ----- ----- --------- ---------   ----------
[0x97][0x4],[BL]       GRANTED      0     0     1 KJUSERNL  KJUSEREX        0

i

-——- END UPDATE THE BLOCK IN INSTANCE 1 ——

– START UPDATE ANOTHER ROW IN THE SAME BLOCK ON NODE3 –
WRITE/WRITE CONTENTION

SCOTT@NODE3>update emp set sal=sal where empno = 7369;
commit;

– Check that there are two copies of the same block on node3 where update command was issued
– CR copy at SCN# = 955092 which was cached earlier
– Current block held in xclusive mode (state = XCUR) on

SYS@NODE3>@get_buffer_stat

OBJECT STATE     MODE_HELD LE_ADDR  CR_SCN_BAS CR_SCN_WRP
------ -------- ---------- -------- ---------- ----------
EMP    XCUR              0 3A3EAA60          0          0
EMP    CR                0 00           955092                 0

– Check that on node1, we have Past image(State = PI) of the block since earlier update was issued on node1
— On node2, now we have CR copies (STATE = CR) with SCN#= 955092 of the block with different SCN#

SYS@NODE1>@get_buffer_stat

OBJECT STATE     MODE_HELD LE_ADDR  CR_SCN_BAS CR_SCN_WRP
------ -------- ---------- -------- ---------- ----------
EMP    PI                0 3A3EFA00          0          0

SYS@NODE2>@get_buffer_stat

OBJECT STATE        MODE_HELD LE_ADDR  CR_SCN_BAS CR_SCN_WRP
------ ----------- ---------- -------- ---------- ----------
EMP    CR                   0 00           955092             0

— Check that the resource has been granted exclusive access (KJBLGRNT = KJUSEREX) on node1 from where update was issued

SYS@NODE3>@get_resource_name

HEXNAME                   RESOURCE_NAME   KJBLGRANT   KJBLROLE KJBLREQUE
------------------------- --------------- --------- ---------- ---------
[0x97][0x4],[BL]          151,4,BL        KJUSEREX          64 KJUSERNL

— Note that node 2 no longer has any lock on the resource. The block is present in their buffer cache in CR mode and can be used locally by the instance for query.However, the block cannot be used for update and cannot be served to other instances, so it does not require any locks.

SYS@NODE2>@get_resource_name
no rows selected

— Note that on node1 where earlier update was done and where PI is present , lock has been downgraded to null (KJBLGRANT = KJUSERNL)

— Master keeps track of the PI images across the buffer cache. All these
instances will be asked to discard the PI copies once the current block is
written to disk.

SYS@NODE1>@get_resource_name

HEXNAME                   RESOURCE_NAME   KJBLGRANT   KJBLROLE KJBLREQUE
------------------------- --------------- --------- ---------- ---------
[0x97][0x4],[BL]          151,4,BL        KJUSERNL         192 KJUSERNL

— Issue following queries on master node NODE1 (Master) to get GRD information
– Node2 holds the block in exclusive mode (GRANT_LEVEL=KJUSEREX)
— node1 which has the PI, its lock level has been downgraded to null
— Master keeps track of the PI images across the buffer cache. All these
instances will be asked to discard the PI copies once the current block is
written to disk.

SYS@NODE1>@get_resource_stat
RESOURCE_NAME          STATE     MAST  CNVT  GRNT REQUEST_L GRANT_LEV OWNER_NODE
---------------------- -------- ----- ----- ----- --------- --------- ----------
[0x97][0x4],[BL]       GRANTED      0     0     1 KJUSERNL  KJUSERNL     0
[0x97][0x4],[BL]       GRANTED      0     0     1 KJUSERNL  KJUSEREX     2

-—- END UPDATE ANOTHER ROW IN THE SAME BLOCK ON NODE3 –

—— DISK WRITES DUE TO CHECKPOINT ——————-
– Trigger a global checkpoint –

SYS@NODE1> alter system checkpoint global;

— The PI on node1 is discarded and the buffer is changed to CR mode.

SYS@NODE1>@get_buffer_stat

OBJECT STATE     MODE_HELD LE_ADDR  CR_SCN_BAS CR_SCN_WRP
------ -------- ---------- -------- ---------- ----------
EMP    CR                0 00           997470          0

— CR block on node2 stays as it is —

SYS@NODE2>@get_buffer_stat

OBJECT STATE     MODE_HELD LE_ADDR  CR_SCN_BAS CR_SCN_WRP
------ -------- ---------- -------- ---------- ----------
EMP    CR                0 00           997343          0

— The checkpoint request from an instance does not change any lock status in the node holding the current block .

SYS@NODE3>@get_buffer_stat

OBJECT STATE     MODE_HELD LE_ADDR  CR_SCN_BAS CR_SCN_WRP
------ -------- ---------- -------- ---------- ----------
EMP    XCUR              0 387E8354          0          0
EMP    CR                0 00           997343          0

— PI from node1 has been discarded after checkpoint and its status has been changed to CR.
— Note that node 2 and node1 no longer have any lock on the resource.The block is present in their buffer cache in CR mode and can be used locally by the instance for query.However, the block cannot be used for update and cannot be served to other instances, so it does not require any locks.

SYS@NODE1>@get_resource_name
no rows selected

SYS@NODE2>@get_resource_name
no rows selected

— The checkpoint request does not change the lock status in the node which is holding the current block.

SYS@NODE3>@get_resource_name
HEXNAME                   RESOURCE_NAME   KJBLGRANT   KJBLROLE KJBLREQUE
------------------------- --------------- --------- ---------- ---------
[0x97][0x4],[BL]          151,4,BL        KJUSEREX           0 KJUSERNL

Conclusion:

– When a block is read from the disk as a result of select statement(for read only), – the block is held in current shared mode (state = SCUR) on the node from where read was issued and a protected read lock (shared read) is granted on this resource. Role of the block is local. (SL0)

– When the same block is requested for read purpose from another node( read/read contention), the cached from earlier node is sent to the requesting node. . The on disk version of the block is same as that on disk as the block has not been modified yet. The requesting node holds the block in current shared mode (state = SCUR). The block is granted in protected read mode to both the nodes. The role of the block is local in both the nodes. (SL0)

– When the block is requested for update operation form another node (Read/write contention), its mode on earlier both the nodes is downgraded to null so that those two nodes can use their copy for local read on that instance only and can’t send that copy to another node (Null, L,0). The node where update operation is performed holds the block in exclusive mode and local role since another copy of the block is not available in any other instance. (XL0).

– When the block is requested from another node for update, its PI is kept on the previous node with the state degraded to null and role to global (Null, G,1). The requesting node holds the block in exclusive, global mode (XG0).

– On checkpoint, PI of the block is discarded on the earlier node and its state changes to CR.

References:
Oracle Db 10G Real App.Clusters By Gopalakrishnan
http://www.toadworld.com/platforms/oracle/w/wiki/663.cache-fusion-otnn48.aspx

—————————————————————————————————

Related links:

Home

11G R2 RAC Index

11g R2 RAC : Dynamic Remastering
11g R2 RAC: How To Identify The Master Node In RAC
11g R2 RAC: How To Find The Resource Master?

 

———————

3 thoughts on “CACHE FUSION DEMONSTRATED

Leave a Reply to prahlad Cancel reply