In this post I will demonstrate cache fusion.Current scenario:
No. of nodes in the cluster : 3
Names of the nodes : host01, host02, host03
Name of the database : racdb
Names of the instances : racdb1, racdb2, racdb3
In this demonstration, I will simulate read/read, read/write and write/write contention and will track the contents of the buffer cache of various instances as well as the Global Resource Directory (GRD).
OVERVIEW
— Get data_object_id for scott.emp
— Get File_id and block_id of emp table
— shutdown the database and restart
— Manually master the scott.emp table to node1
— Retrieve the data from disk on node3 (Physical read)
Read/Read contention:
– Read the block from cache on node2 (Cache read)
Read/Write contention
– Update a row in the block from node1
Write/write contention
– Update another row in the same block from node3
– Trigger checkpoint
IMPLEMENTATION
– Get data_object_id for scott.emp
SYS@NODE1> col owner for a10 col data_object_id for 9999999 col object_name for a15 select owner, data_object_id, object_name from dba_objects where owner = 'SCOTT' and object_name = 'EMP';
OWNER DATA_OBJECT_ID OBJECT_NAME
———- ————– —————
SCOTT 73181 EMP
– Get File_id and block_id of emp table
SYS@NODE1> select empno, dbms_rowid.rowid_relative_fno(rowid) fileno, dbms_rowid.rowid_block_number(rowid) block_no from scott.emp where empno in (7788, 7369);
EMPNO FILENO BLOCK_NO
———- ———- ———-
7369 4 151
7788 4 151
– Following queries will be used repeatedly during this demonstration –
– get_buffer_stat.sql
— The view x$bh contains the status of the block in the buffer cache.
— We can get the status of any cached buffer in the buffer cache from this view.
— The object-id for EMP table is used to restrict output only for blocks of EMP table.
— To get status of the block that contains the rows of empno = 7788 and 7369, dbablk = block value retrieved above is used. Class=1 in predicate is used to get the details about the data blocks of the emp table.
col object_name for a6 select (select object_name from dba_objects where object_id = b.obj) as object_name, decode (state,0, 'Free', 1, 'XCUR', 2, 'SCUR', 3, 'CR', 4, 'BEING READ', 5, 'MREC', 6, 'IREC', 7, 'WRITE_CLONE', 8, 'PI') state, mode_held, le_addr, cr_scn_bas, cr_scn_wrp from x$bh b where obj = 73181 and dbablk = 151 and class = 1;
– get_resource_name.sql
— Find the GCS resource name to be used in the query
— x$kjbl.kjblname = resource name in hexadecimal format([id1],[id2],[type]
— x$kjbl.kjblname2 = resource name in decimal format
— Hexname will be used to query resource in V$gc_element and v$dlm_ress views
col hexname for a25 col resource_name for a15 select b.kjblname hexname, b.kjblname2 resource_name, b.kjblgrant, b.kjblrole, b.kjblrequest from x$le a, x$kjbl b where a.le_kjbl=b.kjbllockp and a.le_addr = ( select le_addr from x$bh where dbablk = 151 and obj = 73181 and class = 1 and state <> 3);
~
— get_resource_stat.sql
— Monitor the resources allocated to the data block from EMP table
— In the last line of SQL statement, substitute the literal by the value returned by above query
col resource_name for a22 col state for a8 col mast for 9999 col grnt for 9999 col cnvt for 9999 select a.resource_name, b.state, a.master_node mast, a.on_convert_q cnvt, a.on_grant_q grnt, b.request_level, b.grant_level, b.owner_node from v$dlm_ress a, v$ges_enqueue b where upper(a.resource_name) = upper(b.resource_name1) and a.resource_name like '%%';
— IMPLEMENTATION ——
– Find the GCS resource name to be used in the query
SYS@NODE1>@get_resource_name HEXNAME RESOURCE_NAME KJBLGRANT KJBLROLE KJBLREQUE ------------------------- --------------- --------- ---------- --------- [0x97][0x4],[BL] 151,4,BL KJUSERPR 0 KJUSERNL
– In the script get_resource_stat.sql, in the last line of SQL statement, substitute the literal by the hexname returned by above query i.e.
col resource_name for a22 col state for a8 col mast for 9999 col grnt for 9999 col cnvt for 9999 select a.resource_name, b.state, a.master_node mast, a.on_convert_q cnvt, a.on_grant_q grnt, b.request_level, b.grant_level, b.owner_node from v$dlm_ress a, v$ges_enqueue b where upper(a.resource_name) = upper(b.resource_name1) and a.resource_name like '%[0x97][0x4],[BL]%';
– NODE1 – shutdown the database and restart
[oracle@host01 ~]$ srvctl stop database -d racdb srvctl start database -d racdb srvctl status database -d racdb
– ON NODE1, NODE2, NODE3
– Run queries on all the nodes to look at the status of locks. The buffer cache does not contain
any block belonging to EMP table. Hence, no resources are needed to keep track of them. The table EMP is mastered by NODE1 and every request will be routed thru NODE1.
SYS>@get_buffer_stat no rows selected SYS>@get_resource_name no rows selected SYS>@get_resource_stat no rows selected
——————BEGIN READ FROM DISK ———————
– Retrieve the data from disk on node3 (Physical read)
– Query the data from emp on node3 to load a buffer from disk to buffer cache of node3.
Instance 1 is the master of the resource.
SCOTT@NODE3> select empno, ename, sal from emp where empno = 7788; EMPNO ENAME SAL ---------- ---------- ---------- 7788 SCOTT 3000<
– Manually master the EMP table to node1 –
SYS@NODE1>oradebug lkdebug -m pkey SYS@NODE1>oradebug lkdebug -m pkey 73181
– Query x$kjbr.kjbrmaster to find master node for a resource.
Note that node1 is master of the block (KJBRMASTER = 0, node numbering starts with 0)
SYS@NODE1>select kjbrname, kjbrmaster from x$kjbr where kjbrname like '%[0x97][0x4],[BL]%'; KJBRNAME KJBRMASTER ------------------------------ ---------- [0x97][0x4],[BL] 0
Monitor the resource movements between master and holder instances.
— Check that the block is in current shared mode (state = SCUR) on NODE3 from where
read was issued
SYS@NODE3>@get_buffer_stat OBJECT_NAM STATE MODE_HELD LE_ADDR CR_SCN_BAS CR_SCN_WRP ---------- -------- ---------- -------- ---------- ---------- EMP SCUR 0 3A3EAA60 0 0
— Check that a protected read lock (shared read) as indicated by KJUSERPR has been granted on this resource
SYS@NODE3>@get_resource_name HEXNAME RESOURCE_NAME KJBLGRANT KJBLROLE KJBLREQUE ------------------------- --------------- --------- ---------- --------- [0x97][0x4],[BL] 151,4,BL KJUSERPR 0 KJUSERNL
— Issue following queries on master node NODE1 (Master) —
— check that master node is node1 (node numbering starts with 0) and
resource is granted on a protected read level (grant_level = KJUSERPR)
SYS@NODE1>@get_resource_stat RESOURCE_NAME STATE MAST CNVT GRNT REQUEST_L GRANT_LEV OWNER_NODE ---------------------- -------- ----- ----- ----- --------- ------------------- [0x97][0x4],[BL] GRANTED 0 0 1 KJUSERNL KJUSERPR 2
—————— END READ FROM DISK ———————
—————— BEGIN READ THE BLOCK FROM CACHE —————-
READ/READ CONTENTION
– Run the same query from node2. The data is already cached in node3 and we should get the block from instance on node3. The on disk version of the block is same as that on disk as the block has not been modified yet.
SCOTT@NODE2> select empno, ename, sal from emp where empno = 7788; EMPNO ENAME SAL ---------- ---------- ---------- 7788 SCOTT 3000
– Check that the block is held in current shared mode (state = SCUR)
SYS@NODE2>@get_buffer_stat OBJECT STATE MODE_HELD LE_ADDR CR_SCN_BAS CR_SCN_WRP ------ -------- ---------- -------- ---------- ---------- EMP SCUR 0 3A7E4C34 0 0
– Check that the resource has been granted in protected read mode to node2 and node3
SYS@NODE2>@get_resource_name HEXNAME RESOURCE_NAME KJBLGRANT KJBLROLE KJBLREQUE ------------------------- --------------- --------- ---------- --------- [0x97][0x4],[BL] 151,4,BL KJUSERPR 0 KJUSERNL SYS@NODE3>@get_resource_name HEXNAME RESOURCE_NAME KJBLGRANT KJBLROLE KJBLREQUE ------------------------- --------------- --------- ---------- --------- [0x97][0x4],[BL] 151,4,BL KJUSERPR 0 KJUSERNL
-- Issue following queries on master node NODE1 (Master) and check that
on nodes node2 and node3 (node numbering starts with 0) the
resource is granted a protected read level (grant_level = KJUSERPR)
SYS@NODE1>@get_resource_stat RESOURCE_NAME STATE MAST CNVT GRNT REQUEST_L GRANT_LEV OWNER_NODE ---------------------- -------- ----- ----- ----- --------- --------- ---------- [0x97][0x4],[BL] GRANTED 0 0 1 KJUSERNL KJUSERPR 2 [0x97][0x4],[BL] GRANTED 0 0 1 KJUSERNL KJUSERPR 1
——- END READ BLOCK FROM CACHE ———————-
– —- START UPDATE THE BLOCK IN INSTANCE 1 ——————
READ/WRITE CONTENTION
SCOTT@NODE1>update emp set sal = sal where empno = 7788; commit; -- Check that the block is held in exclusive mode (state = XCUR) on the node where update command was issued SYS@NODE1>@get_buffer_stat OBJECT STATE MODE_HELD LE_ADDR CR_SCN_BAS CR_SCN_WRP ------ -------- ---------- -------- ---------- ---------- EMP XCUR 0 3A3F1EC0 0 0
– Check that on node2 and node3, now we have CR copies (STATE = CR)
of the block with same SCN# = 955092
SYS@NODE2>@get_buffer_stat OBJECT STATE MODE_HELD LE_ADDR CR_SCN_BAS CR_SCN_WRP ------ -------- ---------- -------- ---------- ---------- EMP CR 0 00 955092 0 SYS@NODE3>@get_buffer_stat OBJECT STATE MODE_HELD LE_ADDR CR_SCN_BAS CR_SCN_WRP ------ -------- ---------- -------- ---------- ---------- EMP CR 0 00 955092 0
– Check that the resource has been granted exclusive access (KJBLGRNT = KJUSEREX) on node1 from where update was issued
SYS@NODE1>@get_resource_name HEXNAME RESOURCE_NAME KJBLGRANT KJBLROLE KJBLREQUE ------------------------- --------------- --------- ---------- --------- [0x97][0x4],[BL] 151,4,BL KJUSEREX 0 KJUSERNL
– Note that node 2 and 3 no longer have any lock on the resource. The block is present in their buffer cache in CR mode and can be used locally by the instance for query.However, the block cannot be used for update and cannot be served to other instances, so it does not require any locks.
SYS@NODE2>@get_resource_name no rows selected SYS@NODE3>@get_resource_name no rows selected
– Issue following queries on master node NODE1 (Master) to get GRD information—
— Note that now the resource has been granted to node1 (OWNER_NODE=0)
in exclusive mode (GRANT_LEVEL=KJUSEREX)
— Only node1 has the most recent block in the buffer cache and will serve the
block in case any instance requests it.
SYS@NODE1>@get_resource_stat RESOURCE_NAME STATE MAST CNVT GRNT REQUEST_L GRANT_LEV OWNER_NODE ---------------------- -------- ----- ----- ----- --------- --------- ---------- [0x97][0x4],[BL] GRANTED 0 0 1 KJUSERNL KJUSEREX 0
i
-——- END UPDATE THE BLOCK IN INSTANCE 1 ——
– START UPDATE ANOTHER ROW IN THE SAME BLOCK ON NODE3 –
WRITE/WRITE CONTENTION
SCOTT@NODE3>update emp set sal=sal where empno = 7369; commit;
– Check that there are two copies of the same block on node3 where update command was issued
– CR copy at SCN# = 955092 which was cached earlier
– Current block held in xclusive mode (state = XCUR) on
SYS@NODE3>@get_buffer_stat OBJECT STATE MODE_HELD LE_ADDR CR_SCN_BAS CR_SCN_WRP ------ -------- ---------- -------- ---------- ---------- EMP XCUR 0 3A3EAA60 0 0 EMP CR 0 00 955092 0
– Check that on node1, we have Past image(State = PI) of the block since earlier update was issued on node1
— On node2, now we have CR copies (STATE = CR) with SCN#= 955092 of the block with different SCN#
SYS@NODE1>@get_buffer_stat OBJECT STATE MODE_HELD LE_ADDR CR_SCN_BAS CR_SCN_WRP ------ -------- ---------- -------- ---------- ---------- EMP PI 0 3A3EFA00 0 0 SYS@NODE2>@get_buffer_stat OBJECT STATE MODE_HELD LE_ADDR CR_SCN_BAS CR_SCN_WRP ------ ----------- ---------- -------- ---------- ---------- EMP CR 0 00 955092 0
— Check that the resource has been granted exclusive access (KJBLGRNT = KJUSEREX) on node1 from where update was issued
SYS@NODE3>@get_resource_name HEXNAME RESOURCE_NAME KJBLGRANT KJBLROLE KJBLREQUE ------------------------- --------------- --------- ---------- --------- [0x97][0x4],[BL] 151,4,BL KJUSEREX 64 KJUSERNL
— Note that node 2 no longer has any lock on the resource. The block is present in their buffer cache in CR mode and can be used locally by the instance for query.However, the block cannot be used for update and cannot be served to other instances, so it does not require any locks.
SYS@NODE2>@get_resource_name no rows selected
— Note that on node1 where earlier update was done and where PI is present , lock has been downgraded to null (KJBLGRANT = KJUSERNL)
— Master keeps track of the PI images across the buffer cache. All these
instances will be asked to discard the PI copies once the current block is
written to disk.
SYS@NODE1>@get_resource_name HEXNAME RESOURCE_NAME KJBLGRANT KJBLROLE KJBLREQUE ------------------------- --------------- --------- ---------- --------- [0x97][0x4],[BL] 151,4,BL KJUSERNL 192 KJUSERNL
— Issue following queries on master node NODE1 (Master) to get GRD information
– Node2 holds the block in exclusive mode (GRANT_LEVEL=KJUSEREX)
— node1 which has the PI, its lock level has been downgraded to null
— Master keeps track of the PI images across the buffer cache. All these
instances will be asked to discard the PI copies once the current block is
written to disk.
SYS@NODE1>@get_resource_stat RESOURCE_NAME STATE MAST CNVT GRNT REQUEST_L GRANT_LEV OWNER_NODE ---------------------- -------- ----- ----- ----- --------- --------- ---------- [0x97][0x4],[BL] GRANTED 0 0 1 KJUSERNL KJUSERNL 0 [0x97][0x4],[BL] GRANTED 0 0 1 KJUSERNL KJUSEREX 2
-—- END UPDATE ANOTHER ROW IN THE SAME BLOCK ON NODE3 –
—— DISK WRITES DUE TO CHECKPOINT ——————-
– Trigger a global checkpoint –
SYS@NODE1> alter system checkpoint global;
— The PI on node1 is discarded and the buffer is changed to CR mode.
SYS@NODE1>@get_buffer_stat OBJECT STATE MODE_HELD LE_ADDR CR_SCN_BAS CR_SCN_WRP ------ -------- ---------- -------- ---------- ---------- EMP CR 0 00 997470 0
— CR block on node2 stays as it is —
SYS@NODE2>@get_buffer_stat OBJECT STATE MODE_HELD LE_ADDR CR_SCN_BAS CR_SCN_WRP ------ -------- ---------- -------- ---------- ---------- EMP CR 0 00 997343 0
— The checkpoint request from an instance does not change any lock status in the node holding the current block .
SYS@NODE3>@get_buffer_stat OBJECT STATE MODE_HELD LE_ADDR CR_SCN_BAS CR_SCN_WRP ------ -------- ---------- -------- ---------- ---------- EMP XCUR 0 387E8354 0 0 EMP CR 0 00 997343 0
— PI from node1 has been discarded after checkpoint and its status has been changed to CR.
— Note that node 2 and node1 no longer have any lock on the resource.The block is present in their buffer cache in CR mode and can be used locally by the instance for query.However, the block cannot be used for update and cannot be served to other instances, so it does not require any locks.
SYS@NODE1>@get_resource_name no rows selected SYS@NODE2>@get_resource_name no rows selected
— The checkpoint request does not change the lock status in the node which is holding the current block.
SYS@NODE3>@get_resource_name HEXNAME RESOURCE_NAME KJBLGRANT KJBLROLE KJBLREQUE ------------------------- --------------- --------- ---------- --------- [0x97][0x4],[BL] 151,4,BL KJUSEREX 0 KJUSERNL
Conclusion:
– When a block is read from the disk as a result of select statement(for read only), – the block is held in current shared mode (state = SCUR) on the node from where read was issued and a protected read lock (shared read) is granted on this resource. Role of the block is local. (SL0)
– When the same block is requested for read purpose from another node( read/read contention), the cached from earlier node is sent to the requesting node. . The on disk version of the block is same as that on disk as the block has not been modified yet. The requesting node holds the block in current shared mode (state = SCUR). The block is granted in protected read mode to both the nodes. The role of the block is local in both the nodes. (SL0)
– When the block is requested for update operation form another node (Read/write contention), its mode on earlier both the nodes is downgraded to null so that those two nodes can use their copy for local read on that instance only and can’t send that copy to another node (Null, L,0). The node where update operation is performed holds the block in exclusive mode and local role since another copy of the block is not available in any other instance. (XL0).
– When the block is requested from another node for update, its PI is kept on the previous node with the state degraded to null and role to global (Null, G,1). The requesting node holds the block in exclusive, global mode (XG0).
– On checkpoint, PI of the block is discarded on the earlier node and its state changes to CR.
References:
Oracle Db 10G Real App.Clusters By Gopalakrishnan
http://www.toadworld.com/platforms/oracle/w/wiki/663.cache-fusion-otnn48.aspx
—————————————————————————————————
Related links:
11g R2 RAC : Dynamic Remastering
11g R2 RAC: How To Identify The Master Node In RAC
11g R2 RAC: How To Find The Resource Master?
———————
Clear explanation, Thanks for posting
Thanks Prahlad for your time and feedback.
Keep visiting the blog.
Your comments and suggestions are always welcome.
Regards
Anju Garg
Your blogs are very useful for DBA. Yours all blogs have welcome.