INSTANCE RECOVERY IN RAC

In this post, I will discuss how instance recovery takes place in 11g R2 RAC. Instance recovery aims at

- writing all committed changes to the datafiles

- undoing all the uncommitted changes from the datafiles

- Incrementing the checkpoint no. to the SCN till which changes have been written to datafiles.

In a single instance database, before the instance crashes,

- some committed changes are in the redo log files but have not been written to the datafiles

- some uncommitted changes have made their way to datafiles

- some uncommitted changes are in the redo log buffer

After the instance crashes in a single instance database

- all uncommitted changes in the redo log buffer are wiped out

- Online redo log files are read to identify the blocks that need to be recovered

- Identified blocks are read from the datafiles

- During roll forward phase, all the changes (committed/uncommitted) in redo log files are applied to them

- During rollback phase, all uncommitted changes are rolled back after reading undo from undo tablespace.

- CKTP# is incremented in control file/data file headers

In a RAC database there can be two scenarios :

- Only one instance crashes

- Multiple instances crash

We will discuss these cases one by one.

Single instance crash in RAC database

In this case, scenario is quite similar to instance crash in a single instance database. But there is slight difference also.

Let us consider a 3 node setup. We will consider a data block B1 with one column and 4 records in it . The column contains values 100, 200, 300 and 400 in 4 records. Initially the block is on disk . In the following chart, update operations on the block in various nodes and corresponding states of the block are represented. Colour code followed is : CR, PI, XCUR:

SCN# —-Update operation on — ———– State of the block on ————

Node1 Node2 Node3 Node1 Node2 Node3 Disk

1 100->101 – – 101 – – 100

200 – – 200

300 – – 300

400 – – 400

2 – 200->201 101 101 – 100

200 201 – 200

300 300 – 300

400 400 – 400

3 – – 300->301 101 101 101 100

200 201 201 200

300 300 301 300

400 400 400 400

4 CRASH

(Node2)

– – 300->301 101 101 101 100

200 201 201 200

300 300 301 300

400 400 400 400

It is assumed that no incremental checkpointing has taken place on any of the nodes in the meanwhile.

Before crash status of block on various nodes is as follows:

- PI at SCN# 2 on Node1

- PI at SCN# 3 on Node2

- XCUR on Node3

Redo logs at various nodes are

Node1 : B1: 100 -> 101, SCN# 1

Node2 : B1:200 -> 201, SCN# 2

Node3 : B1:300 -> 301, SCN# 3

After the crash,

- Redo logs of crashed node (Node2) is analyzed and it is identified that block B1 needs to be recovered.

- It is also identified that role of the block is global as its different versions are available in Node1 and Node3

- It is identified that there is a PI on node1 whose SCN# (2) is earlier than the SCN# of crash (4)

- Changes from redo logs of Node2 are applied to the PI on Node1 and the block is written to disk

- Checkpoint # of node1 is incremented.

- a BWR is placed in redo log of Node1 to indicate that the block has been written to disk and need not be recovered in case Node1

Here it can be readily seen that there are certain differences from the instance recovery in single instance database.

The Role of the block is checked.

If the role is local, then the block will be read from the disk and changes from redo logs of Node2 will be applied i.e. just like single instance database

If the role is global,

It is checked if PI of the block at a SCN# earlier than the SCN# of crash is available

If PI is available, then changes in redo logs of node2 are applied to the PI ,instead of reading the block from the disk,

If PI is not available (has been flushed to disk due to incremental checkpointing

on the owner node of PI or

on any of the nodes at a SCN# > PI holder)

the block will be read from the disk and changes from redo logs of Node2 will be applied just like it used to happen in OPS.

Hence, it can be inferred that PI, if available, speeds up the instance recovery as need to read the block from disk is eliminated. If PI is

not available, block is read from the disk just like in OPS.

Multiple instance crash in RAC database

Let us consider a 4 node setup. We will consider a data block B1 with one column and 4 records in it

. The column contains values 100, 200, 300 and 400 in 4 records. Initially the block is on disk . It can be represented as:

SCN# —- Update operation on —– ————– State of the block on ————–

Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4 Disk

1 100->101 – – – 101 – – – 100

200 – – – 200

300 – – – 300

400 – – – 400

2 – 200->201 – – 101 101 – – 100

200 201 – – 200

300 300 – – 300

400 400 – – 400

3 – – 300->301 – 101 101 101 – 100

200 201 201 – 200

300 300 301 – 300

400 400 400 – 400

4 CKPT

101 101 101 – 101

200 201 201 – 201

300 300 301 – 300

400 400 400 – 400

5 – – – 400->401 101 101 101 101 100

– – – 200 201 201 201 201

– – – 300 300 301 301 300

– – – 400 400 400 401 400

6 401->402 – – – 101 101 101 101 100

200 201 201 201 201

300 300 301 301 300

400 400 400 401 400

101

201

301

402

7 CRASH CRASH

(Node2) (Node3)

101 – – 101 101

200 – – 201 201

300 – – 301 301

400 – – 401 400

101

201

301

402

Explanation:

SCN#1 – Node1 reads the block from disk and updates 100 to 101 in record. It holds the block in XCUR mode

SCN#2 – Node2 requests the same block for update. Node1 keeps the PI and Node2 holds the block in XCUR mode

SCN#3 – Node3 requests the same block for update. Node2 keeps the PI and Node3 holds the block in XCUR mode . Now we have two PIs

– On Node1 with SCN# 2

– On Node2 with SCN# 3

SCN# 4 – Local checkpointing takes place on Node2. PI on this node has SCN# 3.

It is checked if any of the other nodes has a PI at an earlier SCN# than this. Node1 has PI at SCN# 2.

CHanges in redo log of Node2 are applied to its PI and it is flushed to disk.

BWR is placed in redo log of Node2 to indicate that the block has been written to disk and need not be recovered in case Node2 crashes.

PI at node2 is discarded i.e. its state changes to CR which can’t be used to serve remote nodes.

PI at node1 is discarded i.e. its state changes to CR which can’t be used to serve remote nodes.

BWR is placed in redo log of Node1 to indicate that block has been written to disk and need not be recovered in case Node2 crashes.

Now on disk version of block contains changes of both Node1 and Node2.

SCN# 5 – Node4 requests the same block for update. Node3 keeps the PI and Node4 holds the block in XCUR mode .Node1 and Node2 have the CR’s.

SCN# 6 – Node1 again requests the same block for update. Node4 keeps the PI and Node1 holds the block in XCUR mode. Now Node1 has both the same block in CR and XCUR mode. Node3 has PI at SCN# 5.

SCN# 7 – Node2 and Node3 crash.

It is assumed that no incremental checkpointing has taken place on any of the nodes in the meanwhile.

Before crash status of block on various nodes is as follows:

- CR at SCN# 2 on Node1, XCUR on Node1

- CR at SCN# 3 on Node2

- PI at SCN# 5 on Node3

- PI at SCN# 6 on Node4

Redo logs at various nodes are

Node1 : B1: 100 -> 101, SCN# 1, BWR for B1 , B1:401->402 at SCN#6

Node2 : B1:200 -> 201, SCN# 2, BWR for B1

Node3 : B1:300 -> 301, SCN# 3

Node4 : B1:400->401 at SCN# 5

After the crash,

- Redo logs of crashed node (Node2) are analyzed and it is identified that block B1 has been flushed to disk as of SCN# 4 and need not be recovered as no changes have been made to it from Node2.

- No Redo log entry from Node2 needs to be applied

- Redo logs of crashed node (Node3) are analyzed and it is identified that block B1 needs to be recovered

- It is also identified that role of the block is global as its different versions was/is available in Node1(XCUR), Node2(crashed) , Node4(PI)

- Changes from Node3 have to be applied . It is checked if any PI is available which is earlier than the SCN# of the change on node3 which needs to be applied i.e. SCN# 3.

- It is identified that no PI is available whose SCN is earlier than the SCN# (3). Hence, block is read from the disk.

- Redo log entry which needs to be applied is : B1:300 -> 301, SCN# 3

- Redo is applied to the block read from the disk and the block is written to disk so that on disk version contains changes made by Node3 also.

- Checkpoint # of node2 and Node3 are incremented.

After instance recovery :

Node1 : holds CR and XCUR

Node2 :

Node3 :

Node4 : holds PI

On disk version of the block is:

101

201

301

400

References:

https://rajat1205sharma.wordpress.com/2015/06/20/oracle-rac-instance-recovery/

——————————————————————————————————

Related links:

Home

11G R2 RAC Index

11g R2 RAC : Dynamic Remastering

11g R2 RAC: How To Identify The Master Node In RAC

11g R2 RAC: Instance Failover In Serverpools

——————-

Hi Maam,

Thanks for such great post.
Can you please explain what is BWR here.
And I guess after every DML commit is also getting executed.

Just want to Add:–

When the same dirty block is requested by some other instance for write of read purpose, an image of the block is created in owning instance and then the block is shifted to requesting instance. This image copy of the block is called Past Image (PI).

XCUR–Exclusive current lock which is required to update the block.

CR–Is consistent version of block.
+++++++++++++++++++++++++++++++++++++++++++++++++++++
Small request can you please write post on Single instance recovery or help in understanding below question

1)Can you please explain how the commit maker which oracle write in redo stream when transaction committed help in roll forward recovery.

2)After db gets open Oracle now wants to roll back the uncommitted transaction that happen before db was abort but how oracle or smon determine
which block it need to rollback after db is open.

3)Suppose USER A running transaction for 7 mins and after 5 mins checkpoint happen in database so all the dirty buffer will be flush down to datafile
by dbwr and before the dbwr writes lgwr will write the redo buffer of this block to redo file after 7 mins USER A commit the transaction oracle will
write the commit marker in redo stream and update the undo header slot that transaction is committed. Now this undo header slot are free to use by other transaction
as previous transaction is commited. Now USER B had overwrite some of the undo entry of USER A transaction. Now in between USER C process want
to acccess all block modified by USER A so it will read the blocks from datafile and when it will check the block it will see some active ITL entry in it.
as when the block written to datafile transaction was active. Now from the ITL entry it will try to access the undo header slot to determine if transacton
commited or not but as the entry is overwritten by USER B what will happen to this block.

9 thoughts on “INSTANCE RECOVERY IN RAC”

JAMSHER KHAN says:

April 10, 2013 at 1:55 am

Hi Maam,

Thanks for such great post.
Can you please explain what is BWR here.
And I guess after every DML commit is also getting executed.

Just want to Add:–

When the same dirty block is requested by some other instance for write of read purpose, an image of the block is created in owning instance and then the block is shifted to requesting instance. This image copy of the block is called Past Image (PI).

XCUR–Exclusive current lock which is required to update the block.

CR–Is consistent version of block.
+++++++++++++++++++++++++++++++++++++++++++++++++++++
Small request can you please write post on Single instance recovery or help in understanding below question

1)Can you please explain how the commit maker which oracle write in redo stream when transaction committed help in roll forward recovery.

2)After db gets open Oracle now wants to roll back the uncommitted transaction that happen before db was abort but how oracle or smon determine
which block it need to rollback after db is open.

3)Suppose USER A running transaction for 7 mins and after 5 mins checkpoint happen in database so all the dirty buffer will be flush down to datafile
by dbwr and before the dbwr writes lgwr will write the redo buffer of this block to redo file after 7 mins USER A commit the transaction oracle will
write the commit marker in redo stream and update the undo header slot that transaction is committed. Now this undo header slot are free to use by other transaction
as previous transaction is commited. Now USER B had overwrite some of the undo entry of USER A transaction. Now in between USER C process want
to acccess all block modified by USER A so it will read the blocks from datafile and when it will check the block it will see some active ITL entry in it.
as when the block written to datafile transaction was active. Now from the ITL entry it will try to access the undo header slot to determine if transacton
commited or not but as the entry is overwritten by USER B what will happen to this block.

1. Anju Garg says:
  
  April 11, 2013 at 3:52 am
  
  Hi Jamsher,
  
  From whatever little I know, I will try to answer your questions.
  - PI is kept in the owning instance only if the block is requested for write operation by another instance.
  - BWR means Block written Record. After DBWR of an instance has written some dirty blocks to disk, , a BWR is placed in the redo stream of that instance to reflect it. At the time of recovery of that instance , only the redo beyond the BWR needs to be applied to the datafiles.
  - During roll forward phase of instance recovery, redo is applied after reading redo logs. In this process, undo is generated. This undo is used to rollback uncommitted changes.
  
  I hope your questions are answered.
  
  Regards
  Anju Garg
  
2. Anonymous says:
  
  April 11, 2013 at 7:27 am
  
  Hi Maam,
  
  Thanks for your reply. Please let me know if i have understand correctly.
  
  Suppose if a transaction that modify 1 to 10 blocks during time T1 tO T10. Suppose 1 to 5 blocks having commit transaction and 6 to 10 have uncommited transaction
  and at T11 instance crash. As instance crash all blocks are lost what where present in buffer cache.
  
  Rollforward (mount state):-Smon will apply the change from to all 10 blocks which will cause the undo to generate
  Rollbackward (open state):-Now as transaction recovery will start as first 5 blocks are commited they will be remain untouch
  But as 6 to 10 blocks are not commited the undo which is genrated in mount state will be reapply again on them with old values.
  
  Thanks
  Jamsher
  
santosh says:

August 7, 2014 at 8:21 am

It is the best site for all oracle dba’s to learn.

1. Anju Garg says:
  
  August 7, 2014 at 10:46 pm
  
  Thanks Santosh!
  
  Regards
  Anju Garg
  
rajat sharma says:

October 4, 2015 at 1:19 am

The thing is there are many blogs by oracle experts but it is the simplicty of this blog that makes it best one, help us understand the concepts easily especially complex subject as RAC..starting from basic and ending as well. Great work.

1. Anju Garg says:
  
  October 4, 2015 at 5:03 am
  
  Thanks Rajat for your time and feedback.
  
  Keep visiting the blog.
  
  Your comments and suggestions are always welcome.
  
  Regards
  Anju
  
Prachi says:

October 21, 2015 at 1:10 pm

Super work Anju!!! You really know the art of giving knowledge.

1. Anju Garg says:
  
  October 22, 2015 at 2:53 am
  
  Thanks Prachi for your time and feedback.
  
  Keep visiting the blog.
  
  regards
  Anju