Monthly Archives: December 2012

WORKLOAD MANAGEMENT IN 11g R2 RAC : LOAD BALANCING

In this post, I will discuss about various types of load balancing techniques that can be configured in 11g R2 RAC for workload management.
In RAC we have multiple instances of a database running on multiple servers.
Workload management involves :
   – Failover : If connection to an instance fails, client should automatically connect to another instance
   – Load balancing : Workload should spread across various instances to obtain maximum performance/throughput
Failover can be
  – Connect Time Connection Failover (CTCF) while making initial connection ()
  - Run time connection failover (RTCF) after connection has been established or (transparent application failover (TAF)
     FAILOVER_METHOD = NONE, BASIC, PRECONNECT
     FAILOVER TYPE   = SESSION, SELECT
Load balancing can be :
  – Connect time load balancing (CTLB)
    . On client side
    . On server side
  –  Run time load balancing (RTLB)
TO configure failover/load balancing, tnsnames.ora should contain multiple listener addresses to connect to multiple instances e.g.
Multiple Listener addresses within a description: i.e. User is trying to connect to a service which is supported by multiple instances.
RAC=
(DESCRIPTION=
   (ADDRESS_LIST=
     (ADDRESS= (PROTOCOL=TCP) (HOST=node1-vip) (PORT=1521))
     (ADDRESS= (PROTOCOL=TCP) (HOST=node2-vip) (PORT=1521))
     (ADDRESS= (PROTOCOL=TCP) (HOST=node3-vip) (PORT=1521))
     (ADDRESS= (PROTOCOL=TCP) (HOST=node4-vip) (PORT=1521))
                      )
    (CONNECT_DATA= (SERVICE_NAME= RAC))
       )
In case SCAN is used, SCAN name is used in the address which resolves to 3 SCAN listeners.
e.g.
RAC=
(DESCRIPTION=
   (ADDRESS_LIST=
     (ADDRESS= (PROTOCOL=TCP) (HOST=cluster01-scan) (PORT=1521))
                      )
    (CONNECT_DATA= (SERVICE_NAME= RAC))
       )
In this post, I will discuss in detail about Load Balancing. To know more about failover , please click here.
LOAD BALANCING 
————–
Load balancing in RAC implies distributing the workload over multiple instances accessing the same physical database. Two kinds load balancing can be configured
  – Connect time load balancing (CTLB)
    . On client side
    . On server side
  – On server side – Run time load balancing (RTLB)
Connect time load balancing (CTLB) : This enables user to connect to one of the instances supporting the service. The connection stays with the same instance until the user disconnects orthe session is killed. It can be configured on the client side and/or server side.
- Connect time load balancing on client side: When a user session attempts to connect to the database, Oracle Net chooses an address specified in tnsnames.ora to connect to in a random order rather than sequential order. This has the effect of clients connecting through addresses which are picked up at random and no one address is overloaded. Its configuration is quite simple. You just need to set the parameter LOAD_BALANCE=ON in the client connection definition in tnsnames.ora. For example :
RAC=
(DESCRIPTION=
  (ADDRESS_LIST=
                       (LOAD_BALANCE=ON)
    (ADDRESS= (PROTOCOL=TCP) (HOST=node1-vip) (PORT=1521))
                       (ADDRESS= (PROTOCOL=TCP) (HOST=node2-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node3-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node4-vip) (PORT=1521))
                      )
   (CONNECT_DATA= (SERVICE_NAME= RAC)
       )
LOAD_BALANCE parameter is set to ON by default. When this parameter is set to ON, Oracle Net Services progresses through the list of listener addresses in a random sequence, balancing the load on several listeners. When it is set to OFF, the addresses are tried out sequentially until one succeeds.
When using SCAN in the connection definition, Oracle database randomly connects to one of the available SCAN listeners in a round robin fashion and balances the connections on the three scan listeners. Here is a sample tnsnames.ora for a RAC database using SCAN.
RAC=
(DESCRIPTION=
    (LOAD_BALANCE=ON)
    (ADDRESS= (PROTOCOL=TCP) (HOST=SCAN-HOSTNAME) (PORT=1521))
                  )
   (CONNECT_DATA= (SERVICE_NAME= RAC)
       )
Limitation of connect time load balancing : The connection stays with the same instance for the life of a session. If connection lasts a long time, it might be possible that load of current instance increases and some other less loaded instance might be preferable. In that case we would like the connection to switch the other more appropriate instance. This can be achieved by using Run time Load Balancing (RTLB).
- Connect time load balancing on server side: After a listener receives the connection request, it can forward the request to another instance based on the connect time load balancing goal (CLB_GOAL)specified for the service. CLB_GOAL can be :
LONG(Default) - used for application connections that are connected for a long period such as third party connection pools and SQL*Forms applications. In this case,
   . the listener will load balance on number of sessions
   . Run time load balancing goal will not be used in this case
SHORT - used for application connections that are short in duration. This should be used with connection pools integrated with the load balancing advisory. In this case, listener uses Load Balancing Advisory (LBA) to make the connection based on CPU utilization on the node.
Limitation of Connect time load balancing on client side : The listener has no idea if the session has been assigned to an endpoint whose corresponding database server is already overloaded. Hence, timeouts can occur if the node is heavily loaded and unable to respond quickly.Hence, to overcome this problem, server side connect time load balancing needs to be configured. It is useful to spread initial connection load among all listeners inthe cluster. Client may then be redirected based on server side load balancing.
Run Time (Server side) load Balancing (RTLB): In this case, the listener routes incoming client connections according to policies and based on the current service level provided by the database instances. The listener determines the connection distribution depending upon profile statistics that  are dynamically updated by PMON. The higher the load on the node, the more frequently PMON updates the load profile.Thus connections may be switched depending upon changes in cluster configuration, application wrokload  overworked nodes or hangs.
   The core of server side laod balancing id Dynamic service registration so that a services are registered with all the listeners. Since PMON on each node sends load profile to all the listeners with which the service is registered,  all the  listeners come to know about load profile of all the instances and hence the connection is forwarded to the most appropriate listener depending upon the goal of the run time load balancing.
  Run time load balancing is achieved using connection pools. Work requests are automatically balanced across the pool of connections.The connection allocation is based on the current performance level provided by the database instances as indicated by the LBA FAN events. This provides load balancing at the transaction level instead of load balancing at the time of initial connection. 
With server-side load balancing, the listener directs a connection request to the best instance currently providing the service by using the load balancing advisory.
Load Balancing Advisory
- is an advisory for balancing work across RAC instances
- Monitors workload activity for a service across all instances in the cluster
- Analyzes the service level for each instance based on defined metric goal
    Metric: service time (GOAL_SERVICE_TIME)
    Metric: throughput (GOAL_THROUGHPUT)
- Publishes FAN events recommending amount of work to be sent to each instance and data quality  flag
- Default is Off.
-  Directs work to where services are executing well and resources are available
- Adjusts distribution for different power nodes, different priority and shape workloads, changing demand
- Stops sending work to slow, hung, failed nodes early
How to configure server side load balancing:
1. set parameters LOCAL_LISTENER and REMOTE_LISTENER
2. set CLB_GOAL = SHORT for the service
3. set RTLB_GOAL for the service
1. set parameters LOCAL_LISTENER and REMOTE_LISTENER
LOCAL_LISTENER parameter should be set to name of the listener defined in the same node
REMOTE_LISTENER should be set to names of the listeners running on other nodes
For example in a 3 node setup
Host01 running instance orcl1
Host02 running instance orcl2
Host03 running instance orcl3
For host01,
local_listener  – (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP) HOST=host01-vip)(PORT=1521))))
remote_listener  – (DESCRIPTION=(ADDRESS_LIST=
                                 (ADDRESS=(PROTOCOL=TCP) HOST=host02-vip)(PORT=1521)
                                 (ADDRESS=(PROTOCOL=TCP) HOST=host03-vip)(PORT=1521))))
For host02,
local_listener  – (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP) HOST=host02-vip)(PORT=1521))))
remote_listener  – (DESCRIPTION=(ADDRESS_LIST=
                                 (ADDRESS=(PROTOCOL=TCP) HOST=host01-vip)(PORT=1521)
                                 (ADDRESS=(PROTOCOL=TCP) HOST=host03-vip)(PORT=1521))))
For host03,
local_listener  – (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP) HOST=host03-vip)(PORT=1521))))
remote_listener  – (DESCRIPTION=(ADDRESS_LIST=
                                 (ADDRESS=(PROTOCOL=TCP) HOST=host02-vip)(PORT=1521)
                                 (ADDRESS=(PROTOCOL=TCP) HOST=host01-vip)(PORT=1521))))
When we start the three instances the corresponding PMON processes get dynamically registered with all the listeners and start feeding listeners with load profile information. Now all the listeners come to know about load profile of all the instances and hence the connection is forwarded to the listener of the least loaded node.
When SCAN is used, remote_listener parameter should be set to SCAN name on all the nodes i.e.
remote_listener – myrac-cluster-scan:1521
So that PMON process of each instance registers the database services with the default listener on the local node and with each SCAN listener, which is specified by the REMOTE_LISTENER database parameter.
2. set CLB_GOAL = SHORT for the service
   EXECUTE DBMS_SERVICE.MODIFY_SERVICE
   (service_name => ‘sjob’ -
   , clb_goal => DBMS_SERVICE.CLB_GOAL_SHORT);
OR
   srvctl modify service
      -s orcl_serv database -d orcl
      -j SHORT   // connection load balancing goal {long|short}
3. set RTLB_GOAL for the service
A request for a connection is serviced by selecting a connection based on the service goal as determined by the Load Balancing Advisory. The service goal determines whether the connection provides best service quality, that is, how efficiently a single transaction completes, or best throughput, that is, how efficiently an entire job or long-running query completes.
For RTLB, we can define service level goal which will be used only if CLB_GOAL= SHORT
There are 3 options available
NONE – Default setting, you are not taking advantage of this feature
THROUGHPUT – Work requests are directed based on throughput.  THROUGHPUT should be used when the work in a service completes at homogenous rates.  An example is a trading system where work requests are similar lengths. Attempts to direct work requests according to throughput. The load balancing advisory analyzes the service level for each instance based on the service time and is based on the rate that work is completed in the service plus available bandwidth to the service. For example for the use of THROUGHPUT is for workloads such as batch processes,trading system work requests have similar lengths and  next job starts when the last job completes:
EXECUTE DBMS_SERVICE.MODIFY_SERVICE
 (service_name => ‘sjob’ -
  , goal => DBMS_SERVICE.GOAL_THROUGHPUT );
OR
srvctl modify  service
      -s orcl_serv database -d orcl
      -B throughput      // runtime connection load balancing goal { service_time|throughput | none}
SERVICE_TIME – Work requests are directed based on response time. SERVICE_TIME should be used when the work in a service completes at various rates.  In this case, Load balancing advisory data is based on elapsed time for work done in the service plus available bandwidth to the service. An example for the use of SERVICE_TIME is for workloads such as internet shopping where the rate of demand changes and work requests are of differing various lengths.:
EXECUTE DBMS_SERVICE.MODIFY_SERVICE
(service_name => ‘OE’ -
, goal => DBMS_SERVICE.GOAL_SERVICE_TIME -
);
OR
srvctl modify  service
      -s orcl_serv database -d orcl
      -B service_time      // runtime connection load balancing goal { service_time|throughput | none}
 You can see the goal settings for a service in the data dictionary and in the DBA_SERVICES, V$SERVICES, and V$ACTIVE_SERVICES views.
SUMMARY:
Workload management involves :
   – Failover : If connection to an instance fails, client should automatically connect to another instance
   – Load balancing : Workload should spread across various instances to obtain maximum performance/throughput
Load balancing can be :
  – Connect time load balancing (CTLB)
    . On client side
    . On server side
  –  Run time load balancing (RTLB)
TO configure failover/load balancing, tnsnames.ora should contain multiple listener addresses to connect to multiple instances
Connect time load balancing (CTLB) : This enables user to connect to one of the instances supporting the service. The connection stays with the same instance until the user disconnects orthe session is killed. It can be configured on the client side and/or server side.
- Connect time load balancing on client side: When a user session attempts to connect to the database, Oracle Net chooses an address specified in tnsnames.ora to connect to in a random order rather than sequential order. This has the effect of clients connecting through addresses which are picked up at random and no one address is overloaded. Its configuration is quite simple. You just need to set the parameter LOAD_BALANCE=ON in the client connection definition in tnsnames.ora.
Limitation of Connect time load balancing (CTLB) on client side : The listener has no idea if the session has been assigned to an endpoint whose corresponding database server is already overloaded.
- Connect time load balancing (CTLB) on server side: After a listener receives the connection request, it can forward the request to another instance based on the connect time load balancing goal (CLB_GOAL)specified for the service. CLB_GOAL can be : LONG or SHORT
LONG(Default) – In this case,
   . the listener will load balance on number of sessions
   . Run time load balancing goal will not be used in this case
SHORT - used for application connections that are short in duration. This should be used with connection pools integrated with the load balancing advisory. In this case, listener uses Load Balancing Advisory (LBA) to make the connection based on CPU utilization on the node.
Limitation of connect time load balancing : The connection stays with the same instance for the life of a session. If connection lasts a long time, it might be possible that load of current instance increases and some other less loaded instance might be preferable.
Run Time (Server side) load Balancing (RTLB): In this case, the listener routes incoming client connections according to policies and based on the current service level provided by the database instances. With server-side load balancing, the listener directs a connection request to the best instance currently providing the service by using the load balancing advisory.
To configure server side load balancing:
1. set parameters LOCAL_LISTENER and REMOTE_LISTENER
2. set CLB_GOAL = SHORT for the service
3. set RTLB_GOAL for the service
———————————————————————————

WORKLOAD MANAGEMENT IN 11g R2 RAC : FAILOVER

In this post, I will discuss about various types of failovers that can be configured in 11g R2 RAC for workload management.
In RAC we have multiple instances of a database running on multiple servers.
Workload management involves :
   – Failover : If connection to an instance fails, client should automatically connect to another instance
   – Load balancing : Workload should spread across various instances to obtain maximum performance/throughput
Failover can be
   – Connect Time Connection Failover (CTCF) while making initial connection ()
   - Run time connection failover (RTCF) after connection has been established or (transparent application failover (TAF)
     FAILOVER_METHOD = NONE, BASIC, PRECONNECT
     FAILOVER TYPE   = SESSION, SELECT
Load balancing can be :
  - Connect time load balancing (CTLB)
    . On client side
    . On server side
  –  Run time load balancing (RTLB)
TO configure failover/load balancing, tnsnames.ora should contain multiple listener addresses to connect to multiple instances e.g.
Multiple Listener addresses within a description: i.e. User is trying to connect to a service which is supported by multiple instances.
RAC=
(DESCRIPTION=
  (ADDRESS_LIST=
    (ADDRESS= (PROTOCOL=TCP) (HOST=node1-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node2-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node3-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node4-vip) (PORT=1521))
                      )
   (CONNECT_DATA= (SERVICE_NAME= RAC))
       )
In case SCAN is used, SCAN name is used in the address which resolves to 3 SCAN listeners.
e.g.
RAC=
(DESCRIPTION=
  (ADDRESS_LIST=
    (ADDRESS= (PROTOCOL=TCP) (HOST=cluster01-scan) (PORT=1521))
                      )
   (CONNECT_DATA= (SERVICE_NAME= RAC))
       )
In this post, I will discuss in detail about Failover. To know more about load balancing, please click the following link:
FAILOVER
——–
 Failover refers to the mechanism of switching over to an alternate resource when connection to the primary resource gets terminated due to any reason. In context of RAC it implies that when a user tries to connect to one instance of the database and connection fails due to some reason , he should be able to switch to another instance. Connection failure may be broadly categorized as:
- Failure while making the initial connection to an instance (Connect Time Connection Failover (CTCF)
- Failure after a connection has been successfully established (transparent application failover (TAF) or  Run time connection failover (RTCF))
Failure while making the initial connection to an instance (Connect Time Connection Failover (CTCF) : It could occur for example when listener on a node is not working or a node is not up. If your attempt to connect to an instance fails, you can make the connection attempt again, but this time to a backup instance. As long as you have backup instances configured, you can continue trying to connect, until a successful connection is established. This process is technically termed as Connect Time connection Failover (CTCF).
Failure after a connection has been successfully established (transparent application failover (TAF) or  Run time connection failover (RTCF) :It happens as a result of a node crash or an instance crash. In such cases the application  has to reconnectto a backup instance, reestablish the session environment, and resubmit any work lost, because of the break in the connection. The technical name for this type of failover mechanism is server side transparent application failover (TAF) or  Run time connection failover (RTCF).
How to configure Connect Time Connection Failover (CTCF)
Connect Time Failover can be achieved by the setting FAILOVER=ON/TRUE/YES (default) in DESCRIPTION_LIST, DESCRIPTION or ADDRESS_LIST in tnsnames.ora.
RAC=
(DESCRIPTION=
  (ADDRESS_LIST=
                       (FAILOVER=ON)
    (ADDRESS= (PROTOCOL=TCP) (HOST=node1-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node2-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node3-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node4-vip) (PORT=1521))
                      )
   (CONNECT_DATA= (SERVICE_NAME= RAC))
       )
 The default value of parameter FAILOVER is set to ON so that Oracle Net Services tries the first address in the list and if that fails, it fails over to the next one in the address list. Without this, Oracle would try only one address from the list and report an error on connection failure.
Note: Connect Time Failover only works if you are using dynamic registration. This means that this feature won’t work if you have something like this configured in your listener.ora:
SID_LIST_MYLISTENER=
(SID_LIST=
(SID_DESC=
(GLOBAL_DBNAME=sales.us.acme.com)
                                                (ORACLE_HOME=/s01/app/oracle/ora/prod)
(SID_NAME=sghosh)
)
    )
Any reference to Global_dbname and you can forget about Connect Time Failover.
How to configure Run Time Connect Failover (RTCF) or Server side TAF: 
It can be achieved by creating a service for the database and define TAF Policy for the service. (METHOD = BASIC/PRECONNECT) and TYPE (SELECT/SESSION). In this case, connect-descriptor in tnsnames.ora contains  a FAILOVER_MODE portion e.g.
RAC=
(DESCRIPTION=
  (ADDRESS_LIST=
                       (FAILOVER=ON)
    (ADDRESS= (PROTOCOL=TCP) (HOST=node1-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node2-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node3-vip) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=node4-vip) (PORT=1521))
                      )
   (CONNECT_DATA= (SERVICE_NAME= RAC)
                      (FAILOVER_MODE=(TYPE=select)(METHOD=basic))
       )
Unlike connect time failover, which is invoked before the connection is made, TAF comes into play after the connection is made (and then, broken). If the connection is lost while the application is running, Oracle Net will transparently reconnect the application to another instance accessing the same database.
TAF supports two types of failover (TYPE)
- SESSION:
   . Any uncommitted transactions will be rolled back.
   . The session will be connected to another instance
- SELECT:
  . The session will be connected to another instance
  . A SELECT statement that was executing at the time of failure will be re-executed by the new session using the same SCN and fetched rows will be discarded up to the point that the original query failed.
  . Any uncommitted transactions will be rolled back
TAF supports two methods of failover (METHOD): Method determines which method of failovertakes place on instance failure from the primary node to secondary.
- BASIC: You connect to the backup instance when the primary connection fails. This option requires almost no work on the backup server until failover time.
- PRECONNECT: you connect to the backup instance at the same time you connect to the primary instance. This has the obvious benefit of having a backup connection available all of the time, thus reducing the time of ‘failover’. But the downside is that you have to pay the extra ‘cost’ in terms of resources spent, of having a backup connection open all the while.
SUMMARY:
Workload management involves :
   – Failover : If connection to an instance fails, client should automatically connect to another instance
   – Load balancing : Workload should spread across various instances to obtain maximum performance/throughput
Failover can be 
   – while making initial connection (Connect Time Connection Failover (CTCF))
   – after connection has been established (transparent application failover (TAF) or  Run time connection failover (RTCF))
     FAILOVER_METHOD = NONE, BASIC, PRECONNECT
     FAILOVER TYPE   = SESSION, SELECT
TO configure failover/load balancing, tnsnames.ora should contain multiple listener addresses to connect to multiple instances
Connect Time Connection Failover (CTCF) : If your attempt to connect to an instance fails, you can make the connection attempt again, but this time to a backup instance. Connect Time Failover can be achieved by the setting FAILOVER=ON/TRUE/YES (default) in DESCRIPTION_LIST, DESCRIPTION or ADDRESS_LIST in tnsnames.ora.
Run Time Connect Failover (RTCF) or Server side TAF: If the connection is lost while the application is running, Oracle Net will transparently reconnect the application to another instance accessing the same database. It can be achieved by creating a service for the database and define TAF Policy for the service. (METHOD = BASIC/PRECONNECT) and TYPE (SELECT/SESSION). In this case, connect-descriptor in tnsnames.ora contains  a FAILOVER_MODE portion .
References:

 

—————————————————————————————————-

Related links:

Home

11G R2 RAC Index
11g R2 RAC : Services
Service Created Using EM Does Not Update DD or OCR

                                 ———-

LOB STORAGE IN 11G: BASIC FILES AND SECUREFILES

In this post, I will discuss about the methods to store LOBs in an Oracle database – Basic files and secure files.
you may need to store information in a non-stuctured or semi-structured form. Examples include pictures, word processing documents, spreadsheets, XML files, and so on. How are these types of data stored?
HOW TO STORE NON-STRUCTURES OR SEMI-SRUCTURED INFO IN DATABASE?
There are usually two approaches: The data is stored in the database as LOB fields (BLOB for binary and CLOB for character data), or in OS files with the references to the files stored in the database.
- BLOB:  BLOBs contain  binary  information that is not subject to character set conversion. This would be an appropriate type to store a spreadsheet, a word processing document, image files, and the like.
- CLOB: CLOBs contain information that is subject to character set conversion. This would be an  appropriate type to store  large amounts of textual information, such as XML or just  plain text.
- BFILE: This datatype permits you to store an Oracle directory object (a pointer to an operating system directory) and a file name in a database column and to read this file. This effectively allows you to access operating system files available on the database server in a read-only fashion, as if they were stored in the database table itself. The file is maintained outside of the database and is not really part of the database      at all. A BFILE provides read-only access to the contents of the file.
Each approach has advantages and challenges. OS files can be cached by the OS and journaled filesystems that expedite recovery after crashes. They also generally consume less space than the data in the database since they can be compressed.
There are also tools that can intelligently identify patterns in the files and remove duplication for a more efficient storage; however, they are external to the database so the properties of the database do not apply to them. These files are not backed up, fine grained security does not apply to them, and such files are not part of a transaction–so concepts so innate to the Oracle database like read consistency do not apply.
What if you could get the best of both worlds? In Oracle Database 11g, you have the answer with SecureFiles, a completely new infrastructure inside the database that gives you the best features of database-resident LOBs and OS files. Let’s see how.
Traditionally, priot to Oracle Database 11g, you would have defined the table as follows:
sql>
conn / as sysdba
create user tst identified by tst;
grant connect, resource to tst;

conn tst/tst

create table basic_lob

( id number,

txt lob)

tablespace users

lob (txt) store as

(

tablespace users

enable storage in row

chunk 4096

pctversion 20

nocache

nologging);

The various parameters indicate :

    – the tablespace where the lobsegment and lobindex will be stored, and this may be different from the tablespace where the table itself resides.
    – IN ROW Clause
      . disable storage in row : LOB data is always stored separately from the table, in the lobsegment
      . enable storage in row(default)  : LOBs of up to 4,000 bytes will be stored in the table itself LOBs exceeding 4,000 bytes will be moved  out of line  into the lobsegment
    –  CHUNK SIZE : LOBs are stored in chunks; the index that points to the LOB data points to individual chunks of data. Chunks are logically contiguous sets of blocks and are the smallest unit of allocation for LOBs.The CHUNK size must be an integer multiple of your  Oracle blocksize. Each LOB instance will consume at least one CHUNK.
       If you pick a  CHUNK size that is more than expected LOB sizes, you could end up wasting an excessive amount of space. For example, if you have that table with 7KB LOBs on average,  and you use a CHUNK size of 32k, you will be  wasting  approx. 25k of space per LOB instance. On the other hand, if you use an 8KB CHUNK, you will minimize any sort of waste.
       If chunk size is smaller than expected LOB sizes, it  will affect your retrieval performance. A lobindex used to point to the individual chunks, and the more chunks you have, the larger this index is. If you have a 4MB LOB and use an 8KB CHUNK, you will need at least 512 CHUNKs to store that information. That means you need at least as many lobindex entries to point to these chunks. It takes longer to read and manage many small chunks than it does to read fewer, but larger, chunks. The ultimate goal is to use a CHUNK size that minimizes your  waste,  but also efficiently stores your data.
   – PCTVERSION :  This is used to control the read consistency of the LOB. The lobsegment does not use undo to record its changes; rather, it versions the information directly in the lobsegment itself. when you modify a LOB, Oracle allocates a new CHUNK and leaves the old CHUNK in place. If you roll back your transaction, the changes to the LOB index are rolled back and the index will point to the old CHUNK again. PCTVERSION controls the percentage of allocated LOB space that should be used for versioning of LOB data. The default is 10 percent .
    – RETENTION : This is a mutually exclusive alternative to the PCTVERSION clause, and it is valid when you are using automatic undo management in the database. Rather than reserving some percentage of space in the lobsegment to version LOBs, the RETENTION clause uses the same time-based mechanism to retain data. The  parameter UNDO_RETENTION database would  also apply to LOB data in this case.
    – CACHING: This clause controls whether or not the lobsegment data is stored in the buffer cache. One of the advantages of storing unstructured data in OS files instead of  database resident objects is the facility of caching. Files can be cached in the  operating system’s file buffers. A database resident object can also be cached in  the database buffer cache. However in some cases the caching may actually be  detrimental to performance. LOBs are usually very large (hence the term large objects) and if they come to the buffer cache, most other data blocks will need to be pushed out of   the cache to make room for the incoming LOB. The LOB may  Never be used later yet its entry into the buffer cache causes necessary blocks to be flushed out. Thus in most cases  you may want to disable caching for the LOBs.
      The benefits of caching are very application dependent. In an application  manipulating thumbnail images, performance may be improved with caching.  However, for larger documents or images, it is better to turn off caching.
    NOCACHE : default implies that every access will be a direct read from disk and every write/modification will likewise be a direct read from disk.
      CACHE READS : allows LOB data that is read from disk to be buffered, but writes of LOB data will be done directly to disk.
    CACHE : permits the caching of LOB data during both reads and writes. Can be specified for small- to medium sized LOBs (used to store descriptive  fields of just a couple of kilobytes).  Caching refers to the LOB only. The rest of  the table is placed into the buffer cache and follow the same logic as any other table regardless of the setting of the LOB caching on that table.
A LOB column always results in what I call a multisegment object, meaning the table will use multiple physical segments.
sql>
conn tst/tst
select segment_name, segment_type
from user_segments;
SEGMENT_NAME                        SEGMENT_TYPE
——————————                    ——————
SYS_IL0000071432C00002$$        LOBINDEX
SYS_LOB0000071432C00002$$     LOBSEGMENT
BASIC_LOB                               TABLE
The LOB in the table really just points to the lobindex, and the lobindex in turn points to all of the pieces(chunks)  of the LOB itself.
As you didn’t specify it explicitly, the LOB is stored in the conventional format (BasicFiles).
Prior to 11g , we had certain assumptions about LOBs which led to BASIC FILE storage for them :
- LOBs would never grow beyond a size of a few hundred kilobytes or in extreme cases a  megabyte or two. Based on this assumption, the largest  chunk size  the maximum amount of space that Oracle would have to convert from its source file to a LOB at one time could  be kept relatively small as well, at just 32KB.
- LOBs would be loaded relatively infrequently   usually, only once   and then would be  updated very rarely. After all, how often does a standard mailing label format or form letter change?
- Likewise, we anticipated that only one application user session would be modifying a  physician’s picture or a mailing label master document at one time. Therefore, concurrency and read consistency was not going to be a major concern, so we didn t need to worry about the impact of maintaining additional rollback segments just for LOBs.
Now, the footprint of Large Objects has increased exponentially within our databases . It s being driven by diverse business needs that span every industry: medical and pharmaceutical applications that capture and serve up patient images, manufacturing applications that store components  designs, and semi-structured data that business-to-business (B2B) applications use to order products to replenish a company s warehouse. However, these newer application requirements have several things in common:
- LOB sources are much more diverse. These sources include Extended Markup Language (XML)  documents, geospatial data, biometric information, video files, and other semi-structured or  non-structured data.
- Simultaneous multiple-user updates are common. As semi-structured data like XML has proliferated, the need to provide the ability for more than one user to simultaneously update     the same LOB has also exploded.
- Large objects are gigantic … and getting larger. It s not uncommon for LOBs to extend into the gigabyte and even terabyte range, especially for audiovisual and medical imaging data. Transmission bandwidths have increased dramatically.
- Self-published user content has exploded. The YouTube phenomenon and other web sites that    store self-produced audiovisual files for wide consumption has also contributed to the explosion of LOB sources.
- Security requirements have increased dramatically. Finally, many of these LOBs require special security   especially personal medical data that is stored within Digital Imaging for Communication of Medical Information (DICOM) sources   because of federal, state, and international regulations like the Health Insurance Portability and Accountability Act (HIPAA).
LOBs, the Next Generation: SecureFiles
———————————————————–
As a result of these new business requirements, the LOB data type has been redesigned completely in Oracle 11g Release 1. This new LOB datatype is called a SecureFile.
If you want to store the LOB as a SecureFile, all you have to do is place a clause store as securefile in the table creation, as shown below:
create table secure_lob
( id  number,
  txt lob)
tablespace users
lob (txt) store as securefile
(
        tablespace users
        enable storage in row
        chunk 4096
        pctversion 20
        nocache
        nologging
);
- Performance and Manageability Improvements
  Since the sources of LOBs have increased in   size by several orders of magnitude the   methods that load, store, and return data from LOBs   have been  refurbishment.
- Bigger CHUNK size
   In prior releases, the maximum size of a  chunk  of LOB data that could be read or written at one time was limited to 32KB. Internal and external LOB sizes have grown exponentially  ,  so this was a serious limitation to effective LOB I/O.
   Oracle 11g remedies this by automatically choosing an appropriate CHUNK size between  the database s standard block  size and a maximum of 64MB. Further.
   Oracle 11g helps limit the internal fragmentation of   SecureFile LOBs by attempting to store extremely large SecureFile LOB segments on disk so  that they re physically adjacent.
   Finally, the ability to utilize variable CHUNK sizes means  that the corresponding UNDO segments generated during updates of SecureFile LOBs only   need to be as large as the CHUNKs and thus contain less wasted space.
- Improved REDO and UNDO Generation.
  Before Oracle 11g, the Oracle DBA had relatively free rein when configuring CHUNK size, as well as the retention of undo segments for LOBs, and  that didn t always yield optimal results.
   Oracle 11g will choose automatically the   best method for generating redo and undo for  SecureFile LOBs, only two storage-related parameters are required: RETENTION and MAXSIZE.   The RETENTION directive tells Oracle 11g how to handle the consistent read data (stored in   UNDO segments) for a SecureFile LOB. Here s how the setting affects read consistency:
Table 1-1. RETENTION Directive Settings for SecureFiles
Setting  Explanation
MAX     is used to start reclaiming old versions after segment MAXSIZE is reached
MIN      keeps old versions for the specified least amount of time
AUTO   This is the default setting. It s basically a trade-off between space and time.
            This is automatically determined
NONE   Instructs Oracle 11g to reuse older versions of the SecureFile LOB as much as possible
The new MAXSIZE storage setting works in concert with the setting for RETENTION, and it determines the maximum size of a SecureFile LOB. When RETENTION is set to MAX, then the LOB is allowed to grow up to a size of MAXSIZE, after which space can be reclaimed from UNDO segments; otherwise, the MAXSIZE setting is treated as a hard limit on the size of the SecureFile LOB s segment.
Note that the storage attributes CHUNK, PCTVERSION, FREEPOOLS, FREELISTS, and FREELIST GROUPS are simply ignored for SecureFile LOBs
- Automatic Deduplication
  One of the more intriguing SecureFile LOB features is the ability for  Oracle 11g to     automatically locate any LOBs that store exactly the same data within the same   table and   column. Once detected, Oracle 11g will store only one copy of the LOB data. The   advantage   here, of course, is dramatically reduced amounts of disk space are needed to store   the same   amount of data; further, redo logging and UNDO is reduced significantly because an   update   to SecureFile LOB data doesn t require retaining duplicate versions of redo log entries   and   UNDO segments.
SQL> alter table secure_lob
        modify lob(txt)
        (deduplicate);
After the deduplication, the database calculates the hash values of the columns values in each row and compares them to the others. If the hash values match, the hash value is stored AND not the actual LOB. When a new record is inserted its hash value is calculated, and if it matches to another value then the hash value is inserted; otherwise the real value is stored.
You can also reverse the deduplication process:
SQL> alter table secure_lob
         modify lob(txt)
        (keep_duplicates);
- Intelligent Compression.
  This new feature allows Oracle 11g to compress and uncompress SecureFile LOB data  automatically and transparently, and this means that disk space for these data is reduced   dramatically. Whenever a query or DML statement retrieves or applies changes to a SecureFile    LOB s data, Oracle 11g uncompresses only those data blocks needed for the operation, and it    automatically maintains the required mapping between the uncompressed data blocks in the     Database Buffer Cache and the database blocks on disk. There are two compression levels:  MEDIUM (the default) and HIGH; specifying HIGH will tend to increase latency during  SecureFile LOB access, but will also result in the highest storage compression ratios.
SQL> alter table secure_lob
         modify lob(txt)
         (compress high);
Compression is not the same as deduplication. Compression happens inside a LOB column, inside a row each LOB column is compressed independently. In deduplication, all the rows are examined and duplicate values in the columns are removed and replaced with pointers. If you have two very different rows, deduplication will not reduce the size; but compression may optimize the space inside the LOB value. You can compress as well as deduplicate the table.
Compression takes up CPU cycles so depending on how much data is compressible, it may not be worthy of compression. For instance, if you have a lot of JPEG pictures they are compressed already, so further compression will not save any space. On the other hand, if you have an XML document as a CLOB, then compression may produce substantial reduction. SecureFiles compression automatically detects if the data is compressible and only spends CPU cycles if compression yields gains.
Also note that the LOB compression is independent of table compression. If you compress the table SECURE_LOB, the LOBs are not compressed. The LOB compression occurs only when you issue the above SQL.
In Oracle Database 11g Release 2, there is a third compression option in addition to HIGH and MEDIUM: LOW. As the name suggests, it compresses less but also consumes a lot less CPU and completes faster. This approach uses a block-based lossless compression similar to the fast Lempel Ziv Oberhumer (LZO) algorithm.
Let’s see an example of a table with the SecureFiles LOW compression:
create table docs
(
        doc_id  number,
        clearance       varchar2(20),
        doc_body        clob
)
LOB(doc_body) store as securefile
(
        compress low
);
Let’s see an example with the same table and column. First, we modify the column to uncompressed:
SQL> alter table docs
  2  modify LOB(doc_body)
  3  (
  4     nocompress
  5  );
Now, let s modify the column for low compression:
SQL> alter table docs
  2  modify LOB(doc_body)
  3  (
  4     compress low
  5  );
If you omit the LOW clause, the default is MEDIUM. The LOW compression is not just for table creation; you can use it to alter an existing column as well.
- Transparent Encryption
  Every organization has ethical and legal obligations to protect customer-sensitive information  stored within LOBs. For example, a patient s medical records  and radiology images must be  protected. However, there s also enormous amounts of intellectual property information that  must be stored in semi-structured  object forms   for example, molecular formulas and  manufacturing blueprints   that is crucial to any company’s competitive advantage.
  SecureFile LOBs fill this hitherto serious gap in database security because it s now possible to encrypt LOB data within an Oracle 11g database. SecureFile LOBs are encrypted using the  same Transparent Data Encryption (TDE) algorithms that were introduced in Oracle 10gR2, and  this  means it s possible to encrypt either any or all SecureFile LOBs within the same table using the four standard TDE encryption algorithms (AES128, AES192, AES256 and 3DES168).
sql>alter table secure_lob
       modify lob(ttx)
       (encrypt using ‘AES128′);
Before enabling encryption you have to set up encryption wallet.
 Here are the steps in summary:
   1. Set the parameter in sqlnet.ora, if not set already to specify the location of the wallet:
      ENCRYPTION_WALLET_LOCATION=
         (SOURCE=
             (METHOD=FILE)
             (METHOD_DATA=
                (DIRECTORY= /opt/oracle/orawall)
           )
      )
      The directory /opt/oracle/orawall should already exist; if not then you should create it.
   2. Create the wallet:
      alter system set encryption key authenticated by “mypass”
      This creates the wallet with the password mypass and opens it.
   3. The above steps are needed only once. After the wallet is created and open, it stays open as long as the database is up (unless it is explicitly closed). If the database is restarted, you have to open the wallet with:
      alter system set encryption wallet open identified by “mypass”
When a SecureFile LOB column is encrypted, the column values of all the rows of that table are encrypted.
Conclusion
SecureFiles are not merely the next generation of LOBs; they add much more value to them, especially the features that were formerly only in the domain of filesystems. SecureFiles can be encrypted for security, de-duplicated and compressed for more efficient storage, cached (or not) for faster access (or save the buffer pool space), and logged at several levels to reduce the mean time to recover after a crash. With the introduction of SecureFiles, you can store more unstructured documents in the database without incurring too much overhead or losing any critical functionality afforded by an OS filesystem.
References :
—————————————————————————————————–
 
Related links:

PARENT AND CHILD CURSORS IN ORACLE

  In this post I will discuss about parent and child cursors – what do they contain, when are they created and how and why can we ensure that they can be shared.
(Please click here for the video of this post)
A cursor is a memory area in library cache allocated to a SQL statement which stores various info about the SQL statement like its text, execution plan, statistics etc.
 Each SQL statement has
- One Parent cursor
- One or more child cursors
PARENT CURSOR
- It stores the sql text of the cursor. When two statements are identical textually, they will share the same parent Cursor.
- Externalised by V$SQLAREA: Contains one row for each parent cursor
CHILD CURSOR
- Each parent requires at least one child cursor but can have more than one child cursors
- The Child Cursor holds other required information, like: the identity of the objects referenced by the SQL Cursor; the names, type and length of the bind variables used..etc.
- Child cursor contains
  . Environment
  . Statistics
  . Execution Plan
  . Bind variables
- Externalised by V$SQL : Contains one row for each child cursor
- A child cursor takes up less space in the cursor cache. A child cursor doesn’t contain all of the information stored in a parent cursor, for example, the SQL text is only stored in the parent cursor and not in each child.
Since we want to economize on the memory consumption, we would like that equivalent SQL statements should use the same cursor e.g. select * from employees and SELECT * FROM EMPLOYEES achieve the same objective and have the same execution plan and hence only one cursor should be created and should be used when either of the statements is issued. But it won’t be so and two parent and hence two child cursors will be created since the  two statements are textually different .
                 If we have two textually identical statements, only one parent cursor will be created but multiple child cursors and hence multiple execution plans can be created if for example  bind variables have different values/sizes for different executions of the same statement.
When you have the same statement that has several versions (children), the view v$sql_shared_cursor shows the reason why the statement cannot be shared. You may be able to find that for each child cursor except the first one, why it was not possible to share a previously created child cursor. For several types of incompatibility there is a column that is set to either N (not a mismatch) or Y (mismatch).
The following table lists various columns which represent different types of incompatibilities which could lead to non sharing of the child cursors:
ANYDATA_TRANSFORMATION
Is criteria for opaque type transformation and does not match
AUTH_CHECK_MISMATCH
Authorization/translation check failed for the existing child cursor
BIND_MISMATCH
The bind metadata does not match the existing child cursor. Likely a difference in bind variable definition.
BIND_PEEKED_PQ_MISMATCH
Cursor based around bind peeked values
BIND_UACS_DIFF
One cursor has bind UACs and one does not
BUFFERED_DML_MISMATCH
Buffered DML does not match the existing child cursor
CURSOR_PARTS_MISMATCH
Cursor was compiled with subexecution (cursor parts were executed)
DESCRIBE_MISMATCH
The typecheck heap is not present during the describe for the child cursor
DIFF_CALL_DURN
If Slave SQL cursor/single call
DIFFERENT_LONG_LENGTH
Value of LONG does not match
EXPLAIN_PLAN_CURSOR
The child cursor is an explain plan cursor and should not be shared
FLASHBACK_CURSOR
Cursor non-shareability due to flashback
FLASHBACK_TABLE_MISMATCH
Cursor cannot be shared because there is a mismatch with triggers being enabled and/or referential integrity constraints being deferred
INCOMP_LTRL_MISMATCH
Cursor might have some binds (literals) which may be unsafe/non-data. Value mismatch.
INCOMPLETE_CURSOR
Cursor is incomplete: typecheck heap came from call memory
INST_DRTLD_MISMATCH
Insert direct load does not match the existing child cursor
INSUFF_PRIVS
Insufficient privileges on objects referenced by the existing child cursor
INSUFF_PRIVS_REM
Insufficient privileges on remote objects referenced by the existing child cursor
LANGUAGE_MISMATCH
The language handle does not match the existing child cursor
LITERAL_MISMATCH
Non-data literal values do not match the existing child cursor
LITREP_COMP_MISMATCH
Mismatch in use of literal replacement
LOGICAL_STANDBY_APPLY
Logical standby apply context does not match
LOGMINER_SESSION_MISMATCH
LogMiner Session parameters mismatch
MULTI_PX_MISMATCH
Cursor has multiple parallelizers and is slave-compiled
MV_QUERY_GEN_MISMATCH
Internal, used to force a hard-parse when analyzing materialized view queries
MV_REWRITE_MISMATCH
Cursor needs recompilation because an SCN was used during compile time due to being rewritten by materialized view
MV_STALEOBJ_MISMATCH
Cursor cannot be shared because there is a mismatch in the list of materialized views which were stale at the time the cursor was built
NO_TRIGGER_MISMATCH
Cursor and child have no trigger mismatch
OPTIMIZER_MISMATCH
A change to any of 33 supported parameters such as SORT_AREA_SIZE or OPTIMIZER_INDEX_COST_ADJUSTMENT and 151 unsupported parameters such as _unnest_subquery that change the optimizer environment.
OPTIMIZER_MODE_MISMATCH
Optimizer mode has changed (for example, ALL_ROWS vs CHOOSE)
OUTLINE_MISMATCH
The outlines do not match the existing child cursor
OVERLAP_TIME_MISMATCH
Mismatch caused by setting session parameter ERROR_ON_OVERLAP_TIME
PDML_ENV_MISMATCH
PDML environment does not match the existing child cursor
PLSQL_CMP_SWITCHS_DIFF
PL/SQL anonymous block compiled with different PL/SQL compiler switches. See DBMS_WARNING page of the library.
PQ_SLAVE_MISMATCH
Top-level slave decides not to share cursor
PX_MISMATCH
Mismatch in one parameter affecting the parallelization of a SQL statement. For example, one cursor was compiled with parallel DML enabled while the other was not.
REMOTE_TRANS_MISMATCH
The remote base objects of the existing child cursor do not match
ROLL_INVALID_MISMATCH
Marked for rolling invalidation and invalidation window exceeded
ROW_LEVEL_SEC_MISMATCH
The row level security policies do not match
ROW_SHIP_MISMATCH
Session does not support row shipping, but cursor built in one that did
SEC_DEPTH_MISMATCH
Security level does not match the existing child cursor
SLAVE_QC_MISMATCH
The existing child cursor is a slave cursor and the new one was issued by the coordinator (or, the existing child cursor was issued by the coordinator and the new one is a slave)
SQL_REDIRECT_MISMATCH
SQL redirection mismatch
SQL_TYPE_MISMATCH
The SQL type does not match the existing child cursor
STATS_ROW_MISMATCH
The existing statistics do not match the existing child cursor. May be caused by tracing
STB_OBJECT_MISMATCH
STB has come into existence since cursor was compiled
TOP_LEVEL_DDL_MISMATCH
Is top-level DDL cursor
TOP_LEVEL_RPI_CURSOR
Is top level RPI cursor
TRANSLATION_MISMATCH
The base objects of the existing child cursor do not match. For example objects in different schemas with the same name.
TYPCHK_DEP_MISMATCH
Cursor has typecheck dependencies
TYPECHECK_MISMATCH
The existing child cursor is not fully optimized
UNBOUND_CURSOR
The existing child cursor was not fully built (in other words, it was not optimized)
USER_BIND_PEEK_MISMATCH
Cursor is not shared because value of one or more user binds is different and this has a potential to change the execution plan
Let us see how multiple parent parent/child cursors can be created for SQL statements.
Multiple Parent Cursors
- Created because of differences in SQL statement text
- The following statements all require separate parent cursors:
  . SELECT COUNT(*) FROM employees;
  . Select count(*) from employees;
  . SELECT COUNT(*)  FROM EMPLOYEES;
SYS>ALTER SYSTEM FLUSH SHARED_POOL;
HR>SELECT COUNT(*) FROM employees;
   Select count(*) from employees;
   SELECT COUNT(*) FROM EMPLOYEES;
– Let’s check v$SQLAREA for parent cursors for above statements.
– Note that
   .there is one record for each statement in v$sqlarea as   one parent cursor is created for each sql statement since  each of these statements differ in their text.
   . Each statement has different SQL_ID/HASH_VALUE
   .There is one child per parent cursor (version_count=1)
SYS>col sql_text for a30 word_wrapped
    SELECT SQL_TEXT , SQL_ID, VERSION_COUNT, HASH_VALUE
    FROM V$SQLAREA
    WHERE LOWER(SQL_TEXT) LIKE ‘%select count(*) from employees’
      AND LOWER(SQL_TEXT) NOT LIKE ‘%HASH%';
SQL_TEXT                            SQL_ID        VERSION_COUNT HASH_VALUE
—————————— ————- ————- ———-
SELECT COUNT(*) FROM  2ybtna401c2x2             1                 1444770
employees
SELECT COUNT(*) FROM  4ba5cdz25czfq             1                  3294002646
EMPLOYEES
Select count(*) from               99a1j2spyr81k             1                  736862258
employees
– Let’s check v$SQL for child cursors for above statements.
- Note that
  . Each statement has one child (CHILD# = 0)
  . Each statement uses same execution plan (same PLAN_HASH_VALUE)
SYS>col child_number for 99
    SELECT SQL_TEXT, SQL_ID, CHILD_NUMBER CHILD#, HASH_VALUE, PLAN_HASH_VALUE
     FROM V$SQL
    WHERE LOWER(SQL_TEXT) LIKE ‘%select count(*) from employees’
      AND LOWER(SQL_TEXT) NOT LIKE ‘%HASH%';
SQL_TEXT                           SQL_ID            CHILD# HASH_VALUE PLAN_HASH_VALUE
—————————— ————-    ———- —————— ————————-
SELECT COUNT(*) FROM 2ybtna401c2x2          0    1444770              3580537945
employees
SELECT COUNT(*) FROM 4ba5cdz25czfq           0 3294002646            3580537945
EMPLOYEES
Select count(*) from              99a1j2spyr81k          0  736862258            3580537945
employees
Although same execution plan is being used by the 3 statements still they were individually parsed and their parent cursors are consuming memory.
Hence to enable sharing of the parent cursors, we should follow standard formatting of the sql statements.
Multiple Child Cursors for the same parent cursor
- Can be created for a number of reasons . In this post, I will discuss creation of multiple child cursors due to  differences in:
  . System / Session parameters
  . Object translation
  . Bind variables (Name and value)
  . NLS parameters
-- Difference in System / Session Parameters
Multiple c hild cursors maybe creatd for the same SQL due to difference in system / session parameters
- Let’s verify that different optimizer modes require separate child cursors
- Issue same SQL statement with different values of the parameter optimizer_mode
SYS>ALTER SYSTEM FLUSH SHARED_POOL;
HR>ALTER SESSION SET optimizer_mode = CHOOSE;
   SELECT COUNT(*) FROM EMPLOYEES;
- Change optimizer mode to ALL_ROWS
HR>ALTER SESSION SET optimizer_mode = ALL_ROWS;
   SELECT COUNT(*) FROM EMPLOYEES;
– Let’s check v$SQLAREA for parent cursors for above statements.
– Note that
   .there is ONLY one record for both the statements in v$sqlarea as  one parent cursor is created for both sql statements since  each of these statements are identical in their text.
   . Both the statements have same SQL_ID/HASH_VALUE
   .There are two  childs per parent cursor (version_count=2)
SYS>col sql_text for a30 word_wrapped
    SELECT SQL_TEXT , SQL_ID, VERSION_COUNT, HASH_VALUE
    FROM V$SQLAREA
    WHERE LOWER(SQL_TEXT) LIKE ‘%select count(*) from employees’
      AND LOWER(SQL_TEXT) NOT LIKE ‘%HASH%';
SQL_TEXT                            SQL_ID        VERSION_COUNT   HASH_VALUE
——————————       ————-    ———————   —————–
SELECT COUNT(*) FROM 4ba5cdz25czfq               2                3294002646
EMPLOYEES
- Let’s check v$SQL for child cursors for above statements.
- Note that
  . The statement has two children (CHILD# = 0,1)
  . OPTIMIZER_MODE is stored in V$SQL for each child cursor
SYS>col child_number for 99
    SELECT SQL_TEXT, SQL_ID, CHILD_NUMBER CHILD#, HASH_VALUE,
           child_address, OPTIMIZER_MODE OPT_MODE
     FROM V$SQL
    WHERE LOWER(SQL_TEXT) LIKE ‘%select count(*) from employees’
      AND LOWER(SQL_TEXT) NOT LIKE ‘%hash%';
SQL_TEXT                                SQL_ID            CHILD# HASH_VALUE CHILD_AD OPT_MODE
——————————            ————-        ———- —————– ———— ———-
SELECT COUNT(*) FROM     4ba5cdz25czfq          0      3294002646      4C202DB0 CHOOSE
EMPLOYEES
SELECT COUNT(*) FROM     4ba5cdz25czfq          1      3294002646      4C1FEBF8 ALL_ROWS
EMPLOYEES
– Let’s find out from V$SQL_SHARED_CURSOR why the child cursor was not shared
   Note that in second child cursor was created because of mismatch in  optimizer mode
(OPTIMIZER_MODE_MISMATCH = Y)
SQL>col OPTIMIZER_MODE_MISMATCH for a25
    SELECT S.SQL_TEXT, S.CHILD_NUMBER, s.CHILD_ADDRESS, C.OPTIMIZER_MODE_MISMATCH
    FROM   V$SQL S, V$SQL_SHARED_CURSOR C
    WHERE LOWER(S.SQL_TEXT) LIKE ‘%select count(*) from employees’
      AND LOWER(S.SQL_TEXT) NOT LIKE ‘%hash%’
      AND S.CHILD_ADDRESS = C.CHILD_ADDRESS;
SQL_TEXT                                   CHILD#  CHILD_AD OPTIMIZER_MODE_MISMATCH
——————————             ———–     ——–      ———————————
SELECT COUNT(*) FROM                0       4C202DB0                       N
EMPLOYEES
SELECT COUNT(*) FROM                1       4C1FEBF8                       Y
EMPLOYEES
OBJECT TRANSLATIONS
If a statement references different objects with the same name then multiple child cursors can be generated
For example, we will create two tables with same name  t1 with same structure in HR and SCOTT schema and issue identical SQl statement referring to the two tables.
SYS>ALTER SYSTEM FLUSH SHARED_POOL;
HR>CREATE TABLE t1 (c1 NUMBER);
SELECT c1 FROM t1;
SCOTT>CREATE TABLE t1 (c1 NUMBER);
SELECT c1 FROM t1;
The statement SELECT c1 FROM t1 will have a shared parent cursor, but multiple child cursors
                Parent (select c1 from t1;)
                    +
     +————–+————-+
     |                               |
   Child I                    Child II
  (HR.t1)                   (SCOTT.t1)
– Let’s check v$SQLAREA for parent cursors for above statements.
It can be seen that only one parent cursor has been created with 2 child cursors (version_count=2)
SYS>col sql_text for a30 word_wrapped
    SELECT SQL_TEXT , SQL_ID, VERSION_COUNT, HASH_VALUE
    FROM V$SQLAREA
    WHERE SQL_TEXT LIKE ‘%SELECT c1 FROM t1%’
      AND LOWER(SQL_TEXT) NOT LIKE ‘%HASH%';
SQL_TEXT                                                       SQL_ID        VERSION_COUNT HASH_VALUE
————————————————-         ————- ————- ————————–
SELECT COUNT(*) FROM EMPLOYEES 4ba5cdz25czfq             2         3294002646
- Let’s check v$SQL for child cursors for above statements.
SYS>col child_number for 99
    SELECT SQL_TEXT, SQL_ID, CHILD_NUMBER CHILD#, HASH_VALUE,
           PLAN_HASH_VALUE, USERNAME
     FROM V$SQL S, DBA_USERS U
    WHERE SQL_TEXT LIKE ‘%SELECT c1 FROM t1%’
      AND LOWER(SQL_TEXT) NOT LIKE ‘%HASH%’
      AND U.USER_ID = S.PARSING_USER_ID;
SQL_TEXT                             SQL_ID            CHILD# HASH_VALUE PLAN_HASH_VALUE USER
——————————     ————- ———- ———- ————— —————– ——-
SELECT c1 FROM t1             68buhgvjct1pg          0   3805054639        3617692013                 HR    SELECT c1 FROM t1            68buhgvjct1pg          1   3805054639        3617692013          SCOTT
- Note that
  . The statement has two children (CHILD# = 0,1) because they refer to different objects
  . ID of the user executing the statement is stored in V$SQL (parsing_user_id)
    for each child cursor

– Find out why the child cursor was not shared

  SQL>col TRANSLATION_MISMATCH for a25
    SELECT S.SQL_TEXT, S.CHILD_NUMBER, s.CHILD_ADDRESS, C.TRANSLATION_MISMATCH
    FROM   V$SQL S, V$SQL_SHARED_CURSOR C
    WHERE SQL_TEXT LIKE ‘%SELECT c1 FROM t1%’
      AND LOWER(S.SQL_TEXT) NOT LIKE ‘%HASH%’
      AND S.CHILD_ADDRESS = C.CHILD_ADDRESS;
SQL_TEXT                       CHILD_NUMBER CHILD_AD TRANSLATION_MISMATCH
—————————— —————— ————- ———————————
SELECT c1 FROM t1                         0            4C3181C8                       N
SELECT c1 FROM t1                         1            4C3EE02C                      Y
 Note that in second child cursor was created because of translation mismatch
   (TRANSLATION_MISMATCH = Y)
BIND VARIABLES (SIZE AND VALUE)
Multiple child cursors are created for different lengths and values of bind variables .
For example , we will issue the same statement containing  bind variable twice but with different size of the bind variable.
SYS>ALTER SYSTEM FLUSH SHARED_POOL;
HR>DROP TABLE T1 PURGE;
   CREATE TABLE t1 (c1 VARCHAR2(50),c2 NUMBER);
VARIABLE v1 VARCHAR2(30);
SELECT c2 FROM t1 WHERE c1 = :v1;
VARIABLE v1 VARCHAR2(40);
SELECT c2 FROM t1 WHERE c1 = :v1;
– Let’s check v$SQLAREA for parent cursors for above statements.
– Note that two child cursors are created (VERSION_COUNT = 2)

SYS>col sql_text for a35 word_wrapped

    SELECT SQL_TEXT , SQL_ID, VERSION_COUNT, HASH_VALUE
    FROM V$SQLAREA
    WHERE sql_text LIKE ‘SELECT c2 FROM t1 WHERE c1 = %';
SQL_TEXT                            SQL_ID        VERSION_COUNT HASH_VALUE
———————————– ————- ————- ———————–
SELECT c2 FROM t1     c0w72qhanq1zs             2                            357238776
WHERE c1 = :v1

- Let’s check v$SQL for child cursors for above statements. 

- Note that
  . The statement has two children (CHILD# = 0,1) because size of bind variables is different although the execution plan is same (same PLAN_HASH_VALUE)

SYS>col child_number for 99

    SELECT SQL_TEXT, SQL_ID, CHILD_NUMBER CHILD#, HASH_VALUE,
           PLAN_HASH_VALUE
     FROM V$SQL
    WHERE sql_text LIKE ‘SELECT c2 FROM t1 WHERE c1 = %';
SQL_TEXT                            SQL_ID            CHILD# HASH_VALUE PLAN_HASH_VALUE
———————————– ————- ———- ———- ——————————
SELECT c2 FROM t1     c0w72qhanq1zs          0                357238776              3617692013
WHERE c1 = :v1
SELECT c2 FROM t1     c0w72qhanq1zs           1               357238776                    3617692013
WHERE c1 = :v1
– Find out why the child cursor was not shared
   Note that in second child cursor was created because of bind variable length mismatch
   (BIND_LENGTH_UPGRADEABLE = Y)
SQL>col BIND_LENGTH_UPGRADEABLE  for a25
    SELECT S.SQL_TEXT, S.CHILD_NUMBER, s.CHILD_ADDRESS,BIND_LENGTH_UPGRADEABLE
    FROM   V$SQL S, V$SQL_SHARED_CURSOR C
    WHERE sql_text LIKE ‘SELECT c2 FROM t1 WHERE c1 = %’
      AND S.CHILD_ADDRESS = C.CHILD_ADDRESS;
SQL_TEXT                            CHILD_NUMBER CHILD_AD BIND_LENGTH_UPGRADEABLE
———————————– —————–   ——– —————————————–
SELECT c2 FROM t1                         0                     4C26D7E0             N
WHERE c1 = :v1
SELECT c2 FROM t1                         1                      4C252500             Y
WHERE c1 = :v1
NLS PARAMETERS
Multiple child cursors are created for the same statement for different values of NLS parameters. Only a subset of SQL statements are affected which use
Dates
Currency
Ordering
For example, we will issue the same SQL statement  with different values of NLS_LANGUAGE
HR>VAR b1 VARCHAR2(30);
EXECUTE :b1 := SYSDATE;
ALTER SESSION SET NLS_LANGUAGE = ‘AMERICAN‘;
SELECT TO_CHAR (TO_DATE (:b1,’DD-MON-YYYY’),’yyyymmdd’) FROM dual;
ALTER SESSION SET NLS_LANGUAGE = ‘GERMAN';
SELECT TO_CHAR (TO_DATE (:b1,’DD-MON-YYYY’),’yyyymmdd’) FROM dual;
– Let’s check v$SQLAREA for parent cursors for above statements.
– Note that two child cursors are created (VERSION_COUNT = 2)
SYS>col sql_text for a35 word_wrapped
    SELECT SQL_TEXT , SQL_ID, VERSION_COUNT, HASH_VALUE
    FROM V$SQLAREA
    WHERE sql_text LIKE ‘SELECT TO_CHAR (TO_DATE (:b1%';
SQL_TEXT                                              SQL_ID              VERSION_COUNT HASH_VALUE
—————————————– ————-           —————————- ———-
SELECT TO_CHAR (TO_DATE             5588uyghqp14w             2                     3781854364
(:b1,’DD-MON-YYYY’),’yyyymmdd’)
FROM dual
- Let’s check v$SQL for child cursors for above statements. 
- Note that
  . The statement has two children (CHILD# = 0,1) because of the difference in NLS_LANGUAGE parameter
SYS>col child_number for 99
    SELECT SQL_TEXT, SQL_ID, CHILD_NUMBER CHILD#, HASH_VALUE,
           PLAN_HASH_VALUE
     FROM V$SQL
   WHERE sql_text LIKE ‘SELECT TO_CHAR (TO_DATE (:b1%';
SQL_TEXT                                        SQL_ID          CHILD# HASH_VALUE PLAN_HASH_VALUE
———————————– ————- ———- ———- ————————————–
SELECT TO_CHAR (TO_DATE      5588uyghqp14w          0           3781854364      1388734953
(:b1,’DD-MON-YYYY’),’yyyymmdd’)
FROM dual
SELECT TO_CHAR (TO_DATE             5588uyghqp14w    1           3781854364      1388734953
(:b1,’DD-MON-YYYY’),’yyyymmdd’)
FROM dual
– Find out why the child cursor was not shared
   Note that in second child cursor was created because oflanguage  mismatch
   LANGUAGE_MISMATCH = Y)
SQL>col LANGUAGE_MISMATCH  for a25
    SELECT S.SQL_TEXT, S.CHILD_NUMBER, s.CHILD_ADDRESS,LANGUAGE_MISMATCH
    FROM   V$SQL S, V$SQL_SHARED_CURSOR C
   WHERE sql_text LIKE ‘SELECT TO_CHAR (TO_DATE (:b1%’
      AND S.CHILD_ADDRESS = C.CHILD_ADDRESS;
SQL_TEXT                                         CHILD_NUMBER CHILD_AD  LANGUAGE_MISMATCH
—————————————- ———— —————————————————
SELECT TO_CHAR (TO_DATE                        0            4A0EFF20                  N
(:b1,’DD-MON-YYYY’),’yyyymmdd’)
FROM dual
SELECT TO_CHAR (TO_DATE                        1            4A0EFADC                 Y
(:b1,’DD-MON-YYYY’),’yyyymmdd’)
FROM dual
Conclusion:
- To enable sharing of parent cursors we should follow code formatting standards
- To enable sharing of child cursors we should ensure that
  . Optimizer mode is not changed in sessions.
  . Length of bind variables used should be same.
  . Values of NLS parameters should be same.
————————————————————————
Related links :

Home

Database Index
Tuning Index
Cursor Sharing Demystified 
Library Cache Lock And Pin Demonstrated
Latches, Locks, Pins And Mutexes 
Shared Pool Architecture 

                                 

                                                   ————————-

                                 

LIBRARY CACHE LOCK AND PIN DEMONSTRATED

In this post, I will demonstrate library cache locks and pins
.– Start 4 SYS sessions and one HR session
– Flush buffer cache
SYS1>alter system flush shared_pool;

– Check that there are no objects belonging to HR or whose name is like employees
which are in library cache presently
Only HR user is there

SYS1>select kglnaown, kglnaobj from x$kglob
 where kglnaown='HR'
 or kglnaobj like '%employees%'
 and kglnaobj not like '%kgl%'
 /

KGLNAOWN                                                         KGLNAOBJ
—————————————————————- ————————-
HR                                                               HR

– create a procedure tst_employees –

HR1>create or replace procedure tst_employees as
 begin
 for i in 1..100 loop
 dbms_lock.sleep(5);
 update employees set salary=salary;
 end loop;
 end;
 /

– Check that following objects belonging to HR or whose name is like employees   are loaded in  library cache
– HR (user)
– TST_EMPLOYEES procedure
– EMPLOYEES table (accessed in procedure)

SYS1>select kglnaown, kglnaobj from x$kglob
 where kglnaown='HR'
 or kglnaobj like '%employees%'
 and kglnaobj not like '%kgl%'
 /

KGLNAOWN                                                         KGLNAOBJ
—————————————————————- ————————-
HR                                                               TST_EMPLOYEES
BEGIN tst_employees; END;
BEGIN tst_employees; END;
HR                                                               HR
HR                                                               EMPLOYEES

– Check for locks on objects belonging to HR or whose name is like employees

   Only HR (user ) is there since the user’s session is open

SYS2>select  l.kglnaobj
 from x$kgllk l, x$kglob o
 where l.kgllkhdl=o.kglhdadr
 and (o.kglnaown='HR' or o.kglnaobj like '%employees%')
 and o.kglnaobj not like '%kgl%'
 /

KGLNAOBJ
——————————
HR

– Logout from HR session –

HR>Exit

– Check that lock on HR is also gone

SYS2>select  l.kglnaobj
 from x$kgllk l, x$kglob o
 where l.kgllkhdl=o.kglhdadr
 and (o.kglnaown='HR' or o.kglnaobj like '%employees%')
 and o.kglnaobj not like '%kgl%'
 /

– Login as HR user again –

$sqlplus hr/hr

~– Check for pins on objects belonging to HR or whose name is like employees
Note that there are no pins

SYS3>select  o.kglnaobj
 from x$kglpn l, x$kglob o
 where l.kglpnhdl=o.kglhdadr
 and (o.kglnaown='HR' or o.kglnaobj like '%employees%')
 and o.kglnaobj not like '%kgl%'
 /

– Now , I will demonstrate that pin is obtained while a procedure is being executed

– Eexcute the procedure tst_employees

HR>exec tst_employees;

– Check for locks on objects belonging to HR or whose name is like employees

SYS2>select  l.kglnaobj
 from x$kgllk l, x$kglob o
 where l.kgllkhdl=o.kglhdadr
 and (o.kglnaown='HR' or o.kglnaobj like '%employees%')
 and o.kglnaobj not like '%kgl%'
 /

KGLNAOBJ
——————————
TST_EMPLOYEES
BEGIN tst_employees; END;
BEGIN tst_employees; END;
HR

– Check for pins on objects belonging to HR or whose name is like employees     repeatedly.Note that procedure tst_employees is pinned as it is being executed

SYS3>select  o.kglnaobj
 from x$kglpn l, x$kglob o
 where l.kglpnhdl=o.kglhdadr
 and (o.kglnaown='HR' or o.kglnaobj like '%employees%')
 and o.kglnaobj not like '%kgl%'
 /

KGLNAOBJ
——————————————————————————–
TST_EMPLOYEES

– Flush the shared pool while the procedure is still executing

SYS4>alter system flush shared_pool;

– Check that pinned objects have not been flushed out

SYS1>select kglnaown, kglnaobj from x$kglob
 where kglnaown='HR'
 or kglnaobj like '%employees%'
 and kglnaobj not like '%kgl%'
 /

KGLNAOWN                                                         KGLNAOBJ
—————————————————————- ————————-
HR                                                               TST_EMPLOYEES
BEGIN tst_employees; END;
BEGIN tst_employees; END;
HR                                                               HR
HR                                                               EMPLOYEES

– Wait till the execution of the procedure is over or abort it–

~– Check for pins on objects belonging to HR or whose name is like employees

    Note that there are no pins as soon as the procedure stops executing.

SYS3>select  o.kglnaobj
 from x$kglpn l, x$kglob o
 where l.kglpnhdl=o.kglhdadr
 and (o.kglnaown='HR' or o.kglnaobj like '%employees%')
 and o.kglnaobj not like '%kgl%'
 /

no rows selected

– Check for locks on objects belonging to HR or whose name is like employees    after the executionof the procedure is over
   Note that lock(Parse lock) is still there

SYS2>select  l.kglnaobj
 from x$kgllk l, x$kglob o
 where l.kgllkhdl=o.kglhdadr
 and (o.kglnaown='HR' or o.kglnaobj like '%employees%')
 and o.kglnaobj not like '%kgl%'
 /

KGLNAOBJ
——————————
TST_EMPLOYEES
BEGIN tst_employees; END;
BEGIN tst_employees; END;
HR

– Recompile the procedure –

HR>alter procedure tst_employees compile;

–    Note that lock is not  there (parse lock broken)

SYS2>select  l.kglnaobj
 from x$kgllk l, x$kglob o
 where l.kgllkhdl=o.kglhdadr
 and (o.kglnaown='HR' or o.kglnaobj like '%employees%')
 and o.kglnaobj not like '%kgl%'
 /

KGLNAOBJ
——————————
HR
alter procedure tst_employees
compile

Hence,
– A procedure is pinned while it is executing
– Parse lock remains even after execution is over
– Parse lock is broken when the procedure is recompiled.

– Now I will demonstrate that lib cache pin is also obtained when u gather statististics     for an object

– Gather statistics for employees table in a loop

 HR>begin
 for i in 1..1000 loop
 execute immediate 'analyze table hr.employees compute statistics';
 end loop;
 end;
 /

- Check that  pins are obtained on employees table while the statistics are being gathered in the HR session
   As soon the statistics gathering is over, pins are also gone

SYS3>select  'Lib cache pin' Type, o.kglnaobj
 from x$kglpn l, x$kglob o
 where l.kglpnhdl=o.kglhdadr
 and (o.kglnaown='HR' or o.kglnaobj like '%employees%')
 and o.kglnaobj not like '%kgl%'
 /

TYPE          KGLNAOBJ
————- ——————————
Lib cache pin EMPLOYEES

– cleanup –

HR>drop procedure tst_employees;

Conclusion:

– The lock on a procedure is obtaned when it is executed and remains there even after
the execution is over

– The locks obtained on a procedure in library cache (parse locks) are broken when the
procedure is recompiled

– Pins in library cache are obtained when a procedure is executing

– Objects being executed are not flushed out even after flushing the shared pool

– Pins are obtained on a table while its statistics are being gathered.

References:

https://rajat1205sharma.wordpress.com/2015/03/31/library-cache-wait-events-in-rac/

————————————————————————————–

Related links :

Home

Database Index
Tuning Index
Cursor Sharing Demystified 
Latches, Locks, Pins And Mutexes 
Parent And Child Cursors In Oracle
Shared Pool Architecture

                                    ———————

ORACLE CHECKPOINTS

In this post, I will explain about checkpoints – their purpose and different types of checkpoints.
PURPOSE OF CHECKPOINTS
   Database blocks are temporarily stored in Database buffer cache. As blocks are read, they are stored in DB buffer cache so that if any user accesses them later, they are available in memory and need not be read from the disk. When we update any row, the buffer in DB buffer cache corresponding to the block containing that row is updated in memory. Record of the change made is kept in redo log buffer . On commit, the changes we made are written to the disk thereby making them permanent. But where are those changes written? To the datafiles containing data blocks? No !!! The changes are recorded in online redo log files by flushing the contents of redo log buffer to them.This is called write ahead logging.  If the instance crashed right now, the DB buffer cache will be wiped out but on restarting the database, Oracle will apply the changes recorded in redo log files to the datafiles.
    Why doesn’t Oracle write the changes to datafiles right away when we commit the transaction? The reason is simple. If it chose to write directly to the datafiles, it will have to physically locate the data block in the datafile first and then update it which means that after committing, user has to wait until DBWR searches for the block and then writes it before he can issue next command. Moreover, writing to datafiles is performed in  units of Oracle data blocks. Each block may contain multiple rows. Modifying even one column in one row of a block will necessitate writing whole of the block. This will bring down the performance drastically. That is where the role of redo logs comes in. The writes to the redo logs are sequential writes – LGWR just dumps the info in redologs to log files sequentially and synchronously so that the user does not have to wait for long. Moreover, in contrast to DBWR which writes data blocks, LGWR will write only the changes vectors . Hence,  write ahead logging also improves performance by reducing the amount of data written synchronously. When will the changes be applied to the datablocks in datafiles? The data blocks in the datafiles will be updated by the DBWR asynchronously in response to certain triggers. These triggers are called checkpoints.
  Checkpoint is a synchronization event at a specific point in time which causes some / all dirty blocks to be written to disk thereby guaranteeing that blocks dirtied prior to that point in time get written.
  Whenever dirty blocks are written to datafiles, it allows oracle
- to reuse a redo log : A redo log can’t be reused until DBWR writes all the dirty blocks protected by that logfile to disk. If we attempt to reuse it before DBWR has finished its checkpoint, we get the following message in alert log : Checkpoint not complete.
- to reduce instance recovery time : As the memory available to a database instance increases, it is possible to have database buffer caches as large as several million buffers. It requires that the database checkpoint advance frequently to limit recovery time, since infrequent checkpoints and large buffer caches can exacerbate crash recovery times significantly.
- to free buffers for reads : Dirtied blocks can’t be used to read new data into them until they are written to disk. Thus DBWrR writes dirty blocks from the buffer cache, to make room in the cache.

Various types of checkpoints  in Oracle :

– Full checkpoint

– Thread checkpoint

- File checkpoint
- Parallel Query checkpoint
- Object checkpoint
- Log switch checkpoint
_ Incremental checkpoint
Whenever a checkpoint is triggered :
- DBWR writes some /all dirty blocks to datafiles
- CKPT process updates the control file and datafile headers
                      FULL CHECKPOINT
- Writes block images to  the database for all dirty buffers from all instances.
- Statistics updated
  . DBWR checkpoints
  . DBWR checkpoint buffers written
  . DBWR thread checkpoint buffers written
- Caused by :
  . Alter system checkpoint [global]
  . ALter database begin backup
  . ALter database close
  . Shutdown [immediate]
- Controlfile and datafile headers are updated
  . Checkpoint_change#
                   THREAD CHECKPOINT
 – Writes block images to the database for all dirty buffers from one instance
- Statistics updated
  . DBWR checkpoints
  . DBWR checkpoint buffers written
  . DBWR thread checkpoint buffers written
- Caused by :
  . Alter system checkpoint local
- Controlfile and datafile headers are updated
  . Checkpoint_change#
                   FILE CHECKPOINT
   When a tablespace is put into backup mode or take it offline, Oracle writes all the dirty blocks from the tablespace to disk before changing the state of the tablespace.
- Writes block images to the database for all dirty buffers for all files of a tablespace from all instances
- Statistics updated
  . DBWR checkpoints
  . DBWR tablespace checkpoint buffers written
  . DBWR checkpoint buffers written
- Caused by :
  . Alter tablespace xxx offline
  . Alter tablespace xxx begin backup
  . Alter tablespace xxx read only
- Controlfile and datafile headers are updated
  . Checkpoint_change#
                  PARALLEL QUERY CHECKPOINT
   Parallel query often results in direct path reads (Full tablescan or index fast full scan). This means that blocks are read straight into the session’s PGA, bypassing the data cache; but that means if there are dirty buffers in the data cache, the session won’t see the most recent versions of the blocks unless they are copied to disk before the query starts – so parallel queries start with a checkpoint.
- Writes block images to the database for all dirty buffers belonging to objects accessed by the query from all instances.
- Statistics updated
  . DBWR checkpoints
  . DBWR checkpoint buffers written
- Caused by :
  . Parallel Query
  . Parallel Query component of Parallel DML (PDML) or Parallel DDL (PDDL)
- Mandatory for consistency
- Controlfile and datafile headers are updated
  . Checkpoint_change#
                 OBJECT CHECKPOINT
   When an object is dropped/truncated, the session initiates an object checkpoint telling DBWR to copy any dirty buffers for that object to disk and the state of those buffers is changed to free.
- Writes block images to the database for all dirty buffers belonging to an object from all instances.
- Statistics updated
  . DBWR checkpoints
  . DBWR object drop buffers written
- Caused by dropping or truncating a segment:
  . Drop table XXX
  . Drop table XXX Purge
  . Truncate table xxx
  . Drop index xxx
- Mandatory for media recovery purposes
- Controlfile and datafile headers are updated
  . Checkpoint_change#
                LOG SWITCH CHECKPOINT
- Writes the contents of the  dirty buffers whose information is protected by a redo log to the database .
- Statistics updated
  . DBWR checkpoints
  . DBWR checkpoint buffers written
  . background checkpoints started
  . background checkpoints completed
- Caused by log switch
 – Controlfile and datafile headers are updated
  . Checkpoint_change#
                  INCREMENTAL CHECKPOINT
   Prior to Oracle 8i, only well known checkpoint was log switch checkpoint. Whenever LGWR filled an online logfile, DBWR would go into a frenzy writing data blocks to disks, and when it had finished, Oracle would update each data file header block with the SCN to show that file was updated up to that point in time.
   Oracle 8i introduced incremental checkpointing which triggered DBWR to write some dirty blocks from time to time so as to advance the checkpoint and reduce the instance recovery time.
Incremental checkpointing has been implemented using two algorithms :
 – Ageing algorithm
- LRU/TCH algorithm
                                               AGEING ALGORITHM
This strategy involves writing changed blocks that have been dirty for the longest time and  is called aging writes. This algorithm relies on the CKPT Q running thru the cache and buffers being linked to the end of this list the first time they are made dirty.
   .The LRU list contains all the buffers – free / pinned / dirty. Whenever a buffer in LRU list is dirtied, it is placed in CKPT Q as well i.e. a buffer can  simultaneously have pointers in both LRU list and CKPT Q but the buffers in CKPT Q are arranged in the order in which they were dirtied.Thus,  checkpoint queue contains dirty blocks in the order of SCN# in which they were dirtied
  Every 3 secs DBWR wakes up and checks if there are those many  dirty buffers in CKPT Q which need to br written so as to satisfy instance recovery requirement..
If those many or more dirty buffers are not found,
   DBWR goes to sleep
else (dirty buffers found)
  .CKPT target RBA is calculated based on
   – The most recent RBA
   – log_checkpoint_interval
   – log_checkpoint_timeout
   – fast_start_mttr_target
   – fast_start_io_target
   – 90% of the size of the smallest redo log file
   . DBWR walks the CKPT Q from the low end (dirtied earliest) of the redo log file collecting buffers for writing to disk until it reaches the buffer that is more recent than the target RBA. These buffers are placed in write list-main.
  . DBWR walks the write list-main and checks all the buffers
    – If changes made to the buffer have already been written to redo log files
        . Move those buffers to write-aux list
      else
        . Trigger LGWR to write changes to those buffers to redo logs
        . Move those buffers to write-aux list
  . Write buffers from write-aux list to disk
  . Update checkpoint RBA in SGA
  . Delink those buffers from CKPT Q
  . Delink those buffers from write-aux list
- Statistics Updated :
   . DBWR checkpoint buffers written
- Controlfile updated every 3 secs by CKPT
   . Checkpoint progress record
   As sessions link buffers to one end of the list, DBWR can effectively unlink buffers from the other end and copy them to disk. To reduce contention between DBWR and foreground sessions, there are two linked lists in each working set so that foreground sessions can link buffers to one while DBWR is unlinking them from the other.
                                                          LRU/TCH ALGORITHM
 LRU/TCH algorithm writes the cold dirty blocks to disk that are on the point of being pushed out of cache.
   As per ageing algorithm, DBWR will wake up every 3 seconds to flush dirty blocks to disk. But if blocks get dirtied at a fast pace during those 3 seconds and a server process needs some free buffers, some buffers need to be flushed to the disk to make room. That’s when LRU/TCH algorithm is used to write those dirty buffers which are on the cold end of the LRU list.
    Whenever a server process needs some free buffers to read data, it scans the LRU list from its cold end to look for free buffers.
While searching
  If unused buffers found
    Read blocks from disk into the buffers and link them to the corresponding hash bucket
   if it finds some clean buffers (contain data but not dirtied or dirtied and have been flushed to disk),
      if they are the candidates to be aged out (low touch count)
          Read blocks from disk into the buffers and link them to the corresponding hash bucket
      else (have been accessed recently and should not be aged out)
         Move them to MRU end depending upon its touch count.
   If it finds dirty buffers (they are already in CKPT Q),
     Delink them from LRU list
     Link them  to the write-main list (Now these buffers are in CKPT Q and write-main list)
   The server process scans a threshold no. of buffers (_db_block_max_scan_pct = 40(default)). If it does not find required no. of free buffers,
    It triggers DBWR to dirty blocks in write-mainlist to disk
 . DBWR walks the write list-main and checks all the buffers
    – If changes made to the buffer have already been written to redo log files
           . Move those buffers to write-aux list
       else
          . Trigger LGWR to write changes to those buffers to redo logs
          . Move those buffers to write-aux list
  . Write buffers from write-aux list to disk
  . Delink those buffers from CKPT Q and w rite-aux list
  . Link those buffers to LRU list as free buffers
 Note that
- In this algorithm, the dirty blocks are delinked from LRU list before linking them to write-main list in contrast to ageing algorithm where the blocks can be simultaneously be in both CKPT Q and LRU list.
 – In this algorithm, checkpoint is not advanced because it may be possible that the dirty blocks on the LRU end may actually not be the ones which were dirtied earliest. They may be there because the server process did not move them to the MRU end earlier. There might be blocks present in CKPT Q which were dirtied earlier than the blocks in question.
I hope the information was useful. Thanks for your time.
 Keep visiting the blog…
References:
—————————————————————————————–
Related links :

11G R2 RMAN