CLUSTERING FACTOR DEMYSTIFIED : PART – III

 

How to resolve the performance issues due to high clustering factor?

In my earlier post, Clustering Factor Demystified : Part – I, I had discussed that to improve the Clustering Factor,  the table must be rebuilt (and reordered). The data retrieval can be considerably speeded up by physically sequencing the rows in the same order as the key column. If we can group together the rows for a key value,  we can get all of the row with a single block read because the rows are together.  To achieve this goal, various methods may be used. In the post Clustering Factor Demystified : Part -II, I had demonstrated Manual Row Re-sequenciung (CTAS with order by) which pre-orders data to avoid expensive disk sorts after retrieval. In this post, I will demonstrate the use of  Single table hash clusters  and  Single table index clusters which  clusters related rows together onto the same data block .

Overview: 

  •  Create a table organized which contains two columns  -  id(number) and txt (char)- Populate the table insert 34 records for each value of id where id ranges from 1 to 100- In this case as records are added sequentially,  records for a key value are stored together
  •  Create another table unorganized which is a replica of the ‘organized’ table but records are inserted in a random manner so that records for a key value may be scattered across different blocks .
  •  Create a single table index cluster  table from ‘unorganized’ table using CTAS.
  •  Create a single table hash cluster table from ‘unorganized’  table using CTAS
  •  Trace  the query using exact match  on three tables and verify that hash cluster table gives the best performance .
  •  Trace the query using range scan  on three tables and verify that index cluster table gives the best performance .
  •  Verify that index and hash cluster tables have better clustering factor .
 Implementation:
- Create a table organized which contains two columns  -  id(number) and txt (char)
- Populate the table insert 34 records for each value of id where id ranges from 1 to 100
- In this case as records are added sequentially,  records for a key value are stored together
SQL> drop table organized purge;
           create table organized (id number(3), txt char(900));
           begin
           for i in 1..100 loop
               insert into organized select i, lpad(‘x’, 900, ‘x’)
               from    dba_objects where rownum < 35;
           end loop;
          end;
          /
-  create another table unorganized which is a replica of the ‘organized’ table but records are inserted in a random manner so that records for a key value may be scattered across different blocks (order by dbms_random.random).
SQL> drop table unorganized purge;
           create table unorganized as select * from organized order by dbms_random.random;

           create index unorganized_idx on unorganized(id);

        exec dbms_stats.gather_table_stats(USER, ‘unorganized’, estimate_percent => 100, method_opt=> ‘for all indexed columns size 254′);

- Create a single table index cluster  table from ‘unorganized’ table using CTAS.

– Create a cluster with size = blocksize = 8k and index it

SQL> drop cluster index_cluster including tables;
           create cluster index_cluster
            ( id number(3) )
          size 8192;


          create index index_cluster_idx
           on cluster index_cluster;
        drop table index_cluster_tab purge;       
       create table index_cluster_tab 
           cluster index_cluster( id )
           as select * from unorganized ;
 - Create a single table hash cluster table from ‘unorganized’  table using CTAS 
 SQL>drop tablespace mssm including contents and datafiles;  
        Create tablespace mssm datafile ‘/u01/app/oracle/oradata/orcl/mssm01.dbf’ size 100m     segment space management manual;   
         drop cluster hash_cluster including tables;
           create cluster Hash_cluster
           ( id number(3) )
             size 8192 single table hash is id hashkeys 100 tablespace mssm; 
        drop table hash_cluster_tab purge;     
        create table hash_cluster_tab          cluster hash_cluster(id)
           as select * from unorganized;
begin
    dbms_stats.gather_table_stats
    ( user, ‘UNORGANIZED’, cascade=>true );    dbms_stats.gather_table_stats
    ( user, ‘INDEX_CLUSTER_TAB’, cascade=>true );    dbms_stats.gather_table_stats
    ( user, ‘HASH_CLUSTER_TAB’, cascade=>true );
End;
/
- Find out no. of blocks across which records of a key value are spread in the two tables.- Note that in ‘unorganized’  table,  records  for an id are scattered across more than 30 blocks whereas in index_cluster_tab and hash_cluster_tab tables, records for each id are clustered i.e. records for each key value are spread across 5 blocks only.
SQL> select unorg.id id, unorg.cnt unorganized_blocks,
           idx_tab.cnt index_cluster_blocks, hash_tab.cnt hash_cluster_blocks
         from
             ( select id,  count(distinct(dbms_rowid.ROWID_BLOCK_NUMBER(rowid))) cnt
             from unorganized     
              group by id) unorg,
           ( select id,  count(distinct(dbms_rowid.ROWID_BLOCK_NUMBER(rowid))) cnt
             from index_cluster_tab     
              group by id) idx_tab,
           ( select id,  count(distinct(dbms_rowid.ROWID_BLOCK_NUMBER(rowid))) cnt
             from hash_cluster_tab     
              group by id) hash_tab 
        where idx_tab.id = unorg.id
          and hash_tab.id = unorg.id
        order by id;
ID              UNORGANIZED_BLOCKS INDEX_CLUSTER_BLOCKS HASH_CLUSTER_BLOCKS
———-             ——————                 ——————–                                     ——————-

1                 33                    5                   5

2                 34                    5                   5

3                 33                    5                   5

4                 33                    5                   5

5                 33                    5                   5

6                 32                    5                   5

7                 34                    5                   5

8                 34                    5                   5

9                 33                    5                   5

10                 32                    5                   5

90                 34                    5                   5

91                 34                    5                   5

92                 30                    5                   5

93                 34                    5                   5

94                 34                    5                   5

95                 34                    5                   5

96                 34                    5                   5

97                 33                    5                   5

98                 33                    5                   5

99                 34                    5                   5

100               34                    5                   5

- Trace  the query using exact match  on three tables and verify that hash cluster table gives the best performance .
– Let’s compare the statistics when rows for all the id’s are retrieved in succession from the three tables
conn / as sysdba
alter session set tracefile_identifier = ‘cluster_factor’;
alter session set sql_trace=true;
declare
    type tab_row is table of unorganized%rowtype;
    tab_rows tab_row;
    type id_val is table of unorganized.id%type;
    id_vals id_val;   
begin
    select distinct id bulk collect into id_vals
      from unorganized;   for k in id_vals.first .. id_vals.last
    loop
        select * bulk collect into tab_rows
          from unorganized
         where id = k;
    end loop;
end;
/
declare
    type tab_row is table of index_cluster_tab%rowtype;
    tab_rows tab_row;
    type id_val is table of index_cluster_tab.id%type;
    id_vals id_val;    begin
    select distinct id bulk collect into id_vals
      from index_cluster_tab;    for k in id_vals.first .. id_vals.last
    loop
        select * bulk collect into tab_rows
          from index_cluster_tab
         where id = k;
    end loop;
end;
/declare
    type tab_row is table of hash_cluster_tab%rowtype;
    tab_rows tab_row;
    type id_val is table of hash_cluster_tab.id%type;
    id_vals id_val;    begin
    select distinct id bulk collect into id_vals
      from hash_cluster_tab;    for k in id_vals.first .. id_vals.last
    loop
        select * bulk collect into tab_rows
          from hash_cluster_tab
         where id = k;
    end loop;
end;
/
– Find out the name of trace file generatedSQL> col trace_file for a100
           select  value trace_file from v$diag_info
           where upper(name) like ‘%TRACE FILE%’;
TRACE_FILE
—————————————————————————————————
/u01/app/oracle/diag/rdbms/orcl1/orcl1/trace/orcl1_ora_24963_cluster_factor.trc– Run tkprof utility on the trace file generated $cd /u01/app/oracle/diag/rdbms/orcl/orcl/trace
   rm cluster_factor.out   tkprof /u01/app/oracle/diag/rdbms/orcl1/orcl1/trace/orcl1_ora_8403_cluster_factor.trc cluster_factor.out

   vi cluster_factor.out

********************************************************************************
Here are the contents of the trace file:

In case of unorganized table, it can be seen that no. of blocks visited (3517) is approaches the number of rows (3400) in the table as rows for an id  are scattered across a large no. of blocks.

SQL ID: 0npa78p7jkfa5

Plan Hash: 1120857569

SELECT *
FROM
UNORGANIZED WHERE ID = :B1

call     count       cpu    elapsed       disk      query    current        rows
——- ——  ——– ———- ———- ———- ———-  ———-
Parse        1      0.00       0.00          0          0          0           0
Execute    100      0.00       0.00          0          0          0           0
Fetch      100      0.01       0.02          0       3517          0        3400
——- ——  ——– ———- ———- ———- ———-  ———-
total      201      0.01       0.02          0       3517          0        3400

*******************************************************************************
In case of single table index cluster, 
Total  I/O’s  = I/O’s against the table + I/O’s against the table

i/O’s against the table = no. of table blocks across which various records for different id’s are stored

Since we saw earlier that records for each key value are scattered across 5 blocks,

I/O’s against the table =  no. of distinct key values (id’s) * no. of blocks occupied by records for an id
= 100 * 5
= 500

Rest 100 I/O’s are made against against the index ( one I/O for each key value)

Hence total I/O’s = 100 + 500 = 600
********************************************************************************

SQL ID: 6qy378ww4729s
Plan Hash: 3651720007
SELECT *
FROM
INDEX_CLUSTER_TAB WHERE ID = :B1

call     count       cpu    elapsed       disk      query    current        rows
——- ——  ——– ———- ———- ———- ———-  ———-
Parse        1      0.00       0.00          0          0          0           0
Execute    100      0.00       0.00          0          0          0           0
Fetch      100      0.00       0.00          0        600          0        3400
——- ——  ——– ———- ———- ———- ———-  ———-
total      201      0.00       0.00          0        600          0        3400

*******************************************************************************
In case of single table hash cluster, as index access is not needed,
Total  I/O’s  = I/O’s against the table
=  no. of distinct key values (id’s) * no. of blocks occupied by records for an id
= 100 * 5
= 500
********************************************************************************

SQL ID: ctnu91v20p2x2

Plan Hash: 3860562250

SELECT *
FROM
HASH_CLUSTER_TAB WHERE ID = :B1

call     count       cpu    elapsed       disk      query    current        rows
——- ——  ——– ———- ———- ———- ———-  ———-
Parse        1      0.00       0.00          0          0          0           0
Execute    100      0.00       0.00          0          0          0           0
Fetch      100      0.00       0.00          0        500          0        3400
——- ——  ——– ———- ———- ———- ———-  ———-
total      201      0.00       0.00          0        500          0        3400

*******************************************************************************

Summarizing the above results :

unorganized          index_cluster_tab      hash_cluster_tab
total  CPU       0.01                             0.00                   0.00

elapsed            0.01                              0.02                   0.00
time

I/O’s               3517                              600                    500

Hence, it can be concluded that for exact match queries hash clusters give the best performance since least no. of I/O’s are made.

- Trace the query using range scan  on three tables and verify that index cluster table gives the best performance .

– Let’s compare the statistics when rows for entire range of id’s are retrieved from the three tables

conn / as sysdba
alter session set
tracefile_identifier = ‘cluster_factor’;
alter session set
sql_trace=true;
declare
    type tab_row is table of
unorganized%rowtype;
    tab_rows tab_row;
    type id_val is table of
unorganized.id%type;
    id_vals id_val;
    begin
    select distinct id bulk collect into id_vals
     from unorganized;
    for j in id_vals.first..id_vals.first loop
     for k in id_vals.last .. id_vals.last loop
       select * bulk collect into tab_rows
       from unorganized
       where id >= j and id <= k;
     end loop;
    end loop;
end;
/
  
declare
    type tab_row is table of index_cluster_tab%rowtype;
    tab_rows tab_row;
    type id_val is table of index_cluster_tab.id%type;
    id_vals id_val;
    begin
    select distinct id bulk collect into id_vals
    from index_cluster_tab;
  
    for j in id_vals.first..id_vals.first loop
     for k in id_vals.last .. id_vals.last loop
        select * bulk collect into tab_rows
          from index_cluster_tab   
         where id >= j and id <= k;
      end loop;
    end loop;
end;
/
declare
    type tab_row is table of hash_cluster_tab%rowtype;
    tab_rows tab_row;
    type id_val is table of hash_cluster_tab.id%type;
    id_vals id_val;
    begin
    select distinct id bulk collect into id_vals
      from hash_cluster_tab;
    for j in id_vals.first..id_vals.first loop
     for k in id_vals.last .. id_vals.last loop
        select * bulk collect into tab_rows
          from hash_cluster_tab
      where id >= j and id <= k;
     end loop;
    end loop;
end;
/
  
– Find out the name of trace file generated
 SQL> col
trace_file for a100
           select  value trace_file from v$diag_info
           where upper(name) like ‘%TRACE
FILE%’;
TRACE_FILE
—————————————————————————————————-
/u01/app/oracle/diag/rdbms/orcl1/orcl1/trace/orcl1_ora_24963_cluster_factor.trc
– Run tkprof utility on the trace file generated
$cd /u01/app/oracle/diag/rdbms/orcl/orcl/trace
   rm cluster_factor.out
   tkprof /u01/app/oracle/diag/rdbms/orcl1/orcl1/trace  /orcl1_ora_8403_cluster_factor.trc  cluster_factor.out
   vi cluster_factor.out
 Here are the contents of the trace file:

In case of unorganized table, it can be seen that Full table scan is done and

total I/O’s = Physical I/O’s + logical I/O’s

= 486 + 489 = 975

CPU usage = 0.01

elapsed time = 0.01
cost             = 127
********************************************************************************
SQL ID:
dpg9s5v7jannv
Plan Hash:
3859503019
SELECT *
FROM
 UNORGANIZED WHERE ID >= :B2 AND ID <=
:B1
  
call     count
cpu    elapsed       disk
query    current        rows
——-
——  ——– ———- ———-
———- ———-  ———-
Parse        1
0.00       0.00          0          0          0           0
Execute      1
0.00       0.00          0          0          0           0
Fetch        1
0.01       0.01        486        489          0        3400
——-
——  ——– ———- ———-
———- ———-  ———-
total        3
0.01       0.01        486        489          0        3400
Rows     Row Source Operation
——-
—————————————————
   3400
FILTER  (cr=489 pr=486 pw=0
time=3524 us)
   3400
TABLE ACCESS FULL UNORGANIZED (cr=489 pr=486 pw=0 time=1888 us cost=127
size=3073600 card=3400)
********************************************************************************
 In case of index cluster table, it can be seen that index range scan is performed on cluster index followed by table access cluster.

total I/O’s = Physical I/O’s + logical I/O’s

= 385 + (501 + 1)

= 887

CPU usage = 0.01

elapsed time = 0.01
cost             =  101
*******************************************************************************
SQL ID:
22k91ut1b18nj
Plan Hash:
533030663
SELECT *
FROM
 INDEX_CLUSTER_TAB WHERE ID >= :B2 AND ID
<= :B1
call     count
cpu    elapsed       disk
query    current        rows
——-
——  ——– ———- ———-
———- ———-  ———-
Parse        1
0.00       0.00          0          0          0           0
Execute      1
0.00       0.00          0          0          0           0
Fetch        1
0.01       0.01        385        501          0        3400
——-
——  ——– ———- ———-
———- ———-  ———-
total        3      0.01
0.01        385        501          0        3400
Misses in library
cache during parse: 1
Optimizer mode:
ALL_ROWS
Parsing user id:
SYS   (recursive depth: 1)
Rows     Row Source Operation
——-  —————————————————
   3400
FILTER  (cr=501 pr=385 pw=0
time=9441 us)
   3400
TABLE ACCESS CLUSTER INDEX_CLUSTER_TAB (cr=501 pr=385 pw=0 time=7679 us
cost=101 size=3073600 card=3400)
    100
INDEX RANGE SCAN INDEX_CLUSTER_IDX (cr=1 pr=0 pw=0 time=99 us cost=1
size=0 card=1)(object id 80458)
********************************************************************************
 In case of hash cluster table, it can be seen that hash access is made to the table.

total I/O’s = Physical I/O’s + logical I/O’s

= 503 + 506

= 1009

CPU usage = 0.04

elapsed time = 0.04
cost = 132
*******************************************************************************
 SQL ID:
c2www0m7npkqp
Plan Hash:
4115468836
SELECT *
FROM
 HASH_CLUSTER_TAB WHERE ID >= :B2 AND ID
<= :B1
call     count
cpu    elapsed       disk
query    current        rows
——-
——  ——– ———- ———-
———- ———-  ———-
Parse        1
0.00       0.00          0          0          0           0
Execute      1
0.00       0.00          0          0          0           0
Fetch        1
0.04       0.04        503        506          0        3400
——-
——  ——– ———- ———-
———- ———-  ———-
total        3
0.04       0.04        503        506          0        3400
Misses in library
cache during parse: 1
Optimizer mode:
ALL_ROWS
Parsing user id:
SYS   (recursive depth: 1)
Rows     Row Source Operation
——-
—————————————————
   3400
FILTER  (cr=506 pr=503 pw=0
time=39906 us)
   3400
TABLE ACCESS FULL HASH_CLUSTER_TAB (cr=506 pr=503 pw=0 time=37389 us
cost=132 size=3073600 card=3400)
********************************************************************************
 Summarizing above results :
                                unorganized          index_cluster_tab      hash_cluster_tab

total I/O’s                      975                     887                        1009

CPU usage                    0.01                    0.01                        0.04

elapsed time                  0.01                     0.01                        0.04
cost                              127                      101                         132
 
Hence, to search a range of values,  single table index cluster is the best choice.
CPU usage, I/O’s  and cost is the maximum in case of single table hash cluster table.
 
- Verify that index and hash cluster tables have better clustering factor .

Let’s compare clustering factor of indexes on the three tables.

Tables unorganized and index_cluster_tab already have index.

– Let’s create index on hash_cluster_tab and gather statistics .

SQL>create index hash_cluster_idx on hash_cluster_tab(id);

          exec dbms_stats.gather_index_stats(USER, ‘HASH_CLUSTER_IDX’);

          exec dbms_stats.gather_index_stats(USER, ‘INDEX_CLUSTER_IDX’);   

– Find out clustering factor of the three tables.

SQL> select index_name, clustering_factor

           from  user_indexes

          where index_name in (‘UNORGANIZED_IDX’, ‘INDEX_CLUSTER_IDX’, ‘HASH_CLUSTER_IDX’);

INDEX_NAME                     CLUSTERING_FACTOR

—————————— —————–

HASH_CLUSTER_IDX                             500

INDEX_CLUSTER_IDX                            100

UNORGANIZED_IDX                             3311

– Note that

- clustering factor of index on unorganized table approaches no. of rows in the table (3400).

- clustering factor of index on hash_cluster_tab table  = 500 . As entries for each id are spread across 5 blocks, 500 blocks need to be accessed to get all the rows and index is aware of this information.

- clustering factor of index on index_cluster_tab table = 100 as there are 100 entries (one for each id) in the index. Here also 500 table blocks need to be accessed to get all the rows but index contains information about only the first(or may be  the last) data block for an id. Rest 4 blocks containing records for that id are chained to it and index does not have that information and clustering factor of an index is computed on the basis of the information available in the index. That’s why clustering factor in this case = no. of index entries.

SUMMARY:
  •  Clustered tables cannot be truncated.
  •  Choosing the Key :Choosing the correct cluster key is dependent on the most common types of queries issued against the clustered tables. The cluster key should be on the column against which queries are most commonly issued.
 
HASH CLUSTERS
A hash cluster stores related rows together in the same data blocks. Rows in a hash cluster are stored together based on their hash value.
  • – Hash clusters are a great way to reduce IO on some tables, but they have their downside.

   *If too little space is reserved for each
key (small SIZE value)
, or if the cluster is created with too few hash keys (small HASHKEYS), then each key will split across multiple blocks negating the benefits of the cluster.When creating a hash cluster, it is important to choose the cluster key correctly and set the HASH IS, SIZE, and HASHKEYS parameters so that performance and space use are optimal.

   * If too much space is reserved for each key (large SIZE value), or if the cluster is created with too many hash keys (large
HASHKEYS), then the cluster will contain thousands of empty blocks that slow down full table scans .
 A SIZE value much larger results in wasted space.

  •  Hash clusters reduce contention and I/O since index is not accessed .When you use an index range scan + table access by index rowid, the root index block becomes  a “hot block” causing  contention for the cache buffers chains (cbc) latch and hence an increase in CPU usage.
  • Hashing works best for exact match searches i.e. SELECT … WHERE cluster_key = …;  

              A  properly sized hash cluster for a lookup table gives pretty much a SINGLE IO for a keyed lookup.

  •  Hash clusters should only really be used for tables which are static in size so that you can determine the number of rows and amount of space required for the tables in the cluster. If tables in a hash cluster require more space than the initial allocation for the cluster, performance degradation can be substantial because overflow blocks are required.
  • Hash clusters should only really be used for tables which have mostly read-only data. The hash  cluster will take marginally longer
    to insert into since the data now has a “place” to go and maintaining this structure will take longer then maintaining a HEAP table .Updates do not provide much overhead unless the hashkey is being updated.
  •  Hash clusters should not be used in  applications where most queries on the table retrieve rows over a range of cluster key values where a hash function cannot be used to determine the location of specific hash keys and  instead, the equivalent of a full table scan must be done to fetch the rows for the query:
  •  Hash clusters should not be used in applications where hash key is updated. The hashing values can not be recalculated and thus serious overflow can result.
  •  Hash clusters should not be used for tables which are not static and continually growing. If a table grows without limit, the space required over the life of the table (its cluster) cannot be predetermined.
  •   Hash clusters should not be used for when you cannot afford to pre-allocate the space that the hash cluster will eventually
    need.
  •  Hash clusters allocate all the storage for all the hash buckets when the cluster is created, so they may waste space.
  •  Full scans on single table hash clusters will cost as much as they would in a heap table.
INDEX CLUSTERS
In an indexed cluster, Oracle stores together rows having the same cluster key value. Each distinct cluster key value is stored only once in each data block, regardless of the number of tables and rows in which it occurs. This saves disk space and improves performance for many operations.
  •   Index clusters should be used for the apllications where most queries on the table retrieve rows over a range of cluster key values. For example, in full table scans or queries such as the following:

            SELECT . . . WHERE cluster_key < . . . ;

  • With an index, key values are ordered in the index, so cluster key values that satisfy the WHERE clause of a query can be found with relatively few I/Os.
  •   Index clusters should be used for the tables which are not static, but instead are continually growing and the space required over the life of the table (its cluster) cannot be predetermined.
  •  Index clusters should be used for applications which frequently perform full-table scans on the table and the table is sparsely populated. A full-table scan in this situation takes longer under hashing.
  •  Cluster index has one entry per cluster key and not for each row. Therefore, the index is smaller and less costly to access for finding multiple rows.

 References:

————————————————————–

Related links:

Home

Tuning Index

Buffer Cache Wait Events

Clustering Factor Demystified Part-I 

Clustering Factor Demystified Part-II

Direct Read Enhancements in 11g

Single Table Index Clusters

SingleTable Hash Clusters 

SQL Profile Demystified : Part – I

SQL Profile Demystified Part – II

Undocumented Parameters in Oracle 11g

 

———————–

3 thoughts on “CLUSTERING FACTOR DEMYSTIFIED : PART – III

  1. Hi Anju Garg,

    Please explain about “Cache Buffer Chain”

    - How database buffer cache divide to CBCs ?
    - How many buffers in one CBC ?

    Thanks
    K.P

Your comments and suggestions are welcome!