SINGLE TABLE INDEX CLUSTERS DEMYSTIFIED

 

Clusters can be employed to physically co-locate records for specified key/keys on disk. Based on access mechanism, there can be two types of clusters :- Index or B-tree* clusters : Data is accessed using index.- Hash clusters : Data is accessed using hashing algorithm.We can store single/multiple tables in a cluster.   

                  

In this post, I will explain and demonstrate the use of single table index cluster tables .
 
What is a single table index cluster table?
 
In a regular heap table,, the rows for a key value are scattered across multiple blocks and hence multiple blocks have to be visited to get all the rows for a key value. Here is the pictorial representation of the data in a heap organized table with an index. Note that one data block can contain rows with different key values.
 
    Heap Table with Index         
  
Index:
 
    1
    1
    1
    1
    2
    2
    2
  3
  3
  3
  3
   4
   4
   4
   4
     
Data Blocks :
1
1
2
3
4
  
2
4
3
1
3
           
1
2
3
4
4
               
 
Let’s compare it with the way data is stored in single table index cluster table.  Note that all the rows with a key value are placed together in one block  so that they can be retrieved with minimum no. of block visits and  clustering factor of the index approaches the no. of blocks in the table which is desirable.
 
 single table index cluster table
 
  Index
 
    1
    1
    1
    1
    2
    2
    2
  3
  3
  3
  3
   4
   4
   4
   4
     
Data Blocks :
1
1
1
1
 
  
2
2
2
 
 
           
3
3
3
3
 
               
4
4
4
4
 
 
 
How to create a single table index cluster table?
 
Here are steps  to create a single table index cluster table :
 
- – Create a cluster 
SQL> drop cluster my_cluster including tables;
 create cluster my_cluster            (id number(2) )          size 1024;

Here, we have created an index cluster. The cluster key column for this cluster is id. The column in table in this cluster does have to be called ID, but it must be NUMBER(2), to match this definition. Also we have specified a SIZE 1024 option which means that we expect about 1,024 bytes of data to be associated with each cluster key value. Oracle will use it to compute the maximum number of cluster keys that could fit per block.

We also need to index the cluster before we can put data in it. The cluster index takes a cluster key value and returns the block address of the block containing that key. Eeach cluster key value points to a single block in the cluster itself. So, when we ask for the data for id = 1, Oracle will read the cluster key, determine the block address for that, and then read the data.
– Let’s create the cluster key index:

SQL>create index mycluster_idx on cluster my_cluster;

The index can have all of the storage parameters of an index and can also be stored in another tablespace. It is just like a regular index, and may also be on multiple columns; Unlike regular indexes,  it can include an entry for a completely null value. Note that we can not specify a list of columns in this CREATE INDEX statement as it is derived from the CLUSTER definition itself

– Let’s create a table in the cluster:

SQL> drop table t1 purge;
          create table t1 (id number(2), data char(500))
          cluster my_cluster;

Note that the only difference from a normal table is that we have used the CLUSTER keyword and specified the column of the base table that will map to the cluster key. As the cluster is the segment here, this table can never have segment attributes such as TABLESPACE, PCTFREE, and so on. These attributes belong to the cluster segment, and the table inherits them from the cluster. 


– Let’s populate our table:

Given that we have an 8KB block size, Oracle will fit up to seven cluster keys (but maybe less/more if the data is smaller/larger than expected) per database block. This is, the data for ids 1, 2, 3, 4, 5, 6 and 7 would tend to go onto one block, and as soon as we insert id 8, a new block will be used. That does not mean that the data is stored in a sorted manner; it just means that if we inserted the ids in that order, they would naturally tend to be placed together. If we inserted the ids in the order 1, 8, 3, 3, 6, 5 and then 7, the last id, 7 would be located on the newly added block. As we will see ,  both the size of the data and the order in which the data is inserted will affect the number of keys which can be stored per block.

– Let’s insert 8 records with id’s ranging from 1 to 8 
SQL>Begin
            for i in 1..8 loop
              insert into t1 values (i, ‘x’);
           end loop;
           commit;
          end;
          /

– Let’s check the blocks the records have gone to :

- Note that record for id = 8 has gone to a block different from the earlier one’s as we had specified the expected size of each row = 1024 = 1k

– block size = 8k
Considering some space needed for the block header,  oracle expects 7 rows and hence 7 key values to fit in one block.  Hence, row corresponding to the 8th key value is placed in the next block (93322). 

 

SQL>select id, dbms_rowid.rowid_block_number(t1.rowid) t1_block 
           from t1 ;

        ID   T1_BLOCK

———- ———-

         1      93321

         2      93321

         3      93321

         4      93321

         5      93321

         6      93321

         7      93321

         8      93322  


 93321             93322

1, ‘x’

2,’x’

3, ‘x’

4, ‘x’

5, ‘x’

6, ‘x’

7, ‘x’

8,’x

  
– Let’s insert another record with id = 1
-
- Note that that record again goes the same block as earlier (93321). The reason is that there is space available in block 93321 since actual row size = 501 (number(2) + char(500)) is less than expected size (1024).  

 

SQL>  
insert into t1 values (1, ‘x’);
           commit;  
            select id, dbms_rowid.rowid_block_number(t1.rowid) t1_block
             from t1
             where id=1;

            ID   T1_BLOCK

       ———- ———-

            1     93321

            1      93321

   93321     93322

1, ‘x’

 

2,’x’

 

3, ‘x’

 

4, ‘x’

 

5, ‘x’

 

6, ‘x’

 

7, ‘x’

 

1, ‘x’
8,’x’

 

– Let’s insert another record for id = 8

- Note than this record goes to block 93322 even though there is space available in block 93321 because rows with same key value will be placed in the same block. 

SQL> insert into t1 values (8, ‘x’);

           commit;

           select id, dbms_rowid.rowid_block_number(t1.rowid) t1_block    from t1    where id=8;

        ID   T1_BLOCK

———- ———-

         8      93322

         8      93322

   93321     93322

1, ‘x’

 

2,’x’

 

3, ‘x’

 

4, ‘x’

 

5, ‘x’

 

6, ‘x’

 

7, ‘x’

 

1, ‘x’
8,’x’

 

8,’x’

 

– As per the actual size of the row, no. of rows that can fit in one block are slightly less than 16.35 ( (8*1024)/501) i.e. 15 or may be even lesser taking into account the space for block header.

– There are already 8 rows in block 93321. Let’s insert 8 more rows with id = 1 to 7.

– Note that 5 more  rows are inserted in the first block. After the  first block is fully occupied rest of the 2 rows are placed in the second block i.e. 93322. Now second block has rows with id’s = 6,7 and 8. 

SQL>Begin

            for i in 1..8 loop

              insert into t1 values (i, ‘x’);
           end loop;
           commit;
          end;
          /

       select id, dbms_rowid.rowid_block_number(t1.rowid) t1_block              from t1;

      ID   T1_BLOCK

———- ———-

         1      93321

         2      93321

         3      93321

         4      93321

         5      93321

         6      93321

         7      93321

         1      93321

         1      93321

         2      93321

         3      93321

         4      93321

         5      93321

         8      93322

         8      93322

         6      93322

         7      93322

         8      93322 

    93321     93322

1, ‘x’

 

2,’x’

 

3, ‘x’

 

4, ‘x’

 

5, ‘x’

 

6, ‘x’

 

7, ‘x’

 

1, ‘x’

 

1,’x’

 

2,’x’

 

3,’x’

 

4,’x’

 

5,’x’
8,’x’

 

8,’x’

 

6,’x’

 

7,’x’

 

8,’x’
– Let’s again insert records with id’s 1 to 8.
– Note that rows with 3 id’s i.e. 6,7 and 8 are already in block 93322. Hence rows for 4 more id’s (1,2,3and 4) can go this block. 5th id (5) again goes to the next block. 

 

S
QL>
Begin
            for i in 1..8 loop
              insert into t1 values (i, ‘x’);
           end loop;
           commit;
          end;
          /
       select distinct id, dbms_rowid.rowid_block_number(t1.rowid) t1_block, count(*) cnt
             from t1
            group by id, dbms_rowid.rowid_block_number(t1.rowid)
            order by dbms_rowid.rowid_block_number(t1.rowid), id;   

        ID   T1_BLOCK        CNT

———- ———- ———-

         1      93322          1

         2      93322          1

         3      93322          1

         4      93322          1

         6      93322          2

         7      93322          2

         8      93322          4

         5      93323          1

 93321    93322   93323

1, ‘x’

2,’x’

3, ‘x’

4, ‘x’

5, ‘x’

6, ‘x’

7, ‘x’

1, ‘x’

1,’x’

2,’x’

3,’x’

4,’x’

5,’x’

 

8,’x’

8,’x’

6,’x’

7,’x’

8,’x’

1, ‘x’

2,’x’

3, ‘x’

4, ‘x

6, ‘x’

7, ‘x’

 

8,’x’

 

 5,’x’

-
Hence we could control the no. of distinct key values for which records are placed in a block. But notice that records for a key value are scattered across variouw blocks. How to cluster together the rows for a key value?
There are two ways of doing it :
– Insert data presorted on the cluster key to the table
– Specify size = block size in the create cluster statement

 Let’s demonstrate both the methods:
Insert data presorted on the cluster key to the table
- Let’s recreate the table in the cluster:
SQL> drop table t1 purge;         
 create table t1 ( id number(2), data char(500) )           cluster my_cluster( id );

– Let’s insert presorted data

SQL>Begin

          for i in 1..8 loop

             for j in 1..13 loop

               insert into t1 values (i, ‘x’);

             end loop;

           end loop;

           commit;

         end;

          /
         select distinct id, dbms_rowid.rowid_block_number(t1.rowid) t1_block, count(*) cnt

         from t1

         group by id, dbms_rowid.rowid_block_number(t1.rowid)

      ID   T1_BLOCK        CNT

——– ———- ———-     

       1      94561         13       

       2      94562         13

       3      94563         13

       4      94564         13

       5      94565         13

       6      94566         13

       7      94567         13

       8      94576         13 

 

    94561      94562    94563       94564        94565    94566      94567       94576                                                    

1, ‘x’
1,’x’
1, ‘x’
1, ‘x’
1, ‘x’
1, ‘x’
1, ‘x’
1, ‘x’
1,’x’
1, ‘x’
1, ‘x’
1, ‘x’
1, ‘x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
6,’x’
6,’x’
6,’x’
6,’x’
6,’x’
6,’x’
6,’x’
6,’x’
6,’x’6,’x’
6,’x’
6,’x’
6,’x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’

It can be seen that if data presorted by the cluster key is added to the table, all the rows for a key value will continue to occupy the same block unless a row with a different key value arrives and the block is not yet full. . Hence to fetch all rows for a key, the no. block visits will be smaller if presorted data is insertde into the table.

- Let’s create an index on table t1 and check its clustering factor

 
– Note that the index a low (desirable) clustering factor = 8 which is same as the no. of blocks in the table

SQL> create index t1_idx on t1(id);

          select index_name, clustering_factor from user_indexes where index_name=’T1_IDX';

INDEX_NAME                     CLUSTERING_FACTOR
—————————— —————–
T1_IDX                                         8

Specify size = block size in the create cluster statement

Suppose we want a block to contain rows with one id only, all we need to do is to specify size equal to the blocksize in the create cluster statement. This effectively means that we are telling oracle we want it to reserve a complete block for  each distinct key value . Hence,  when a record for a different key value is inserted, it goes to the next block. 

– Let’s  recreate our cluster with size = blocksize = 8k;

– Create a cluster and index it 

SQL>drop cluster my_cluster including tables;       
   create cluster my_cluster            ( id number(2) )          size 8192;
    create index my_cluster_idx         on cluster my_cluster;
– Create and populate a table in the cluster
SQL>drop table t1 purge;       
           create table t1 ( id number(2), data char(500) )           cluster my_cluster( id );         
     Begin
            for i in 1..8 loop
                 insert into t1 values (i, ‘x’);
             end loop;
             commit;
            end;
              /  
 
– Let’s check the blocks the records have gone to :
– Note that record for each distinct id has gone to a different block .
 SQL>select id, dbms_rowid.rowid_block_number(t1.rowid) t1_block            from t1 ;

          ID   T1_BLOCK

———- ———-
 1
     94561
2      94562
3      94563
4      94564
5      94565
6      94566
7      94567
8      94576

 94561       94562      94563      94564      94565  94566     94567       94576                      
1, ‘x’
2,’x’
3,’x’
4,’x’
5, ‘x’
6,’x’
7, ‘x’
8,’x’
– Now let’s again insert records for id’s 1 to 8 and verify
that row of an id goes to the block already containing  the row with same id. 
 
SQL>Begin
          for i in 1..8 loop
               insert into t1 values (i, ‘x’);
           end loop;
           commit;
        end;
          /         select distinct id, dbms_rowid.rowid_block_number(t1.rowid) t1_block, count(*) cnt
            from t1
            group by id, dbms_rowid.rowid_block_number(t1.rowid)
            order by dbms_rowid.rowid_block_number(t1.rowid), id;
        ID   T1_BLOCK        CNT
———- ———- ———-
         1      94561          2
         2      94562          2
         3      94563          2
         4      94564          2
         5      94565          2
         6      94566          2
         7      94567          2
         8      94576          2
   94561  94562   94563  94564   94565  94566     94567   94576                      
1, ‘x’
1, ‘x’
2,’x’
2, ‘x’
3,’x’
3, ‘x’
4,’x’
4, ‘x’
5, ‘x’
5, ‘x’
6,’x’
6, ‘x’
7, ‘x’
7, ‘x’
8,’x’
8, ‘x’ 
 

– Let’s create an index on table t1 and check its clustering factor

– Note that the index a low (desirable) clustering factor = 8 which is same as the no. of blocks in the table

SQL> create index t1_idx on t1(id);

           select index_name, clustering_factor from user_indexes where index_name=’T1_IDX';

INDEX_NAME                     CLUSTERING_FACTOR

—————————— —————–
T1_IDX                                         8

Hence, it can be seen that  SIZE parameter is the most important parameter for a cluster. It controls the maximum number of cluster keys per block. It influences  the space utilization of the cluster. If we set the size too high, we’ll have very few keys per block and will use more space then we need. If we set the size too low, and we’ll have data for a key value spread across multiple blocks  which defeats the very purpose of the cluster to store all of the data together on a single block. 

Summary:

 Size clause of create cluster statement is used to compute the maximum number of cluster keys that could fit per block.

- In the create cluster statement, if size < block size, records with different key values may be placed in the same block. 

- A block will contain  the records for one key value only  provided 

    . data presorted by the cluster key is added to the table OR

    . Size  = blocksize is specified in the create cluster statement 

- When any of the above options are used, the index on the cluster key column in the table has a low (desirable) clustering factor approaching the no. of blocks in the table.

– Cluster index has one entry per cluster key and not for each row. Therefore, the index is smaller and less costly to access for finding multiple rows.

Note

- An index cluster has negative performance impacts on DML performance, particularly insert statements. Since data has to be put in the right place, it takes longer to insert data.

- Neither the cluster nor the tables in  the cluster can be partitioned.

- Tables in cluster cannot be truncated. You must delete rows from the tables in the cluster.

- Index clusters are also called B*Tree clusters.

In my post Clustering Factor Demystified : Part – III, I have demonstrated the use of single table index and hash clusters to improve the clustering factor of an unorganized table.

I hope you found this post useful.  Thanx for your time !!References:http://docs.oracle.com/cd/E11882_01/server.112/e25789/tablecls.htm

———————————————————————————————————–

Related links :

Home

Database Index 
Tuning Index
SingleTable hash Clusters
Clustering Factor Demystified Part – I
Clustering Factor Demystified Part – II
Clustering FactorDemystified Part – III

——————————————————————————————————————–

 

                                             ———————-

One thought on “SINGLE TABLE INDEX CLUSTERS DEMYSTIFIED

Your comments and suggestions are welcome!