Clusters can be employed to physically co-locate records for specified key/keys on disk. Based on access mechanism, there can be two types of clusters :- Index or B-tree* clusters : Data is accessed using index.- Hash clusters : Data is accessed using hashing algorithm.We can store single/multiple tables in a cluster.
1
1
1
1
|
2
2
2
|
3
3
3
3
|
4
4
4
4
|
1
|
1
|
2
|
3
|
4
|
2
|
4
|
3
|
1
|
3
|
1
|
2
|
3
|
4
|
4
|
1
1
1
1
|
2
2
2
|
3
3
3
3
|
4
4
4
4
|
1
|
1
|
1
|
1
|
|
2
|
2
|
2
|
|
|
3
|
3
|
3
|
3
|
|
4
|
4
|
4
|
4
|
|
Here, we have created an index cluster. The cluster key column for this cluster is id. The column in table in this cluster does have to be called ID, but it must be NUMBER(2), to match this definition. Also we have specified a SIZE 1024 option which means that we expect about 1,024 bytes of data to be associated with each cluster key value. Oracle will use it to compute the maximum number of cluster keys that could fit per block.
We also need to index the cluster before we can put data in it. The cluster index takes a cluster key value and returns the block address of the block containing that key. Eeach cluster key value points to a single block in the cluster itself. So, when we ask for the data for id = 1, Oracle will read the cluster key, determine the block address for that, and then read the data.
– Let’s create the cluster key index:
SQL>create index mycluster_idx on cluster my_cluster;
The index can have all of the storage parameters of an index and can also be stored in another tablespace. It is just like a regular index, and may also be on multiple columns; Unlike regular indexes, it can include an entry for a completely null value. Note that we can not specify a list of columns in this CREATE INDEX statement as it is derived from the CLUSTER definition itself
– Let’s create a table in the cluster:
create table t1 (id number(2), data char(500))
cluster my_cluster;
Note that the only difference from a normal table is that we have used the CLUSTER keyword and specified the column of the base table that will map to the cluster key. As the cluster is the segment here, this table can never have segment attributes such as TABLESPACE, PCTFREE, and so on. These attributes belong to the cluster segment, and the table inherits them from the cluster.
– Let’s populate our table:
Given that we have an 8KB block size, Oracle will fit up to seven cluster keys (but maybe less/more if the data is smaller/larger than expected) per database block. This is, the data for ids 1, 2, 3, 4, 5, 6 and 7 would tend to go onto one block, and as soon as we insert id 8, a new block will be used. That does not mean that the data is stored in a sorted manner; it just means that if we inserted the ids in that order, they would naturally tend to be placed together. If we inserted the ids in the order 1, 8, 3, 3, 6, 5 and then 7, the last id, 7 would be located on the newly added block. As we will see , both the size of the data and the order in which the data is inserted will affect the number of keys which can be stored per block.
insert into t1 values (i, ‘x’);
end loop;
commit;
end;
/
– Let’s check the blocks the records have gone to :
- Note that record for id = 8 has gone to a block different from the earlier one’s as we had specified the expected size of each row = 1024 = 1k
Considering some space needed for the block header, oracle expects 7 rows and hence 7 key values to fit in one block. Hence, row corresponding to the 8th key value is placed in the next block (93322).
ID T1_BLOCK
———- ———-
1 93321
2 93321
3 93321
4 93321
5 93321
6 93321
7 93321
8 93322
93321 93322
1, ‘x’ 2,’x’ 3, ‘x’ 4, ‘x’ 5, ‘x’ 6, ‘x’ 7, ‘x’ |
8,’x‘ |
select id, dbms_rowid.rowid_block_number(t1.rowid) t1_block
from t1
where id=1;
ID T1_BLOCK
———- ———-
1 93321
1 93321
93321 93322
1, ‘x’
2,’x’
3, ‘x’
4, ‘x’
5, ‘x’
6, ‘x’
7, ‘x’
1, ‘x’
|
8,’x’
|
– Let’s insert another record for id = 8
- Note than this record goes to block 93322 even though there is space available in block 93321 because rows with same key value will be placed in the same block.
SQL> insert into t1 values (8, ‘x’);
select id, dbms_rowid.rowid_block_number(t1.rowid) t1_block from t1 where id=8;
ID T1_BLOCK
———- ———-
8 93322
8 93322
93321 93322
1, ‘x’
2,’x’
3, ‘x’
4, ‘x’
5, ‘x’
6, ‘x’
7, ‘x’
1, ‘x’
|
8,’x’
8,’x’
|
– There are already 8 rows in block 93321. Let’s insert 8 more rows with id = 1 to 7.
– Note that 5 more rows are inserted in the first block. After the first block is fully occupied rest of the 2 rows are placed in the second block i.e. 93322. Now second block has rows with id’s = 6,7 and 8.
SQL>Begin
for i in 1..8 loop
end loop;
commit;
end;
/
select id, dbms_rowid.rowid_block_number(t1.rowid) t1_block from t1;
ID T1_BLOCK
———- ———-
1 93321
2 93321
3 93321
4 93321
5 93321
6 93321
7 93321
1 93321
1 93321
2 93321
3 93321
4 93321
5 93321
8 93322
8 93322
6 93322
7 93322
8 93322
93321 93322
1, ‘x’
2,’x’
3, ‘x’
4, ‘x’
5, ‘x’
6, ‘x’
7, ‘x’
1, ‘x’
1,’x’
2,’x’
3,’x’
4,’x’
5,’x’
|
8,’x’
8,’x’
6,’x’
7,’x’
8,’x’
|
insert into t1 values (i, ‘x’);
end loop;
commit;
end;
/
group by id, dbms_rowid.rowid_block_number(t1.rowid)
order by dbms_rowid.rowid_block_number(t1.rowid), id;
ID T1_BLOCK CNT
———- ———- ———-
1 93322 1
2 93322 1
3 93322 1
4 93322 1
6 93322 2
7 93322 2
8 93322 4
5 93323 1
93321 93322 93323
1, ‘x’ 2,’x’ 3, ‘x’ 4, ‘x’ 5, ‘x’ 6, ‘x’ 7, ‘x’ 1, ‘x’ 1,’x’ 2,’x’ 3,’x’ 4,’x’ 5,’x’
|
8,’x’ 8,’x’ 6,’x’ 7,’x’ 8,’x’ 1, ‘x’ 2,’x’ 3, ‘x’ 4, ‘x 6, ‘x’ 7, ‘x’
8,’x’
|
5,’x’ |
There are two ways of doing it :
– Insert data presorted on the cluster key to the table
– Specify size = block size in the create cluster statement
Let’s demonstrate both the methods:
– Let’s insert presorted data
SQL>Begin
for i in 1..8 loop
for j in 1..13 loop
insert into t1 values (i, ‘x’);
end loop;
end loop;
commit;
end;
/
select distinct id, dbms_rowid.rowid_block_number(t1.rowid) t1_block, count(*) cnt
from t1
group by id, dbms_rowid.rowid_block_number(t1.rowid)
ID T1_BLOCK CNT
——– ———- ———-
1 94561 13
2 94562 13
3 94563 13
4 94564 13
5 94565 13
6 94566 13
7 94567 13
8 94576 13
94561 94562 94563 94564 94565 94566 94567 94576
1, ‘x’
1,’x’
1, ‘x’
1, ‘x’
1, ‘x’
1, ‘x’
1, ‘x’
1, ‘x’
1,’x’
1, ‘x’
1, ‘x’
1, ‘x’
1, ‘x’
|
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
2,’x’
|
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
3,’x’
|
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
4,’x’
|
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
5, ‘x’
|
6,’x’
6,’x’
6,’x’
6,’x’
6,’x’
6,’x’
6,’x’
6,’x’
6,’x’6,’x’
6,’x’
6,’x’
6,’x’
|
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
7, ‘x’
|
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
8,’x’
|
It can be seen that if data presorted by the cluster key is added to the table, all the rows for a key value will continue to occupy the same block unless a row with a different key value arrives and the block is not yet full. . Hence to fetch all rows for a key, the no. block visits will be smaller if presorted data is insertde into the table.
- Let’s create an index on table t1 and check its clustering factor
SQL> create index t1_idx on t1(id);
select index_name, clustering_factor from user_indexes where index_name=’T1_IDX';
INDEX_NAME CLUSTERING_FACTOR
—————————— —————–
T1_IDX 8
Specify size = block size in the create cluster statement
Suppose we want a block to contain rows with one id only, all we need to do is to specify size equal to the blocksize in the create cluster statement. This effectively means that we are telling oracle we want it to reserve a complete block for each distinct key value . Hence, when a record for a different key value is inserted, it goes to the next block.
– Create a cluster and index it
insert into t1 values (i, ‘x’);
end loop;
commit;
end;
/
ID T1_BLOCK
1 94561
2 94562
3 94563
4 94564
5 94565
6 94566
7 94567
8 94576
94561 94562 94563 94564 94565 94566 94567 94576
1, ‘x’
|
2,’x’
|
3,’x’
|
4,’x’
|
5, ‘x’
|
6,’x’
|
7, ‘x’
|
8,’x’
|
that row of an id goes to the block already containing the row with same id.
for i in 1..8 loop
insert into t1 values (i, ‘x’);
end loop;
commit;
end;
/ select distinct id, dbms_rowid.rowid_block_number(t1.rowid) t1_block, count(*) cnt
from t1
group by id, dbms_rowid.rowid_block_number(t1.rowid)
order by dbms_rowid.rowid_block_number(t1.rowid), id;
1, ‘x’
1, ‘x’
|
2,’x’
2, ‘x’
|
3,’x’
3, ‘x’
|
4,’x’
4, ‘x’
|
5, ‘x’
5, ‘x’
|
6,’x’
6, ‘x’
|
7, ‘x’
7, ‘x’
|
8,’x’
8, ‘x’
|
– Let’s create an index on table t1 and check its clustering factor
SQL> create index t1_idx on t1(id);
select index_name, clustering_factor from user_indexes where index_name=’T1_IDX';
INDEX_NAME CLUSTERING_FACTOR
—————————— —————–
T1_IDX 8
Hence, it can be seen that SIZE parameter is the most important parameter for a cluster. It controls the maximum number of cluster keys per block. It influences the space utilization of the cluster. If we set the size too high, we’ll have very few keys per block and will use more space then we need. If we set the size too low, and we’ll have data for a key value spread across multiple blocks which defeats the very purpose of the cluster to store all of the data together on a single block.
Summary:
- Size clause of create cluster statement is used to compute the maximum number of cluster keys that could fit per block.
- In the create cluster statement, if size < block size, records with different key values may be placed in the same block.
- A block will contain the records for one key value only provided
. data presorted by the cluster key is added to the table OR
. Size = blocksize is specified in the create cluster statement
- When any of the above options are used, the index on the cluster key column in the table has a low (desirable) clustering factor approaching the no. of blocks in the table.
– Cluster index has one entry per cluster key and not for each row. Therefore, the index is smaller and less costly to access for finding multiple rows.
Note:
- An index cluster has negative performance impacts on DML performance, particularly insert statements. Since data has to be put in the right place, it takes longer to insert data.
- Neither the cluster nor the tables in the cluster can be partitioned.
- Tables in cluster cannot be truncated. You must delete rows from the tables in the cluster.
- Index clusters are also called B*Tree clusters.
In my post Clustering Factor Demystified : Part – III, I have demonstrated the use of single table index and hash clusters to improve the clustering factor of an unorganized table.
———————————————————————————————————–
Related links :
Database Index
Tuning Index
SingleTable hash Clusters
Clustering Factor Demystified Part – I
Clustering Factor Demystified Part – II
Clustering FactorDemystified Part – III
Cluster concept well broken down and demonstrated.