SINGLE TABLE HASH CLUSTERS DEMYSTIFIED


In this post, I will discuss single table hash clusters which we can fetch a row requiring only a single block get most of the time. Hash clusters are quite similar to index clusters, except that the cluster index is missing. The data in the cluster itself acts like an index.
First, let us see how is the data stored in a hash cluster.
Let’s say we have a hash cluster table with two columns id (number) and text (char) where  id is the key column. On  inserting a record with id = 1 into the table- The key column value (1 here) is hashed and  converted into a diskaddress.- The record is placed at the disk address returned above.Now, if  another record is inserted with id = 2, , the key column value 2 is hashed to return a different address. Thus records having different key values will be placed at different disk addresses whereas  all the records with same key value will go to the same address on the disk.
Now, let’s see how the data is retrieved from  a hash cluster as compared to a conventional indexed cluster  table.In a conventional indexed heap table , if there is a unique index on the column being queried, the algorithm followed is :- Perform unique scan of the index to locate the key and get the rowid (i.e. at least 2 I/O’s – one for index root block, one for index leaf block and may be more as blevel of the index increases)

– Using the rowid obtained above, access the table by rowid (i.e. one I/O to get the data)

whereas, the procedure followed in  a hash cluster is :

– Hash the primary key to get the physical location of the record

– Perform single I/O to read the block.

Thus, hash clusters can  reduce I/O on  tables.

How to create a single table hash cluster?

create cluster <cluster_name> (cluster_key <datatype>)

size <size_number> single table hashkeys <hash_keys_number> hash is
<expr>;

Let’s understand various clauses in create cluster command:

CLUSTER_KEY <datatype>

The cluster key should have a single column containing only integers.

Hash clusters having composite cluster keys or cluster keys made up of non integer columns use the internal hash function.

If a non-integer cluster key value is supplied and internal hash function is bypassed , the operation (INSERT or UPDATE statement) is rolled back and an error is returned.

SIZE <size_number>

Specifies the amount of space in bytes reserved in a block to store all rows having the same cluster key value or the same hash value. This space decides the maximum number of cluster or hash values stored in a data block.

SINGLE TABLE

SINGLE TABLE indicates that this  hash cluster can contain only one
table.. However, you may drop the table and create another  table in the
same cluster.

HASHKEYS <hash_keys_number>

Specify the HASHKEYS clause to create a hash cluster and specify the number of hash values for the hash cluster.The HASHKEYS value specifies and limits the number of unique hash values that can be generated by the hash function used by the cluster. (how many distinct values you anticipate for the cluster key over time)Oracle Database rounds up the HASHKEYS value to the nearest prime number to obtain the actual number of hash values. The minimum value for this parameter
is 2. If you omit the HASHKEYS parameter, the database creates an indexed cluster by default.

When you create a hash cluster, the database immediately allocates space for the cluster based on the values of the SIZE and HASHKEYS parameters.

It allocates a hash table to hold HASHKEY number of cluster keys of SIZE bytes each.

HASH IS <expr>

Specify an expression to be used as the hash function for the hash cluster. Must evaluate to a positive value

If you omit the HASH IS clause, then Oracle Database uses an internal hash function for the hash cluster.

—- Overview –

– create two single table hash clusters with
size = 8k , hashkeys = 4

cluster HSH_CLUSTER_ROWSIZ_8K and HSH_CLUSTER_ROWSIZ_4K

— Create  table hash_cluster_tab_ROWSIZE_8k in cluster HSH_CLUSTER_ROWSIZ_8K  with row size such that only one record fits one block

— Create  table hash_cluster_tab_ROWSIZE_4k in cluster HSH_CLUSTER_ROWSIZ_4K  with row size such that two records fit one block

— insert records for 5 distinct key values to both the tables

— Each record goes to a different block as hashkeys has been set to 5 (next prime no.) and each block can have records with only one hash key.

— Try to access record with a key value – single I/O

— Insert records for 6th key values although provision has made for 5 keys only and check that hash collision takes place as id = 6 is mapped to one of the already existing hash values (id=1)

— Add another record for id = 1 and check that multiple blocks containing rows with same hash value are chained together.

– Create 3 clusters with size = 2K, hashkeys = 4  

HSH_CLUSTER_SIZ_2K_ROW_1K,  HSH_CLUSTER_SIZ_2K_ROW_2K and HSH_CLUSTER_SIZ_2K_ROW_4K

— Create a table HSH_TAB_SIZ_2K_ROWSIZ_1K in cluster HSH_CLUSTER_SIZ_2K_ROW_1K with row size such that row size =1/2( specified size) = 1/2(2K) = 1K

— Create a table HSH_TAB_SIZ_2K_ROWSIZ_2K in cluster HSH_CLUSTER_SIZ_2K_ROW_2Kwith row size such that row size =  specified size 2K

— Create a table HSH_TAB_SIZ_2K_ROWSIZ_4K in cluster HSH_CLUSTER_SIZ_2K_ROW_4K with row size such that row size = 2( specified size) = 2(2K) = 4K

— Insert records for 5 distinct key values in the three tables

— Check that each block contains records for a maximum of 3 hashkeys.
— Insert 3 records with id = 1 in all the 3 tables

— Add records for id’s = 6 to 9 i.e. four keys more than what we have defined the cluster for (5).

— Note that it takes time to add these records

— Check that there is collision for hash keys – Multiple key values correspond to the same  hash key – the hash chain for a hash key becomes longer and contains records with different key values.

— Check that overallocation takes place i.e. a block holds rows for hashkeys more than it is expected to hold (3).
– Implementation –

– create two single table hash clusters with
size = 8k , hashkeys = 4

cluster HSH_CLUSTER_ROWSIZ_8K and HSH_CLUSTER_ROWSIZ_4K

– The cluster key column for the clusters is id. The column in table in this cluster does not have to be called ID, but it must be NUMBER(2), to match this definition.

– Also we have specified a SIZE 8K option which means that we expect about 8K bytes of data to be associated with each cluster key value. Oracle will use it to compute the maximum number of cluster keys that could fit per block. Here, maximum no. of hash keys per block = Block size/SIZE = 8K/8K = 1 i.e. each block can contain records having only one hash key.

– The value of HASHKEYS limits the number of unique hash values that can be generated by the hash function used for the cluster. Oracle rounds the number you specify for HASHKEYS to the nearest prime number (5 here as we have set  HASHKEYS to 4 and 5 is the next prime number) i.e.  for any cluster key value, the hash function generates a maximum of 5 values.

SQL>Create tablespace mssm datafile
‘/u01/app/oracle/oradata/orcl/mssm01.dbf’ size 100m

segment space management manual;

    alter system set deferred_segment_creation=FALSE;

    drop  cluster  HSH_CLUSTER_ROWSIZ_8K including tables;

  create cluster HSH_CLUSTER_ROWSIZ_8K     ( id number(2) )
size 8192 single table hash is id hashkeys 4 tablespace mssm;

      drop cluster HSH_CLUSTER_ROWSIZ_4K including tables;

create cluster HSH_CLUSTER_ROWSIZ_4K ( id number(2) )
size 8192 single table hash is id hashkeys 4 tablespace mssm;

Let’s create  tables in  these clusters ..

– Create  table
hash_cluster_tab_ROWSIZE_8k in cluster HSH_CLUSTER_ROWSIZ_8K  with row
size such that only one record fits one block

drop table hash_cluster_tab_rowsize_8k purge;

CREATE TABLE HASH_CLUSTER_TAB_ROWSIZE_8K
(id          number(2) ,
txt1        char(2000),
txt2        char(2000),
txt3        char(2000)
 ) CLUSTER HSH_CLUSTER_ROWSIZ_8K( id );

– Create  table
hash_cluster_tab_ROWSIZE_4k in cluster HSH_CLUSTER_ROWSIZ_4K  with row
size such that two records fit one block

drop table hash_cluster_tab_rowsize_4k purge; 

CREATE TABLE hash_cluster_tab_rowsize_4k
(id          number(2) ,
txt1        char(2000),
txt2        char(1000)CLUSTER HSH_CLUSTER_ROWSIZ_4K( id );

– check the blocks assigned to the clusters –

SQL> col segment_name for a30
select segment_name, extent_id, block_id, blocks  
 from dba_extents where segment_name like ‘%HASH_CLUSTER%';

SEGMENT_NAME    EXTENT_ID   BLOCK_ID     BLOCKS

—————————— ———- ———- ———-

HSH_CLUSTER_ROWSIZ_8K        0       128          8

HSH_CLUSTER_ROWSIZ_4K       0       136          8

– Insert 5 records each with a  distinct
key value to both the tables

SQL>Begin
    for i in 1..5 loop
     insert into hash_cluster_tab_rowsize_8k values (i,  ‘x’, ‘x’, ‘x’);
     insert into hash_cluster_tab_rowsize_4k values (i,  ‘x’, ‘x’);
       end loop;
     commit;
    end;
/

Let’s find out the blocks the records have
gone to ..

– Each record goes to a different block as hashkeys has been set to 5 (next prime no.)
First block assigned to the cluster(128, 136) is used to store header info.

Records occupy blocks second block onwards

Note that record for id=5 goes to the first block as mod (id, hashkeys) decides the block no. for a key and mod (5,5) = 0

0 denotes the first block to accommodate records for the table

Note that records for id= 1  to 4  go to the subsequent  blocks as mod (1,5) = 1, mod(2,5) = 2 …

In hash_cluster_tab_rowsize_4k, even though two rows can fit one block, still each row has gone to a different block   as one block has been assigned to a hash value. and

SQL>select ‘HASH_CLUSTER_TABLE_8K’
TAB_NAME , id,    dbms_rowid.rowid_block_number(t1.rowid) hash_cluster_block

from hash_cluster_tab_rowsize_8k t1 
    union
select ‘HASH_CLUSTER_TABLE_4K’, id,
dbms_rowid.rowid_block_number(t2.rowid) hash_cluster_block
from hash_cluster_tab_rowsize_4k t2
     order by 1,2,3;

TAB_NAME
ID HASH_CLUSTER_BLOCK

——————— ———- ——————

HASH_CLUSTER_TABLE_4K
1
138

HASH_CLUSTER_TABLE_4K
2
139

HASH_CLUSTER_TABLE_4K
3
140

HASH_CLUSTER_TABLE_4K
4
141

HASH_CLUSTER_TABLE_4K
5
137

HASH_CLUSTER_TABLE_8K
1
130

HASH_CLUSTER_TABLE_8K
2
131

HASH_CLUSTER_TABLE_8K
3
132

HASH_CLUSTER_TABLE_8K
4
133

HASH_CLUSTER_TABLE_8K
5
129

– Try to access record with a key value –
single I/O is needed (cr=1) as hash access is used

SQL>conn / as sysdba


alter session set tracefile_identifier=’hash_cluster';


alter session set sql_trace=true;

       


select * from hash_cluster_tab_rowsize_8k where id = 1;


select * from hash_cluster_tab_rowsize_4k where id = 1;


alter session set sql_trace=false;


col value for a75


select value from v$diag_info where upper(name) like ‘%TRACE FILE%';

VALUE

—————————————————————————

/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_14452_hash_cluster.trc

ho
rm hash_cluster.out


ho tkprof
/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_14452_hash_cluster.trc
hash_cluster.out


ho vi hash_cluster.out

********************************************************************************

select * from

hash_cluster_tab_rowsize_8k where id = 1

call     count
cpu    elapsed
disk      query
current        rows

——- ——  ——– ———- ———- ———-
———-  ———-

Parse
1      0.00
0.00
0          2
0           0

Execute      1
0.00
0.00
0
0
0           0

Fetch
2      0.00
0.00
0         
0           1

——- ——  ——– ———- ———- ———-
———-  ———-

total
4      0.00
0.01
0          3
0           1

Rows     Row Source Operation

——-  —————————————————

1  TABLE ACCESS HASH
HASH_CLUSTER_TAB_ROWSIZE_8K (cr=1 pr=0
pw=0 time=0 us)

********************************************************************************

select * from

hash_cluster_tab_rowsize_4k where id = 1

call     count
cpu    elapsed
disk      query
current        rows

——- ——  ——– ———- ———- ———-
———-  ———-

Parse
1      0.00
0.00
0          2
0           0

Execute      1      0.00
0.00
0
0
0           0

Fetch
2      0.00
0.00
0          1  
0           1

——- ——  ——– ———- ———- ———-
———-  ———-

total
4      0.00       0.00
0
3
0           1

Rows     Row Source Operation

——-  —————————————————

1  TABLE ACCESS HASH
HASH_CLUSTER_TAB_ROWSIZE_4K (cr=1 pr=0
pw=0 time=0 us)

********************************************************************************

– Insert records for 6th key values although
provision has made for 5 keys only

SQL> insert into
hash_cluster_tab_rowsize_8k values (6,  ‘x’, ‘x’, ‘x’);


insert into hash_cluster_tab_rowsize_4k values (6,
‘x’, ‘x’);


commit;

– Let’s find out the blocks the records have
gone to .

–    In hash_cluster_tab_rowsize_4k, id = 6 is hashed
to one of the existing hash values and record is placed in the same block (138)
as the earlier containing record with same hash key as two records can fit one
block. Hence block 138 becomes overflow block as it contains more hashvalues
than it is configured for.

Here id = 6 has been mappped to same hash
value as id = 1 and has occupied the same block as id = 1

–    In hash_cluster_tab_rowsize_8k, since one row can
fit one block, Newly added rows have  gone to a new  block (147) as
the earlier blocks can’t accommodate new rows.

But it has again been mapped to one of the
already existing hash values and is chained to the blocks containing those key
values.

SQL>select ‘HASH_CLUSTER_TABLE_8K’
TAB_NAME , id, dbms_rowid.rowid_block_number(t1.rowid) hash_cluster_block


from hash_cluster_tab_rowsize_8k t1

    union

select ‘HASH_CLUSTER_TABLE_4K’, id,
dbms_rowid.rowid_block_number(t2.rowid) hash_cluster_block


from hash_cluster_tab_rowsize_4k t2

     order by 1,2,3;

TAB_NAME
ID HASH_CLUSTER_BLOCK

——————— ———- ——————

HASH_CLUSTER_TABLE_4K        
1
138

HASH_CLUSTER_TABLE_4K
2
139

HASH_CLUSTER_TABLE_4K
3
140

HASH_CLUSTER_TABLE_4K
4
141

HASH_CLUSTER_TABLE_4K
5
137

HASH_CLUSTER_TABLE_4K        
6
138

HASH_CLUSTER_TABLE_8K
1
130

HASH_CLUSTER_TABLE_8K
2
131

HASH_CLUSTER_TABLE_8K
3
132

HASH_CLUSTER_TABLE_8K
4
133

HASH_CLUSTER_TABLE_8K
5
129

HASH_CLUSTER_TABLE_8K        
6
147

– Let’s add another record for id = 1

In hash_cluster_tab_rowsize_2k,  although we have space for another row
in blocks containing id = 2,3,4,5 , the new record goes to a new block(142) as
one block can contain only one hashvalue.  From now onwards, a block which
contains id = 1 may also contain an entry for id=6 and vice versa since both of
them hash to the same value.

In hash_cluster_tab_rowsize_8k also , the new  record goes to a new
block (135) as a block can contain only one row. Presuming that id = 6 and 1
hash to the same value, blocks containing id = 1 (130 and 135) have been
chained to block containing id=6 (135).

From now onwards, when we search for id =1 or id = 6 we will have to scan a
larger no. of blocks.Hence, if actual no. of cluster keys exceeds the specified
value for HASHKEYS,  the likelihood of a collision (two cluster
key values having the same hash value) increases and performance degrades.

SQL>

insert into
hash_cluster_tab_rowsize_8k values (1,  ‘x’, ‘x’, ‘x’);

     insert into
hash_cluster_tab_rowsize_4k values (1,  ‘x’, ‘x’);

     commit;

   

  select ‘HASH_CLUSTER_TABLE_8K’
TAB_NAME , id, dbms_rowid.rowid_block_number(t1.rowid) hash_cluster_block


from hash_cluster_tab_rowsize_8k t1

    union all

select ‘HASH_CLUSTER_TABLE_4K’, id,
dbms_rowid.rowid_block_number(t2.rowid) hash_cluster_block


from hash_cluster_tab_rowsize_4k t2

     order by 1,2,3;

TAB_NAME
ID HASH_CLUSTER_BLOCK

——————— ———- ——————

HASH_CLUSTER_TABLE_4K          1
138

HASH_CLUSTER_TABLE_4K          1
142

HASH_CLUSTER_TABLE_4K
2
139

HASH_CLUSTER_TABLE_4K
3
140

HASH_CLUSTER_TABLE_4K
4
141

HASH_CLUSTER_TABLE_4K
5
137

HASH_CLUSTER_TABLE_4K          6
138

HASH_CLUSTER_TABLE_8K          1
130

HASH_CLUSTER_TABLE_8K          1
135

HASH_CLUSTER_TABLE_8K
2
131

HASH_CLUSTER_TABLE_8K
3
132

HASH_CLUSTER_TABLE_8K
4
133

HASH_CLUSTER_TABLE_8K
5
129

HASH_CLUSTER_TABLE_8K        
6
134

Therefore, the distribution of rows in a hash cluster is directly
controlled by the value set for the HASHKEYS parameter. With a larger number of
hash keys for a given number of rows, the likelihood of
a collision (two cluster key values having the same hash value)
decreases. Minimizing the number of collisions is important because overflow
blocks (thus extra I/O) might be necessary to store rows with hash values that
collide.

Now let’s play around with SIZE clause...

The maximum number of hash keys assigned per data block is determined by the
SIZE parameter of the CREATE CLUSTER command. SIZE is an estimate of the total
amount of space in bytes required to store the average number of rows associated
with each hash value. For example, if the available free space per data block
is 1700 bytes and SIZE is set to 500 bytes, three hash keys (round(1700/500))
are assigned per data block.

Note: The importance of the SIZE parameter of hash clusters is analogous
to that of the SIZE parameter for index clusters. However, with index clusters,
SIZE applies to rows with the same cluster key value instead of the same hash
value.

Although the maximum number of hash key values per data block is determined
by SIZE, Oracle does not actually reserve space for each hash key value in the
block. For example, if SIZE determines that three hash key values are allowed
per block, this does not prevent rows for one hash key value from taking up all
of the available space in the block. If there are more rows for a given hash
key value than can fit in a single block, the block is chained, as necessary.

– Create 3 clusters with size = 2K, hashkeys
= 4  

HSH_CLUSTER_SIZ_2K_ROW_1K,  HSH_CLUSTER_SIZ_2K_ROW_2Kand
HSH_CLUSTER_SIZ_2K_ROW_4K

— Available free space is slightly less that blocksize of 8K

— No. of hashkeys that can be stored in the block < 4 (blocksize/size =
8k/2K)

= 3

SQL>  drop cluster HSH_CLUSTER_SIZ_2K_ROW_1K
including tables;


create cluster HSH_CLUSTER_SIZ_2K_ROW_1K


( id number(2) )


size 2K single table hash is id hashkeys 4 tablespace mssm;

      drop cluster
HSH_CLUSTER_SIZ_2K_ROW_2Kincluding tables;


create cluster hash_cluster_size_2k_row_2k


( id number(2) )


size 2K single table hash is id hashkeys 4 tablespace mssm;

     drop cluster
HSH_CLUSTER_SIZ_2K_ROW_4K including tables;


create cluster HSH_CLUSTER_SIZ_2K_ROW_4K


( id number(2) )


size 2K single table hash is id hashkeys 4 tablespace mssm;

– Create a table HSH_TAB_SIZ_2K_ROWSIZ_1K in
cluster HSH_CLUSTER_SIZ_2K_ROW_1K with row size such that row size =1/2(
specified size) = 1/2(2K) = 1K 

SQL>drop table HSH_TAB_SIZ_2K_ROWSIZ_1K
purge;

CREATE TABLE HSH_TAB_SIZ_2K_ROWSIZ_1K

(
id          number(2) ,


txt1        char(400),


txt2        char(400)

)

CLUSTER HSH_CLUSTER_SIZ_2K_ROW_1K( id );

– Create a table HSH_TAB_SIZ_2K_ROWSIZ_2K in
cluster HSH_CLUSTER_SIZ_2K_ROW_2Kwith row size such that row size =
specified size 2K  

SQL>drop table HSH_TAB_SIZ_2K_ROWSIZ_2K
purge;

CREATE TABLE HSH_TAB_SIZ_2K_ROWSIZ_2K

(
id          number(2) ,


txt1        char(900),


txt2        char(900)

)

CLUSTER HASH_CLUSTER_SIZE_2K_row_2k( id );
– Create a table HSH_TAB_SIZ_2K_ROWSIZ_4K in
cluster HSH_CLUSTER_SIZ_2K_ROW_4K with row size such that row size = 2(
specified size) = 2(2K) = 4K 

SQL>drop table HSH_TAB_SIZ_2K_ROWSIZ_4K
purge;

CREATE TABLE HSH_TAB_SIZ_2K_ROWSIZ_4K

(
id          number(2) ,


txt1        char(1900),


txt2        char(1900)

)

CLUSTER HSH_CLUSTER_SIZ_2K_ROW_4K( id );

– Insert records for 5 distinct key values in
the three tables

SQL>Begin

    for i in 1..5 loop

      insert into
HSH_TAB_SIZ_2K_ROWSIZ_1K values (i,  ‘x’, ‘x’);

      insert into
HSH_TAB_SIZ_2K_ROWSIZ_2K values (i,  ‘x’, ‘x’);

      insert into
HSH_TAB_SIZ_2K_ROWSIZ_4K values (i,  ‘x’, ‘x’);

       end
loop;

     commit;

    end;

/

– Check that each block contains records for
a maximum of 3 hashkeys.   

For row size = 1K

8 rows can fit one block

Max no. of hash keys per block = 3

Max no. of records of distinct key values per block = 3

Records for id = 1,2 and 5 (3 hash keys) go to block 161

Records for id = 3 and 4 (only records left) go to block
162

 – For row size = 2K

3 rows can fit one block

Max no. of hash keys per block = 3

Max no. of records of distinct key values per block = 3

Records for id = 1,2 and 5 (3 hash keys) go to block 169

Records for id = 3 and 4 (only records left) go to block
170

 – For row size = 4K

1 row can fit one block

Max no. of hash keys per block = 3

Max no. of records possible per block = 1

Records for each id  go to different block

SQL>select
‘HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K’ TABLE_NAME, id,
dbms_rowid.rowid_block_number(t1.rowid) hash_cluster_block


from HSH_TAB_SIZ_2K_ROWSIZ_1K t1

    union

    select
‘HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K’ TABLE_NAME, id,
dbms_rowid.rowid_block_number(t2.rowid) hash_cluster_block


from HSH_TAB_SIZ_2K_ROWSIZ_2K t2

    union

    select
‘HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K’ TABLE_NAME, id,
dbms_rowid.rowid_block_number(t3.rowid) hash_cluster_block


from HSH_TAB_SIZ_2K_ROWSIZ_4K t3  

     order by 1,2,3;

TABLE_NAME
ID HASH_CLUSTER_BLOCK

————————————- ———- ——————

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K          1
161

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K         
2
161

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K
3
162

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K         
4
162

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K        
5
161

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K
1
169

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K        
2
169

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K       
3
170

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K
4
170

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K         
5
169

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
1
177

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
2
179

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
3
178

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
4
180

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
5
181

 — Let’s insert 3 records with id = 1 in
all the 3 tables 

- For row size = 1K

No. of rows already there = 3 in block 161 containing id = 1

Since 8 rows can fit one block, all the 3 rows inserted go to
block 161

Now block 161 has 6 rows

i.e. although the maximum number of hash key values per data
block as determined by SIZE is 3, Oracle does not actually reserve space for
each hash key value in the block. Rows for one hash key value(id=1) can take up
all of the available space in the block.

 – For row size = 2K

3 rows can fit one block

Block 169 containing id=1 already has 3 rows

Although 1 more row can fit in block 170 , newly added
rows for id=1 go to anothet block 172

i.e. when key value is inserted for the first time, its
blockmate keys are decided or let’s say that it is     decided
records for which hash values will stay together. From then onwards, records
for its blockmate keys will always reside on the same block. Hence, once hash
values occupying a block have been grouped (1,2,5 and 3,4) further records will
occupy only the blocks along with their blockmates.

Block 172 is chained to block 169 as both contain id=1.

 – For row size = 4K

1 row can fit one block

Each newly added record goes to new block

All the blocks containing id=1 (177, 184, 185, 186) will
be chained

SQL>select
‘HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K’ TABLE_NAME, id,
dbms_rowid.rowid_block_number(t1.rowid) hash_cluster_block, count(*)


from HSH_TAB_SIZ_2K_ROWSIZ_1K t1

    group by id,
dbms_rowid.rowid_block_number(t1.rowid)

    union

    select
‘HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K’ TABLE_NAME, id,
dbms_rowid.rowid_block_number(t2.rowid) hash_cluster_block, count(*)


from HSH_TAB_SIZ_2K_ROWSIZ_2K t2

    group by id,
dbms_rowid.rowid_block_number(t2.rowid)

    union

    select
‘HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K’ TABLE_NAME, id,
dbms_rowid.rowid_block_number(t3.rowid) hash_cluster_block, count(*)


from HSH_TAB_SIZ_2K_ROWSIZ_4K t3 

    group by id,
dbms_rowid.rowid_block_number(t3.rowid)

     order by 1,3,2;

TABLE_NAME
ID HASH_CLUSTER_BLOCK   COUNT(*)

————————————- ———- ——————
———-

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K
1
161          4

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K
2
161          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K         
5
161          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K
3
162          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K
4
162          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K
1
169          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K
2
169          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K
5
169          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K
3
170          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K
4
170          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K         
1
172          3

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
1
177          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
3
178          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
2
179          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
4
180          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
5
181          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K         
1
184          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K         
1
185          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K         
1
186          1

– Add records for id’s = 6 to 9 i.e. four
keys more than what we have defined the cluster for (5).

— Note that it takes time to add these records

SQL>Begin

    for i in 6..9 loop

      insert into
HSH_TAB_SIZ_2K_ROWSIZ_1K values (i,  ‘x’, ‘x’);

      insert into
HSH_TAB_SIZ_2K_ROWSIZ_2K values (i,  ‘x’, ‘x’);

      insert into
HSH_TAB_SIZ_2K_ROWSIZ_4K values (i,  ‘x’, ‘x’);

       end
loop;

     commit;

    end;

/

– Check that there is collision for hash keys
- Multiple key values correspond to the same  hash key – the
hash chain for a hash key becomes longer and contains records with different
key values.

– Check that overallocation takes place i.e.
a block holds rows for hashkeys more than it is expected to hold (3).

 – For row size = 1K

In block 161, no. of rows already there = 6  containing id
= 1

Since 8 rows can fit one block, rows for id = 6 and 7 go to block
161 i.e. now block 161 contains records for id = 1,2,5,6,7

i.e. it contains 2 hashkeys more than it has configured to hold
. This is called overallocation and block 161 is called overflow block.

From now onwards whenever a record is added with any of id =
1,2,5,6,7, it will be chained to block 161   .

All the blocks chained to block 161  will be searched whenever any of
these id’s is searched.

 – For row size = 2K

3 rows can fit one block

Block 169 containing id=1 already has 3 rows with id =
1,2,5

Block 172 has 3 rows with id = 1 – It is already full –
it is chained to block 169

Block 170 has 2 rows with id = 3,4

Record for id=8 goes to block 170 i.e. from now onwards all
the blocks containing of id=3,4,8 will be chained to block 170.

All the blocks chained to block 170  will be
searched whenever any of these id’s (3,4,8) is searched.

Rows with id’s 7 and 9  go to block 173 . This block
will be chained with block 172 or 170

 – For row size = 4K

1 row can fit one block

Each newly added record goes to new block

Blocks with id’s > 5 will be chained with any of the
earlier blocks with matching hash value.

e.g. mod(1,5) = mod (6,5) = 1

Hence block containing id = 6 i.e. block 183  will
chain with the all the blocks containing id = 1 (177, 184, 185, 186)

i.e. to search for id = 6, 5 blocks will be searched.

– With a larger number of hash keys for a given number of rows, the
likelihood of a collision (two cluster key values having the same
hash value) decreases.

SQL>select
‘HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K’ TABLE_NAME, id,
dbms_rowid.rowid_block_number(t1.rowid) hash_cluster_block, count(*)


from HSH_TAB_SIZ_2K_ROWSIZ_1K t1

    group by id,
dbms_rowid.rowid_block_number(t1.rowid)

    union

    select
‘HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K’ TABLE_NAME, id,
dbms_rowid.rowid_block_number(t2.rowid) hash_cluster_block, count(*)


from HSH_TAB_SIZ_2K_ROWSIZ_2K t2

    group by id,
dbms_rowid.rowid_block_number(t2.rowid)

    union

    select
‘HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K’ TABLE_NAME, id,
dbms_rowid.rowid_block_number(t3.rowid) hash_cluster_block, count(*)


from HSH_TAB_SIZ_2K_ROWSIZ_4K t3 

    group by id,
dbms_rowid.rowid_block_number(t3.rowid)

     order by 1,3,2;

TABLE_NAME
ID HASH_CLUSTER_BLOCK   COUNT(*)

————————————- ———- ——————
———-

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K
1
161          4

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K
2
161          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K
5
161          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K         
6
161          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K
7
161          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K
3
162          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K
4
162          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K
8
162          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_1K         
9
162          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K
1
169          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K
2
169          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K
5
169          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K
3
170          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K
4
170          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K         
8
170          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K
6
171          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K
1
172          3

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K
7
173          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_2K
9
173          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
1
177          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
3
178          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
2
179          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
4
180          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
5
181          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
9
182          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
6
183          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
1
184          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
1
185          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
1
186          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K
7
187          1

HASH_CLUSTER_TABLE_SIZE_2K_ROWSIZE_4K         
8
188          1

Specifying a User-Defined Hash Function

You can also specify any SQL expression as the hash function for a hash
cluster. If your cluster key values are not evenly distributed among the
cluster, you should consider creating your own hash function that more
efficiently distributes cluster rows among the hash values.

For example, if you have a hash cluster containing employee information and
the cluster key is the employee’s home area code, it is likely that many
employees will hash to the same hash value. To alleviate this problem, you can
place the following expression in the HASH IS clause of the CREATE CLUSTER
command:

MOD((emp.home_area_code + emp.home_prefix + emp.home_suffix), 101)

The expression takes the area code column and adds the phone prefix and
suffix columns, divides by the number of hash values (in this case 101), and
then uses the remainder as the hash value. The result is cluster rows more
evenly distributed among the various hash values.

CREATE CLUSTER address

(postal_code NUMBER, country_id CHAR(2))

HASHKEYS 20

HASH IS MOD(postal_code + country_id, 101);

Summary:

— Hash Cluster tables are appropriate for data that is read frequently via
an equality comparison on the key. If an index scan is used for a key
value, as more no. of users search for the same record, they hit the same index
block which becomes the “hot block” leading to more contention for
the cache buffers chains (cbc) latch. Replacing indexed tables with hash cluster
tables in this case can resolve the problem of contention for CBC latches.

— you cannot range scan a table in a hash cluster without adding a
conventional index to the table.  In an index cluster, the query for range of values will  be able to make use of the cluster key index to find these rows. In a hash cluster, this query would result in a full table scan unless you had an index on the key column. Only exact equality searches (including in lists and subqueries) may be made on the hash key without using an index that supports range scans.

— When you create a hash cluster  table, you must determine in advance,  the number of hash keys your table will ever have. The number of HASHKEYs in a hash cluster is a fixed size. You cannot change the size of the hash table without a rebuild of the cluster. This does not in any way limit the amount of data you can store in this cluster; it simply limits the number of unique hash keys that can be generated for this cluster. That may affect performance due to unintended hash collisions if the value was set too low. Getting the size of the HASHKEYs and SIZE parameters right is crucial to avoid a rebuild.

— With a hash cluster, the tables will start out big and will take longer
to create, as Oracle must initialize each block, an action that normally takes
place as data is added to the table. They have the potential to have data in
their first block and their last block, with nothing in between. Full scanning
a virtually empty hash cluster will take as long as full scanning a full hash cluster.

• The hash cluster is allocated right from the beginning. Oracle will take
your HASHKEYS/ trunc(blocksize/SIZE) and allocate and format that space right away. As soon as the first table is put in that cluster, any full scan will hit every allocated block. This is different from every other table in this respect.

• Updates to hash cluster tables do not introduce significant overhead,
unless you update the HASHKEY, which would not be a good idea, as it would
cause the row to migrate

. Hash clusters allocate all the storage for all the hash buckets when the cluster is created, so they may waste space.. Full scans on single table hash clusters will cost as much as they would in a heap table.

In my post Clustering Factor Demystified : Part – III, I have demonstrated the use of single table index and hash clusters to improve the clustering factor of an unorganized table.

——————————–

References :

http://www.relationaldbdesign.com/extended-database-features/module3/oracle-hash-cluster.php

http://docs.oracle.com/cd/B19306_01/server.102/b14231/hash.htm

http://simpleoracledba.blogspot.in/2009/05/oracle-single-table-hash-clusters-are.html

http://jonathanlewis.wordpress.com/2009/10/05/hash-clusters-2/

http://www.iselfschooling.com/mc4articles/mc4cluster.htm

http://dbaforums.org/oracle/index.php?showtopic=18947

http://www.orafaq.com/tuningguide/bad%20hash%20cluster.html

http://www.riddle.ru/mirrors/oracledocs/server/scn73/ch507.html

http://www.dba-oracle.com/t_hash_tables_clusters.htm

http://docs.oracle.com/cd/E11882_01/server.112/e25789/tablecls.htm

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:707440500346653259

http://stackoverflow.com/questions/2714099/oracle-hash-cluster-overflow-blocks

http://www.orafaq.com/forum/t/141142/2/

https://forums.oracle.com/forums/thread.jspa?threadID=2512488&start=0&tstart=0

———————————————————————————————————–

Related links :

Home

Database Index

Tuning Index

Single Table Index Clusters

Clustering Factor Demystified : Part – I

Clustering Factor Demystified : Part – II

Clustering Factor Demystified : Part – III

 

 

——————-

 

One thought on “SINGLE TABLE HASH CLUSTERS DEMYSTIFIED

Your comments and suggestions are welcome!