Create Histograms On Columns That Already Have One

The default value of METHOD_OPT from  10g onwards is ‘FOR ALL COLUMNS SIZE AUTO’.

The definition of AUTO as per Oracle documentation is  :
AUTO: Oracle determines the columns to collect histograms based on data distribution and the workload of the columns.

This basically implies that Oracle will automatically  create histograms on those  columns which have skewed data distribution and there are  SQL statements  referencing those columns.

However, this gives rise to the problem is that Oracle generates too many  unnecessary histograms .

Let’s demonstrate:

– Create a table with skewed data distribution in two columns

SQL>drop table hr.skewed purge;

create table hr.skewed
( empno number,
job_id varchar2(10),
salary number);

insert into hr.skewed select employee_id, job_id, salary
from hr.employees;

– On gathering statistics for the table using default options, it can be seen that histogram is not gathered on any column although data
distribution in columns JOB_ID and SALARY is skewed

SQL>exec dbms_stats.gather_table_stats('HR','SKEWED');

col table_name for a10
col column_name for a10
select TABLE_NAME,COLUMN_NAME,HISTOGRAM
from dba_tab_columns where table_name = 'SKEWED';

TABLE_NAME COLUMN_NAM HISTOGRAM
---------- ---------- ---------------
SKEWED SALARY NONE
SKEWED JOB_ID NONE
SKEWED EMPNO NONE

– Let’s now issue some queries querying the table based on  the  three columns in the table followed by statistics gathering to verify that histograms get automatically created only on columns with skewed data distribution.

– No histogram gets created if column EMPNO is queried which
has data distributed uniformly

SQL>select * from hr.skewed where empno = 100;
exec dbms_stats.gather_table_stats('HR','SKEWED');

col table_name for a10
col column_name for a10

select TABLE_NAME,COLUMN_NAME,HISTOGRAM
from dba_tab_columns where table_name = 'SKEWED';

TABLE_NAME COLUMN_NAM HISTOGRAM
---------- ---------- ---------------
SKEWED SALARY NONE
SKEWED JOB_ID NONE
SKEWED EMPNO NONE

– A histogram gets created on JOB_ID column as soon as we search  for records with a JOB_ID as data distribution is non-uniform in JOB_ID column

SQL>select * from hr.skewed where job_id = 'CLERK';
exec dbms_stats.gather_table_stats('HR','SKEWED');

col table_name for a10
col column_name for a10

select TABLE_NAME,COLUMN_NAME,HISTOGRAM
from dba_tab_columns where table_name = 'SKEWED';

TABLE_NAME COLUMN_NAM HISTOGRAM
---------- ---------- ---------------
SKEWED SALARY NONE
SKEWED JOB_ID FREQUENCY
SKEWED EMPNO NONE

– A histogram gets created on SALARY column when search is made for  employees drawing salary more than 10000 as data distribution is non-uniform in SALARY column.

SQL>select * from hr.skewed where salary < 10000;
exec dbms_stats.gather_table_stats('HR','SKEWED');

col table_name for a10
col column_name for a10
select TABLE_NAME,COLUMN_NAME,HISTOGRAM
from dba_tab_columns where table_name = 'SKEWED';

TABLE_NAME COLUMN_NAM HISTOGRAM
---------- ---------- ---------------
SKEWED SALARY FREQUENCY
SKEWED JOB_ID FREQUENCY
SKEWED EMPNO NONE

Thus gathering statistics using default options, manually or as part of the automatic maintenance task,  might lead to creation of histograms  on all such columns  which have  skewed data distribution and  had been  part of the search clause even once. That is, Oracle  makes even the histograms you didn’t ask for.  Some of the histograms might not be needed by the application and hence are undesirable as computing histograms is a resource intensive operation and moreover they might  degrade the performance as a result of their interaction with bind peeking.

Solution
Employ FOR ALL COLUMNS SIZE REPEAT option of METHOD_OPT parameter  which prevents deletion of existing histograms and collects histograms only on the columns that already have histograms.

First step is to eliminate unwanted histograms and have histograms only on the desired columns.

Well, there are two options:

OPTION-I: Delete histograms from unwanted columns and use REPEAT option henceforth which Collects histograms only on the columns that already have histograms.

– Delete unwanted histogram for SALARY column

SQL>exec dbms_stats.gather_table_stats('HR','SKEWED', -
METHOD_OPT => 'for columns salary size 1');

-- Verify that histogram for salary column has been deleted

col table_name for a10
col column_name for a10

select TABLE_NAME,COLUMN_NAME,HISTOGRAM
from dba_tab_columns where table_name = 'SKEWED';

TABLE_NAME COLUMN_NAM HISTOGRAM
---------- ---------- ---------------
SKEWED SALARY NONE
SKEWED JOB_ID FREQUENCY
SKEWED EMPNO NONE

– Issue a SQL with  salary column in where clause and verify that gathering  stats using repeat  option retains histogram on JOB_ID column and does not cause histogram to be created on salary column.

SQL>select * from hr.skewed where salary < 10000;

exec dbms_stats.gather_table_stats('HR','SKEWED',-
METHOD_OPT => 'for columns salary size REPEAT');

col table_name for a10
col column_name for a10

select TABLE_NAME,COLUMN_NAME,HISTOGRAM
from dba_tab_columns where table_name = 'SKEWED';

TABLE_NAME COLUMN_NAM HISTOGRAM
---------- ---------- ---------------
SKEWED SALARY NONE
 SKEWED JOB_ID FREQUENCY
SKEWED EMPNO NONE

OPTION-II:   Wipe out all histograms and manually add only the desired ones. Use REPEAT option henceforth which Collects histograms only on the columns that already have one.

– Delete histograms on all columns 

SQL>exec dbms_stats.gather_table_stats('HR','SKEWED',-
METHOD_OPT => 'for all columns size 1');

– Verify that histograms on all columns have been dropped

SQL>col table_name for a10
col column_name for a10

select TABLE_NAME,COLUMN_NAME,HISTOGRAM
from dba_tab_columns where table_name = 'SKEWED';

TABLE_NAME COLUMN_NAM HISTOGRAM
---------- ---------- ---------------
SKEWED SALARY NONE
 SKEWED JOB_ID NONE
 SKEWED EMPNO NONE

– Create histogram only on the desired JOB_ID column

SQL>exec dbms_stats.gather_table_stats('HR','SKEWED',-
METHOD_OPT => 'for columns JOB_ID size AUTO');

– Verify that histogram has been created on JOB_ID

SQL>col table_name for a10
col column_name for a10

select TABLE_NAME,COLUMN_NAME,HISTOGRAM
from dba_tab_columns where table_name = 'SKEWED';

TABLE_NAME COLUMN_NAM HISTOGRAM
---------- ---------- ---------------
SKEWED SALARY NONE
SKEWED JOB_ID FREQUENCY
SKEWED EMPNO NONE

- Verify that gathering  stats using repeat  option creates histogram only on JOB_ID column on which it already exists

SQL>exec dbms_stats.gather_table_stats('HR','SKEWED',-
METHOD_OPT => 'for columns salary size REPEAT');

SQL>col table_name for a10
col column_name for a10
select TABLE_NAME,COLUMN_NAME,HISTOGRAM
from dba_tab_columns where table_name = 'SKEWED';

TABLE_NAME COLUMN_NAM HISTOGRAM
---------- ---------- ---------------
SKEWED SALARY NONE
SKEWED JOB_ID FREQUENCY
SKEWED EMPNO NONE

That is, now Oracle will no longer make histograms you didn’t ask for.

– Finally, change the preference for METHOD_OPT parameter of automatic stats gathering job from default value of AUTO to REPEAT so that it will gather histograms only for the columns already having one.

–  Get Current value –

SQL> select dbms_stats.get_prefs ('METHOD_OPT') from dual;

DBMS_STATS.GET_PREFS('METHOD_OPT')
-----------------------------------------------------------------------
FOR ALL COLUMNS SIZE AUTO

– Set preference to REPEAT–

SQL> exec dbms_stats.set_global_prefs ('METHOD_OPT','FOR ALL COLUMNS SIZE REPEAT');

– Verify –

SQL> select dbms_stats.get_prefs ('METHOD_OPT') from dual;

DBMS_STATS.GET_PREFS('METHOD_OPT')
-----------------------------------------------------------------------
FOR ALL COLUMNS SIZE REPEAT

From  now onwards, gathering  of statistics, manually or automatically will not create any new histograms while retaining  all the existing ones.

I hope this post is useful.

Happy reading….

References:

https://blogs.oracle.com/optimizer/entry/how_does_the_method_opt
http://www.pythian.com/blog/stabilize-oracle-10gs-bind-peeking-behaviour/
https://richardfoote.wordpress.com/2008/01/04/dbms_stats-method_opt-default-behaviour-changed-in-10g-be-careful/

———————————————————————————————–

Related Links:

Home

Database Index
Tuning Index

———————————————————————————————

8 thoughts on “Create Histograms On Columns That Already Have One

  1. Thanks Anju for this nicely detailed post.
    I was also reading a post from Maria (in Oracle Optimizer blog) , and she commented about repeat (https://blogs.oracle.com/optimizer/entry/how_does_the_method_opt) :
    ” – REPEAT ensures a histogram will only be created for any column that already has one. If the table is a partitioned table, repeat ensures a histogram will be created for a column that already has one on the global level. However, this is not a recommended setting, as the number of buckets currently in each histogram will limit the maximum number of buckets used for the newly created histograms. Lets assume there are 5 buckets currently in a histogram. When the histogram is re-gathered with SIZE REPEAT, the newly created histogram will use at most 5 buckets and may not been of good quality.

    Foued

    1. Thanks Foued for your valuable feedback and info. I did not know that the newly created histogram will have same number of buckets as the current histogram .

      Your comments and suggestions are always welcome!

      Regards
      Anju

Your comments and suggestions are welcome!