Monthly Archives: March 2015

Conditions Based On Inequalities Can’t Use Indexes – How To Resolve?

Conditions based on inequalities (!=, <>) cannot make use of index(es). I will illustrate this limitation and show you how to optimize SQL statements hitting it.

For the demonstration, I have  a table  students table having a column named result that  can contain the values – ‘Pass’, ‘Fail’, ‘To be evaluated’. The column is characterized by a very non-uniform distribution having most of the rows  set to value Passed (P). Here’s the example:

SQL>drop table students purge;
    create table students (id , result )
    as
    select rownum, decode (mod(rownum, 30), 0, 'F', 1, 'T',  'P')
    from  all_tables;

    create index students_idx on students (result);
    exec dbms_stats.gather_table_stats (USER, 'STUDENTS', cascade => TRUE);

     SELECT result , count(*)
     FROM students
     GROUP BY result;
RESULT COUNT(*)
---------- ----------
P              100
T                4
F                3

Let’s execute the  query to select all students who have not passed (result = ‘T’ or ‘F’). Even though the query has a very strong selectivity and the result column is indexed, the query optimizer chooses a full table scan for reading 7 rows as the predicate involves inequality.

SQL>select * from students where result <> 'P';
    select * from table(dbms_xplan.display_cursor);

ID RESULT
---------- ----------
1 T
30 F

....

7 rows selected.

PLAN_TABLE_OUTPUT
---------------------------------------------------------
SQL_ID f2wkxqy3b6b5h, child number 0
-------------------------------------
select * from students where result <> 'P'

Plan hash value: 4078133427
---------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
|* 1 | TABLE ACCESS FULL| STUDENTS | 71 | 355 | 3 (0)| 00:00:01 |
---------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RESULT"<>'P')

In a case like this, where the inequality condition has a strong selectivity, we can advantage of an index using following four techniques :

First, the inequality condition can be rewritten into an IN condition. This is an option only when the number of values to be selected is known and the number is limited. For example, if the query is modified as shown, index range scan is employed.

SQL>select * from students where result in ('F', 'T');
select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
---------------------------------------------------------
SQL_ID 672mnj9pggkq7, child number 0
-------------------------------------
select * from students where result in ('F', 'T')

Plan hash value: 2871222462
---------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | INLIST ITERATOR | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| STUDENTS | 71 | 355 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | STUDENTS_IDX | 71 | | 1 (0)| 00:00:01 |
---------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - access(("RESULT"='F' OR "RESULT"='T'))

Second,   manually rewrite the query to make sure that both component queries can take advantage of an index range scan. This technique  can be applied if the values are unknown or the number of values to be specified is too high.   Hence, if  the query is rewritten as shown, it will be able to to take advantage of the or expansion query transformation:

SQL>select * from students where result < 'P'
    union all
    select * from students where result > 'P' ;
    select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
--------------------------------------------------------- 
SQL_ID gqrp063y9c5a5, child number 0
-------------------------------------
select * from students where result < 'P' union all select * from
students where result > 'P'

Plan hash value: 2171568329
--------------------------------------------------------- 
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------- 
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | UNION-ALL | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| STUDENTS | 76 | 380 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | STUDENTS_IDX | 76 | | 1 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| STUDENTS | 36 | 180 | 2 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | STUDENTS_IDX | 36 | | 1 (0)| 00:00:01 |
--------------------------------------------------------- 

Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("RESULT"<'P')
5 - access("RESULT">'P')

– Third technique: We can create function based index using decode / case and restructure our query to use decode / case 

In this technique, we will create a function based index using decode or case which will return two distinct values when the the condition is satisfied / not satisfied. Subsequently, we will restructure or query to use decode / case.

Using decode

SQL>create index students_decode_idx  
    on students (decode (result, 'P', 'Passed', 'Not Passed'));
    select * from students 
    where decode (result, 'P', 'Passed', 'Not Passed') = 'Not Passed';

select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 2ddr5p1t2q7ax, child number 0
-------------------------------------
select * from students where decode (result, 'P', 'Passed', 'Not
Passed') = 'Not Passed'

Plan hash value: 3076565877
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | TABLE ACCESS BY INDEX ROWID| STUDENTS | 1 | 12 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | STUDENTS_DECODE_IDX  | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("STUDENTS"."SYS_NC00004$"='Not Passed')

Using Case

SQL>create index students_case_IDX 
      on students (CASE result
                    WHEN 'P' THEN 'Passed'
                    ELSE 'Not Passed' END);

select * from students
where (CASE result
         WHEN 'P' THEN 'Passed'
          ELSE 'Not Passed' END) = 'Not Passed';

select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------SQL_ID 6pqgppzp47tzb, child number 0
-------------------------------------
select * from students where (CASE result WHEN 'P' THEN
'Passed' ELSE 'Not Passed' END) = 'Not Passed'

Plan hash value: 2622858639
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | TABLE ACCESS BY INDEX ROWID| STUDENTS | 1 | 12 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | STUDENTS_CASE_IDX | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("STUDENTS"."SYS_NC00004$"='Not Passed')

The fourth technique simply forces an index full scan with, for example, the index hint. From a performance point of view, it’s not optimal,as, for a query with very strong selectivity, full index has to be scanned.

SQL>SELECT /*+ index(students) */ * FROM students where result != 'P';
select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
--------------------------------------------------------- 
SQL_ID 2hyrf6n7kb8pr, child number 0
-------------------------------------
SELECT /*+ index(students) */ * FROM students where result != 'P'

Plan hash value: 635752001
---------------------------------------------------------  
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------- 
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | TABLE ACCESS BY INDEX ROWID| STUDENTS | 71 | 355 | 2 (0)| 00:00:01|
|* 2 | INDEX FULL SCAN | STUDENTS_IDX | 71 | | 1 (0)| 00:00:01 |
--------------------------------------------------------- 

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("RESULT"<>'P')

Conclusion:

In cases where the inequality condition having a strong selectivity is notable to make use of an index, we can advantage of an index using following three techniques : 

  • First, the inequality condition can be rewritten into an IN condition. This is an option only when the number of values to be selected is known and the number is limited.
  • Second,   manually rewrite the query to make sure that both component queries can take advantage of an index range scan. This technique  can be applied if the values are unknown or the number of values to be specified is too high.
  • Third ,  create function based index using decode / case and restructure the query to use decode / case 
  • The fourth technique simply forces an index full scan with, for example, the index hint. From a performance point of view, it’s not optimal,as, for a query with very strong selectivity, full index has to be scanned.

References:

—————————————————————————————————————

Related links:

Home
Tuning Index

————————-

 

Not NULL Constraint Influences Access Path

The optimizer can make use of explicitly defined Not NULL constraints to take advantage
of an index in order to avoid a full table scan since a B-tree index stores only not NULL values .
When  count (constant) or count(*)  is queried,  we want to count no. of rows in the table. Hence , if there is a column which is defined as not NULL and has an index on it, the number of index entries  in the index are bound to be same as the number of rows. The query optimizer uses the index to count no. of rows in the table.

Similarly, when  a count (not-nullable-column) is queried,  we want to count the no. of rows having not null values in the column. Since the column  has a not NULL constraint on it, every row in the table will have a not null value in it and count(not-nullable-column) is  same as count(*). As a result, the query optimizer can use  the index on the column to process the query.
In fact, in both the cases above, any B-tree containing at least a not-nullable column can serve the purpose.

When a count (nullable-column) is queried, we want to count the no. of rows having not null values in the column. If we have an index on the column, the index will store only not NULL values and hence can be effectively used by  the query optimizer to give the result.
In fact, the optimizer can use any index containing the nullable column for this purpose.

To demonstrate the above functionality, I have created a  table HR.TEST with two columns – NOTNULL having not NULL constraint
NULLABLE
. having same data as column NOTNULL but has not been declared not NULL
. has a B-tree index on it

SQL>drop table hr.test purge;
    create table hr.test (notnull number not null, nullable number);
    insert into hr.test select rownum, rownum from all_tables;
    create index hr.test_idx on hr.test(nullable);
    exec dbms_stats.gather_table_stats ('HR','TEST', cascade => true);

Now I will query count for various arguments and check if optimizer can use the index on NULLABLE column.

Note that to process count(*),  count(1) and   count(notnull), the query optimizer uses Full Table Scan. Although the column NULLABLE has non-null values in all the rows but since it has not been explicitly declared not null , the  optimizer does not know that no. of entries in index reflect the count correctly and hence does not use the index .

SQL>select count(*) from hr.test;
    select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 1mat065c25crk, child number 0
-------------------------------------
select count(*) from hr.test

Plan hash value: 1950795681
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 3 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| TEST | 108 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------

SQL>select count(1) from hr.test;
    select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID gzpsn7ff3ncmc, child number 0
-------------------------------------
select count(1) from hr.test

Plan hash value: 1950795681
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 3 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| TEST | 108 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------

SQL>select count(notnull) from hr.test;
    select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 6kxdzxbac62b4, child number 0
-------------------------------------
select count(notnull) from hr.test

Plan hash value: 1950795681
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 3 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| TEST | 108 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------

To process count(nullable), the optimizer uses index on column NULLABLE because we want to count not null values in column nullable and Btree index stores only not null values.

SQL> select count(nullable) from hr.test;
     select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID bz8rxw5rmmv8g, child number 0
-------------------------------------
select count(nullable) from hr.test

Plan hash value: 2983066704
-------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | INDEX FAST FULL SCAN| TEST_IDX | 2784 | 11136 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------

Now I will declare not NULL constraint on  column NULLABLE.

SQL> alter table hr.test modify (nullable not null);

Now if query count(*), count(1), count(notnull) and count(nullable), the optimizer is able to avoid Full Table Index by making  use of the index  on NULLABLE column in all the cases . Since the column NULLABLE having index has been declared not null and optimizer knows that entries in the index represent all the rows of the table.

SQL>select count(*) from hr.test;
    select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 1mat065c25crk, child number 0
-------------------------------------
select count(*) from hr.test

Plan hash value: 2983066704
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 3 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| TEST_IDX | 2784 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------
SQL>select count(1) from hr.test;
    select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------- 
SQL_ID gzpsn7ff3ncmc, child number 0
-------------------------------------
select count(1) from hr.test

Plan hash value: 2983066704
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 3 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| TEST_IDX | 2784 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------

SQL>select count(notnull) from hr.test;
    select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------- 
SQL_ID 6kxdzxbac62b4, child number 0
-------------------------------------
select count(notnull) from hr.test

Plan hash value: 2983066704
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 3 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| TEST_IDX | 2784 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------

SQL> select count(nullable) from hr.test;
     select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------- 
SQL_ID bz8rxw5rmmv8g, child number 0
-------------------------------------
select count(nullable) from hr.test

Plan hash value: 2983066704
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 3 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| TEST_IDX | 2784 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------

A naughty trick

Since null values are not stored in the single column indexes,  a single column index cannot be used to search NULL values. To demonstrate this, I will remove non NULL constraint from column NULLABLE .

SQL>alter table hr.test modify (nullable null);

Now when I  query the rows having NULL values in column NULLABLE, we can see that Full table scan is used as the index on column NULLABLE does not contain NULL values .

SQL>select * from hr.test where nullable is null;
    select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 4w1kz5f6uh43x, child number 0
-------------------------------------
select * from hr.test where nullable is null

Plan hash value: 1357081020
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
|* 1 | TABLE ACCESS FULL| TEST | 1 | 8 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 1 - filter("NULLABLE" IS NULL)

Now I will use the fact that  even NULL values are  stored in a composite index when at least one of the columns isn’t nullable. I am creating a multi column index on column NULLABLE and a dummy second column containing a not null value,  so that the index will store even NULL values in column NULLABLE. To keep the size of the index  small, I will assign  the value zero to the dummy column.

SQL>drop index hr.test_idx;
    create index hr.test_idx on hr.test(nullable, 0);

Now when I  query the rows having NULL values in column NULLABLE, we can see that  the multi column index is used as it stores NULL values  in column   NULLABLE.

SQL>select * from hr.test where nullable is null;
    select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 4w1kz5f6uh43x, child number 0
-------------------------------------
select * from hr.test where nullable is null

Plan hash value: 1389866015
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 8 | 1 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TEST_IDX | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 2 - access("NULLABLE" IS NULL)

Conclusion:
To enable the optimizer to choose index access in relevant cases,
–  declare NOT NULL constraint on relevant columns or
– create a multi column index on  nullable column  and a dummy second column

References:
Troubleshooting Oracle Performance (second edition ) by Christian Antognini
—————————————————————————————————————

Related links:

Home
Tuning Index

————————-