Conditions based on inequalities (!=, <>) cannot make use of index(es). I will illustrate this limitation and show you how to optimize SQL statements hitting it.
For the demonstration, I have a table students table having a column named result that can contain the values – ‘Pass’, ‘Fail’, ‘To be evaluated’. The column is characterized by a very non-uniform distribution having most of the rows set to value Passed (P). Here’s the example:
SQL>drop table students purge; create table students (id , result ) as select rownum, decode (mod(rownum, 30), 0, 'F', 1, 'T', 'P') from all_tables; create index students_idx on students (result); exec dbms_stats.gather_table_stats (USER, 'STUDENTS', cascade => TRUE); SELECT result , count(*) FROM students GROUP BY result; RESULT COUNT(*) ---------- ---------- P 100 T 4 F 3
Let’s execute the query to select all students who have not passed (result = ‘T’ or ‘F’). Even though the query has a very strong selectivity and the result column is indexed, the query optimizer chooses a full table scan for reading 7 rows as the predicate involves inequality.
SQL>select * from students where result <> 'P'; select * from table(dbms_xplan.display_cursor); ID RESULT ---------- ---------- 1 T 30 F .... 7 rows selected. PLAN_TABLE_OUTPUT --------------------------------------------------------- SQL_ID f2wkxqy3b6b5h, child number 0 ------------------------------------- select * from students where result <> 'P' Plan hash value: 4078133427 --------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 3 (100)| | |* 1 | TABLE ACCESS FULL| STUDENTS | 71 | 355 | 3 (0)| 00:00:01 | --------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RESULT"<>'P')
In a case like this, where the inequality condition has a strong selectivity, we can advantage of an index using following four techniques :
First, the inequality condition can be rewritten into an IN condition. This is an option only when the number of values to be selected is known and the number is limited. For example, if the query is modified as shown, index range scan is employed.
SQL>select * from students where result in ('F', 'T'); select * from table(dbms_xplan.display_cursor); PLAN_TABLE_OUTPUT --------------------------------------------------------- SQL_ID 672mnj9pggkq7, child number 0 ------------------------------------- select * from students where result in ('F', 'T') Plan hash value: 2871222462 --------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 2 (100)| | | 1 | INLIST ITERATOR | | | | | | | 2 | TABLE ACCESS BY INDEX ROWID| STUDENTS | 71 | 355 | 2 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | STUDENTS_IDX | 71 | | 1 (0)| 00:00:01 | --------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(("RESULT"='F' OR "RESULT"='T'))
Second, manually rewrite the query to make sure that both component queries can take advantage of an index range scan. This technique can be applied if the values are unknown or the number of values to be specified is too high. Hence, if the query is rewritten as shown, it will be able to to take advantage of the or expansion query transformation:
SQL>select * from students where result < 'P' union all select * from students where result > 'P' ; select * from table(dbms_xplan.display_cursor); PLAN_TABLE_OUTPUT --------------------------------------------------------- SQL_ID gqrp063y9c5a5, child number 0 ------------------------------------- select * from students where result < 'P' union all select * from students where result > 'P' Plan hash value: 2171568329 --------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 4 (100)| | | 1 | UNION-ALL | | | | | | | 2 | TABLE ACCESS BY INDEX ROWID| STUDENTS | 76 | 380 | 2 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | STUDENTS_IDX | 76 | | 1 (0)| 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID| STUDENTS | 36 | 180 | 2 (0)| 00:00:01 | |* 5 | INDEX RANGE SCAN | STUDENTS_IDX | 36 | | 1 (0)| 00:00:01 | --------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("RESULT"<'P') 5 - access("RESULT">'P')
– Third technique: We can create function based index using decode / case and restructure our query to use decode / case
In this technique, we will create a function based index using decode or case which will return two distinct values when the the condition is satisfied / not satisfied. Subsequently, we will restructure or query to use decode / case.
Using decode
SQL>create index students_decode_idx
on students (decode (result, 'P', 'Passed', 'Not Passed'));
select * from students
where decode (result, 'P', 'Passed', 'Not Passed') = 'Not Passed';
select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 2ddr5p1t2q7ax, child number 0
-------------------------------------
select * from students where decode (result, 'P', 'Passed', 'Not
Passed') = 'Not Passed'
Plan hash value: 3076565877
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | TABLE ACCESS BY INDEX ROWID| STUDENTS | 1 | 12 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | STUDENTS_DECODE_IDX | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("STUDENTS"."SYS_NC00004$"='Not Passed')
Using Case
SQL>create index students_case_IDX
on students (CASE result
WHEN 'P' THEN 'Passed'
ELSE 'Not Passed' END);
select * from students
where (CASE result
WHEN 'P' THEN 'Passed'
ELSE 'Not Passed' END) = 'Not Passed';
select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------SQL_ID 6pqgppzp47tzb, child number 0
-------------------------------------
select * from students where (CASE result WHEN 'P' THEN
'Passed' ELSE 'Not Passed' END) = 'Not Passed'
Plan hash value: 2622858639
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | TABLE ACCESS BY INDEX ROWID| STUDENTS | 1 | 12 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | STUDENTS_CASE_IDX | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("STUDENTS"."SYS_NC00004$"='Not Passed')
The fourth technique simply forces an index full scan with, for example, the index hint. From a performance point of view, it’s not optimal,as, for a query with very strong selectivity, full index has to be scanned.
SQL>SELECT /*+ index(students) */ * FROM students where result != 'P'; select * from table(dbms_xplan.display_cursor); PLAN_TABLE_OUTPUT --------------------------------------------------------- SQL_ID 2hyrf6n7kb8pr, child number 0 ------------------------------------- SELECT /*+ index(students) */ * FROM students where result != 'P' Plan hash value: 635752001 --------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 2 (100)| | | 1 | TABLE ACCESS BY INDEX ROWID| STUDENTS | 71 | 355 | 2 (0)| 00:00:01| |* 2 | INDEX FULL SCAN | STUDENTS_IDX | 71 | | 1 (0)| 00:00:01 | --------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("RESULT"<>'P')
Conclusion:
In cases where the inequality condition having a strong selectivity is notable to make use of an index, we can advantage of an index using following three techniques :
- First, the inequality condition can be rewritten into an IN condition. This is an option only when the number of values to be selected is known and the number is limited.
- Second, manually rewrite the query to make sure that both component queries can take advantage of an index range scan. This technique can be applied if the values are unknown or the number of values to be specified is too high.
- Third , create function based index using decode / case and restructure the query to use decode / case
- The fourth technique simply forces an index full scan with, for example, the index hint. From a performance point of view, it’s not optimal,as, for a query with very strong selectivity, full index has to be scanned.
References:
—————————————————————————————————————
Related links:
————————-