Recenty the following question was posted on oracle-l (I paraphrase…):
With Oracle Database it is possible to create something similar to Teradata’s sparse indexes?
Since the question is an interesting one, I decided to write this short post.
First of all, I have to say that such a feature is not supported by the CREATE INDEX statement with Oracle Database. What a pity! I would really like to see this feature in Oracle Database 12x (I bet that “x” will be “f”; post you opinion as a comment if you want). Several database engines do so. Teradata was mentioned by the OP, two additional examples are PostgreSQL and SQL Server.
How to do something similar with Oracle Database?
As of Oracle Database 11g Release 2 it is possible to partially index a table by taking advantage of Zero-Size Unusable Indexes. The following SQL statements show an example where only the data of August 2010 is indexed.
SQL> CREATE TABLE t AS 2 SELECT rownum AS id, sysdate-mod(rownum,100) AS tim, rpad('*',50,'*') AS pad 3 FROM dual 4 CONNECT BY level <= 1000; SQL> CREATE INDEX i ON t (tim) 2 GLOBAL PARTITION BY RANGE (tim) ( 3 PARTITION i_201001 VALUES LESS THAN (to_date('2010-02-01','YYYY-MM-DD')), 4 PARTITION i_201002 VALUES LESS THAN (to_date('2010-03-01','YYYY-MM-DD')), 5 PARTITION i_201003 VALUES LESS THAN (to_date('2010-04-01','YYYY-MM-DD')), 6 PARTITION i_201004 VALUES LESS THAN (to_date('2010-05-01','YYYY-MM-DD')), 7 PARTITION i_201005 VALUES LESS THAN (to_date('2010-06-01','YYYY-MM-DD')), 8 PARTITION i_201006 VALUES LESS THAN (to_date('2010-07-01','YYYY-MM-DD')), 9 PARTITION i_201007 VALUES LESS THAN (to_date('2010-08-01','YYYY-MM-DD')), 10 PARTITION i_201008 VALUES LESS THAN (to_date('2010-09-01','YYYY-MM-DD')), 11 PARTITION i_201009 VALUES LESS THAN (to_date('2010-10-01','YYYY-MM-DD')), 12 PARTITION i_201010 VALUES LESS THAN (to_date('2010-11-01','YYYY-MM-DD')), 13 PARTITION i_201011 VALUES LESS THAN (to_date('2010-12-01','YYYY-MM-DD')), 14 PARTITION i_201012 VALUES LESS THAN (to_date('2011-01-01','YYYY-MM-DD')), 15 PARTITION i_maxvalue VALUES LESS THAN (MAXVALUE) 16 ) 17 UNUSABLE; SQL> ALTER INDEX i REBUILD PARTITION i_201008;
It goes without saying that you are not forced to have so many partitions in place. In fact, to index the data of August, the following CREATE INDEX is more appropriate.
SQL> CREATE INDEX i ON t (tim) 2 GLOBAL PARTITION BY RANGE (tim) ( 3 PARTITION i_201007 VALUES LESS THAN (to_date('2010-08-01','YYYY-MM-DD')), 4 PARTITION i_201008 VALUES LESS THAN (to_date('2010-09-01','YYYY-MM-DD')), 5 PARTITION i_maxvalue VALUES LESS THAN (MAXVALUE) 6 ) 7 UNUSABLE;
Then, to index the data of September, you have to execute some SQL statements like the following ones.
- Create a partition for the data of September and rebuild it:
SQL> ALTER INDEX i SPLIT PARTITION i_maxvalue AT (to_date('2010-10-01','YYYY-MM-DD')) INTO ( 2 PARTITION i_201009, 3 PARTITION i_maxvalue 4 ); SQL> ALTER INDEX i REBUILD PARTITION i_201009;
- Drop the oldest partition:
SQL> ALTER INDEX i DROP PARTITION i_201007;
- Make unusable the partition for the data of August:
SQL> ALTER INDEX i MODIFY PARTITION i_201008 UNUSABLE;
In this way you have at most three partitions available. And, for most of the time, only one of them is usable and, therefore, occupying space.
Even though in the example I provide in this post I use a global index, you can use the same technique with local indexes as well. That said, I see no problem in using a global index as the one shown in this post.
I tried this method with 10.2.0.4 test database, worked perfectly. Tried a simple query to count all rows from a date within the “active” partition, and partition was used.
But I have one question, if I understand right, the creation of the index with a UNUSABLE status do not populate the index, so that’s why you say that the other partitions do not occupy space ?
Really useful example.
Sorry I got the answer, it is a particularity of 11gR2. I check segment usage, space is allocated in 10gR2.. so i guess this is less interesting.
Depending on your requirement, couldn’t you also use a function based index to partially index a table?
That’s cool, but the optimizer doesn’t recognize the usable index partition if you query using bind-variables. If you use a literal which points to the usable index-partition, then it’s used.
I have tried Tony’s suggestion of function-based indexes before with success. Using a decode statement to index only the values you want.
Ciao a tutti
@Jean-Christophe: Before 11gR2 not only the segments occupy space, but also the query optimizer does not generate execution plans taking advantage of UNION ALL (see the post about Zero-Size Unusable Indexes that I referenced).
@Tony/Jimmy: Yes, a FBI can sometimes be used to solve such a problem. But, honestly, I don’t like it. My main concern is that the application should usually be modified to take advantage of such a possibility.
@Stefan: Of course when binds are used the pruning has to be postponed to the execution phase. In the current release it seems (meaning: I was not able to reproduce it) that the query optimizer is not able to use the execution plan containing the UNION ALL in such situations. But, technically, it is possible. They just have to implement this possibility.
Cheers,
Chris
The other point against function-based indexes in this context is illustrated very well by Christian’s example — that is becomes tricky and expensive to use for maintaining an index only on “recent” data. The “rolling usable index partitions” method is not trivial but at least it doesn’t require completely recreating an index to redefine the range of indexed values.
Being able to use interval-based range partitions for globally partitioned indexes would be nice of course.
I tried this on 10.2.0.4. It all works fine. However, as soon as I had rebuilt one of the partitions, I couldn’t gather statistics on the index, nor on just the usable partition. I got
.
Rebuilding the partition updated its statistics, so I got the expected execution plans that used only the rebuilt partition where available.
I can still perform DML on the table, but I cannot updates the statistics, which could be a problem as data changes over time.
Hi David
Mhmm… It should be possible to gather partition-level statistics for the usuable ones… Here an example:
Cheers,
Chris