Feb 17 2009

Virtual Column-Based Partitioning Might Lead to Wrong Results

Tag: 11gR1, Bug, PartitioningChristian Antognini @ 7:53 am

As of Oracle Database 11g it is possible to use a virtual column as partition key. In this post I do not want to discuss how it works and whether this is good or not… Instead, I would like to show you that the feature might lead to wrong results.

First of all, I would like to show you a test where everything works fine. For that purpose, let’s create a table (notice the virtual column n2), insert one row into it, and gather the object statistics:

SQL> CREATE TABLE t (
  2    n1 NUMBER,
  3    n2 AS (CASE n1 WHEN 1 THEN 1 WHEN 2 THEN 2 ELSE 0 END) VIRTUAL
  4  )
  5  PARTITION BY LIST (n2) (
  6    PARTITION zero VALUES (0),
  7    PARTITION one VALUES (1),
  8    PARTITION two VALUES (2)
  9  )
 10  ENABLE ROW MOVEMENT;

SQL> INSERT INTO t (n1) VALUES (1);

SQL> COMMIT;

SQL> execute dbms_stats.gather_table_stats(user,'t')

The aim of the following test is to check whether row movement works correctly. Hence, I update the column n1 to cause such a movement. To check whether row movement is performed or not, I display the content of the two involved partitions before and after the update statement. In addition, I also display the rowids (because of the movement the row should get a new rowid).

SQL> SELECT rowid, n1, n2 FROM t PARTITION (zero);

no rows selected

SQL> SELECT rowid, n1, n2 FROM t PARTITION (one);

ROWID                      N1         N2
------------------ ---------- ----------
AAAE89AAEAAAAGNAAA          1          1

SQL> UPDATE t SET n1 = 3;

SQL> COMMIT;

SQL> SELECT rowid, n1, n2 FROM t PARTITION (zero);

ROWID                      N1         N2
------------------ ---------- ----------
AAAE88AAEAAAAF9AAA          3          0

SQL> SELECT rowid, n1, n2 FROM t PARTITION (one);

no rows selected

The previous test was successful. Now, let me show you a situation that leads to wrong results :-(

To reproduce the bug I basically execute the same operations as before. The only difference is that seven columns are added before the columns n1 and n2 in the table. Hence, the test table is recreated with the following statements:

SQL> DROP TABLE t PURGE;

SQL> CREATE TABLE t (
  2    d1 NUMBER,
  3    d2 NUMBER,
  4    d3 NUMBER,
  5    d4 NUMBER,
  6    d5 NUMBER,
  7    d6 NUMBER,
  8    d7 NUMBER,
  9    n1 NUMBER,
 10    n2 AS (CASE n1 WHEN 1 THEN 1 WHEN 2 THEN 2 ELSE 0 END) VIRTUAL
 11  )
 12  PARTITION BY LIST (n2) (
 13    PARTITION zero VALUES (0),
 14    PARTITION one VALUES (1),
 15    PARTITION two VALUES (2)
 16  )
 17  ENABLE ROW MOVEMENT;

SQL> INSERT INTO t (n1) VALUES (1);

SQL> COMMIT;

SQL> execute dbms_stats.gather_table_stats(user,'t')

As before, I update the row to cause the movement and display the content of the two involved partitions before and after doing it.

SQL> SELECT rowid, n1, n2 FROM t PARTITION (zero);

no rows selected

SQL> SELECT rowid, n1, n2 FROM t PARTITION (one);

ROWID                      N1         N2
------------------ ---------- ----------
AAAE9BAAEAAAAGNAAA          1          1

SQL> UPDATE t SET n1 = 3;

SQL> COMMIT;

SQL> SELECT rowid, n1, n2 FROM t PARTITION (zero);

no rows selected

SQL> SELECT rowid, n1, n2 FROM t PARTITION (one);

ROWID                      N1         N2
------------------ ---------- ----------
AAAE9BAAEAAAAGNAAA          3          0

As you can see, the two queries after the update statement return wrong results. Also the rowid is the same. Hence, row movement was not performed. It goes without saying that also other queries might return wrong results. An example is the following:

SQL> SELECT rowid, n1, n2 FROM t WHERE n2 = 1;

ROWID                      N1         N2
------------------ ---------- ----------
AAAE9BAAEAAAAGNAAA          3          0

By playing around with the number of columns and position of the columns n1 and n2, I found out that depending on the situation you might have correct results or wrong results.

Since I was able to reproduce the problem with several databases (both 11.1.0.6 and 11.1.0.7), last Friday I opened a service request. Now the issue is tracked as bug# 8258501.


Oct 03 2008

Granularity ‘APPROX_GLOBAL AND PARTITION’

Tag: 11gR1, Object Statistics, PartitioningChristian Antognini @ 10:27 am

The patchset 11.1.0.7 introduces in the package DBMS_STATS a new value for the parameter GRANULARITY.

The description provided by the development team in $ORACLE_HOME/rdbms/admin/dbmsstat.sql is the following:

'APPROX_GLOBAL AND PARTITION' - This option is similar to
   'GLOBAL AND PARTITION'. But the global statistics are aggregated
   from partition level statistics. It will aggregate all statistics except number of
   distinct values for columns and number of distinct keys of indexes.
   The existing histograms of the columns at the table level
   are also aggregated.The global statistics are  gathered
   (i.e., going back to GLOBAL AND PARTITION behaviour)
   if partname argument is null or if the aggregation cannot be done
   e.g., statistics of one of the partitions is missing.

To illustrate how it works, let’s have a look to an example…

  • Create a range-partitioned table with a local index, load some data and gather object statistics.

SQL> CREATE TABLE t (
  2    id NUMBER,
  3    n NUMBER,
  4    d DATE,
  5    pad VARCHAR2(4000)
  6  )
  7  PARTITION BY RANGE (d) (
  8    PARTITION t_jan_2008 VALUES LESS THAN (to_date('2008-02-01','yyyy-mm-dd'))
  9  )
 10  NOLOGGING;

SQL> CREATE INDEX i ON t(n) LOCAL;

SQL> INSERT /*+ append */ INTO t
  2  WITH
  3    t1000 AS (SELECT rownum AS dummy
  4              FROM dual
  5              CONNECT BY level <=1000)
  6  SELECT 1000000+rownum,
  7         CASE WHEN rownum <= 500 THEN mod(rownum,11) ELSE nullif(mod(rownum,2),1) END,
  8         to_date('2008-01-01','yyyy-mm-dd')+mod(rownum,31),
  9         rpad('*',100,'*')
 10  FROM t1000, t1000
 11  WHERE rownum <= 500000;

500000 rows created.

SQL> COMMIT;

SQL> BEGIN
  2    dbms_stats.gather_table_stats(
  3      ownname => user,
  4      tabname => 't',
  5      estimate_percent => 100,
  6      granularity => 'global and partition',
  7      method_opt => 'for olumns size 1 id, d, pad, n size 254',
  8      cascade => TRUE
  9    );
 10  END;
 11  /

  • Add a new partition and load data into it.

SQL> ALTER TABLE t
  2  ADD PARTITION t_feb_2008 VALUES LESS THAN (to_date('2008-03-01','yyyy-mm-dd'));

SQL> INSERT /*+ append */ INTO t
  2  WITH
  3    t1000 AS (SELECT rownum AS dummy
  4              FROM dual
  5              CONNECT BY level <=1000)
  6  SELECT 2000000+rownum,
  7         CASE WHEN rownum <= 500 THEN mod(rownum,12) ELSE nullif(mod(rownum,2),1) END,
  8         to_date('2008-02-01','yyyy-mm-dd')+mod(rownum,29),
  9         rpad('*',200,'*')
 10  FROM t1000, t1000;

1000000 rows created.

SQL> COMMIT;

  • Gather object statistics (notice that the new value for the parameter GRANULARITY is used).

SQL> BEGIN
  2    dbms_stats.gather_table_stats(
  3      ownname => user,
  4      tabname => 't',
  5      partname => 't_feb_2008',
  6      estimate_percent => 100,
  7      granularity => 'approx_global and partition',
  8      method_opt => 'for columns size 1 id, d, pad, n size 254',
  9      cascade => TRUE
 10    );
 11  END;
 12  /

  • Compare the current object statistics with the previous ones.

SQL> SELECT *
  2  FROM table(dbms_stats.diff_table_stats_in_history(
  3               ownname => user,
  4               tabname => 't',
  5               time1 => localtimestamp,
  6               time2 => localtimestamp-to_dsinterval('0 00:00:15'),
  7               pctthreshold => 0
  8             ));

STATISTICS DIFFERENCE REPORT FOR:
.................................

TABLE         : T
OWNER         : OPS$CHA
SOURCE A      : Statistics as of 21-SEP-08 06.39.37.597595 PM +02:00
SOURCE B      : Statistics as of 21-SEP-08 06.39.22.597595 PM +02:00
PCTTHRESHOLD  : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TABLE / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................

OBJECTNAME                  TYP SRC ROWS       BLOCKS     ROWLEN     SAMPSIZE
...............................................................................

T                           T   A   1500000    40091      183        500000
                                B   500000     8614       116        500000
T_FEB_2008                  P   A   1000000    31477      216        1000000
                                B   NO_STATS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

COLUMN STATISTICS DIFFERENCE:
.............................

COLUMN_NAME     SRC NDV     DENSITY    HIST NULLS   LEN  MIN   MAX   SAMPSIZ
...............................................................................

D               A   31      .032258064 NO   0       8    786C0 786C0 1500000
                B   31      .032258064 NO   0       8    786C0 786C0 500000
ID              A   500000  .000002    NO   0       6    C4020 C404  1500000
                B   500000  .000002    NO   0       6    C4020 C4023 500000
N               A   11      .000001998 YES  749500  2    80    C10C  750500
                B   11      .000001998 YES  249750  2    80    C10B  250250
PAD             A   1       1          NO   0       168  2A2A2 2A2A2 1500000
                B   1       1          NO   0       101  2A2A2 2A2A2 500000

                              PARTITION: T_FEB_2008
                              .....................

D               A   29      .034482758 NO   0       8    786C0 786C0 1000000
                B   NO_STATS
ID              A   1000000 .000001    NO   0       6    C4030 C404  1000000
                B   NO_STATS
N               A   12      .000000999 YES  499750  2    80    C10C  500250
                B   NO_STATS
PAD             A   1       1          NO   0       201  2A2A2 2A2A2 1000000
                B   NO_STATS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

INDEX / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................

OBJECTNAME      TYP SRC ROWS    LEAFBLK DISTKEY LF/KY DB/KY CLF     LVL SAMPSIZ
...............................................................................

                                    INDEX: I
                                    ........

I               I   A   750500  1221    12      101   3332  39991   2   750500
                    B   250250  407     11      37    778   8565    1   250250
T_FEB_2008      P   A   500250  814     12      67    2618  31426   2   500250
                    B   NO_STATS

A few remarks about the number of distinct keys…

As documented, the number of distinct values at table level has not been updated. I do understand that aggregating them is not possible. However, IMHO, it should be quite easy to do a kind of sanity check and to set the number of distinct values at the table level according to the highest number found at partition level. For example, in the previous case, the new partition contains 1M distinct values for the column ID. Hence, the number of distinct values for that column at the table level should be at least 1M.

Interestingly, the number of distinct values at the index level is correct. The reason is quite simple… they do gather them as usual (I checked that with SQL trace). In other words, the new value for the parameter GRANULARITY is only valid for tables. This is also the case when the procedure GATHER_INDEX_STATS is used. In other words, the new value for the parameter granularity is accepted but ignored!

What really puzzle me is why such a feature has been implemented in first place. In fact, as of Oracle Database 11g, with an incremental gathering is possible to have better statistics than with this feature.

Any suggestion/thought/opinion is welcome…


Sep 26 2008

Bloom Filters

Tag: 10gR2, 11gR1, Parallel Processing, Partitioning, Query OptimizerChristian Antognini @ 6:24 pm

Oracle Database, as of 10g Release 2, uses bloom filters in various situations. Unfortunately, no information about their usage is available in Oracle documentation. For this reason, I decided to write a paper to explain not only what bloom filters are, but also, and foremost, to describe how Oracle Database makes use of them. Specifically, to explain how the database engine uses bloom filters to reduce data communication between slave processes in parallel joins and to implement join-filter pruning.

Originally, I wrote this paper for the IOUG Select Journal. Even if I wrote it last June, I wanted to receive the printed copies before putting it online. Today a packet with five copies of the Q4/2008 issue and a polo shirt arrived… Hence, it is now available online as well.

If you don’t have access to Select Journal, you can download it from this page.


« Previous Page