Today, while tuning a fairly complex query experiencing wrong cardinality estimates, I noticed something I was not aware of. Hence, I thought to write this short post to illustrate how to reproduce the problem I experienced…

  • Create the test table (notice the correlation between the data of the two columns):
SQL> CREATE TABLE t
  2  AS
  3  SELECT mod(rownum,50) AS n1, mod(rownum,50) AS n2
  4  FROM dual
  5  CONNECT BY level <= 1000;
  • Gather the statistics and show whether histograms exist (since I didn’t change any default value of the dbms_stats package, no histograms were gathered):
SQL> execute dbms_stats.gather_table_stats(user,'t')

SQL> SELECT column_name, histogram
  2  FROM user_tab_col_statistics
  3  WHERE table_name = 'T';

COLUMN_NAME HISTOGRAM
----------- ---------------
N1          NONE
N2          NONE
  • Check how many rows are returned by a query that specifies the predicate “n1 = 42 AND n2 = 42” (it should be 20, i.e. 1000/50):
SQL> SELECT count(*)
  2  FROM t
  3  WHERE n1 = 42 and n2 = 42;

  COUNT(*)
----------
        20
  • Check whether the query optimizer is able to correctly estimate the cardinality of an operation applying the “n1 = 42 AND n2 = 42” predicate:
SQL> EXPLAIN PLAN FOR
  2  SELECT *
  3  FROM t
  4  WHERE n1 = 42 AND n2 = 42;

SQL> SELECT cardinality
  2  FROM plan_table
  3  WHERE id = 0
  4  AND plan_id = (SELECT max(plan_id) FROM plan_table);

CARDINALITY
-----------
          1

Unfortunately, the query optimizer estimation is wrong (notice that it estimates 1 instead of 20). This is because the data of the two columns is correlated. It’s exactly to solve this kind of issues that Oracle introduced extended statistics. Hence, let’s create an extension to see whether it solves the problem...

  • Create the extension (column group of the two columns):
SQL> SELECT dbms_stats.create_extended_stats(ownname=>user, tabname=>'t', extension=>'(n1,n2)')
  2  FROM dual;

DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>USER,TABNAME=>'T',EXTENSION=>'(N1,N2)')
-----------------------------------------------------------------------------------------------
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS
  • Gather the object statistics and check whether a histogram for supporting the extension is created:
SQL> execute dbms_stats.gather_table_stats(user,'t')

SQL> SELECT column_name, histogram
  2  FROM user_tab_col_statistics
  3  WHERE table_name = 'T';

COLUMN_NAME                    HISTOGRAM
------------------------------ ---------
N1                             FREQUENCY
N2                             FREQUENCY
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS NONE
  • Now that the extension and the object statistics (except for the histogram for the extension) are in place, check whether the query optimizer does a better estimation:
SQL> EXPLAIN PLAN FOR
  2  SELECT *
  3  FROM t
  4  WHERE n1 = 42 AND n2 = 42;

SQL> SELECT cardinality
  2  FROM plan_table
  3  WHERE id = 0
  4  AND plan_id = (SELECT max(plan_id) FROM plan_table);

CARDINALITY
-----------
          1

As you can see the estimation is not better! Mhmm...

In this case to solve the problem you have to regather the object statistics. This is necessary because a histogram for the extension is needed. Let's try...

  • Gather the statistics and check whether a histogram exists on all columns:
SQL> execute dbms_stats.gather_table_stats(user,'t')

SQL> SELECT column_name, histogram
  2  FROM user_tab_col_statistics
  3  WHERE table_name = 'T';

COLUMN_NAME                    HISTOGRAM
------------------------------ ---------------
N1                             FREQUENCY
N2                             FREQUENCY
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS FREQUENCY
  • Check again whether the query optimizer, thanks to the extension, is able to come up with a better estimation:
SQL> EXPLAIN PLAN FOR
  2  SELECT *
  3  FROM t
  4  WHERE n1 = 42 AND n2 = 42;

SQL> SELECT cardinality
  2  FROM plan_table
  3  WHERE id = 0
  4  AND plan_id = (SELECT max(plan_id) FROM plan_table);

CARDINALITY
-----------
         20

Yes, it does!

In other words, it seems that the query optimizer bypasses if the extension has no histogram AND histograms exist on the columns on which the extension is based.

But what does it happen when no histograms at all exist? Let’s check it...

SQL> execute dbms_stats.gather_table_stats(user,'t',method_opt=>'for all columns size 1')

SQL> EXPLAIN PLAN FOR
  2  SELECT *
  3  FROM t
  4  WHERE n1 = 42 AND n2 = 42;

SQL> SELECT cardinality
  2  FROM plan_table
  3  WHERE id = 0
  4  AND plan_id = (SELECT max(plan_id) FROM plan_table);

CARDINALITY
-----------
         20

In such a case, it works! To me it seems buggy...

In conclusion, be careful of extensions without histograms. They might be bypassed by the query optimizer.

Update 2014-02-22

Oracle implemented a fix to avoid the problem described in this post. Unfortunately, by default it's disabled. To enabled it, you have to set "_fix_control"="6972291:on" either at the system or session level. For additional information refer to MOS, specifically to Bug 6972291 Column group selectivity is not used when there is a histogram on one column.