Extension Bypassed Because of Missing Histogram

Today, while tuning a fairly complex query experiencing wrong cardinality estimates, I noticed something I was not aware of. Hence, I thought to write this short post to illustrate how to reproduce the problem I experienced…

  • Create the test table (notice the correlation between the data of the two columns):
SQL> CREATE TABLE t
  2  AS
  3  SELECT mod(rownum,50) AS n1, mod(rownum,50) AS n2
  4  FROM dual
  5  CONNECT BY level <= 1000;
  • Gather the statistics and show whether histograms exist (since I didn’t change any default value of the dbms_stats package, no histograms were gathered):
SQL> execute dbms_stats.gather_table_stats(user,'t')

SQL> SELECT column_name, histogram
  2  FROM user_tab_col_statistics
  3  WHERE table_name = 'T';

COLUMN_NAME HISTOGRAM
----------- ---------------
N1          NONE
N2          NONE
  • Check how many rows are returned by a query that specifies the predicate “n1 = 42 AND n2 = 42” (it should be 20, i.e. 1000/50):
SQL> SELECT count(*)
  2  FROM t
  3  WHERE n1 = 42 and n2 = 42;

  COUNT(*)
----------
        20
  • Check whether the query optimizer is able to correctly estimate the cardinality of an operation applying the “n1 = 42 AND n2 = 42” predicate:
SQL> EXPLAIN PLAN FOR
  2  SELECT *
  3  FROM t
  4  WHERE n1 = 42 AND n2 = 42;

SQL> SELECT cardinality
  2  FROM plan_table
  3  WHERE id = 0
  4  AND plan_id = (SELECT max(plan_id) FROM plan_table);

CARDINALITY
-----------
          1

Unfortunately, the query optimizer estimation is wrong (notice that it estimates 1 instead of 20). This is because the data of the two columns is correlated. It’s exactly to solve this kind of issues that Oracle introduced extended statistics. Hence, let’s create an extension to see whether it solves the problem...

  • Create the extension (column group of the two columns):
SQL> SELECT dbms_stats.create_extended_stats(ownname=>user, tabname=>'t', extension=>'(n1,n2)')
  2  FROM dual;

DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>USER,TABNAME=>'T',EXTENSION=>'(N1,N2)')
-----------------------------------------------------------------------------------------------
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS
  • Gather the object statistics and check whether a histogram for supporting the extension is created:
SQL> execute dbms_stats.gather_table_stats(user,'t')

SQL> SELECT column_name, histogram
  2  FROM user_tab_col_statistics
  3  WHERE table_name = 'T';

COLUMN_NAME                    HISTOGRAM
------------------------------ ---------
N1                             FREQUENCY
N2                             FREQUENCY
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS NONE
  • Now that the extension and the object statistics (except for the histogram for the extension) are in place, check whether the query optimizer does a better estimation:
SQL> EXPLAIN PLAN FOR
  2  SELECT *
  3  FROM t
  4  WHERE n1 = 42 AND n2 = 42;

SQL> SELECT cardinality
  2  FROM plan_table
  3  WHERE id = 0
  4  AND plan_id = (SELECT max(plan_id) FROM plan_table);

CARDINALITY
-----------
          1

As you can see the estimation is not better! Mhmm...

In this case to solve the problem you have to regather the object statistics. This is necessary because a histogram for the extension is needed. Let's try...

  • Gather the statistics and check whether a histogram exists on all columns:
SQL> execute dbms_stats.gather_table_stats(user,'t')

SQL> SELECT column_name, histogram
  2  FROM user_tab_col_statistics
  3  WHERE table_name = 'T';

COLUMN_NAME                    HISTOGRAM
------------------------------ ---------------
N1                             FREQUENCY
N2                             FREQUENCY
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS FREQUENCY
  • Check again whether the query optimizer, thanks to the extension, is able to come up with a better estimation:
SQL> EXPLAIN PLAN FOR
  2  SELECT *
  3  FROM t
  4  WHERE n1 = 42 AND n2 = 42;

SQL> SELECT cardinality
  2  FROM plan_table
  3  WHERE id = 0
  4  AND plan_id = (SELECT max(plan_id) FROM plan_table);

CARDINALITY
-----------
         20

Yes, it does!

In other words, it seems that the query optimizer bypasses if the extension has no histogram AND histograms exist on the columns on which the extension is based.

But what does it happen when no histograms at all exist? Let’s check it...

SQL> execute dbms_stats.gather_table_stats(user,'t',method_opt=>'for all columns size 1')

SQL> EXPLAIN PLAN FOR
  2  SELECT *
  3  FROM t
  4  WHERE n1 = 42 AND n2 = 42;

SQL> SELECT cardinality
  2  FROM plan_table
  3  WHERE id = 0
  4  AND plan_id = (SELECT max(plan_id) FROM plan_table);

CARDINALITY
-----------
         20

In such a case, it works! To me it seems buggy...

In conclusion, be careful of extensions without histograms. They might be bypassed by the query optimizer.

Update 2014-02-22

Oracle implemented a fix to avoid the problem described in this post. Unfortunately, by default it's disabled. To enabled it, you have to set "_fix_control"="6972291:on" either at the system or session level. For additional information refer to MOS, specifically to Bug 6972291 Column group selectivity is not used when there is a histogram on one column.

2 Comments

  1. antony's Gravatar antony
    December 1, 2014    

    Hi Chris,

    Curious to find out whether the mentioned fix works as expected.The cardinality estimate is fine when I had histogram on virtual column which is the result of extended stats on c1,c2. Now, i have recollected the stats without histogram on any columns, including virtual column.(method_opt=>’for all columns size 1′)

    $grep 69722 /u01/app/oracle/diag/rdbms/ora11204/ora11204/trace/ora11204_ora_27483_0004.trc
        fix  6972291 = enabled *
          OPT_PARAM('_fix_control' '6972291:1')
        fix  6972291 = enabled *
    
    COLUMN_NAME                          NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
    -------------------------------- ---------------- ---------- ---------- --------------------
    SYS_STUBZH0IHA7K$KEBJVXO5LOHAS                100          0        .01 NONE
    N4                                             25          0        .04 NONE
    N3                                             50          0        .02 NONE
    N2                                            100          0        .01 NONE
    N1                                            100          0        .01 NONE
    
    SQL> explain plan for select count(*) from t1 where n1=10 and n2=10;
    
    Explained.
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    ---------------------------------------------------------------------------
    Plan hash value: 3724264953
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |     1 |     6 |     9   (0)| 00:00:01 |
    |   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
    |*  2 |   TABLE ACCESS FULL| T1   |   100 |   600 |     9   (0)| 00:00:01 |
    ---------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter("N1"=10 AND "N2"=10)
    
    14 rows selected

    cardinality estimates is correct even when there is no histogram on the extended column.

    • December 2, 2014    

      Hi Antony

      Hence, without histograms, the behavior doesn’t depend on the fix. In both cases (ON, OFF) the estimation is correct.

      Cheers,
      Chris

  1. Today’s activity « Oracle-dba on February 7, 2014 at 20:53
  2. Extended Stats | Oracle Scratchpad on February 23, 2014 at 10:24

Leave a Reply

Your email address will not be published. Required fields are marked *