Granularity 'APPROX_GLOBAL AND PARTITION'

The patchset 11.1.0.7 introduces in the package DBMS_STATS a new value for the parameter GRANULARITY.

The description provided by the development team in $ORACLE_HOME/rdbms/admin/dbmsstat.sql is the following:

'APPROX_GLOBAL AND PARTITION' - This option is similar to
   'GLOBAL AND PARTITION'. But the global statistics are aggregated
   from partition level statistics. It will aggregate all statistics except number of
   distinct values for columns and number of distinct keys of indexes.
   The existing histograms of the columns at the table level
   are also aggregated.The global statistics are  gathered
   (i.e., going back to GLOBAL AND PARTITION behaviour)
   if partname argument is null or if the aggregation cannot be done
   e.g., statistics of one of the partitions is missing.

   This option is useful when you collect statistics for a new partition added
   into a range partitioned table (for example, a table  partitioned by month).
   The new data in the partition makes the global statistics stale (especially
   the min/max values of the partitioning column). This stale global statistics
   may cause suboptimal plans.  In this scenario, users can collect statistics
   for the newly added partition with 'APPROX_GLOBAL AND PARTITION'
   option so that the global statistics will reflect the newly added range.
   This option will take less time than 'GLOBAL AND PARTITION' option since the
   global statistics are aggregated from underlying partition level statistics.
   Note that, if you are using APPROX_GLOBAL AND PARTITION,
   you still  need to collect global statistics (with granularity = 'GLOBAL' option)
   when there is substantial amount of change at the table level.
   For example you added 10% more data to the table.  This is needed to get the
   correct number of distinct values/keys statistic at table level.

To illustrate how it works, let’s have a look to an example…

Create a range-partitioned table with a local index, load some data and gather object statistics.

SQL> CREATE TABLE t (
  2    id NUMBER,
  3    n NUMBER,
  4    d DATE,
  5    pad VARCHAR2(4000)
  6  )
  7  PARTITION BY RANGE (d) (
  8    PARTITION t_jan_2008 VALUES LESS THAN (to_date('2008-02-01','yyyy-mm-dd'))
  9  )
 10  NOLOGGING;

SQL> CREATE INDEX i ON t(n) LOCAL;

SQL> INSERT /*+ append */ INTO t
  2  WITH
  3    t1000 AS (SELECT rownum AS dummy
  4              FROM dual
  5              CONNECT BY level <=1000)
  6  SELECT 1000000+rownum,
  7         CASE WHEN rownum <= 500 THEN mod(rownum,11) ELSE nullif(mod(rownum,2),1) END,
  8         to_date('2008-01-01','yyyy-mm-dd')+mod(rownum,31),
  9         rpad('*',100,'*')
 10  FROM t1000, t1000
 11  WHERE rownum <= 500000;

500000 rows created.

SQL> COMMIT;

SQL> BEGIN
  2    dbms_stats.gather_table_stats(
  3      ownname => user,
  4      tabname => 't',
  5      estimate_percent => 100,
  6      granularity => 'global and partition',
  7      method_opt => 'for olumns size 1 id, d, pad, n size 254',
  8      cascade => TRUE
  9    );
 10  END;
 11  /

Add a new partition and load data into it.

SQL> ALTER TABLE t
  2  ADD PARTITION t_feb_2008 VALUES LESS THAN (to_date('2008-03-01','yyyy-mm-dd'));

SQL> INSERT /*+ append */ INTO t
  2  WITH
  3    t1000 AS (SELECT rownum AS dummy
  4              FROM dual
  5              CONNECT BY level <=1000)
  6  SELECT 2000000+rownum,
  7         CASE WHEN rownum <= 500 THEN mod(rownum,12) ELSE nullif(mod(rownum,2),1) END,
  8         to_date('2008-02-01','yyyy-mm-dd')+mod(rownum,29),
  9         rpad('*',200,'*')
 10  FROM t1000, t1000;

1000000 rows created.

SQL> COMMIT;

Gather object statistics (notice that the new value for the parameter GRANULARITY is used).

SQL> BEGIN
  2    dbms_stats.gather_table_stats(
  3      ownname => user,
  4      tabname => 't',
  5      partname => 't_feb_2008',
  6      estimate_percent => 100,
  7      granularity => 'approx_global and partition',
  8      method_opt => 'for columns size 1 id, d, pad, n size 254',
  9      cascade => TRUE
 10    );
 11  END;
 12  /

Compare the current object statistics with the previous ones.

SQL> SELECT *
  2  FROM table(dbms_stats.diff_table_stats_in_history(
  3               ownname => user,
  4               tabname => 't',
  5               time1 => localtimestamp,
  6               time2 => localtimestamp-to_dsinterval('0 00:00:15'),
  7               pctthreshold => 0
  8             ));

STATISTICS DIFFERENCE REPORT FOR:
.................................

TABLE         : T
OWNER         : OPS$CHA
SOURCE A      : Statistics as of 21-SEP-08 06.39.37.597595 PM +02:00
SOURCE B      : Statistics as of 21-SEP-08 06.39.22.597595 PM +02:00
PCTTHRESHOLD  : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TABLE / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................

OBJECTNAME                  TYP SRC ROWS       BLOCKS     ROWLEN     SAMPSIZE
...............................................................................

T                           T   A   1500000    40091      183        500000
                                B   500000     8614       116        500000
T_FEB_2008                  P   A   1000000    31477      216        1000000
                                B   NO_STATS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

COLUMN STATISTICS DIFFERENCE:
.............................

COLUMN_NAME     SRC NDV     DENSITY    HIST NULLS   LEN  MIN   MAX   SAMPSIZ
...............................................................................

D               A   31      .032258064 NO   0       8    786C0 786C0 1500000
                B   31      .032258064 NO   0       8    786C0 786C0 500000
ID              A   500000  .000002    NO   0       6    C4020 C404  1500000
                B   500000  .000002    NO   0       6    C4020 C4023 500000
N               A   11      .000001998 YES  749500  2    80    C10C  750500
                B   11      .000001998 YES  249750  2    80    C10B  250250
PAD             A   1       1          NO   0       168  2A2A2 2A2A2 1500000
                B   1       1          NO   0       101  2A2A2 2A2A2 500000

                              PARTITION: T_FEB_2008
                              .....................

D               A   29      .034482758 NO   0       8    786C0 786C0 1000000
                B   NO_STATS
ID              A   1000000 .000001    NO   0       6    C4030 C404  1000000
                B   NO_STATS
N               A   12      .000000999 YES  499750  2    80    C10C  500250
                B   NO_STATS
PAD             A   1       1          NO   0       201  2A2A2 2A2A2 1000000
                B   NO_STATS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

INDEX / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................

OBJECTNAME      TYP SRC ROWS    LEAFBLK DISTKEY LF/KY DB/KY CLF     LVL SAMPSIZ
...............................................................................


                                    INDEX: I
                                    ........

I               I   A   750500  1221    12      101   3332  39991   2   750500
                    B   250250  407     11      37    778   8565    1   250250
T_FEB_2008      P   A   500250  814     12      67    2618  31426   2   500250
                    B   NO_STATS

A few remarks about the number of distinct keys…

As documented, the number of distinct values at table level has not been updated. I do understand that aggregating them is not possible. However, IMHO, it should be quite easy to do a kind of sanity check and to set the number of distinct values at the table level according to the highest number found at partition level. For example, in the previous case, the new partition contains 1M distinct values for the column ID. Hence, the number of distinct values for that column at the table level should be at least 1M.

Interestingly, the number of distinct values at the index level is correct. The reason is quite simple… they do gather them as usual (I checked that with SQL trace). In other words, the new value for the parameter GRANULARITY is only valid for tables. This is also the case when the procedure GATHER_INDEX_STATS is used. In other words, the new value for the parameter granularity is accepted but ignored!

What really puzzle me is why such a feature has been implemented in first place. In fact, as of Oracle Database 11g, with an incremental gathering is possible to have better statistics than with this feature.

Any suggestion/thought/opinion is welcome…

4 Comments

Greg Rahn

7 October 2008

This was originally part of patch 6526370. 11.1.0.7 was the current open branch and thus it was checked in and then backported for 10g use. It really has little value in 11g which contains the new incremental stats for partitioned tables.

Christian Antognini

8 October 2008

Hi Greg

Thank you for the information. Somehow, in 10g it could make sense to have such a feature. Since the bug is not public in Metalink, I was not able to see what the original problem was and, therefore, I was puzzled.

Cheers,
Chris

Pratik Mehta

19 February 2014

Nice article Christian. BTW, you have a valid question. I was also amused that in our datawarehouse which uses partitioning features then which granularity am I supposed to use when?!

Once more question I have. I have heard several DBA comment on not take get the stats updated for certain tables/queries (i.e. stop gathering stats for selected tables). They have to say that this is for better performance of the query or else its execution plan will change and performance may get affected. Now what amazes me is that stats are to be updated for better decisions of optimizer then why should one be worried on updating it ?

Awaiting for your comment…

Regards,
Pratik Mehta

- Christian Antognini
  
  19 February 2014
  
  Hi
  
  Stopping gathering is an option. But, IMO, it’s a more difficult path to take. Hence, for simplicity, I advise to keep the statistics up to date.
  Note that execution plan can change also if the statistics doesn’t change. Hence, if you don’t know what you are doing (complexity…), not gathering them doesn’t solve any problem…
  
  HTH
  Chris

4 Comments

Leave a Reply Cancel reply