Aug 09 2010

Exadata Storage Server and the Query Optimizer – Part 4

Tag: 11gR2, Bug, Exadata, Query OptimizerChristian Antognini @ 4:37 pm

When I started writing the series of posts about Exadata Storage Server and the query optimizer, I didn’t expect to write more than three posts (part 1, part 2, part 3). Of course, things change. Hence, here is part 4 to cover a couple of things that I learned in the next couple of months.

In part 2 I pointed out that Oracle Database is not able to offload the processing of all datetime functions. This fact, to my surprise, was also referenced by Netezza in a recent paper entitled Oracle Exadata and Netezza TwinFin Compared. The essential thing to understand is that this limitation is due to bug 9682721. The fix is expected to be part of 11.2.0.2. According to my test cases (that Greg Rahn was so kind to execute against an early release of 11.2.0.2), offloading works correctly for all datetime functions but for the following three predicates.

  • months_between(d,sysdate) = 0
  • months_between(d,current_date) = 0
  • months_between(d,to_date(‘01-01-2010′,’DD-MM-YYYY’)) = 0

Note that the MONTHS_BETWEEN function can basically be offloaded. The problem in these cases is that the offloading does not work when, for example, SYSDATE is used as parameter.

To have a full list of the functions supporting offloading, the “official reference” is available through the V$SQLFN_METADATA view. Here is a simple query to summarize the current situation.

SQL> SELECT offloadable, count(DISTINCT name)
  2  FROM v$sqlfn_metadata
  3  GROUP BY offloadable;

OFF COUNT(DISTINCTNAME)
--- -------------------
NO                  511
YES                 319

Another thing I would like to point out about offloading is that the feature can be controlled through the CELL_OFFLOAD_PROCESSING initialization parameter. By default it is set to TRUE and, therefore, offloading is used whenever possible. It goes without saying that offloading is disabled when it is set to FALSE. Note that it can not only be set at the instance and session level, but also at the statement level. The following example illustrate this (notice that only the first query uses offloading).

SQL> ALTER SESSION SET cell_offload_plan_display = always;

SQL> ALTER SESSION SET cell_offload_processing = true;

SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE id = 101;

SQL> SELECT * FROM table(dbms_xplan.display(format=>'basic +predicate'));

PLAN_TABLE_OUTPUT
---------------------------------------------------
Plan hash value: 3557914527

-------------------------------------------
| Id  | Operation                  | Name |
-------------------------------------------
|   0 | SELECT STATEMENT           |      |
|   1 |  PARTITION RANGE ALL       |      |
|*  2 |   TABLE ACCESS STORAGE FULL| T    |
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - storage("ID"=101)
       filter("ID"=101)

SQL> EXPLAIN PLAN FOR SELECT /*+ opt_param('cell_offload_processing' 'false') */  * FROM t WHERE id = 101;

SQL> SELECT * FROM table(dbms_xplan.display(format=>'basic +predicate'));

PLAN_TABLE_OUTPUT
---------------------------------------------------
Plan hash value: 3557914527

-------------------------------------------
| Id  | Operation                  | Name |
-------------------------------------------
|   0 | SELECT STATEMENT           |      |
|   1 |  PARTITION RANGE ALL       |      |
|*  2 |   TABLE ACCESS STORAGE FULL| T    |
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ID"=101)

Another initialization parameter that controls offloading is CELL_OFFLOAD_DECRYPTION. This parameter is relevant for encrypted tablespaces only. With it you can specify whether the keys necessary to decrypt the data can be shipped to the cells. By default it is set to TRUE and, therefore, the keys are shipped. For security reasons you might want to set it to FALSE and disable offloading for encrypted tablespaces. Note that this parameter can only be changed at the instance level.


May 20 2010

How Good Are the Values Returned by DBMS_COMPRESSION.GET_COMPRESSION_RATIO?

Tag: 11gR2, Bug, ExadataChristian Antognini @ 1:00 am

According to the documentation the GET_COMPRESSION_RATIO procedure of the DBMS_COMPRESSION package can be used to assess the impact of different compression options for a given table. In other words, it allows us to find out the expected compression ratio for a given set of data without having to really create a compressed table. The question is: how good are the values it returns?

Before answering this question it is essential to point out two things. First, this package is available as of 11.2 only. Second, it can be used to find out (and compare) the expected compression ratio of OLTP compression as well as Exadata Hybrid Columnar Compression (EHCC). Note that for getting values about EHCC it is not required to actually have access to an Exadata Storage Server. How is that possible?

To answer this second question, let’s describe how such an analysis is performed…

The following PL/SQL block shows how to start an analysis for all uncompressed tables of the current user.

DECLARE
  l_blkcnt_cmp       BINARY_INTEGER;
  l_blkcnt_uncmp     BINARY_INTEGER;
  l_row_cmp          BINARY_INTEGER;
  l_row_uncmp        BINARY_INTEGER;
  l_cmp_ratio        NUMBER;
  l_comptype_str     VARCHAR2(100);
BEGIN
  FOR i IN (SELECT table_name
            FROM user_tables
            WHERE compression = 'DISABLED'
            ORDER BY table_name)
  LOOP
    FOR j IN 1..5
    LOOP
      dbms_compression.get_compression_ratio(
        -- input parameters
        scratchtbsname   => 'SCRATCH',       -- scratch tablespace
        ownname          => user,            -- owner of the table
        tabname          => i.table_name,    -- table name
        partname         => NULL,            -- partition name
        comptype         => power(2,j),      -- compression algorithm
        -- output parameters
        blkcnt_cmp       => l_blkcnt_cmp,    -- number of compressed blocks
        blkcnt_uncmp     => l_blkcnt_uncmp,  -- number of uncompressed blocks
        row_cmp          => l_row_cmp,       -- number of rows in a compressed block
        row_uncmp        => l_row_uncmp,     -- number of rows in an uncompressed block
        cmp_ratio        => l_cmp_ratio,     -- compression ratio
        comptype_str     => l_comptype_str   -- compression type
      );
      dbms_output.put_line(i.table_name||' - '||'type: '||l_comptype_str||' ratio: '||to_char(l_cmp_ratio,'99.999'));
    END LOOP;
  END LOOP;
END;

As you can see the idea is that we provide to the package a table and, based on its data, the package estimates the compression ratio that can be achieved with the specified algorithm. Notice that as input parameter a scratch tablespace is also specified. This is necessary because the package, to estimate the output parameters, actually creates a compressed table. If you trace an execution you would see two CREATE TABLE statements.

  • The first one creates a scratch table based on a sample of the content of the input table (the sampling percentage is chosen based on the size of the table; simply put the sampling percentage is inversely proportional to the table size):

CREATE TABLE dbms_tabcomp_temp_uncmp
TABLESPACE <tablespace name>
AS
SELECT * FROM <table name> SAMPLE BLOCK( <sampling percentage> )

  • The second one creates another scratch table where the data is compressed according to the specified compression algorithm (in this case COMPRESS FOR QUERY HIGH):

CREATE TABLE dbms_tabcomp_temp_uncmp
ORGANIZATION HEAP
TABLESPACE <tablespace name>
COMPRESS FOR QUERY HIGH
AS
SELECT * FROM dbms_tabcomp_temp_uncmp

In other words the estimations are based on data that is actually compressed. Not on some heuristics…

The essential thing to note is that the second CREATE TABLE statement cannot be directly executed without an Exadata Storage Server. In fact, if you try to run it, the database engine raises an ORA-64307 (hybrid columnar compression is only supported in tablespaces residing on Exadata storage). Based on that observation it is sensible to say that in the current implementation there is a software lock that prevents us from using EHCC without an Exadata Storage Server. But, as just described, that lock can be overridden by the DBMS_COMPRESSION package.

Now, back to the initial question: how good are the values it returns?

To answer this question I installed a TPC-H schema (scale factor 10) and compared the estimations of the package with the actual values. For that purpose I created one table with each type of compression.

  • The compression ratios based on the BLOCKS column of the USER_TABLES view are the following:

SQL> SELECT comp.table_name, round(uncomp.blocks/comp.blocks,3) AS ratio
  2  FROM user_tables comp, user_tables uncomp
  3  WHERE comp.compression = 'ENABLED'
  4  AND uncomp.compression = 'DISABLED'
  5  AND comp.table_name LIKE uncomp.table_name||'/_%' ESCAPE '/'
  6  ORDER BY uncomp.table_name, nullif(comp.compress_for,'OLTP') DESC;

TABLE_NAME                          RATIO
------------------------------ ----------
CUSTOMER_OLTP                       1.041
CUSTOMER_QUERY_LOW                  2.220
CUSTOMER_QUERY_HIGH                 3.708
CUSTOMER_ARCHIVE_LOW                3.837
CUSTOMER_ARCHIVE_HIGH               4.432
LINEITEM_OLTP                       1.487
LINEITEM_QUERY_LOW                  3.034
LINEITEM_QUERY_HIGH                 4.912
LINEITEM_ARCHIVE_LOW                5.144
LINEITEM_ARCHIVE_HIGH               6.704
ORDERS_OLTP                         1.149
ORDERS_QUERY_LOW                    2.887
ORDERS_QUERY_HIGH                   5.038
ORDERS_ARCHIVE_LOW                  5.305
ORDERS_ARCHIVE_HIGH                 6.704
PART_OLTP                           1.328
PART_QUERY_LOW                      3.307
PART_QUERY_HIGH                     6.718
PART_ARCHIVE_LOW                    7.296
PART_ARCHIVE_HIGH                  11.049
PARTSUPP_OLTP                       0.991
PARTSUPP_QUERY_LOW                  2.892
PARTSUPP_QUERY_HIGH                 5.428
PARTSUPP_ARCHIVE_LOW                5.659
PARTSUPP_ARCHIVE_HIGH               8.071

  • The output of the PL/SQL block shown above is the following (be careful that the order is slightly different):

CUSTOMER - type: "Compress For OLTP" ratio:   1.041
CUSTOMER - type: "Compress For Query High" ratio:   3.742
CUSTOMER - type: "Compress For Query Low" ratio:   2.228
CUSTOMER - type: "Compress For Archive High" ratio:   4.460
CUSTOMER - type: "Compress For Archive Low" ratio:   3.894
ORDERS - type: "Compress For OLTP" ratio:   1.149
ORDERS - type: "Compress For Query High" ratio:   5.048
ORDERS - type: "Compress For Query Low" ratio:   2.886
ORDERS - type: "Compress For Archive High" ratio:   6.651
ORDERS - type: "Compress For Archive Low" ratio:   5.325
PART - type: "Compress For OLTP" ratio:   1.328
PART - type: "Compress For Query High" ratio:   6.778
PART - type: "Compress For Query Low" ratio:   3.338
PART - type: "Compress For Archive High" ratio:  11.126
PART - type: "Compress For Archive Low" ratio:   7.343
PARTSUPP - type: "Compress For OLTP" ratio:    .990
PARTSUPP - type: "Compress For Query High" ratio:   5.451
PARTSUPP - type: "Compress For Query Low" ratio:   2.893
PARTSUPP - type: "Compress For Archive High" ratio:   8.051
PARTSUPP - type: "Compress For Archive Low" ratio:   5.657

By comparing the estimations with the actual values it is possible to say that the value returned by the DBMS_COMPRESSION package are very accurate. Having so accurate estimations is a good thing because you probably want to know how much space you can save before doing a major reorganization of large tables as well as before buying an Exadata Storage Server just to test how effective the compression of EHCC is.

You might ask why the second output does not provide information about the LINEITEM table. The problem is that the package was not able to process it. In fact, the following error was raised:

ORA-00942: table or view does not exist
ORA-06512: at "SYS.PRVT_COMPRESSION", line 459
ORA-30562: SAMPLE percentage must be in the range [0.000001,100)
ORA-06512: at "SYS.DBMS_COMPRESSION", line 214
ORA-06512: at line 16

The problem is (probably) due to a rounding performed during the selection of the sampling percentage. In fact, for the LINEITEM table (which is the bigger one), the following CREATE TABLE statement was generated (notice that the sampling percentage is set to 0):

CREATE TABLE dbms_tabcomp_temp_uncmp
TABLESPACE scratch
AS
SELECT *
FROM lineitem SAMPLE BLOCK( 0)

I was not able to find a bug in MOS, but it is definitely one.


May 05 2010

Exadata Storage Server and the Query Optimizer – Part 3

Tag: 11gR1, 11gR2, Exadata, Parallel Processing, Query OptimizerChristian Antognini @ 3:55 pm

In the first and second post of this series I shared with you some basics about smart scan and gave some details about projection and restriction. The aim of this post is to cover the third basic technique: join filtering.

Join filtering is not something specific to the Exadata Storage Server. In fact, it is an Enterprise Edition feature available since Oracle Database 10g Release 2. Simply put, it is used to reduce data communication between slave processes in parallel joins. For more information about it I suggest you to read a paper I published in June 2008 entitled Bloom Filters. In it I describe not only what bloom filters are, but also how Oracle Database uses them. And, one of the use cases is join filtering.

What I want to show here is how Exadata Storage Server is able to take advantage of join filtering. For that purpose let’s have a look to the following execution plan:

-----------------------------------------------------
| Id  | Operation                        | Name     |
-----------------------------------------------------
|   0 | SELECT STATEMENT                 |          |
|   1 |  PX COORDINATOR                  |          |
|   2 |   PX SEND QC (RANDOM)            | :TQ10002 |
|*  3 |    HASH JOIN BUFFERED            |          |
|   4 |     JOIN FILTER CREATE           | :BF0000  |
|   5 |      PX RECEIVE                  |          |
|   6 |       PX SEND HASH               | :TQ10000 |
|   7 |        PX BLOCK ITERATOR         |          |
|*  8 |         TABLE ACCESS STORAGE FULL| T1       |
|   9 |     PX RECEIVE                   |          |
|  10 |      PX SEND HASH                | :TQ10001 |
|  11 |       JOIN FILTER USE            | :BF0000  |
|  12 |        PX BLOCK ITERATOR         |          |
|* 13 |         TABLE ACCESS STORAGE FULL| T2       |
-----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."ID"="T2"."ID")
   8 - storage("T1"."MOD"=42)
       filter("T1"."MOD"=42)
  13 - storage(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."ID"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."ID"))

As you can see, join filtering is used. In fact, the operation 4 (JOIN FILTER CREATE) builds a bloom filter that, later on, is used by operation 11 (JOIN FILTER USE) to filter out the data that does not fulfill the join condition. However, the most important thing to notice in this execution plan is the STORAGE predicate applied by the operation 13. According to it the bloom filter is applied not only by the operation 11, but also by the operation 13. And, since the operation 13 is a smart scan operation, the STORAGE predicate is evaluated by the cells. This means that the reduction of data communication does not only take place between slave processes, but also between the cells and the database instances. Remarkable!


May 03 2010

Exadata Storage Server and the Query Optimizer – Part 2

Tag: 11gR1, 11gR2, Exadata, Query OptimizerChristian Antognini @ 6:48 am

In the first post of this series I shared with you some basics about smart scan and gave some details about projection. The aim of this post is to cover the second basic technique: restriction. Simply put, the aim of this technique is to offload to a cell the processing of predicates found in the WHERE clause. And, as a result, to reduce the amount of data that a cell has to send back to a database instance.

To check whether offloading is used for a given execution plan, you can have a look to the predicate information section of the DBMS_XPLAN output. If offloading takes place, a restriction based on a STORAGE predicate is displayed. Here an example (in this case the restriction “N=1” is offloaded to a cell):

------------------------------------------
| Id  | Operation                 | Name |
------------------------------------------
|   0 | SELECT STATEMENT          |      |
|*  1 |  TABLE ACCESS STORAGE FULL| T    |
------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - storage("N"=1)
       filter("N"=1)

As you can see in the previous example, it is interesting to notice that the STORAGE predicate is used along with a FILTER predicate based on the same restriction. Honestly, I don’t know why this is necessary. And, I guess, it’s not because of false positives. In fact the cell should be able to correctly apply this kind of restrictions.

It is essential to point out that offloading is only applied to predicates that in a non-Exadata system would be processed as FILTER predicates. In fact, when an ACCESS predicate has to be applied (e.g. an index look-up is performed), smart scan cannot be used.

Now that we have seen how a simple predicate is offloaded to a cell, it is interesting to ask ourselves the following questions: Which predicates are supported by a cell? What happens when predicates contain functions or expressions?

To answer this question I tested about a hundred single-row functions to find out whether all of them can be offloaded. Here is a summary of what I observed:

  • Numeric functions: they can all be offloaded with a single exception, the WIDTH_BUCKET function. For example the predicate “width_bucket(n,1,10,100) = 1” is not offloaded.
  • Character functions returning character values: they can all be offloaded.
  • Character functions returning number values: they can all be offloaded.
  • Datetime functions: the offloading of these functions is, in my opinion, not very consistent. On the one hand, only the DATE datatype seems to be supported. In fact, when a TIMESTAMP datatype is involved, offloading almost never happens. And, be careful, this is also true for simple predicates like “t = systimestamp” (note that “t” is a column of TIMESTAMP datatype). On the other hand, while predicates like “d = sysdate” (note that “d” is a column of DATE datatype) and “add_months(d,1) = to_date(‘01-01-2010′,’DD-MM-YYYY’)” can be offloaded, something like “add_months(d,1) = sysdate” cannot. What I observed is that basically every datetime function can be offloaded provided that it is not used along with SYSDATE or CURRENT_DATE.
  • NULL-related functions: they can all be offloaded.

According to my tests, except for datetime functions, offloading can take place most of the time. This is a good thing. However, the limitations related to datetime functions are tough and, therefore, I expect that Oracle addresses this issue very soon.

Addenda 2010-08-09: have a look to this post for further discussion of the problem related to datetime functions.


Apr 28 2010

Exadata Storage Server and the Query Optimizer – Part 1

Tag: 11gR1, 11gR2, Exadata, Query OptimizerChristian Antognini @ 2:38 pm

Even though the utilization of an Exadata Storage Server should be transparent for the query optimizer, when you look at execution plans generated in an environment using it you might notice slight differences. The purpose of this series of post is to summarize the differences I was able to observe.

Disclaimer: I do not have access to an Exadata Storage Server (cell, from now on…). All tests I did, i.e. generation of execution plans, were performed with an 11gR2 database using “regular” storage. All I did is to set the CELL_OFFLOAD_PLAN_DISPLAY initialization parameter to ALWAYS. As a result, this post contains untested assumptions based on what I was able to observe in such an environment. It goes without saying that any correction is highly welcome!

Smart scan is one of the key performance features available only when a cell is used. Its purpose is to offload to the storage layer part of the work that would usually be performed by a database instance. As a result, not only the amount of work performed by a database instance might be (much) lower, but also the amount of data transferred from a cell to a database instance might be strongly reduced. When smart scan is used a cell, instead of sending back to the database engine “regular” database blocks (as a “regular” storage layer would do), sends back packets of data containing only the relevant information (I guess they are similar to the messages exchanged during PX operations).

Three are the basic smart scan techniques: projection, restriction and join filtering. As of Oracle Database 11g Release 2 three additional improvements are available: storage indexing, hybrid columnar compression and encryption. But, honestly, I consider them “simple” enhancements (that might have a major impact, though) of the previous ones. In other words, they add nothing to the basic concepts.

In this first post I cover projection. In the next posts I will cover the other two.

Whenever a query does not reference all columns belonging to the tables referenced in the FROM clause AND smart scan is used, it makes no sense that a cell sends back to a database instance the data belonging to the unreferenced columns. Since it makes no sense, it’s not done. Instead, a cell is able to extract only the data of the referenced columns. In this way, especially when queries reference few columns from wide tables, the amount of data received by a database instance is much smaller. As far as I know this technique is not explicitly externalized in execution plans (I mean, you don’t see which columns are actually sent back to the database instance). But, I guess, if you see one of the new operations added to support Exadata (they all contains the keyword “STORAGE” in them), you take advantage of this technique. Here is an example with a FTS (notice the keyword “STORAGE”):

------------------------------------------
| Id  | Operation                 | Name |
------------------------------------------
|   0 | SELECT STATEMENT          |      |
|   1 |  TABLE ACCESS STORAGE FULL| T    |
------------------------------------------

The full list of new “operations” that I was able to observe is the following:

  • STORAGE FULL
  • STORAGE SAMPLE
  • STORAGE SAMPLE BY ROWID RANGE
  • STORAGE BY ROWID RANGE
  • STORAGE SAMPLE FAST FULL SCAN
  • STORAGE FAST FULL SCAN
  • STORAGE FULL SCAN
  • STORAGE RANGE SCAN
  • STORAGE FULL SCAN (MIN/MAX)

So, it seems that for the most common access paths there is a “smart scan version”.