Striving for Optimal Performance
  • Blog
    • Archive
    • Categories
    • Search
  • Troubleshooting Oracle Performance
    • Description
    • Structure
    • Table of Contents
    • Forewords
    • Reviews
    • Downloadable Files
    • Addenda and Errata
    • First Edition
  • Public Appearances
    • Past Public Appearances
  • Contact
  • About

Does the Query Optimizer Cost PX Distribution Methods?

26 January 2010 13 Comments Written by Christian Antognini

The short answer to this question is “yes”, it does. Unfortunately, the distribution costs are not externalized through the execution plans and, as a result, this limitation (yes, it is really a limitation in the current implementation, not a bug) confuses everyone that carefully look at the information provided in an execution plan of a SQL statement executed in parallel. Hence, let’s remove some confusion…

To illustrate what the problem is, let’s have a look to a simple query that joins two tables:

SELECT * FROM master m JOIN detail d ON (m.id = d.id)

Now, let’s have a look at two parallel executions. If the two tables are equipartitioned, the following execution plan (which takes advantage of partition-wise join) is probably the most effective for such a query. Note that thanks to the partition-wise join not only there is a single set of parallel slaves (Q1,00), but, in addition, the parallel slaves do not communicate with each other (they only communicate with the query coordinator). As a result, the communication costs are equal to zero (this is because the query optimizer does not compute the costs of the communication towards the query coordinator).

----------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Bytes | Cost (%CPU)|    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |    16G| 162524  (1)|        |      |            |
|   1 |  PX COORDINATOR         |          |       |            |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000 |    16G| 162524  (1)|  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION HASH ALL|          |    16G| 162524  (1)|  Q1,00 | PCWC |            |
|   4 |     HASH JOIN           |          |    16G| 162524  (1)|  Q1,00 | PCWP |            |
|   5 |      TABLE ACCESS FULL  | MASTER   |   125M|   1422  (1)|  Q1,00 | PCWP |            |
|   6 |      TABLE ACCESS FULL  | DETAIL   |    15G| 161052  (1)|  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------

If the two tables are not equipartitioned, the following execution plan might be chosen by the query optimizer. Since it does not take advantage of a partition-wise join, several set of parallel slaves are used. The first one (Q1,00) scans the MASTER table, the second one (Q1,01) scans the DETAIL table, and both of them send the data to the third one (Q1,02) that performs the join of the two tables and sends the data to the query coordinator. Since all data (about 15GB; yes, the estimations are good) is sent through the PX channels, the cost should not be zero. However, as you can see, the cost is exactly the same as the one of the previous execution plan.

----------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Bytes | Cost (%CPU)|    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |    16G| 162524  (1)|        |      |            |
|   1 |  PX COORDINATOR         |          |       |            |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10002 |    16G| 162524  (1)|  Q1,02 | P->S | QC (RAND)  |
|   3 |    HASH JOIN BUFFERED   |          |    16G| 162524  (1)|  Q1,02 | PCWP |            |
|   4 |     PX RECEIVE          |          |   125M|   1422  (1)|  Q1,02 | PCWP |            |
|   5 |      PX SEND HASH       | :TQ10000 |   125M|   1422  (1)|  Q1,00 | P->P | HASH       |
|   6 |       PX BLOCK ITERATOR |          |   125M|   1422  (1)|  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| MASTER   |   125M|   1422  (1)|  Q1,00 | PCWP |            |
|   8 |     PX RECEIVE          |          |    15G| 161052  (1)|  Q1,02 | PCWP |            |
|   9 |      PX SEND HASH       | :TQ10001 |    15G| 161052  (1)|  Q1,01 | P->P | HASH       |
|  10 |       PX BLOCK ITERATOR |          |    15G| 161052  (1)|  Q1,01 | PCWC |            |
|  11 |        TABLE ACCESS FULL| DETAIL   |    15G| 161052  (1)|  Q1,01 | PCWP |            |
----------------------------------------------------------------------------------------------

For completeness, let’s compare the cost of several distribution methods (“none-none” is the one of the first execution plan above, “hash-hash” of the second one). As you can see the cost is always the same!

SQL> EXPLAIN PLAN SET STATEMENT_ID 'none-none' FOR SELECT /*+ pq_distribute(d none none) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> EXPLAIN PLAN SET STATEMENT_ID 'hash-hash' FOR SELECT /*+ pq_distribute(d hash hash) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> EXPLAIN PLAN SET STATEMENT_ID 'broadcast-none' FOR SELECT /*+ pq_distribute(d broadcast none) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> EXPLAIN PLAN SET STATEMENT_ID 'none-broadcast' FOR SELECT /*+ pq_distribute(d none broadcast) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> EXPLAIN PLAN SET STATEMENT_ID 'partition-none' FOR SELECT /*+ pq_distribute(d partition none) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> EXPLAIN PLAN SET STATEMENT_ID 'none-partition' FOR SELECT /*+ pq_distribute(d none partition) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> SELECT statement_id, cost FROM plan_table WHERE id = 0;

STATEMENT_ID                         COST
------------------------------ ----------
none-none                          162524
hash-hash                          162524
broadcast-none                     162524
none-broadcast                     162524
partition-none                     162524
none-partition                     162524

As I wrote before, the problem is not that the costs are not computed. The problem is that they are not externalized. In fact, by giving a look to a trace file generated through the event 10053 the costs are available. Here’s the relevant part (the lines starting with “---- cost” contain the most important information). As you can see there are two costs associated with every distribution method: one with the distribution costs (w/ dist) and one without them (w/o dist).

Enumerating distribution method for join between M[MASTER] and D[DETAIL]
-- Using join method #Hash Join:
---- cost NONE = 0.00
  Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 513  #groups: 1
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162524.05 (w/ dist)
---- cost VALUE = 278.52
---- cost with slave mapping  =   Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 2  #groups: 1
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162802.57 (w/ dist)
---- cost PARTITION-RIGHT = 271.40
  Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 576  #groups: 1
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162795.46 (w/ dist)
---- cost PARTITION-LEFT = 7.12
  Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 544  #groups: 1
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162531.17 (w/ dist)
---- cost BROADCAST-RIGHT = 920.78
---- cost with slave mapping  =   Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 8  #groups: 4
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162755.25 (w/ dist)
---- cost BROADCAST-LEFT = 7.22
---- cost with slave mapping  =   Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 16  #groups: 4
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162526.86 (w/ dist)

Since the cost are (correctly) computed, the query optimizer is able to choose the optimal plan. However, it would be nice to have the actual costs in the execution plans.

10gR1, 10gR2, 11gR1, 11gR2, 9iR2, Parallel Processing, Query Optimizer
Edition-Based Redefinition
The Art of Application Performance Testing

13 Comments

3 Pings/Trackbacks

  1. Tony Hasler Tony Hasler
    1 February 2010    

    Christian, thanks for this. I did a test joining two equi-partitioned tables on the partition column and the highly efficient “none-none” distribution method was not selected without a hint. In fact the CBO used “hash-hash”. Unfortunately, I seem to have lost the test case and can no longer reproduce it to get at the 10053 trace. Is there any way that “hash-hash” could be deemed “better” than “none-none” for such a query?

    Reply
    • Christian Antognini Christian Antognini
      1 February 2010    

      Hi Tony

      IIRC, when I saw similar (?) cases, the problem was “due” to partition pruning. In other words, mixing FPWJ and partition pruning is not always successful.
      For a join I see no case where a “none-none” might be worse than a “hash-hash”.

      Cheers,
      Chris

      Reply
  2. Greg Rahn Greg Rahn
    1 February 2010    

    @Tony

    In all likelihood your plan was not a FPWJ even though you joined on the partition keys. There would be no row redistribution (none-none) for a FPWJ plan. The DOP influences whether or not you get a FPWJ or whether there is redistribution.

    So the answer to your question is yes; if the DOP does not match the number of partitions needed to be joined for a FPWJ plan then you will see a row redistribution (hash-hash) plan.

    Reply
  3. Tony Hasler Tony Hasler
    3 February 2010    

    Christian, your comment reminded me of my test case. The two tables that I joined were both range partitioned by date and subpartitioned by hash on the join column. The query I ran selected a specific date in the WHERE clause. The hint was required to get the “none none” distribution mechanism despite that being demonstrably faster.
    Can you explain this or do I need to look at the 10053 trace?

    Thanks again.

    Reply
  4. Christian Antognini Christian Antognini
    3 February 2010    

    Hi Tony

    Sorry, I have no real explanation for that. It just matches something that I already observed several times… i.e. partition pruning does not always match well with PWJ. To understand what’s going on, you should have a look to the 10053 trace.

    Cheers,
    Chris

    Reply
  5. Hans-Peter Hans-Peter
    4 February 2010    

    Hi Christian,

    Can you point my to some documentation about the px-distribution methods?
    I searched for it but most performance tuning books do not go into detail about it.

    regards Hans-Peter

    Reply
  6. Christian Antognini Christian Antognini
    4 February 2010    

    Hi Hans-Peter

    Sorry, but I’m also not aware of a detailed description of the distribution methods. In my book, at page 493, I added some remarks about them.
    http://books.google.com/books?id=b3DIkYO2gBQC&lpg=PP1&dq=troubleshooting%20oracle%20performance&pg=PA493#v=onepage&q=&f=false

    HTH
    Chris

    Reply
  7. Tony Hasler Tony Hasler
    4 February 2010    

    The only stuff I have been able to find is in the documentation for the PQ_DISTRIBUTE hint but it is still quite useful. I would buy Christian’s book anyway :-)

    Reply
  8. Blogroll Report 22/01/2009 – 29/01/2010 « Coskan’s Approach to Oracle Blogroll Report 22/01/2009 – 29/01/2010 « Coskan’s Approach to Oracle
    21 February 2010    

    […] 19-How does CBO behave with different PX Distribution Methods ? Christian Antognini-Does the Query Optimizer Cost PX Distribution Methods? […]

  9. Michael Michael
    3 March 2010    

    I’ve seen problems with combined partitions and PWJ as in Tony Hasler’s test. In my case it was caused by bug 6476257.
    Another requirement was that default DOP of a table should match the number of subpartitions.

    Reply
    • Christian Antognini Christian Antognini
      3 March 2010    

      Hi Michael

      Thank for the interesting information. Unfortunately the base bug, 6714199, is not public. Anyway, this is an addition evidence that there are problems in this area.

      Cheers,
      Chris

      Reply
  10. Parallel query distribution methods « Tony’s Oracle Tips Parallel query distribution methods « Tony’s Oracle Tips
    10 March 2010    

    […] asked Christian about this and his full reply came in this blog. Christian was able to explain why the estimated costs in the execution plan were misleading and […]

  11. Divide to conquer: Parallel load distribution and balance | ocpdba oracle weblog Divide to conquer: Parallel load distribution and balance | ocpdba oracle weblog
    12 July 2011    

    […] The Cost of PX Distribution Method by Christian Antognini […]

  1. Blogroll Report 22/01/2009 – 29/01/2010 « Coskan’s Approach to Oracle on 21 February 2010 at 04:26
  2. Parallel query distribution methods « Tony’s Oracle Tips on 10 March 2010 at 17:47
  3. Divide to conquer: Parallel load distribution and balance | ocpdba oracle weblog on 12 July 2011 at 17:28

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.