The short answer to this question is “yes”, it does. Unfortunately, the distribution costs are not externalized through the execution plans and, as a result, this limitation (yes, it is really a limitation in the current implementation, not a bug) confuses everyone that carefully look at the information provided in an execution plan of a SQL statement executed in parallel. Hence, let’s remove some confusion…

To illustrate what the problem is, let’s have a look to a simple query that joins two tables:

SELECT * FROM master m JOIN detail d ON (m.id = d.id)

Now, let’s have a look at two parallel executions. If the two tables are equipartitioned, the following execution plan (which takes advantage of partition-wise join) is probably the most effective for such a query. Note that thanks to the partition-wise join not only there is a single set of parallel slaves (Q1,00), but, in addition, the parallel slaves do not communicate with each other (they only communicate with the query coordinator). As a result, the communication costs are equal to zero (this is because the query optimizer does not compute the costs of the communication towards the query coordinator).

----------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Bytes | Cost (%CPU)|    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |    16G| 162524  (1)|        |      |            |
|   1 |  PX COORDINATOR         |          |       |            |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000 |    16G| 162524  (1)|  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION HASH ALL|          |    16G| 162524  (1)|  Q1,00 | PCWC |            |
|   4 |     HASH JOIN           |          |    16G| 162524  (1)|  Q1,00 | PCWP |            |
|   5 |      TABLE ACCESS FULL  | MASTER   |   125M|   1422  (1)|  Q1,00 | PCWP |            |
|   6 |      TABLE ACCESS FULL  | DETAIL   |    15G| 161052  (1)|  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------

If the two tables are not equipartitioned, the following execution plan might be chosen by the query optimizer. Since it does not take advantage of a partition-wise join, several set of parallel slaves are used. The first one (Q1,00) scans the MASTER table, the second one (Q1,01) scans the DETAIL table, and both of them send the data to the third one (Q1,02) that performs the join of the two tables and sends the data to the query coordinator. Since all data (about 15GB; yes, the estimations are good) is sent through the PX channels, the cost should not be zero. However, as you can see, the cost is exactly the same as the one of the previous execution plan.

----------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Bytes | Cost (%CPU)|    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |    16G| 162524  (1)|        |      |            |
|   1 |  PX COORDINATOR         |          |       |            |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10002 |    16G| 162524  (1)|  Q1,02 | P->S | QC (RAND)  |
|   3 |    HASH JOIN BUFFERED   |          |    16G| 162524  (1)|  Q1,02 | PCWP |            |
|   4 |     PX RECEIVE          |          |   125M|   1422  (1)|  Q1,02 | PCWP |            |
|   5 |      PX SEND HASH       | :TQ10000 |   125M|   1422  (1)|  Q1,00 | P->P | HASH       |
|   6 |       PX BLOCK ITERATOR |          |   125M|   1422  (1)|  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| MASTER   |   125M|   1422  (1)|  Q1,00 | PCWP |            |
|   8 |     PX RECEIVE          |          |    15G| 161052  (1)|  Q1,02 | PCWP |            |
|   9 |      PX SEND HASH       | :TQ10001 |    15G| 161052  (1)|  Q1,01 | P->P | HASH       |
|  10 |       PX BLOCK ITERATOR |          |    15G| 161052  (1)|  Q1,01 | PCWC |            |
|  11 |        TABLE ACCESS FULL| DETAIL   |    15G| 161052  (1)|  Q1,01 | PCWP |            |
----------------------------------------------------------------------------------------------

For completeness, let’s compare the cost of several distribution methods (“none-none” is the one of the first execution plan above, “hash-hash” of the second one). As you can see the cost is always the same!

SQL> EXPLAIN PLAN SET STATEMENT_ID 'none-none' FOR SELECT /*+ pq_distribute(d none none) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> EXPLAIN PLAN SET STATEMENT_ID 'hash-hash' FOR SELECT /*+ pq_distribute(d hash hash) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> EXPLAIN PLAN SET STATEMENT_ID 'broadcast-none' FOR SELECT /*+ pq_distribute(d broadcast none) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> EXPLAIN PLAN SET STATEMENT_ID 'none-broadcast' FOR SELECT /*+ pq_distribute(d none broadcast) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> EXPLAIN PLAN SET STATEMENT_ID 'partition-none' FOR SELECT /*+ pq_distribute(d partition none) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> EXPLAIN PLAN SET STATEMENT_ID 'none-partition' FOR SELECT /*+ pq_distribute(d none partition) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> SELECT statement_id, cost FROM plan_table WHERE id = 0;

STATEMENT_ID                         COST
------------------------------ ----------
none-none                          162524
hash-hash                          162524
broadcast-none                     162524
none-broadcast                     162524
partition-none                     162524
none-partition                     162524

As I wrote before, the problem is not that the costs are not computed. The problem is that they are not externalized. In fact, by giving a look to a trace file generated through the event 10053 the costs are available. Here’s the relevant part (the lines starting with “---- cost” contain the most important information). As you can see there are two costs associated with every distribution method: one with the distribution costs (w/ dist) and one without them (w/o dist).

Enumerating distribution method for join between M[MASTER] and D[DETAIL]
-- Using join method #Hash Join:
---- cost NONE = 0.00
  Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 513  #groups: 1
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162524.05 (w/ dist)
---- cost VALUE = 278.52
---- cost with slave mapping  =   Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 2  #groups: 1
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162802.57 (w/ dist)
---- cost PARTITION-RIGHT = 271.40
  Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 576  #groups: 1
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162795.46 (w/ dist)
---- cost PARTITION-LEFT = 7.12
  Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 544  #groups: 1
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162531.17 (w/ dist)
---- cost BROADCAST-RIGHT = 920.78
---- cost with slave mapping  =   Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 8  #groups: 4
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162755.25 (w/ dist)
---- cost BROADCAST-LEFT = 7.22
---- cost with slave mapping  =   Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 16  #groups: 4
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162526.86 (w/ dist)

Since the cost are (correctly) computed, the query optimizer is able to choose the optimal plan. However, it would be nice to have the actual costs in the execution plans.