Striving for Optimal Performance
  • Blog
    • Archive
    • Categories
    • Search
  • Troubleshooting Oracle Performance
    • Description
    • Structure
    • Table of Contents
    • Forewords
    • Reviews
    • Downloadable Files
    • Addenda and Errata
    • First Edition
  • Public Appearances
    • Past Public Appearances
  • Contact
  • About

Operation CONNECT BY WITH FILTERING

17 June 2008 13 Comments Written by Christian Antognini

It happened to me several times to being asked about the mysterious full table scan in CONNECT BY operations. In this post I would like to share with you some of the information I wrote about it in my book (pages 233 to 236) .

The operation CONNECT BY WITH FILTERING is used to process hierarchical queries. It is characterized by two child operations. The first one is used to get the root of the hierarchy, and the second one is executed once for each level in the hierarchy.

Here is a sample query and its execution plan. Note that the execution plan was generated on Oracle Database 11g (the reason will be explained later).

SELECT level, rpad('-',level-1,'-')||ename AS ename, prior ename AS manager
FROM emp
START WITH mgr IS NULL
CONNECT BY PRIOR empno = mgr

---------------------------------------------------------------------
| Id  | Operation                     | Name      | Starts | A-Rows |
---------------------------------------------------------------------
|*  1 |  CONNECT BY WITH FILTERING    |           |      1 |     14 |
|*  2 |   TABLE ACCESS FULL           | EMP       |      1 |      1 |
|   3 |   NESTED LOOPS                |           |      4 |     13 |
|   4 |    CONNECT BY PUMP            |           |      4 |     14 |
|   5 |    TABLE ACCESS BY INDEX ROWID| EMP       |     14 |     13 |
|*  6 |     INDEX RANGE SCAN          | EMP_MGR_I |     14 |     13 |
---------------------------------------------------------------------

   1 - access("MGR"=PRIOR "EMPNO")
   2 - filter("MGR" IS NULL)
   6 - access("MGR"=PRIOR "EMPNO")

To help you understand the execution plan with a hierarchical query more easily, it is useful to look at the data returned by the query as well:

     LEVEL ENAME      MANAGER
---------- ---------- ----------
         1 KING
         2 -JONES     KING
         3 --SCOTT    JONES
         4 ---ADAMS   SCOTT
         3 --FORD     JONES
         4 ---SMITH   FORD
         2 -BLAKE     KING
         3 --ALLEN    BLAKE
         3 --WARD     BLAKE
         3 --MARTIN   BLAKE
         3 --TURNER   BLAKE
         3 --JAMES    BLAKE
         2 -CLARK     KING
         3 --MILLER   CLARK

The execution plan carries out the operations as follows:

  1. Operation 1 has two children (2 and 3), and operation 2 is the first of them in ascending order. Therefore, the execution starts with operation 2.
  2. Operation 2 scans the table EMP, applies the filter predicate “MGR” IS NULL, and returns the root of the hierarchy to its parent operation (1).
  3. Operation 3 is the second child of operation 1. It is therefore executed for each level of the hierarchy—in this case, four times. The first child, operation 4, is executed, and for each row it returns, the inner loop (composed of operation 5 and its child operation 6) is executed once. Notice, as expected, the match between the column A-Rows of operation 4 with the column Starts of operations 5 and 6.
  4. For the first execution, operation 4 gets the root of the hierarchy through the operation CONNECT BY PUMP. In this case, there is a single row (KING) at level 1. With the value in the column mgr, operation 6 does a scan of the index EMP_MGR_I by applying the access predicate “MGR”=PRIOR “EMPNO”, extracts the rowids, and returns them to its parent operation (5). Operation 5 accesses the table EMP with the rowids and returns the rows to its parent operation (3).
  5. For the second execution of operation 4, everything works the same as for the first execution. The only difference is that the data from level 2 (JONES, BLAKE, and CLARK) is passed to operation 4 for the processing.
  6. For the third execution of operation 4, everything works like in the first one. The only difference is that level 3 data (SCOTT, FORD, ALLEN, WARD, MARTIN, TURNER, JAMES, and MILLER) is passed to operation 4 for the processing.
  7. For the fourth and last execution of operation 4, everything works like in the first one. The only difference is that level 4 data (ADAMS and SMITH) is passed to operation 4 for the processing.
  8. Operation 3 gets the rows passed from its children and returns them to its parent operation (1).
  9. Operation 1 applies the access predicate “MGR”=PRIOR “EMPNO” and sends the 14 rows to the caller.

The execution plan generated on Oracle Database 10g is slightly different. As can be seen, the operation CONNECT BY WITH FILTERING has a third child (operation 8). In this case, it was not executed, however. The value in the column Starts for operation 8 confirms this. Actually, the third child is executed only when the CONNECT BY operation uses temporary space. When that happens, performance might degrade considerably. This problem, which is fixed as of version 10.2.0.4, is known as bug 5065418.

---------------------------------------------------------------------
| Id  | Operation                     | Name      | Starts | A-Rows |
---------------------------------------------------------------------
|*  1 |  CONNECT BY WITH FILTERING    |           |      1 |     14 |
|*  2 |   TABLE ACCESS FULL           | EMP       |      1 |      1 |
|   3 |   NESTED LOOPS                |           |      4 |     13 |
|   4 |    BUFFER SORT                |           |      4 |     14 |
|   5 |     CONNECT BY PUMP           |           |      4 |     14 |
|   6 |    TABLE ACCESS BY INDEX ROWID| EMP       |     14 |     13 |
|*  7 |     INDEX RANGE SCAN          | EMP_MGR_I |     14 |     13 |
|   8 |   TABLE ACCESS FULL           | EMP       |      0 |      0 |
---------------------------------------------------------------------
10gR1, 10gR2, 11gR1, Bug, Query Optimizer, TOP
Never Say Never
TOP Is At Last Available

13 Comments

3 Pings/Trackbacks

  1. Alex Shea Alex Shea
    2 October 2008    

    Christian, thanks for this helpful page. I was at my wit’s end trying to speed up an 86 line
    connect-by query that takes 3 hour to finish on a 10.2.0.1. THANKS for mentioning that it is a bug.

    Reply
  2. Rudy Rudy
    22 June 2009    

    Thank you for reporting this. I very likely hit this bug and verified it has been fixed in patch set 3.
    Bye :-)

    Reply
  3. Connect By « Oracle Scratchpad Connect By « Oracle Scratchpad
    30 June 2009    

    […] an interesting post on this topic on Christian Antognini’s blog (which I’ve also referenced from the OTN posting) […]

  4. Alen Alen
    30 June 2009    

    I’ve been always wondering what is that third child operation which always shows “table access full” and never gets executed. Finally I got the answer!

    Reply
  5. Todor Botev Todor Botev
    2 July 2009    

    Thank you for this insightfull and very well written post!

    […]9. Operation 1 applies the access predicate “MGR”=PRIOR “EMPNO” and sends the 14 rows to the caller.[…]

    Why do we need the access predicate in operation 1 ? I think the whole algorithm ensures that only “valid” rows get reported to operation 1. Why is it needed to apply the predicate once more before sending them to the caller?

    Reply
  6. Christian Antognini Christian Antognini
    17 July 2009    

    Hi Todor

    Sorry, but I don’t know why this double check is needed. My best guess is the following: since it is an access predicate (not a filter predicate), this probably means that data is placed in a memory structure (kind-of hash table) and, to access it, the key is MGR.

    Cheers,
    Chris

    Reply
  7. Resolving chains of events with CONNECT_BY « Todor Botev's Blog Resolving chains of events with CONNECT_BY « Todor Botev's Blog
    23 August 2009    

    […] of data. So please test for performance – with a data volume comparable to the real one. Here is a very good detailed explanation by Christian Antognini of how CONNECT_BY is executed internally […]

  8. Leon Leon
    25 July 2011    

    Hi Chris,
    Thank you for your good article.But I have one question about ‘For the second execution of operation 4, everything works the same as for the first execution. The only difference is that the data from level 2 …is passed to operation 4 for the processing. ‘

    My question is which operation passes data from level 2 to operation 4?
    Can I understand like this,the first execution has generated the level 2 data from level 1 and CONNECT BY PUMP gets these data and then launches the subseque loop?

    Best regards,
    Leon

    Reply
    • Christian Antognini Christian Antognini
      25 July 2011    

      Hi Leon

      > My question is which operation passes data from level 2 to operation 4?

      Operation 1, CONNECT BY WITH FILTERING

      > Can I understand like this,the first execution has generated the level 2 data from
      > level 1 and CONNECT BY PUMP gets these data and then launches the subseque loop?

      There are 4 executions:
      – The first one extracts the data for level 1 and, therefore, provides the data to access level 2.
      – The second execution extract the data for level 2 and provides the data to access level 3
      – …

      In general: execution “n” extracts data for level “n” and provides the data to access level “n+1”.

      HTH
      Chris

      Reply
  9. Leon Leon
    26 July 2011    

    Hi Chris,
    Thank you for your reply.

    But since ‘CONNECT BY WITH FILTERING’ just starts once, how can this operation identify all the levels of data?

    And it seems that your answer aslo implies that operation for extracting data will start 4 times. That’s why I asked whether it’s operation 4 passes n level data to level n+1 instead of operation 1.

    Thank you very much.
    Leon

    Reply
  10. Christian Antognini Christian Antognini
    27 July 2011    

    Hi Leon

    > But since ‘CONNECT BY WITH FILTERING’ just starts once, how can this operation identify all the levels of data?

    The execution is not atomic. During the execution this operation does several things… like getting the data of level “n” and starting the search of level “n+1”.
    You can imagine that there is a kind of pipe between operation 1 (CONNECT BY WITH FILTERING) and operation 4 (CONNECT BY PUMP). So, every time operation 1 gets the data for level “n” it sends into it the ID to find the data for level “n+1”.

    > And it seems that your answer aslo implies that operation for extracting data will start 4 times.

    Also the Starts column of the dbms_xplan output confirms that the extraction of data is executed 4 times.

    HTH
    Chris

    Reply
  11. Leon Leon
    28 July 2011    

    Hi Chris,

    Thank you very much. I am now more clear.

    Best regards,
    Leon

    Reply
  12. WITH-Clause Reloaded: Hierarchie und Rekursion | Oraculix WITH-Clause Reloaded: Hierarchie und Rekursion | Oraculix
    26 February 2014    

    […] Die rekursive Variante benötigt zwei Full Scans auf “emp”, die alte Variante nur einen (das war nicht immer so) […]

  1. Connect By « Oracle Scratchpad on 30 June 2009 at 10:22
  2. Resolving chains of events with CONNECT_BY « Todor Botev's Blog on 23 August 2009 at 08:26
  3. WITH-Clause Reloaded: Hierarchie und Rekursion | Oraculix on 26 February 2014 at 15:30

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.