In the last post I discussed a test case generating lot of child cursors. Today I wanted to show you, for the very same test case, that in 11.2 the parse time might increases linearly with the number of child cursors per parent cursor. This is the expected behavior. In fact, to check whether an already available child cursor can be reused, the list of child cursors must be scanned. And, in case no one of the already available child cursors is compatible, every entry needs to be probed.
Note that to generate the previous chart I simply run, in 11.2.0.2, the PL/SQL block of the previous post while SQL trace was enabled. Then I extracted from the trace file the elapsed time for every parse operation and loaded the values in Excel.
In 11.2.0.3, because of the artificial limitation of the number of child cursors per parent cursor, the parse time is almost constant. Notice, however, that in this case the number of child cursors is spread across a lot of parent cursors, not a single one as in 11.2.0.2.
Looks like there is an index missing on the column(s) that define if a child cursor can be reused (optimizer environment). But on the other hand, the high number of child cursors that would justify the index is the actual problem.
The graph is linear up to the certain number of children, It would be interesting to see what happens with higher child counts. From my personal experience I know that the parse time goes up in a non-linear way once there are a few hundred of children.
I’ve noticed that in (very) busy systems bind aware cursors sometimes create children at a higher rate than Oracle is able to clean-up obsolete children of the same parent cursor (fi. when children are merged for reasons of adjacent selectivity ranges typically the oldest child becomes obsolete, but removing it in the library cache appears to happen with a certain delay).
The performance implications become noticeable for the applications around 500 children (depending on the processing resources of your system), On a medium-large platform it really starts affecting the overall performance of the database once you have a few 1000 of children, and eventually results in a complete hang or even crashes around 4000-5000 children. A full RAC database might survive because the Oracle RDBMS intervenes once a cursor has 8000 children (the value of the undocumented parameter _cursor_obsolete_threshold), but even on a full RAC cursors with a few 1000 children cause major performance problems.
I Guess that once the parse time becomes comparable with with the average time between requests for new children, there appears to be no way back unless you detect the upcoming disaster in time and purge the problematic cursor from the cache.
Notes:
1. I’m a consultant DBA specialized in performance tuning and troubleshooting. From the +/- 500 databases I’ve working with the last 10 years, I’ve seen the problem on 4 databases so far, so it certainly is not a frequent problem, but also not a rare one.
2. You might consider setting _cursor_obsolete_threshold value to a (much) lower value than the default. (Mike Dietrich recommends setting 1024 for all databases unless you’re creating 100s or 1000s of PDBs in the same container database). Personally I prefer the value 500, for almost all cases (most on premise databases have less than 10 PDBs).
3. Often bind selective cursors become bind aware in a unpredictable way, so in many (but not all) cases the NO_BIND_AWARE hint is a valid work-around (at least if you have access to the application code).