Exadata Storage Server and the Query Optimizer – Part 1

28 April 2010 12 Comments

Written by Christian Antognini

Even though the utilization of an Exadata Storage Server should be transparent for the query optimizer, when you look at execution plans generated in an environment using it you might notice slight differences. The purpose of this series of post is to summarize the differences I was able to observe.

Disclaimer: I do not have access to an Exadata Storage Server (cell, from now on…). All tests I did, i.e. generation of execution plans, were performed with an 11gR2 database using “regular” storage. All I did is to set the CELL_OFFLOAD_PLAN_DISPLAY initialization parameter to ALWAYS. As a result, this post contains untested assumptions based on what I was able to observe in such an environment. It goes without saying that any correction is highly welcome!

Smart scan is one of the key performance features available only when a cell is used. Its purpose is to offload to the storage layer part of the work that would usually be performed by a database instance. As a result, not only the amount of work performed by a database instance might be (much) lower, but also the amount of data transferred from a cell to a database instance might be strongly reduced. When smart scan is used a cell, instead of sending back to the database engine “regular” database blocks (as a “regular” storage layer would do), sends back packets of data containing only the relevant information (I guess they are similar to the messages exchanged during PX operations).

Three are the basic smart scan techniques: projection, restriction and join filtering. As of Oracle Database 11g Release 2 three additional improvements are available: storage indexing, hybrid columnar compression and encryption. But, honestly, I consider them “simple” enhancements (that might have a major impact, though) of the previous ones. In other words, they add nothing to the basic concepts.

In this first post I cover projection. In the next posts I will cover the other two.

Whenever a query does not reference all columns belonging to the tables referenced in the FROM clause AND smart scan is used, it makes no sense that a cell sends back to a database instance the data belonging to the unreferenced columns. Since it makes no sense, it’s not done. Instead, a cell is able to extract only the data of the referenced columns. In this way, especially when queries reference few columns from wide tables, the amount of data received by a database instance is much smaller. As far as I know this technique is not explicitly externalized in execution plans (I mean, you don’t see which columns are actually sent back to the database instance). But, I guess, if you see one of the new operations added to support Exadata (they all contains the keyword “STORAGE” in them), you take advantage of this technique. Here is an example with a FTS (notice the keyword “STORAGE”):

------------------------------------------
| Id  | Operation                 | Name |
------------------------------------------
|   0 | SELECT STATEMENT          |      |
|   1 |  TABLE ACCESS STORAGE FULL| T    |
------------------------------------------

The full list of new “operations” that I was able to observe is the following:

STORAGE FULL
STORAGE SAMPLE
STORAGE SAMPLE BY ROWID RANGE
STORAGE BY ROWID RANGE
STORAGE SAMPLE FAST FULL SCAN
STORAGE FAST FULL SCAN
STORAGE FULL SCAN
STORAGE RANGE SCAN
STORAGE FULL SCAN (MIN/MAX)

So, it seems that for the most common access paths there is a “smart scan version”.

11gR1, 11gR2, Exadata, Query Optimizer

12 Comments

Dani Rey

29 April 2010

Ciao Chris

Thanks for deconstructing the Performance Optimization Techniques of Exadata. One thing I had troubles with when trying to understand Exadata Smart Scans was, how do the results end up in the SGA? Because, if you return only part of the columns and or rows, you have to take the database blocks apart. The answer which Kevin Closson gave me was very simple, “Results from a Smart Scan are not “real” database blocks thus the results do not get cached in the SGA.” I hope this additional will help others to understand Exadata.

I’m looking forward to further post of this series.

Cheers
Dani

Reply
Ignacio Ruiz

30 April 2010

Chris,
I wonder why you say EHCC (enhanced hybrid columnar compression) is a simple “enhancement”. Achieve very high rates of compression with data de-duplication (I’m oversimplifying) algorithms, and on top of that, very efficiently deserve a much higher rank… cannot share figures or details, but I’ve seen EHCC sales promise 100% fulfilled on real life

Reply
- Christian Antognini
  
  30 April 2010
  
  Hi Ignacio
  
  First of all, I pointed out that even though it is a simple enhancement, its impact might be important. Now, to reply to your “question”… I consider it a simple enhancement because from an architectural point-of-view EHCC doesn’t change much in the environment. Just the physical arrangement of rows in blocks is different. As soon as the posts about the query optimizer are online, I’ll publish some internals on how they do that…
  
  Cheers,
  Chris
  
  Reply
Flado

17 May 2010

> you don’t see which columns are actually sent back to the database instance

Does the ‘projection’ format option not help? As in
select * from table(dbms_xplan.display(null,null,’projection’));

Flado

Reply
- Christian Antognini
  
  17 May 2010
  
  Hi Flado
  
  I was not able to notice any difference compared to the previous versions.
  That said, it is sensible to assume that a cell should be able to send back to an instance only the columns that are referenced in the projection section.
  
  Cheers,
  Chris
  
  Reply
Kerry Osborne

21 May 2010

Hi Chris,

We have been fortunate enough to play around with an Exadata V2 recently. From what we have observed, it appears that Smart Scans are only done via direct path reads and like regular direct path reads, they return data directly to PGA as opposed to returning blocks to the SGA. You may have noticed that 11gR2 is much more likely to do direct path reads. I’m guessing that’s specifically because of Exadata (but I may be reading too much into it). Doug Burns did a post on the that topic here:

http://oracledoug.com/serendipity/index.php?/archives/1321-11g-and-direct-path-reads.html

I haven’t actually tried to prove that the column projection claims are true, although I have no reason to suspect that it isn’t done. It does appear that additional data is returned along with the requested columns though, such as SCN so that read consistency can be maintained (we are still investigating this issue).

A couple of other quick comments: DBMS_XPLAN appears to show Exadata specific plan info without setting the CELL_OFFLOAD_PLAN_DISPLAY parameter. The common trait of plans that “may” have been offloaded” is the “storage” keyword in the predicate section. And by the way, the “storage” key word in the plan output is only an indicator that an operation could potentially be offloaded. A 10046 trace though will show you what’s going on as there are a number of specific new wait events. I plan to write up some more detailed info on my blog as I have time (I have done a couple of introductory posts already). One of the main things that took me a little while to get my head around was that there are actually two disk access methods – direct path, which is where all the smarts are (Smart Scann, Offloading), and conventional which returns blocks to the SGA. There may be many finer points that I am not aware of yet, but those are the main two categories.

Kerry

Reply
- Christian Antognini
  
  22 May 2010
  
  Hi Kerry
  
  First of all, thank you very much for the information you provide. I’m really looking for feedbacks provided by people that are actually using a DBM.
  
  > it appears that Smart Scans are only done via direct path reads
  
  Do you see particular events or the “regular” ones are used?
  
  > DBMS_XPLAN appears to show Exadata specific plan info without
  > setting the CELL_OFFLOAD_PLAN_DISPLAY parameter
  
  Yes, setting the parameter is not necessary in your situation. In fact 1) the default value is AUTO 2) you are using a DBM. Hence, by default, the query optimizer shows that information. In my case, however, I have to force it to do so.
  
  > the “storage” key word in the plan output is only an indicator that
  > an operation could potentially be offloaded
  
  This is also my understanding. In fact, it practice, things can be much more complex. For example when a partitioned table is only partially stored in an Exadata Storage Server. In such a situation I expect that the execution plan contains the smart scan operations but, when the SQL engine has to execute it, it can only do so for the partitions stored in an Exadata Storage Server.
  
  > I plan to write up some more detailed info on my blog as I have time
  
  Excellent!
  
  Cheers,
  Chris
  
  Reply

Kerry Osborne

24 May 2010

Yes – there are new wait events. Here’s a few lines from a trace file:

WAIT #5: nam='cell smart table scan' ela= 8873 cellhash#=398250101 p2=0 p3=0 obj#=36406 tim=1272659587252221
WAIT #5: nam='cell smart table scan' ela= 248 cellhash#=88802347 p2=0 p3=0 obj#=36406 tim=1272659587252672
WAIT #5: nam='cell smart table scan' ela= 6 cellhash#=398250101 p2=0 p3=0 obj#=36406 tim=1272659587252755
WAIT #5: nam='cell smart table scan' ela= 6343 cellhash#=398250101 p2=0 p3=0 obj#=36406 tim=1272659587259113
WAIT #5: nam='cell smart table scan' ela= 156 cellhash#=88802347 p2=0 p3=0 obj#=36406 tim=1272659587259414
WAIT #5: nam='cell smart table scan' ela= 23 cellhash#=88802347 p2=0 p3=0 obj#=36406 tim=1272659587259484
WAIT #5: nam='cell smart table scan' ela= 4 cellhash#=398250101 p2=0 p3=0 obj#=36406 tim=1272659587259512
WAIT #5: nam='cell smart table scan' ela= 8714 cellhash#=398250101 p2=0 p3=0 obj#=36406 tim=1272659587268239
WAIT #5: nam='cell smart table scan' ela= 226 cellhash#=88802347 p2=0 p3=0 obj#=36406 tim=1272659587268641
WAIT #5: nam='cell smart table scan' ela= 5 cellhash#=398250101 p2=0 p3=0 obj#=36406 tim=1272659587268769
WAIT #5: nam='cell smart table scan' ela= 17441 cellhash#=398250101 p2=0 p3=0 obj#=36406 tim=1272659587286221
WAIT #5: nam='cell smart table scan' ela= 157 cellhash#=88802347 p2=0 p3=0 obj#=36406 tim=1272659587286523
WAIT #5: nam='cell smart table scan' ela= 19 cellhash#=2520626383 p2=0 p3=0 obj#=36406 tim=1272659587286657
WAIT #5: nam='cell smart table scan' ela= 5 cellhash#=398250101 p2=0 p3=0 obj#=36406 tim=1272659587286790
WAIT #5: nam='cell smart table scan' ela= 3298 cellhash#=398250101 p2=0 p3=0 obj#=36406 tim=1272659587290103
WAIT #5: nam='cell smart table scan' ela= 634 cellhash#=88802347 p2=0 p3=0 obj#=36406 tim=1272659587290882

So we can see that a smart scan is being performed and which cell was returning the data. This was generated on a quarter rack (so 3 cells). I grabbed a section that had wait events for all 3 cells, but I was a bit surprised that one of the cells had significantly fewer events than the other 2. I was expecting a more even distribution. More to investigate.

There is also a “cell smart index scan” event (as you would expect).

Kerry

Christian Antognini

24 May 2010

Kerry, thank you very much for this information!

Cheers,
Chris

Reply
Kevin Closson

4 July 2010

There seems to be a log of speculation about the matter. Smart Scans are agents of plans with access method full when serviced by direct path reads (whether parallel or serial). Full stop.

Reply
Alexander Anokhin

22 March 2011

Hi Christian,

A little addition.

>> As far as I know this technique is not explicitly externalized in execution plans (I
>> mean, you don’t see which columns are actually sent back to the database instance)
There is possibility to see projection thanks to “projection” keyword in dbms_xplan package. E.g. (format => ‘+projection’)

Reply
Christian Antognini

22 March 2011

Hi Alexander

I’m aware of the possibility to list the columns referenced by a SQL statement. But, honestly, inferring that only those columns are returned is a bit too much for me ;-) It is a good starting point, though.

Cheers,
Chris

Reply

Exadata Storage Server and the Query Optimizer – Part 1

12 Comments

Leave a Reply Cancel reply