As of Oracle Database 11g, serial full table scans can use direct reads (see Metalink note 793845.1). This is basically done for improving the performance of large full table scans. In my opinion, the most important pros and cons of this feature are the following…
Pros (which are related to the fact that the buffer cache is not used):
- Reduced latch utilization
- The size of the physical I/Os does not depend on the blocks that are in the buffer cache
Cons:
- A segment checkpoint must precede the first direct read
- Repeated delayed block cleanouts
In this post, as the title suggests, I would like to focus on the last one. The reason is simple. I cannot remember having already read something about that topic.
Let’s start by showing you an example where delayed block cleanouts are performed…
- Create a test table and insert some data into it (note that the table uses 10000 database blocks):
SQL> CREATE TABLE t (id NUMBER, pad VARCHAR2(500)) TABLESPACE test_8k; SQL> INSERT INTO t 2 SELECT rownum, rpad('*',500,'*') 3 FROM dual 4 CONNECT BY level <= 140000; SQL> COMMIT;
- Generate some blocks that require delayed block cleanout:
SQL> UPDATE t SET id=id*10; SQL> COMMIT;
- Flush the buffer cache (this is necessary because many blocks are in the buffer cache and, therefore, direct reads would not be used for reading the test table):
SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;
- Execute two queries on the test table and store the session statistics in a helper table:
SQL> CREATE TABLE s AS 2 SELECT 0 AS run, name, value 3 FROM v$statname NATURAL JOIN v$mystat; SQL> SELECT count(*) FROM t; COUNT(*) ---------- 140000 SQL> INSERT INTO s 2 SELECT 1, name, value 3 FROM v$statname NATURAL JOIN v$mystat; SQL> SELECT count(*) FROM t; COUNT(*) ---------- 140000 SQL> INSERT INTO s 2 SELECT 2, name, value 3 FROM v$statname NATURAL JOIN v$mystat;
- Show some statistics for the two executions:
SQL> SELECT * 2 FROM ( 3 SELECT run, name, value-lag(value) OVER (ORDER BY name, run) AS value 4 FROM s 5 WHERE name IN ('cleanouts only - consistent read gets', 6 'session logical reads', 7 'physical reads', 8 'physical reads direct') 9 ) 10 PIVOT (sum(value) FOR run IN (1 AS RUN_1, 2 AS RUN2_2)) 11 ORDER BY name; NAME RUN_1 RUN2_2 ---------------------------------------- ---------- ---------- cleanouts only - consistent read gets 9466 9399 physical reads 10112 10004 physical reads direct 10003 10003 session logical reads 19850 19538
As you can see from the execution statistics, both runs did almost the same amount of (direct) reads, logical reads and cleanouts. Especially the last is strange. In fact, usually, the cleanouts are performed only once for each block. In this case, however, the same cleanouts are performed for every execution of the query.
This is clearly due to the direct reads.
To confirm that, let’s rerun the test without using serial direct reads. For that purpose, event 10949 is enabled.
SQL> TRUNCATE TABLE s; SQL> ALTER SESSION SET EVENTS '10949 TRACE NAME CONTEXT FOREVER'; SQL> INSERT INTO s 2 SELECT 0 AS run, name, value 3 FROM v$statname NATURAL JOIN v$mystat; SQL> SELECT count(*) FROM t; COUNT(*) ---------- 140000 SQL> INSERT INTO s 2 SELECT 1, name, value 3 FROM v$statname NATURAL JOIN v$mystat; SQL> SELECT count(*) FROM t; COUNT(*) ---------- 140000 SQL> INSERT INTO s 2 SELECT 2, name, value 3 FROM v$statname NATURAL JOIN v$mystat; SQL> SELECT * 2 FROM ( 3 SELECT run, name, value-lag(value) OVER (ORDER BY name, run) AS value 4 FROM s 5 WHERE name IN ('cleanouts only - consistent read gets', 6 'session logical reads', 7 'physical reads', 8 'physical reads direct') 9 ) 10 PIVOT (sum(value) FOR run IN (1 AS RUN_3, 2 AS RUN2_4)) 11 ORDER BY name; NAME RUN_3 RUN2_4 ---------------------------------------- ---------- ---------- cleanouts only - consistent read gets 9399 0 physical reads 9919 0 physical reads direct 0 0 session logical reads 19592 10228
As you can see from the statistics, the major difference between the second and the third run is the number of direct reads. The interesting thing is to notice how the third run finally did a “regular” cleanout and, therefore, the subsequent run, the fourth one, was able to directly read the (cleaned) blocks. Because of that the number of logical reads of the fourth run was much lower.
Thanks for the great catch and the example code. I suspected something like this is happening, but was very confused since “everyone knows each block is only cleaned once”.
It’s interesting to note the difference in the metrics, but how would you describe the impact on a database system because of this new behavior? I’m trying to draw the line between difference in metrics and the difference in performance on a database system. What does it mean in terms of performance impact?
Chris. This is a very natural but hard-to-find restriction.
Here we have another reason for selecting the whole table in non-direct scan after big DML!
Let me check my understanding: direct read means that the block is stored in RAM inside the process’ private memory (not in the buffer cache), and hence any modification to the block are made there and not persisted to disk. Correct ?
I would guess that writing back to disk would be a bit complex; it would necessitate a synchronization with DBWR, checking that you are not writing back an older version of the block, etcetera …
Hi Greg, Dion
First of all, it is true that such a behavior will probably cause performance problems only in specific cases. But, nevertheless, it’s in my opinion always good to know that such a possibility exist. Because of this problem, I would for example execute a silly query on a table that I know I just modified through a massive DML operation. Do you know, just in case ;-) This is especially true for data that will be “archived” or put in a read-only tablespace.
Now, about the actual performance impact… Of course, it depends! In this specific case, let me show you a couple of figures (for producing them, I used the same queries of the post but with a table storing about 5GB of data).
If the query is I/O bound (no data cached in the Linux buffer cache), I have measured an elapsed time of 51 seconds (8.3 seconds on CPU) with cleanouts and 46 seconds (6.1 seconds on CPU) without them. So, the execution with cleanouts is about 10% slower and uses about 30% more CPU.
If the query is CPU bound (data is cached in the Linux buffer cache), I have measured an elapsed time of 6.1 seconds (100% of the time on CPU) with cleanouts and 4.0 seconds (100% of the time on CPU) without them. So, the execution with cleanouts is about 50% slower and uses about 50% more CPU.
Cheers,
Chris
Ciao Alberto
> any modification to the block are made there and not persisted to disk. Correct ?
Yes. And, therefore, they have to be repeated every time the block is read.
Sa vedum,
Chris
A delayed block cleanout does require updating the block (and, at some time, writing back to disk). Both operations would be through the SGA (the actual write, I expect, still has to be done by DBWR). A direct read bypassing the SGA cannot be expected (or allowed !) to update the block.
Another non direct read session must “take the burden” of doing the block cleanout.
Chris.
In your testcase, how do you define the exact meaning of “cleanouts only – consistent read gets”?
It seems to me that the cleanout by serial direct path read is not real cleanout, but just to read the undo info to confirm whether the block is dirty or not.
My test case shows that two cases(direct read vs. conventional read after big update) show a big difference in redo size.
[sourcecode language=’sql’]
— serial direct read
redo size 4,156
— convential read
redo size 149,184
[/sourcecode]
If serial direct path read really cleans out the dirty blocks, wouldn’t it show similar redo size?
Hemant
To avoid misunderstandings, let me stress that I do not expect that a direct read updates the block on disk. Of course it cannot. With this post I just wanted to point out that behavior.
Cheers,
Chris
Dion
> how do you define the exact meaning of “cleanouts only – consistent read gets”?
I’m not sure to understand your question. Did you ask it because the statistic contains the word “only”? If yes, please notice that there is another statistics named “cleanouts and rollbacks – consistent read gets”. Their definition, according to the Referece Guide, is the following:
cleanouts only – consistent read gets: Number of consistent gets that require only block cleanouts, no rollbacks.
cleanouts and rollbacks – consistent read gets: Number of consistent gets that require both block rollbacks and block cleanouts.
> My test case shows that two cases(direct read vs. conventional read after big update) show a big difference in redo size.
I notice the same behavior. This is probably due to the fact that the block is not really modified. Honestly, I didn’t investigate further in this direction. But it could be interesting to extract the information stored in redo logs…
Cheers,
Chris
The reference guide! What a shame I didn’t check it. :(
The redo generated by serial direct path read seems to be the minimum workload necessary to activate the direct path read – maybe related to object checkpoint. As you commented, redo log will show exact info.
I think that the serial direct path read does not cleanout dirty blocks multiple times(As per your expression, the blocks wouldn’t be really modified multiple times). It would just read additional blocks to check the status of the blocks.
Dion
> maybe related to object checkpoint
Yep. That should be the reason.
> I think that the serial direct path read does not cleanout dirty blocks multiple times
The block on disk must be cleaned out. So, there is much more than only checking the transaction table in the undo segment. About that point I did several crosschecks between execution statistics, block dumps, event 10203… everything is consistent and leads to the conclusion that a cleanout of every block is be performed. The only exception is for the blocks that are fast cleaned out at commit time.
Cheers,
Chris
[…] Antognini-Impact of Direct Reads on Delayed Block Cleanouts Dion Cho-Disabling direct path read for the serial full table scan – […]
[…] Antognini posted an article about the serial direct path read and repeated block cleanout problem here. I’d just like to make it clear what really happens inside […]
[…] Striving for Optimal Performance, Christian Antognini examines the impact of direct reads on delayed block cleanouts, one of the “most important cons,” in his opinion, of serial full table scans can using […]
I suppose if you select ora_rowscn’s in your test runs 1,2,3 they will be different in each run.
In test run 4 they should be the same like in run 3.
Am I right?
Hi Todor
For all runs the ora_rowscn is the same. Makes sense…
Cheers,
Chris
Hi Christian, I read your interesting test and this connection, I bring
to your attention some doubts on manage of direct reads.
I did two tests, the first on DB version 10g, the other in 11g to try to
understand if the principle is the same for the two versions DB. I know
that a condition for Oracle decides to perform direct reads is that the
number of blocks should be about 5 times the value of hidden parameter
_small_table_threshold.
test 10g
so with a number of blocks greater than 10650, the direct reads should be used
but this doesn’t happen. For completeness, I also see the number of blocks of the table
I run the same test at 11g:
So, I have the same situation that I had in 10g, but with a different value _small_table_threshold
I noticed that in 11g, on equal terms with the test conducted in the version
10g, then with _serial_direct_read -> FALSE and the number of blocks (50437) exceeds 5
times the value of _small_table_threshold (7120) the direct reads are applied
as not done in 10g
But then this means that even if Oracle version 11g
_serial_direct_read = FALSE (but the number of blocks is greater than 5 times the value
_small_table_threshold most of the other conditions to which I referred you)
the direct reads are performed?
Is my interpretation correct?
Thanks a lot
Donatello
Ciao Donatello
Yes, your interpretation is correct. Just note that the initialization _serial_direct_read exists from a long time (8i?) and, probably for that reason, it is not (directly) related to the serial direct reads that are used in recent releases.
Cheers,
Chris
Ciao Christian,
I would like to ask you one last question , this time as
regards direct reads and delayed block cleanout, in 10g:
I check if there are dirty blocks, as lawful expect:
This means that there are dirty blocks in buffer cache, insert’s blocks are
by definition. I run some statistical checks before executing my query:
I run my query:
So,I rerun the statistics:
My interpretation is the following:
When I run my query (the select count (*) from t;) there are
blocks in buffer cache. I see from the statistic that Oracle
has solved my query using direct reads. To do that, Oracle
must make changes to blocks, to download it from SGA and then
bring these blocks into PGA, since the engine Oracle has
established that the circumstances require the use of direct reads.
1)Is it correct what I say?
2)If my hypothesis is correct, it means that during the query
(select count (*) from t) were performed delayed block cleanout
for a number of blocks (the ones on disk) + fast commit to a number
blocks (those who were SGA) before the engine execute direct reads?
Kind Regards,
Donatello
Ciao Donatello
I’m not sure to fully understand what do you mean… This is probably because you are confused by the dirty blocks in the SGA. The important thing you have to remember is that before doing direct reads a segment checkpoint has to be performed. Because of the segment checkpoint, all blocks are on disk and, therefore, every time that they are read through a direct read a deferred cleanout has to be performed.
HTH
Chris
Ciao Christian,
I have explained myself badly, I understood from what you had written that
a key prerequisite is that a checkpoint has been executed,
but what I meant is that after I insert (ie before the
select count (*)…) saw a number bi dirty blocks in SGA
and this means that (the image of those blocks does not match that on disk
also because some blocks there have never been on disk, right?).
I’d like to have only confirmed the following concept:
when I run “select count (*) from t” and I see that Oracle uses
“Direct reads” …
… this means that Oracle has done in the meantime
a checkpoint and then brought up the disc, even blocks that
were in SGA? If not, I would not expect “direct reads”
but “conventional reads.
Thanks
Ciao Donatello
> what I meant is that after I insert (ie before the
> select count (*)…) saw a number dirty blocks in SGA
> the image of those blocks does not match that on disk
> also because some blocks there have never been on disk, right?
Yes, a regular insert writes into the buffer cache only.
> when I run “select count (*) from t” and I see that Oracle uses
> “Direct reads” …
>
> … this means that Oracle has done in the meantime
> a checkpoint and then brought up the disc, even blocks that
> were in SGA?
Yes.
And now I understand what bothers you :-) It is effectively so that the DBWR has to write some blocks to disk and just after that the server process has to read them back.
HTH
Chris
[…] http://antognini.ch/2009/07/impact-of-direct-reads-on-delayed-block-cleanouts/ (dated July 23, 2009) […]
[…] http://antognini.ch/2009/07/impact-of-direct-reads-on-delayed-block-cleanouts/ (הניסוי מוצג בהערה מספר 18 של הפוסט בלינק לעיל.) […]
Ciao Chris,
I try to think the impact on real applications, thus extract a testcase:
it seems that "physical reads direct" of table t repeats 100 times (100 * 10003)
due to the 100 nested loops on t.
Hi Ksun
I would not expect something different in such a case…
Cheers,
Chris
I have found the “direct path read” have several times greater response time than the „db file scattered read“. Do you think that the time spent on theese overheads mentioned above (repeated block cleanout, segment checkpoint) are also counted into the response time of “direct path read”? If not, then what can be the reason that “direct path read” are slower? MBRC should be same for both.
Jan
Hi Jan
> Do you think that the time spent on these overheads mentioned above are also counted into the response time of “direct path read”?
No. You should see more disk I/O and CPU utilization for them.
> If not, then what can be the reason that “direct path read” are slower? MBRC should be same for both.
They are not implemented in the same way. E.g. the system calls (depending on port and release) might be different. In addition, the size of the disk I/O can also be different. E.g. a FTS based on scattered reads does not reads blocks already in the buffer cache. So, I would suggest to carefully compare the two kind of disk I/O…
HTH
Chris
[…] This means that I am almost certainly going to see direct path reads now when scanning my table. As Christian Antognini has pointed out direct path reads have an impact on the amount of work that has to be done. The test commences with […]