With the INMEMORY clause you can specify 4 sub-clauses:
- The MEMCOMPRESS clause specifies whether and how compression is used
- The PRIORITY clause specifies the priority (“order”) in which the segments are loaded when the IMCS is populated
- The DISTRIBUTE clause specifies how data is distributed across RAC instances
- The DUPLICATE clause specifies whether and how data is duplicated across RAC instances
The aim of this post is not to describe these attributes in detail. Instead, this post shows you the impact of the DUPLICATE clause when a query is executed in a RAC environment. For a short description, have a look to this table in the Database Reference guide. For more details, have a look to the Oracle Database In-Memory white paper.
It is essential to understand that the DUPLICATE clause (also known as “Fault Tolerant In-Memory Column Store”), according to the Oracle Database Licensing Information guide, requires Exadata or Supercluster. Also note that, according to this Twitter conversation, Oracle also plans to support the feature on ODA. Hence, the idea is to support it on all engineered systems.
In this post I would like to show you examples that make clear why the DUPLICATE clause is key to get good performance on a RAC system. To do so, I will execute a simple query like “SELECT sum(col) FROM tab” with different settings.
First of all, let me set the scene…
- Since the In-Memory option is required, I have to use version 12.1.0.2.0
SQL> SELECT banner 2 FROM v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production PL/SQL Release 12.1.0.2.0 - Production CORE 12.1.0.2.0 Production TNS for Linux: Version 12.1.0.2.0 - Production NLSRTL Version 12.1.0.2.0 - Production
- What you cannot see from the previous output is that the database is hosted on RAC/Exadata system
- The test table stores 12GB of data in about 1.5 million blocks (BTW, this is the same table I used in this post)
SQL> SELECT blocks, round(blocks*8192/1024/1024) AS size_mb 2 FROM user_tables 3 WHERE table_name = 'T'; BLOCKS SIZE_MB ---------- ---------- 1550061 12110
- Note that parallel processing is disabled at the table level
- At the beginning, make sure that the In-Memory Column Store (IMCS) is not used for the test table
SQL> ALTER TABLE t NO INMEMORY; SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated 2 FROM gv$im_user_segments 3 WHERE segment_name = 'T'; no rows selected
TEST 1
Let me prepare the environment for the first test:
- One instance is up and running
SQL> SELECT count(*) 2 FROM gv$instance; COUNT(*) ---------- 1
- Specify to use the IMCS for the test table and trigger the population
SQL> ALTER TABLE t INMEMORY; SQL> SELECT count(*) 2 FROM t; COUNT(*) ---------- 100000000
- After waiting for few minutes, check that the population is over (POPULATE_STATUS=COMPLETED) and that the test table is completely stored in the IMCS (BYTES_NO_POPULATED=0).
SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated 2 FROM gv$im_user_segments 3 WHERE segment_name = 'T'; INST_ID POPULATE_ INMEMORY_SIZE BYTES BYTES_NOT_POPULATED ---------- --------- ------------- ---------- ------------------- 1 COMPLETED 1.0921E+10 1.2707E+10 0
- Finally, flush the buffer cache (note that the IMCS is not impacted by this action)
SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;
Now that the environment is ready, I execute the test query and, at the same time, I activate extended SQL trace and gather runtime information:
- The execution took 2.75 seconds (UPDATE 2014-12-11: The execution isn’t as fast as you might expect because SQL trace was enabled during the test. Without SQL trace the execution is about 10 times faster.)
SQL> execute dbms_session.session_trace_enable SQL> SELECT /*+ gather_plan_statistics */ sum(n_32) 2 FROM t; SUM(N_32) ---------- 1550000000 Elapsed: 00:00:02.75 SQL> execute dbms_session.session_trace_disable
- A serial execution taking advantage of an IM scan (TABLE ACCESS INMEMORY FULL) was used
SQL> SELECT * FROM table(dbms_xplan.display_cursor('7qry07h06c2xs',0,'iostats last')); SQL_ID 7qry07h06c2xs, child number 0 ------------------------------------- SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t Plan hash value: 2966233522 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:02.73 | 4 | 2 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:02.73 | 4 | 2 | | 2 | TABLE ACCESS INMEMORY FULL| T | 1 | 100M| 100M|00:00:00.91 | 4 | 2 | -------------------------------------------------------------------------------------------------------
- The corresponding extended SQL trace information as shown by TKPROF is the following
SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 2.72 2.73 2 4 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 2.72 2.73 2 4 0 1 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 95 Number of plan statistics captured: 1 Rows (1st) Rows (avg) Rows (max) Row Source Operation ---------- ---------- ---------- --------------------------------------------------- 1 1 1 SORT AGGREGATE (cr=4 pr=2 pw=0 time=2730184 us) 100000000 100000000 100000000 TABLE ACCESS INMEMORY FULL T (cr=4 pr=2 pw=0 time=907260 us cost=17717 size=300000000 card=100000000) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 2 0.00 0.00 cell single block physical read 2 0.00 0.00 SQL*Net message from client 2 0.00 0.00
In summary, the execution was, for a query that had to process 12GB of data, quite fast. This was possible because all data was extracted from the IMCS.
TEST 2
The second test uses two instances. As a result, after starting the second instance, the Im-Memory attributes of the test table needed to be changed, and the IMCS had to be re-populated.
- Two instances are up and running
SQL> SELECT count(*) 2 FROM gv$instance; COUNT(*) ---------- 2
- Data is duplicated on both instances
SQL> ALTER TABLE t INMEMORY DUPLICATE;
- Trigger population
SQL> SELECT count(*) 2 FROM t; COUNT(*) ---------- 100000000
- After waiting for few minutes, check that the population is over and that test table is completely stored in both IMCS
SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated 2 FROM gv$im_user_segments 3 WHERE segment_name = 'T'; INST_ID POPULATE_ INMEMORY_SIZE BYTES BYTES_NOT_POPULATED ---------- --------- ------------- ---------- ------------------- 2 COMPLETED 1.0921E+10 1.2707E+10 0 1 COMPLETED 1.0921E+10 1.2707E+10 0
- Flush the buffer cache of both instances
Of course the test is the same as for TEST 1. The only difference is that the test table is stored in the IMCS of both instances.
- The execution took 2.97 seconds (UPDATE 2014-12-11: The execution isn’t as fast as you might expect because SQL trace was enabled during the test. Without SQL trace the execution is about 10 times faster.)
SQL> execute dbms_session.session_trace_enable SQL> SELECT /*+ gather_plan_statistics */ sum(n_32) 2 FROM t; SUM(N_32) ---------- 1550000000 Elapsed: 00:00:02.97 SQL> execute dbms_session.session_trace_disable
- It goes without saying that the execution plan is the same as for TEST 1
SQL> SELECT * FROM table(dbms_xplan.display_cursor('7qry07h06c2xs',0,'iostats last')); SQL_ID 7qry07h06c2xs, child number 0 ------------------------------------- SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t Plan hash value: 2966233522 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:02.97 | 4 | 2 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:02.97 | 4 | 2 | | 2 | TABLE ACCESS INMEMORY FULL| T | 1 | 100M| 100M|00:00:01.02 | 4 | 2 | -------------------------------------------------------------------------------------------------------
- And here is the extended SQL trace information
SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 2.95 2.96 2 4 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 2.95 2.97 2 4 0 1 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 95 Number of plan statistics captured: 1 Rows (1st) Rows (avg) Rows (max) Row Source Operation ---------- ---------- ---------- --------------------------------------------------- 1 1 1 SORT AGGREGATE (cr=4 pr=2 pw=0 time=2967854 us) 100000000 100000000 100000000 TABLE ACCESS INMEMORY FULL T (cr=4 pr=2 pw=0 time=1023321 us cost=17717 size=300000000 card=100000000) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 2 0.00 0.00 cell single block physical read 2 0.00 0.00 SQL*Net message from client 2 0.00 0.00
In summary, there is no real difference between TEST 1 and TEST 2.
TEST 3
The third test also uses two instances. But, this time, data is not fully duplicated on each instance. Instead, it is only distributed. This means that every row of the test table is stored in the IMCS of either one instance or the other.
Let me prepare the environment:
- Two instances are up and running
SQL> SELECT count(*) 2 FROM gv$instance; COUNT(*) ---------- 2
- Data is not duplicated on both instances
SQL> ALTER TABLE t INMEMORY NO DUPLICATE;
- Trigger population
SQL> SELECT count(*) 2 FROM t; COUNT(*) ---------- 100000000
- After waiting for few minutes, check that the population is over. Notice that since no instance contains all data, the BYTES_NOT_POPULATED column remains greater than 0
SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated 2 FROM gv$im_user_segments 3 WHERE segment_name = 'T'; INST_ID POPULATE_ INMEMORY_SIZE BYTES BYTES_NOT_POPULATED ---------- --------- ------------- ---------- ------------------- 2 COMPLETED 5726535680 1.2707E+10 6029524992 1 COMPLETED 5194317824 1.2707E+10 6652174336
- Flush the buffer cache of both instances
Once more, the same SQL statements are used to run the test:
- The execution took 2 minutes and 40 seconds!!!!!
SQL> execute dbms_session.session_trace_enable SQL> SELECT /*+ gather_plan_statistics */ sum(n_32) 2 FROM t; SUM(N_32) ---------- 1550000000 Elapsed: 00:02:39.95 SQL> execute dbms_session.session_trace_disable
- The execution plan is the same as for TEST 1 and TEST 2. The difference is that a huge number of logical and physical I/O were performed
SQL> SELECT * FROM table(dbms_xplan.display_cursor('7qry07h06c2xs',0,'iostats last')); SQL_ID 7qry07h06c2xs, child number 0 ------------------------------------- SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t Plan hash value: 2966233522 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:02:39.94 | 1086K| 1086K| | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:02:39.94 | 1086K| 1086K| | 2 | TABLE ACCESS INMEMORY FULL| T | 1 | 100M| 100M|00:02:31.06 | 1086K| 1086K| -------------------------------------------------------------------------------------------------------
- The information provided by extended SQL trace confirms what dbms_xplan shows. In addition, it also shows that the poor performance is due to the high number of single-block physical reads that were performed (about 275K)
SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 93.50 159.94 1086979 1086981 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 93.50 159.94 1086979 1086981 0 1 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 95 Number of plan statistics captured: 1 Rows (1st) Rows (avg) Rows (max) Row Source Operation ---------- ---------- ---------- --------------------------------------------------- 1 1 1 SORT AGGREGATE (cr=1086981 pr=1086979 pw=0 time=159942519 us) 100000000 100000000 100000000 TABLE ACCESS INMEMORY FULL T (cr=1086981 pr=1086979 pw=0 time=151063172 us cost=18478 size=300000000 card=100000000) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ Disk file operations I/O 1 0.00 0.00 SQL*Net message to client 2 0.00 0.00 cell single block physical read 274946 0.03 73.01 gc cr multi block request 6368 0.03 4.19 cell multiblock physical read 6368 0.02 10.00 latch: cache buffers lru chain 6 0.00 0.00 latch: gcs resource hash 2 0.00 0.00 latch: gc element 113 0.00 0.00 SQL*Net message from client 2 0.00 0.00
Since the bad performance is caused by the number of physical I/O, it goes without saying that if only logical I/O are taking place, the performance is much better. In this case, an immediate re-execution of the query took 8.28 seconds.
SQL> SELECT /*+ gather_plan_statistics */ sum(n_32) 2 FROM t; SUM(N_32) ---------- 1550000000 Elapsed: 00:00:08.28 SQL> SELECT * FROM table(dbms_xplan.display_cursor('7qry07h06c2xs',0,'iostats last')); --------------------------------------------------------------------------------------- SQL_ID 7qry07h06c2xs, child number 0 ------------------------------------- SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t Plan hash value: 2966233522 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:08.28 | 1086K| | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:08.28 | 1086K| | 2 | TABLE ACCESS INMEMORY FULL| T | 1 | 100M| 100M|00:00:08.26 | 1086K| ----------------------------------------------------------------------------------------------
In summary, only part of the data was accessed through the IMCS. In addition, despite the fact that an IM scan was processed, except if data was in the buffer cache, many single-block physical reads were carried out. As a result, the performance was bad. This behaviour shows that a session is only able to access the IMCS it is managed by the instance it is connected to. This is also a good example showing that accessing data through the IMCS is faster than accessing the same data through the buffer cache.
TEST 4
The fourth and final test is a continuation of the previous one. Its aim is to check the performance of a parallel query. Hence, I added the PARALLEL hint to the test query. Also note that I flushed the buffer cache of both instances before executing the test query.
SQL> SELECT /*+ gather_plan_statistics parallel */ sum(n_32) 2 FROM t; SUM(N_32) ---------- 1550000000 Elapsed: 00:00:01.49 SQL> SELECT * FROM table(dbms_xplan.display_cursor('br2as2yzy7gk4',2,'iostats last')); SQL_ID br2as2yzy7gk4, child number 2 ------------------------------------- SELECT /*+ gather_plan_statistics parallel */ sum(n_32) FROM t Plan hash value: 3126468333 --------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | --------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.41 | 7 | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.41 | 7 | 1 | | 2 | PX COORDINATOR | | 1 | | 10 |00:00:01.34 | 7 | 1 | | 3 | PX SEND QC (RANDOM) | :TQ10000 | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | | 4 | SORT AGGREGATE | | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | | 5 | PX BLOCK ITERATOR | | 0 | 100M| 0 |00:00:00.01 | 0 | 0 | |* 6 | TABLE ACCESS INMEMORY FULL| T | 0 | 100M| 0 |00:00:00.01 | 0 | 0 | --------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 6 - inmemory(:Z>=:Z AND :Z<=:Z) Note ----- - automatic DOP: Computed Degree of Parallelism is 10 - parallel scans affinitized for inmemory
The execution took 1.49 seconds. In other words, the performance of the parallel query is way better than the one of TEST 3. The reason can be seen in the Note section of the dbms_xplan output: "parallel scans affinitized for inmemory". This means that parallel slaves were started on both instances and that every set of parallel slaves accessed only the data stored in the IMCS of the instance they were attached to. As a result, even though the data was distributed, the performance was good. The only "overhead" compared to the serial execution is that the parallel slaves had to communicate with the query coordinator.
WHAT WE LEARNED
- DUPLICATE is only applicable to engineered systems (as I write this blog, ODA does not yet support it)
- On a RAC system, without DUPLICATE, data is distributed across all instances
- A process can only access data stored in the IMCS of the instance it is connected to
- Even though an IM scan is used, serial executions on RAC systems might require many logical/physical I/O. (UPDATE 2014-12-11: Parallel executions using Auto DOP don't suffer from this problem.)
In summary, I would not advise to use the In-Memory option on a RAC system that does not support the DUPLICATE clause. At least, not without Auto DOP. (UPDATE 2014-12-12: the previous sentence was "At least, not in general.")
Hi Christian,
Not every parallel query can access IM segments on other nodes. As explained in the related whitepaper (http://www.oracle.com/technetwork/database/in-memory/overview/twp-oracle-database-in-memory-2245633.html) Auto DOP is required for this. Your statement level hint enables Auto DOP which provides the affinity of scans. So you need Auto DOP enabled either with parallel_degree_policy=auto (or adaptive) or with a hint like you did.
Hi
You are absolutely right. Thank you for pointing it out.
Best,
Chris
Chris,
Was your tablespace locally managed with uniform extents of 1MB ? If so that could explain the 275,000 single block reads in Test 3
Hi Jonathan
No uniform extent size. Here are the details:
Could you please explain why a uniform extent size of 1MB could explain that?
BTW, most multi-block reads (cell multiblock physical read) are of 1MB.
Thank you,
Chris
[…] Fellow Oaktable Network member Christian Antognini has produced a very important article regarding how Oracle Database 12c In-Memory Column Store with Real Application Clusters is questionable unless using Oracle storage (Exadata, SPARC SuperCluster). I found Christian’s article very interesting because, after all, the topic at hand is an in-memory cache product (a.k.a., In-Memory Column Store). I fail to see any technical reason why Oracle wouldn’t support an in-memory product with blocks from any and all storage. It is in-memory after all, isn’t it? Please visit Christian’s article here: The Importance of the In-Memory DUPLICATE Clause for a RAC System. […]
This all begs the question of why you are not allowed to control instances considered for the distribution of the non-duplicate in memory column store. Perhaps this non-feature (surely trivial to NOT distribute at all if only a single instance is listed for a table) yielding your only one instance up results if the correct instance is used for the query is the continuance of the deprecation of application affinity. Perhaps Oracle will see the light and realize allowing instance designation dovetailed with application affinity could allow customers to have a functional column store much larger in aggregate for the operational cost of grouping analytical targets by instance. To me this seems like a worthwhile feature to add for non-engineered systems in particular, but for any RAC implementation where natural opportunities for application affinity exist.
Hi Chris,
I am surprised by all the single block reads in test 3l. In my tests it generally uses multi-block reads. Do you have ideas as to why that might be occurring?
Kerry
Hi Kerry
Unfortunately, I didn’t have time to look at it in detail. My present guess: it is because of the way data is distributed. For example, I would like to test whether I am able to reproduce the same issue with a partitioned table using DISTRIBUTE BY PARTITION.
In any case, the problem was reproduced independently by a colleague of mine on a different system (a non-engineered one) with completely different data.
Best,
Chris
I find the DUPLICATE clause to be misleading in non-Engineered systems. You can specify it with no errors. Explain Plans on other instances show the InMemory option is used. Yet the segment is only in one instance’s IM store. I blogged my findings here: http://www.peasland.net/2014/11/13/inmemory-duplicate-confusion-in-oracle-rac/
Thanks,
Brian
Hi Brian
I fully agree with you. It should not be possible to set it in case it’s not used.
What I do not understand in what you posted is why data is not distributed. Did you wait long enough?
Best,
Chris
Hi Christian,
I tried to get rid of the default behaviour on non Engineered Systems (“Every row of the test table is stored in the IMCS of either one instance or the other”) for 2 reasons:
1) Your finding: It could lead to bad performance.
2) In case of service defined as preferred/available, it would make sense to have the IMCS populated with all the DATA on the “active” Instance.
To get rid of the default behaviour, we can set the “_inmemory_auto_distribute” to false.
I tested it for:
RAC (non CDB) here: https://bdrouvot.wordpress.com/2014/11/20/in-memory-instance-distribution-with-rac-databases-i-want-at-least-one-instance-that-contains-all-the-data/
RAC with PDB here: https://bdrouvot.wordpress.com/2014/11/21/in-memory-instance-distribution-with-rac-and-multitenant-environment/
RAC with PDB and service defined as preferred/available here (Note that in that case FAN callouts can be enough): https://bdrouvot.wordpress.com/2014/12/04/fan-callouts-for-in-memory-instance-distribution-with-rac-pdb-and-preferredavailable-service/
Thx
Bertrand
[…] Christian Antognini’s blog item on the Duplicate clause in RAC. […]
[…] Christian Antognini show us that having the data not fully populated on each instance could lead to bad performance into this blog post. […]
[…] This distribution could lead to performance issue. […]
[…] Antognini show us into this blog post that having the data not fully populated on an Instance (default behaviour on non Engineered […]
[…] Un très bon blog de Christian Antognini à lire sur le sujet: http://antognini.ch/2014/11/the-importance-of-the-in-memory-duplicate-clause-for-a-rac-system/ […]
Hi Chris,
It’s also worth knowing that, perhaps unsurprisingly, if parallel_force_local=true the IM segments stored on other instances cannot be accessed,
and the local instance will ask the rows to the cells/disk/buffer cache instead.
Just checked on our X5-2 12.1.2.2.1.160119.
Hence using DISTRIBUTE (DISTRIBUTE ALL, in the general case of 3+ RAC instances) becomes of paramount importance if you want to (or are forced to due to bugs) use this setting …
ciao
al
Ciao Alberto
Thank you for sharing the result of your test. Honestly, I’m not surprised. At all.
Best,
Chris
Hi Chris,
In addition to single block physical read, on Test 3 I noticed another “stranger things”.
The elapsed time of the statement is about 160 seconds, but if I sum cpu time (93.5) and wait time (73.01+4.19.10), the elapsed time should be 180.7 seconds.
How can this is possible?
What am I doing wrong?
Thanks and regards.
Giuseppe
Ciao Giuseppe
I ought to be an instrumentation problem. In fact, the total elapsed time matches to the time shown by SQL*Plus.
Since the test was carried out in 12.1.0.2, it could be “Bug 16915985 : INCORRECT WAIT TIMING WITH SERIAL DIRECT PATH READS ON ASM WITH SYNCHRONOUS IO.”
Hence, it’s not you doing something wrong.
Best,
Chris