With the INMEMORY clause you can specify 4 sub-clauses:

  • The MEMCOMPRESS clause specifies whether and how compression is used
  • The PRIORITY clause specifies the priority (“order”) in which the segments are loaded when the IMCS is populated
  • The DISTRIBUTE clause specifies how data is distributed across RAC instances
  • The DUPLICATE clause specifies whether and how data is duplicated across RAC instances

The aim of this post is not to describe these attributes in detail. Instead, this post shows you the impact of the DUPLICATE clause when a query is executed in a RAC environment. For a short description, have a look to this table in the Database Reference guide. For more details, have a look to the Oracle Database In-Memory white paper.

It is essential to understand that the DUPLICATE clause (also known as “Fault Tolerant In-Memory Column Store”), according to the Oracle Database Licensing Information guide, requires Exadata or Supercluster. Also note that, according to this Twitter conversation, Oracle also plans to support the feature on ODA. Hence, the idea is to support it on all engineered systems.

In this post I would like to show you examples that make clear why the DUPLICATE clause is key to get good performance on a RAC system. To do so, I will execute a simple query like “SELECT sum(col) FROM tab” with different settings.

First of all, let me set the scene…

  • Since the In-Memory option is required, I have to use version 12.1.0.2.0
SQL> SELECT banner
  2  FROM v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
CORE    12.1.0.2.0      Production
TNS for Linux: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production
  • What you cannot see from the previous output is that the database is hosted on RAC/Exadata system
  • The test table stores 12GB of data in about 1.5 million blocks (BTW, this is the same table I used in this post)
SQL> SELECT blocks, round(blocks*8192/1024/1024) AS size_mb
  2  FROM user_tables
  3  WHERE table_name = 'T';

    BLOCKS    SIZE_MB
---------- ----------
   1550061      12110
  • Note that parallel processing is disabled at the table level
  • At the beginning, make sure that the In-Memory Column Store (IMCS) is not used for the test table
SQL> ALTER TABLE t NO INMEMORY;

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
  2  FROM gv$im_user_segments
  3  WHERE segment_name = 'T';

no rows selected

TEST 1

Let me prepare the environment for the first test:

  • One instance is up and running
SQL> SELECT count(*)
  2  FROM gv$instance;

  COUNT(*)
----------
         1
  • Specify to use the IMCS for the test table and trigger the population
SQL> ALTER TABLE t INMEMORY;

SQL> SELECT count(*)
  2  FROM t;

  COUNT(*)
----------
 100000000
  • After waiting for few minutes, check that the population is over (POPULATE_STATUS=COMPLETED) and that the test table is completely stored in the IMCS (BYTES_NO_POPULATED=0).
SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
  2  FROM gv$im_user_segments
  3  WHERE segment_name = 'T';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED    1.0921E+10 1.2707E+10                   0
  • Finally, flush the buffer cache (note that the IMCS is not impacted by this action)
SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;

Now that the environment is ready, I execute the test query and, at the same time, I activate extended SQL trace and gather runtime information:

  • The execution took 2.75 seconds (UPDATE 2014-12-11: The execution isn’t as fast as you might expect because SQL trace was enabled during the test. Without SQL trace the execution is about 10 times faster.)
SQL> execute dbms_session.session_trace_enable

SQL> SELECT /*+ gather_plan_statistics */ sum(n_32)
  2  FROM t;

 SUM(N_32)
----------
1550000000

Elapsed: 00:00:02.75

SQL> execute dbms_session.session_trace_disable
  • A serial execution taking advantage of an IM scan (TABLE ACCESS INMEMORY FULL) was used
SQL> SELECT * FROM table(dbms_xplan.display_cursor('7qry07h06c2xs',0,'iostats last'));

SQL_ID  7qry07h06c2xs, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t

Plan hash value: 2966233522

-------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |      1 |        |      1 |00:00:02.73 |       4 |      2 |
|   1 |  SORT AGGREGATE             |      |      1 |      1 |      1 |00:00:02.73 |       4 |      2 |
|   2 |   TABLE ACCESS INMEMORY FULL| T    |      1 |    100M|    100M|00:00:00.91 |       4 |      2 |
-------------------------------------------------------------------------------------------------------
  • The corresponding extended SQL trace information as shown by TKPROF is the following
SELECT /*+ gather_plan_statistics */ sum(n_32)
FROM t

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      2.72       2.73          2          4          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      2.72       2.73          2          4          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 95
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=4 pr=2 pw=0 time=2730184 us)
 100000000  100000000  100000000   TABLE ACCESS INMEMORY FULL T (cr=4 pr=2 pw=0 time=907260 us cost=17717 size=300000000 card=100000000)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  cell single block physical read                 2        0.00          0.00
  SQL*Net message from client                     2        0.00          0.00

In summary, the execution was, for a query that had to process 12GB of data, quite fast. This was possible because all data was extracted from the IMCS.

TEST 2

The second test uses two instances. As a result, after starting the second instance, the Im-Memory attributes of the test table needed to be changed, and the IMCS had to be re-populated.

  • Two instances are up and running
SQL> SELECT count(*)
  2  FROM gv$instance;

  COUNT(*)
----------
         2
  • Data is duplicated on both instances
SQL> ALTER TABLE t INMEMORY DUPLICATE;
  • Trigger population
SQL> SELECT count(*)
  2  FROM t;

  COUNT(*)
----------
 100000000
  • After waiting for few minutes, check that the population is over and that test table is completely stored in both IMCS
SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
  2  FROM gv$im_user_segments
  3  WHERE segment_name = 'T';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         2 COMPLETED    1.0921E+10 1.2707E+10                   0
         1 COMPLETED    1.0921E+10 1.2707E+10                   0
  • Flush the buffer cache of both instances

Of course the test is the same as for TEST 1. The only difference is that the test table is stored in the IMCS of both instances.

  • The execution took 2.97 seconds (UPDATE 2014-12-11: The execution isn’t as fast as you might expect because SQL trace was enabled during the test. Without SQL trace the execution is about 10 times faster.)
SQL> execute dbms_session.session_trace_enable

SQL> SELECT /*+ gather_plan_statistics */ sum(n_32)
  2  FROM t;

 SUM(N_32)
----------
1550000000

Elapsed: 00:00:02.97

SQL> execute dbms_session.session_trace_disable
  • It goes without saying that the execution plan is the same as for TEST 1
SQL> SELECT * FROM table(dbms_xplan.display_cursor('7qry07h06c2xs',0,'iostats last'));

SQL_ID  7qry07h06c2xs, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t

Plan hash value: 2966233522

-------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |      1 |        |      1 |00:00:02.97 |       4 |      2 |
|   1 |  SORT AGGREGATE             |      |      1 |      1 |      1 |00:00:02.97 |       4 |      2 |
|   2 |   TABLE ACCESS INMEMORY FULL| T    |      1 |    100M|    100M|00:00:01.02 |       4 |      2 |
-------------------------------------------------------------------------------------------------------
  • And here is the extended SQL trace information
SELECT /*+ gather_plan_statistics */ sum(n_32)
FROM t

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      2.95       2.96          2          4          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      2.95       2.97          2          4          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 95
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=4 pr=2 pw=0 time=2967854 us)
 100000000  100000000  100000000   TABLE ACCESS INMEMORY FULL T (cr=4 pr=2 pw=0 time=1023321 us cost=17717 size=300000000 card=100000000)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  cell single block physical read                 2        0.00          0.00
  SQL*Net message from client                     2        0.00          0.00

In summary, there is no real difference between TEST 1 and TEST 2.

TEST 3

The third test also uses two instances. But, this time, data is not fully duplicated on each instance. Instead, it is only distributed. This means that every row of the test table is stored in the IMCS of either one instance or the other.

Let me prepare the environment:

  • Two instances are up and running
SQL> SELECT count(*)
  2  FROM gv$instance;

  COUNT(*)
----------
         2
  • Data is not duplicated on both instances
SQL> ALTER TABLE t INMEMORY NO DUPLICATE;
  • Trigger population
SQL> SELECT count(*)
  2  FROM t;

  COUNT(*)
----------
 100000000
  • After waiting for few minutes, check that the population is over. Notice that since no instance contains all data, the BYTES_NOT_POPULATED column remains greater than 0
SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
  2  FROM gv$im_user_segments
  3  WHERE segment_name = 'T';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         2 COMPLETED    5726535680 1.2707E+10          6029524992
         1 COMPLETED    5194317824 1.2707E+10          6652174336
  • Flush the buffer cache of both instances

Once more, the same SQL statements are used to run the test:

  • The execution took 2 minutes and 40 seconds!!!!!
SQL> execute dbms_session.session_trace_enable

SQL> SELECT /*+ gather_plan_statistics */ sum(n_32)
  2  FROM t;

 SUM(N_32)
----------
1550000000

Elapsed: 00:02:39.95

SQL> execute dbms_session.session_trace_disable
  • The execution plan is the same as for TEST 1 and TEST 2. The difference is that a huge number of logical and physical I/O were performed
SQL> SELECT * FROM table(dbms_xplan.display_cursor('7qry07h06c2xs',0,'iostats last'));

SQL_ID  7qry07h06c2xs, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t

Plan hash value: 2966233522

-------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |      1 |        |      1 |00:02:39.94 |    1086K|   1086K|
|   1 |  SORT AGGREGATE             |      |      1 |      1 |      1 |00:02:39.94 |    1086K|   1086K|
|   2 |   TABLE ACCESS INMEMORY FULL| T    |      1 |    100M|    100M|00:02:31.06 |    1086K|   1086K|
-------------------------------------------------------------------------------------------------------
  • The information provided by extended SQL trace confirms what dbms_xplan shows. In addition, it also shows that the poor performance is due to the high number of single-block physical reads that were performed (about 275K)
SELECT /*+ gather_plan_statistics */ sum(n_32)
FROM t

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     93.50     159.94    1086979    1086981          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     93.50     159.94    1086979    1086981          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 95
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=1086981 pr=1086979 pw=0 time=159942519 us)
 100000000  100000000  100000000   TABLE ACCESS INMEMORY FULL T (cr=1086981 pr=1086979 pw=0 time=151063172 us cost=18478 size=300000000 card=100000000)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  Disk file operations I/O                        1        0.00          0.00
  SQL*Net message to client                       2        0.00          0.00
  cell single block physical read            274946        0.03         73.01
  gc cr multi block request                    6368        0.03          4.19
  cell multiblock physical read                6368        0.02         10.00
  latch: cache buffers lru chain                  6        0.00          0.00
  latch: gcs resource hash                        2        0.00          0.00
  latch: gc element                             113        0.00          0.00
  SQL*Net message from client                     2        0.00          0.00

Since the bad performance is caused by the number of physical I/O, it goes without saying that if only logical I/O are taking place, the performance is much better. In this case, an immediate re-execution of the query took 8.28 seconds.

SQL> SELECT /*+ gather_plan_statistics */ sum(n_32)
  2  FROM t;

 SUM(N_32)
----------
1550000000

Elapsed: 00:00:08.28

SQL> SELECT * FROM table(dbms_xplan.display_cursor('7qry07h06c2xs',0,'iostats last'));

---------------------------------------------------------------------------------------
SQL_ID  7qry07h06c2xs, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t

Plan hash value: 2966233522

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |      1 |        |      1 |00:00:08.28 |    1086K|
|   1 |  SORT AGGREGATE             |      |      1 |      1 |      1 |00:00:08.28 |    1086K|
|   2 |   TABLE ACCESS INMEMORY FULL| T    |      1 |    100M|    100M|00:00:08.26 |    1086K|
----------------------------------------------------------------------------------------------

In summary, only part of the data was accessed through the IMCS. In addition, despite the fact that an IM scan was processed, except if data was in the buffer cache, many single-block physical reads were carried out. As a result, the performance was bad. This behaviour shows that a session is only able to access the IMCS it is managed by the instance it is connected to. This is also a good example showing that accessing data through the IMCS is faster than accessing the same data through the buffer cache.

TEST 4
The fourth and final test is a continuation of the previous one. Its aim is to check the performance of a parallel query. Hence, I added the PARALLEL hint to the test query. Also note that I flushed the buffer cache of both instances before executing the test query.

SQL> SELECT /*+ gather_plan_statistics parallel */ sum(n_32)
  2  FROM t;

 SUM(N_32)
----------
1550000000

Elapsed: 00:00:01.49

SQL> SELECT * FROM table(dbms_xplan.display_cursor('br2as2yzy7gk4',2,'iostats last'));

SQL_ID  br2as2yzy7gk4, child number 2
-------------------------------------
SELECT /*+ gather_plan_statistics parallel */ sum(n_32) FROM t

Plan hash value: 3126468333

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |          |      1 |        |      1 |00:00:01.41 |       7 |      1 |
|   1 |  SORT AGGREGATE                 |          |      1 |      1 |      1 |00:00:01.41 |       7 |      1 |
|   2 |   PX COORDINATOR                |          |      1 |        |     10 |00:00:01.34 |       7 |      1 |
|   3 |    PX SEND QC (RANDOM)          | :TQ10000 |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |
|   4 |     SORT AGGREGATE              |          |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |
|   5 |      PX BLOCK ITERATOR          |          |      0 |    100M|      0 |00:00:00.01 |       0 |      0 |
|*  6 |       TABLE ACCESS INMEMORY FULL| T        |      0 |    100M|      0 |00:00:00.01 |       0 |      0 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - inmemory(:Z>=:Z AND :Z<=:Z)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 10
   - parallel scans affinitized for inmemory

The execution took 1.49 seconds. In other words, the performance of the parallel query is way better than the one of TEST 3. The reason can be seen in the Note section of the dbms_xplan output: "parallel scans affinitized for inmemory". This means that parallel slaves were started on both instances and that every set of parallel slaves accessed only the data stored in the IMCS of the instance they were attached to. As a result, even though the data was distributed, the performance was good. The only "overhead" compared to the serial execution is that the parallel slaves had to communicate with the query coordinator.

WHAT WE LEARNED

  • DUPLICATE is only applicable to engineered systems (as I write this blog, ODA does not yet support it)
  • On a RAC system, without DUPLICATE, data is distributed across all instances
  • A process can only access data stored in the IMCS of the instance it is connected to
  • Even though an IM scan is used, serial executions on RAC systems might require many logical/physical I/O. (UPDATE 2014-12-11: Parallel executions using Auto DOP don't suffer from this problem.)

In summary, I would not advise to use the In-Memory option on a RAC system that does not support the DUPLICATE clause. At least, not without Auto DOP. (UPDATE 2014-12-12: the previous sentence was "At least, not in general.")