May 29 2009

Troubleshooting Oracle Performance – Downloadable Files

Tag: TOPChristian Antognini @ 3:20 pm

This is just a short note to point out that I just uploaded a new version of the scripts related to TOP.The new ZIP is available through this page.

The change log is the following (as you can see there are only few changes compared to the last time):

chapter07/object_stats.sql New file
chapter07/outline_with_ffs.sql New file
chapter07/sqltext_to_signature.sql New file

May 05 2009

Wrong Information about Temporary Space Usage in V$SQL_PLAN_STATISTICS_ALL and DBMS_XPLAN Output

Tag: 10gR1, 10gR2, 11gR1, Bug, TOPChristian Antognini @ 1:00 am

As you can read in the documentation, the columns MAX_TEMPSEG_SIZE and LAST_TEMPSEG_SIZE in the dynamic performance view V$SQL_WORKAREA provide information about the size of the temporary segment used for a specific workarea. The values are given in bytes. Let’s perform a test to check this information…

  • Create a test table that contains about 1MB of data:

SQL> CREATE TABLE t AS
  2  SELECT rownum AS id, dbms_random.string('p',1000) AS pad
  3  FROM dual
  4  CONNECT BY level <= 1000;

SQL> execute dbms_stats.gather_table_stats(user, 't')

  • Setup the session to force the user process to spill into a temporary segment:

SQL> ALTER SESSION SET workarea_size_policy = manual;
SQL> ALTER SESSION SET sort_area_size = 524288;

  • Run test query including a sort operation (that spills to the temporary tablespace):

SQL> SELECT id FROM t ORDER BY pad;

  • Check the amount of used temporary space by querying V$SQL_WORKAREA:

SQL> SELECT max_tempseg_size, last_tempseg_size
  2  FROM v$sql_workarea
  3  WHERE (sql_id, child_number) IN (SELECT prev_sql_id, prev_child_number
  4                                   FROM v$session
  5                                   WHERE sid = sys_context('userenv','sid'));

MAX_TEMPSEG_SIZE LAST_TEMPSEG_SIZE
---------------- -----------------
         2097152           2097152

According to this information the size of the temporary space used to execute the query was 2MB. So far, so good.

Always according to the documentation another dynamic performance view, V$SQL_PLAN_STATISTICS_ALL, should provide the same information (remember, V$SQL_PLAN_STATISTICS_ALL shows in a single view all the information provided by the views V$SQL_PLAN, V$SQL_PLAN_STATISTICS, and V$SQL_WORKAREA). Let’s check it…

  • Run the same test query as before:

SQL> SELECT id FROM t ORDER BY pad;

  • Check the amount of used memory by querying V$SQL_PLAN_STATISTICS_ALL:

SQL> SELECT max_tempseg_size, last_tempseg_size
  2  FROM v$sql_plan_statistics_all
  3  WHERE (sql_id, child_number) IN (SELECT prev_sql_id, prev_child_number
  4                                   FROM v$session
  5                                   WHERE sid = sys_context('userenv','sid'))
  6  AND max_tempseg_size IS NOT NULL;

MAX_TEMPSEG_SIZE LAST_TEMPSEG_SIZE
---------------- -----------------
            2048              2048

Ups! According to this information the size of the temporary space used to execute the query was 2KB. Mhmm, something is not good… For this reason, at the end of 2007 I opened a service request about this issue. The support guy recognized the problem and opened a bug. Fine. For some unknown reasons (?) yesterday I was checking the status of few bugs. While doing so I noticed that this specific bug was closed few months ago with the status “Could Not Reproduce”! I don’t know you, but on my 64-bit Linux server I can reproduce it with at least 11.1.0.7.0, 11.1.0.6.0, 10.2.0.4.0, 10.2.0.3.0, 10.2.0.2.0, 10.2.0.1.0, 10.1.0.5.0, 10.1.0.4.0 and 10.1.0.3.0. Geez!

It is essential to note that also the package DBMS_XPLAN shows wrong information (here an example for the same query as before):

SQL> SELECT * FROM table(dbms_xplan.display_cursor(null,null,'memstats last'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------
SQL_ID  ftb71b6926dtn, child number 0
-------------------------------------
SELECT id FROM t ORDER BY pad

Plan hash value: 961378228

---------------------------------------------------------------------------------
| Id  | Operation          | Name | E-Rows |  OMem |  1Mem | Used-Mem | Used-Tmp|
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |        |       |       |          |         |
|   1 |  SORT ORDER BY     |      |   1000 |  1152K|   562K|  529K (1)|    2048 |
|   2 |   TABLE ACCESS FULL| T    |   1000 |       |       |          |         |
---------------------------------------------------------------------------------

The only good thing about the fact that Oracle is not willing to fix the bug is that my book, Troubleshooting Oracle Performance, does not need to be updated. In fact, at page 210, while describing the output of the package DBMS_XPLAN I wrote the following information:

  • Used-Tmp: The amount of temporary space used by the operation during the last execution. This value must be multiplied by 1,024 to be consistent with the other memory utilization columns (for example, 32K means 32MB).
  • Max-Tmp: The maximum amount of temporary space used by the operation. This value has to be multiplied by 1,024 to be consistent with the other memory utilization columns (for example, 32K means 32MB).

ADDENDA (Mai 6, 2009): This post was noticed by an Oracle employee and, as a result, the bug was reopened. Thank you Greg!


May 01 2009

Report Information about CPU Activity in SQL*Plus

Tag: 10gR1, 10gR2, 11gR1Christian Antognini @ 1:26 am

Yesterday a colleague of mine asked the following question: “Is there a way to get information about free CPU resources within sqlplus?”

When I read it, I immediately thought to V$OSSTAT. In fact, as of 10g that dynamic performance view provides system utilization statistics from the operating system. For example, it provides the following statistics:

  • IDLE_TIME: Time (centi-secs) that CPUs have been in the idle state
  • USER_TIME: Time (centi-secs) spent in user code
  • SYS_TIME: Time (centi-secs) spent in the kernel
  • IOWAIT_TIME: Time (centi-secs) spent waiting for IO
  • NICE_TIME: Time (centi-secs) spend in low-priority user code

Note: In 10.1 the statistics are named IDLE_TICKS, USER_TICKS, SYS_TICKS and NICE_TICKS. In addition, IOWAIT_TIME is available as of 10.2.0.2 only.

Hence, the answer to the question is: “Yes, it’s possible”. But, how accurate it is? To answer this second question, I wrote a pipelined PL/SQL function to sample V$OSSTAT and compared the results with the values provided by sar(1). The necessary objects (two types and the function) are created with the following SQL statements:

CREATE OR REPLACE TYPE osstat_record IS OBJECT (
  date_time TIMESTAMP,
  idle_time NUMBER,
  user_time NUMBER,
  sys_time NUMBER,
  iowait_time NUMBER,
  nice_time NUMBER
);
/

CREATE OR REPLACE TYPE osstat_table AS TABLE OF osstat_record;
/
CREATE OR REPLACE FUNCTION osstat(p_interval IN NUMBER, p_count IN NUMBER)
   RETURN osstat_table
   PIPELINED
IS
  l_t1 osstat_record;
  l_t2 osstat_record;
  l_out osstat_record;
  l_num_cpus NUMBER;
  l_total NUMBER;
BEGIN
  l_t1 := osstat_record(NULL, NULL, NULL, NULL, NULL, NULL);
  l_t2 := osstat_record(NULL, NULL, NULL, NULL, NULL, NULL);
  
  SELECT value 
  INTO l_num_cpus
  FROM v$osstat 
  WHERE stat_name = 'NUM_CPUS';
  
  FOR i IN 1..p_count
  LOOP
    SELECT sum(decode(stat_name,'IDLE_TIME', value, NULL)) as idle_time,
           sum(decode(stat_name,'USER_TIME', value, NULL)) as user_time,
           sum(decode(stat_name,'SYS_TIME', value, NULL)) as sys_time,
           sum(decode(stat_name,'IOWAIT_TIME', value, NULL)) as iowait_time,
           sum(decode(stat_name,'NICE_TIME', value, NULL)) as nice_time
    INTO l_t2.idle_time, l_t2.user_time, l_t2.sys_time, l_t2.iowait_time, l_t2.nice_time
    FROM v$osstat 
    WHERE stat_name in ('IDLE_TIME','USER_TIME','SYS_TIME','IOWAIT_TIME','NICE_TIME');

    l_out := osstat_record(systimestamp,
                           (l_t2.idle_time-l_t1.idle_time)/l_num_cpus/p_interval,
                           (l_t2.user_time-l_t1.user_time)/l_num_cpus/p_interval,
                           (l_t2.sys_time-l_t1.sys_time)/l_num_cpus/p_interval,
                           (l_t2.iowait_time-l_t1.iowait_time)/l_num_cpus/p_interval,
                           (l_t2.nice_time-l_t1.nice_time)/l_num_cpus/p_interval);
    l_total := l_out.idle_time+l_out.user_time+l_out.sys_time+l_out.iowait_time+nvl(l_out.nice_time,0);
    PIPE ROW(osstat_record(systimestamp,
                           l_out.idle_time/l_total*100,
                           l_out.user_time/l_total*100,
                           l_out.sys_time/l_total*100,
                           l_out.iowait_time/l_total*100,
                           l_out.nice_time/l_total*100));
    l_t1 := l_t2;

    dbms_lock.sleep(p_interval);

  END LOOP;
  RETURN;
END;
/

The statistics are displayed with a query like the following one (notice that I set ARRAYSIZE to keep as short as possible the delay between the generation and the display of the statistics):

SQL> SET ARRAYSIZE 1

SQL> COLUMN user_time FORMAT 990.00
SQL> COLUMN nice_time FORMAT 990.00
SQL> COLUMN sys_time FORMAT 990.00
SQL> COLUMN iowait_time FORMAT 990.00
SQL> COLUMN idle_time FORMAT 990.00

SQL> SELECT to_char(date_time,'HH:MI:SS') as date_time, user_time, nice_time, sys_time, iowait_time, idle_time
  2  FROM table(osstat(5,100));

DATE_TIM USER_TIME NICE_TIME SYS_TIME IOWAIT_TIME IDLE_TIME
-------- --------- --------- -------- ----------- ---------
12:26:11
12:26:16      0.05      0.00     0.05        0.10     99.80
12:26:21      0.76      0.00     0.05        0.66     98.52
12:26:26      0.05      0.00     0.10        0.10     99.74
12:26:31      0.15      0.00     8.03        0.31     91.50
12:26:36      0.27      0.00    21.06       15.75     62.92
12:26:41      0.10      0.00     2.57        8.13     89.21
12:26:46      0.05      0.00     0.10        0.71     99.14
12:26:51      0.10      0.00     0.05        0.41     99.44
12:26:56     24.37      0.00     0.65        3.28     71.71
12:27:01     24.50      0.00     0.97        1.27     73.26
12:27:06     24.31      0.00     1.17        1.32     73.20
12:27:11     25.05      0.00     0.66        0.82     73.47
12:27:16     25.06      0.00     0.61        0.76     73.56
12:27:21     25.13      0.00     0.56        0.46     73.85
12:27:26     24.91      0.00     0.45        1.77     72.87
12:27:31     23.97      0.00     1.41        2.17     72.46
12:27:36     24.90      0.00     0.97        0.91     73.22
12:27:41     25.18      0.00     0.51        0.36     73.95
12:27:46     25.64      0.00     0.41        0.36     73.59
12:27:51     46.37      0.05     3.48        0.45     49.65
12:27:56     46.81      0.00     3.14        0.35     49.70
12:28:01     46.63      0.00     3.34        0.20     49.83
12:28:06     45.76      0.00     4.19        0.25     49.80
12:28:11     46.58      0.00     3.40        0.15     49.88
12:28:16     46.76      0.00     3.54        0.25     49.45
12:28:21     46.06      0.00     6.74        0.25     46.96
12:28:26     43.73      0.00     6.24        0.10     49.93
12:28:31     34.87      0.00     6.98        0.30     57.84
12:28:36     29.60      0.00     5.50        0.71     64.20
12:28:41     38.40      0.00     9.60        7.69     44.32
12:28:46     39.20      0.00     9.39        6.32     45.09
12:28:51     34.73      0.00     8.37       13.81     43.09
...

And here is the sar(1) output for the very same period of time:

oracle@helicon:~/ [rdbms11107] sar 5 100
Linux 2.6.9-42.ELsmp (helicon.antognini.ch)     05/01/2009

12:26:11 AM       CPU     %user     %nice   %system   %iowait     %idle
12:26:16 AM       all      0.05      0.00      0.05      0.10     99.80
12:26:21 AM       all      0.77      0.00      0.10      0.67     98.46
12:26:26 AM       all      0.05      0.00      0.10      0.10     99.74
12:26:31 AM       all      0.15      0.00      8.31      0.31     91.23
12:26:36 AM       all      0.31      0.00     26.54     18.63     54.52
12:26:41 AM       all      0.10      0.00      3.74      8.41     87.75
12:26:46 AM       all      0.05      0.00      0.20      0.72     99.03
12:26:51 AM       all      0.26      0.00      0.05      0.41     99.28
12:26:56 AM       all     25.15      0.00      0.72      3.39     70.74
12:27:01 AM       all     24.87      0.00      0.98      1.29     72.86
12:27:06 AM       all     24.64      0.00      1.23      1.34     72.79
12:27:11 AM       all     25.23      0.00      0.67      0.82     73.27
12:27:16 AM       all     25.26      0.00      0.72      0.77     73.25
12:27:21 AM       all     25.19      0.00      0.62      0.46     73.73
12:27:26 AM       all     25.40      0.00      0.51      1.80     72.29
12:27:31 AM       all     24.46      0.00      1.44      2.21     71.88
12:27:36 AM       all     25.13      0.00      1.03      0.92     72.92
12:27:41 AM       all     25.26      0.00      0.56      0.36     73.82
12:27:46 AM       all     25.73      0.00      0.46      0.36     73.44
12:27:51 AM       all     46.58      0.05      3.50      0.45     49.43
12:27:56 AM       all     46.95      0.00      3.15      0.35     49.55
12:28:01 AM       all     46.70      0.00      3.45      0.20     49.65
12:28:06 AM       all     45.85      0.00      4.25      0.25     49.65
12:28:11 AM       all     46.65      0.00      3.45      0.15     49.75
12:28:16 AM       all     46.80      0.00      3.55      0.25     49.40
12:28:21 AM       all     46.20      0.00      6.80      0.25     46.75
12:28:26 AM       all     43.80      0.00      6.25      0.10     49.85
12:28:31 AM       all     35.05      0.00      7.01      0.30     57.64
12:28:36 AM       all     29.70      0.00      5.58      0.71     64.01
12:28:41 AM       all     41.18      0.00     11.29      8.20     39.33
12:28:46 AM       all     41.57      0.00     10.66      6.75     41.02
12:28:51 AM       all     39.96      0.00     10.62     15.87     33.55
...

As you can verify, the values provided by the pipelined function are quite good! Also note that small difference are normal because the sampling interval is quite short (5 seconds) and the two gathering methods are not synchronized.