Yesterday a colleague of mine asked the following question: “Is there a way to get information about free CPU resources within sqlplus?”
When I read it, I immediately thought to V$OSSTAT. In fact, as of 10g that dynamic performance view provides system utilization statistics from the operating system. For example, it provides the following statistics:
- IDLE_TIME: Time (centi-secs) that CPUs have been in the idle state
- USER_TIME: Time (centi-secs) spent in user code
- SYS_TIME: Time (centi-secs) spent in the kernel
- IOWAIT_TIME: Time (centi-secs) spent waiting for IO
- NICE_TIME: Time (centi-secs) spend in low-priority user code
Note: In 10.1 the statistics are named IDLE_TICKS, USER_TICKS, SYS_TICKS and NICE_TICKS. In addition, IOWAIT_TIME is available as of 10.2.0.2 only.
Hence, the answer to the question is: “Yes, it’s possible”. But, how accurate it is? To answer this second question, I wrote a pipelined PL/SQL function to sample V$OSSTAT and compared the results with the values provided by sar(1). The necessary objects (two types and the function) are created with the following SQL statements:
CREATE OR REPLACE TYPE osstat_record IS OBJECT ( date_time TIMESTAMP, idle_time NUMBER, user_time NUMBER, sys_time NUMBER, iowait_time NUMBER, nice_time NUMBER ); / CREATE OR REPLACE TYPE osstat_table AS TABLE OF osstat_record; /
CREATE OR REPLACE FUNCTION osstat(p_interval IN NUMBER, p_count IN NUMBER) RETURN osstat_table PIPELINED IS l_t1 osstat_record; l_t2 osstat_record; l_out osstat_record; l_num_cpus NUMBER; l_total NUMBER; BEGIN l_t1 := osstat_record(NULL, NULL, NULL, NULL, NULL, NULL); l_t2 := osstat_record(NULL, NULL, NULL, NULL, NULL, NULL); SELECT value INTO l_num_cpus FROM v$osstat WHERE stat_name = 'NUM_CPUS'; FOR i IN 1..p_count LOOP SELECT sum(decode(stat_name,'IDLE_TIME', value, NULL)) as idle_time, sum(decode(stat_name,'USER_TIME', value, NULL)) as user_time, sum(decode(stat_name,'SYS_TIME', value, NULL)) as sys_time, sum(decode(stat_name,'IOWAIT_TIME', value, NULL)) as iowait_time, sum(decode(stat_name,'NICE_TIME', value, NULL)) as nice_time INTO l_t2.idle_time, l_t2.user_time, l_t2.sys_time, l_t2.iowait_time, l_t2.nice_time FROM v$osstat WHERE stat_name in ('IDLE_TIME','USER_TIME','SYS_TIME','IOWAIT_TIME','NICE_TIME'); l_out := osstat_record(systimestamp, (l_t2.idle_time-l_t1.idle_time)/l_num_cpus/p_interval, (l_t2.user_time-l_t1.user_time)/l_num_cpus/p_interval, (l_t2.sys_time-l_t1.sys_time)/l_num_cpus/p_interval, (l_t2.iowait_time-l_t1.iowait_time)/l_num_cpus/p_interval, (l_t2.nice_time-l_t1.nice_time)/l_num_cpus/p_interval); l_total := l_out.idle_time+l_out.user_time+l_out.sys_time+l_out.iowait_time+nvl(l_out.nice_time,0); PIPE ROW(osstat_record(systimestamp, l_out.idle_time/l_total*100, l_out.user_time/l_total*100, l_out.sys_time/l_total*100, l_out.iowait_time/l_total*100, l_out.nice_time/l_total*100)); l_t1 := l_t2; dbms_lock.sleep(p_interval); END LOOP; RETURN; END; /
The statistics are displayed with a query like the following one (notice that I set ARRAYSIZE to keep as short as possible the delay between the generation and the display of the statistics):
SQL> SET ARRAYSIZE 1 SQL> COLUMN user_time FORMAT 990.00 SQL> COLUMN nice_time FORMAT 990.00 SQL> COLUMN sys_time FORMAT 990.00 SQL> COLUMN iowait_time FORMAT 990.00 SQL> COLUMN idle_time FORMAT 990.00 SQL> SELECT to_char(date_time,'HH:MI:SS') as date_time, user_time, nice_time, sys_time, iowait_time, idle_time 2 FROM table(osstat(5,100)); DATE_TIM USER_TIME NICE_TIME SYS_TIME IOWAIT_TIME IDLE_TIME -------- --------- --------- -------- ----------- --------- 12:26:11 12:26:16 0.05 0.00 0.05 0.10 99.80 12:26:21 0.76 0.00 0.05 0.66 98.52 12:26:26 0.05 0.00 0.10 0.10 99.74 12:26:31 0.15 0.00 8.03 0.31 91.50 12:26:36 0.27 0.00 21.06 15.75 62.92 12:26:41 0.10 0.00 2.57 8.13 89.21 12:26:46 0.05 0.00 0.10 0.71 99.14 12:26:51 0.10 0.00 0.05 0.41 99.44 12:26:56 24.37 0.00 0.65 3.28 71.71 12:27:01 24.50 0.00 0.97 1.27 73.26 12:27:06 24.31 0.00 1.17 1.32 73.20 12:27:11 25.05 0.00 0.66 0.82 73.47 12:27:16 25.06 0.00 0.61 0.76 73.56 12:27:21 25.13 0.00 0.56 0.46 73.85 12:27:26 24.91 0.00 0.45 1.77 72.87 12:27:31 23.97 0.00 1.41 2.17 72.46 12:27:36 24.90 0.00 0.97 0.91 73.22 12:27:41 25.18 0.00 0.51 0.36 73.95 12:27:46 25.64 0.00 0.41 0.36 73.59 12:27:51 46.37 0.05 3.48 0.45 49.65 12:27:56 46.81 0.00 3.14 0.35 49.70 12:28:01 46.63 0.00 3.34 0.20 49.83 12:28:06 45.76 0.00 4.19 0.25 49.80 12:28:11 46.58 0.00 3.40 0.15 49.88 12:28:16 46.76 0.00 3.54 0.25 49.45 12:28:21 46.06 0.00 6.74 0.25 46.96 12:28:26 43.73 0.00 6.24 0.10 49.93 12:28:31 34.87 0.00 6.98 0.30 57.84 12:28:36 29.60 0.00 5.50 0.71 64.20 12:28:41 38.40 0.00 9.60 7.69 44.32 12:28:46 39.20 0.00 9.39 6.32 45.09 12:28:51 34.73 0.00 8.37 13.81 43.09 ...
And here is the sar(1) output for the very same period of time:
oracle@helicon:~/ [rdbms11107] sar 5 100 Linux 2.6.9-42.ELsmp (helicon.antognini.ch) 05/01/2009 12:26:11 AM CPU %user %nice %system %iowait %idle 12:26:16 AM all 0.05 0.00 0.05 0.10 99.80 12:26:21 AM all 0.77 0.00 0.10 0.67 98.46 12:26:26 AM all 0.05 0.00 0.10 0.10 99.74 12:26:31 AM all 0.15 0.00 8.31 0.31 91.23 12:26:36 AM all 0.31 0.00 26.54 18.63 54.52 12:26:41 AM all 0.10 0.00 3.74 8.41 87.75 12:26:46 AM all 0.05 0.00 0.20 0.72 99.03 12:26:51 AM all 0.26 0.00 0.05 0.41 99.28 12:26:56 AM all 25.15 0.00 0.72 3.39 70.74 12:27:01 AM all 24.87 0.00 0.98 1.29 72.86 12:27:06 AM all 24.64 0.00 1.23 1.34 72.79 12:27:11 AM all 25.23 0.00 0.67 0.82 73.27 12:27:16 AM all 25.26 0.00 0.72 0.77 73.25 12:27:21 AM all 25.19 0.00 0.62 0.46 73.73 12:27:26 AM all 25.40 0.00 0.51 1.80 72.29 12:27:31 AM all 24.46 0.00 1.44 2.21 71.88 12:27:36 AM all 25.13 0.00 1.03 0.92 72.92 12:27:41 AM all 25.26 0.00 0.56 0.36 73.82 12:27:46 AM all 25.73 0.00 0.46 0.36 73.44 12:27:51 AM all 46.58 0.05 3.50 0.45 49.43 12:27:56 AM all 46.95 0.00 3.15 0.35 49.55 12:28:01 AM all 46.70 0.00 3.45 0.20 49.65 12:28:06 AM all 45.85 0.00 4.25 0.25 49.65 12:28:11 AM all 46.65 0.00 3.45 0.15 49.75 12:28:16 AM all 46.80 0.00 3.55 0.25 49.40 12:28:21 AM all 46.20 0.00 6.80 0.25 46.75 12:28:26 AM all 43.80 0.00 6.25 0.10 49.85 12:28:31 AM all 35.05 0.00 7.01 0.30 57.64 12:28:36 AM all 29.70 0.00 5.58 0.71 64.01 12:28:41 AM all 41.18 0.00 11.29 8.20 39.33 12:28:46 AM all 41.57 0.00 10.66 6.75 41.02 12:28:51 AM all 39.96 0.00 10.62 15.87 33.55 ...
As you can verify, the values provided by the pipelined function are quite good! Also note that small difference are normal because the sampling interval is quite short (5 seconds) and the two gathering methods are not synchronized.
Cool :) Craig Shallahamer’s (orapub) OSM toolkit also got that facility which also queries v$osstat
Christian, your result was to be expected. Oracle instance can either utilize system calls like sysinfo and getrusage or read the standardized /proc file system to obtain the system usage statistics. 10g reference manual describes the “VALUE” column of the V$OSSTAT table as the “instantaneous statistics value”, which means that some process, most likely smon or ckpt, is taking snapshots of the OS statististic at some pre-defined interval and storing the result into SGA. You managed to prove it entirely by PL/SQL, which is a remarkable feat. Thanks for that!
Mladen Gogala
For AIX-64-bit (db 10.2.0.3) I had to add modify
l_total := l_out.idle_time+l_out.user_time+l_out.sys_time+l_out.iowait_time+l_out.nice_time;
to
l_total := l_out.idle_time+l_out.user_time+l_out.sys_time+l_out.iowait_time+nvl(l_out.nice_time,0);
In addition it seems that the stat did not match so good with sar-output as it does with linux,
but the rough trend is visible and the idea is very good.
Hi Martin
Thank you very much for the input, I modified the code accordingly. Since I tested it only on one Linux server, it’s nice to know what the results on other systems are…
Cheers,
Chris
You’re not only showing, that the statistics in v$osstat are correct, you’re using an important and fundamental, mathematical law, called the utilisation law, which is:
utilisation = busy time / total time
Lately, I got a statspack report with the following data:
Elapsed 1679 min (=100740 sec)
CPU TIME 19847
The OS stats says (in cpu seconds):
Busy 31087 sec (user + sys time)
Idle 775080 sec
Total 806167 sec (derived)
CPUCount 8
Therefore, the util = 31087/806167 = 3.8%
Dividing this total cpu time of 806167 sec. by 8 cpus (cpucount), we’ll get the elapsed time of 100770 seconds.
Using the same law, we can say how much of this 3.8% Host-Util was from the DB:
cpu_used_by_this_session / 8 cpucount / elapsed time
19487/8/100740 = 2.42%
So, your code is a good example to see the law in action ;-)
Anyway, the metric Host CPU Utilization (%) from v$sysmetric shows the value in % (needs licencing packs). Maybe this is only derived from v$osstat.
[…] of the Unix TOP utility to view busy database sessions. I never got around to it but a recent post by Christian Antognini gave me the inspiration to finally write my script and a few ideas on how best to go about it. […]
Thank you very much for your interesting comment, Peter.
Hi Bernard (?)
Thank you for that. I’ll give it a try as soon as I have few minutes…
Cheers,
Chris
I derived this for usage with AWR. ‘$NSNAP’ default to 1 unless you want more
Hello Peter,
yes, “Host CPU Utilization (%)” in v$sysmetric is calculated on byasis of
v$osstat. I have done a comparision between what v$sysmetric shows and what v$osstat is telling me and the error was <= 0.5% (mainly because in my script I have rounded the values many times).
So, v$sysmetric just presents the calculations/aggregations done automatically from MMON (I believe) process.
Best Regards. Milen
So, your code is a good example to see the law in action ;-)
Anyway, the metric Host CPU Utilization (%) from v$sysmetric shows the value in % (needs licencing packs). Maybe this is only derived from v$osstat.
Hello, Antognini,
It is great the interpretation of v$osstat.
a further question, have you any idea of those ‘AVG_%_TIME’ items, is it based on instance startup, or current snapshot?
thank you.
Hi
The Reference guide clearly states that while %_TIME values are totalled over all processors, AVG_%_TIME values are averaged over all processors. Hence, AVG_%TIME = %_TIME / NUM_CPUS.
Cheers,
Chris
Hi Chris
Wonderful tool. It works fine on my Solaris box but on 10.2.0.4/Win2003(64bit) I get all blanks. This can be remedied by replacing the NULLs with 0s in the select statement:
SELECT sum(decode(stat_name,’IDLE_TIME’, value, 0)) as idle_time,
sum(decode(stat_name,’USER_TIME’, value, 0)) as user_time,
sum(decode(stat_name,’SYS_TIME’, value, 0)) as sys_time,
sum(decode(stat_name,’IOWAIT_TIME’, value, 0)) as iowait_time,
sum(decode(stat_name,’NICE_TIME’, value, 0)) as nice_time
INTO l_t2.idle_time, l_t2.user_time, l_t2.sys_time, l_t2.iowait_time, l_t2.nice_time
FROM sys.v_$osstat
WHERE stat_name in (‘IDLE_TIME’,’USER_TIME’,’SYS_TIME’,’IOWAIT_TIME’,’NICE_TIME’);
Best regards
Ivan Bajon
Hi Chris
There’s this one thing I can’t seem to wrap my head around: why do you divide by p_interval in the lout := osstat_record(systimestamp,…) -statement?
Best regards
Ivan Bajon
Hi Ivan
First of all, thank you very much for your feedback.
About your question… I divide by l_num_cpus and p_interval because I want to have a percentage that sums up to 100% and not the absolute values. Ok?
Cheers,
Chris
Hi Chris
Yes, I get that as far as l_num_cpus go but why p_interval? You can divide by 42 or a million or any other number and the script will still produce correct result with any input values.
Cheers
Ivan
Ok, now I understand what do you mean. Since I “reuse” the computed value to compute l_total, there is no difference… Honestly, I don’t remember what I exactly did ;-) But, I guess, I computed the right values just to check them.
Sorry to be an annoyance here but just tried not dividing by anything. Neither l_num_cpus or p_interval.
It makes sense since you’re dividing each value by the the sum of the values later to get the percentage.
Cheers
Ivan
Probably I was not clear in my last reply… Anyway, I wanted to point out the same thing. So, we agree that it is not necessary.
Superb!!!!!!!!Thanks to you that i ve learnt something new today….
Hi – and a wonderfull little thing that I have added to our procedure tool box. We are outsourced, have muliple db on the same aix box – and do not have access to the AIX OS at all ,,, so every tool is welcome :-)
I have tried to combine with V$SYS_TIME_MODEL, but I’m not sure how to calcaulate the number for the cpu usage for the db i’m currently on:
Mette
> I’m not sure how to calcaulate the number for the cpu usage for the db i’m currently on:
Two corrections and then it should work:
1) You divided v$sys_time_model.value by 100000 instead of 1000000.
2) If you want a percentage based on the number of CPUs, you should use “(l_t2.instance_cpu_time-l_t1.instance_cpu_time)/l_num_cpus/p_interval” to initialize l_out.
HTH
Chris
Great script. Do you have a similar type of script which shows the memory utilization per snap id
No, I don’t have it.