Instance Caging

Instance caging is another small but useful feature of Oracle Database 11g Release 2. Thanks to it the database resource manager is able, for the first time, to limit the number of CPUs that can be used by a given instance. (By the way, note that this limit has no “impact” on the number of CPUs that have to be licensed.)

Using instance caging requires only two simple configurations from our part:

  • Enable the resource manager by assigning a resource plan (e.g. DEFAULT_PLAN)
  • Set the number of CPUs through the CPU_COUNT initialization parameter

To illustrate how it works, let me show you what I observed on my small test server with 4 cores.

To burn all the available CPU resources, I started four sessions executing the following PL/SQL block:

DECLARE
  n NUMBER;
BEGIN
  WHILE (TRUE)
  LOOP
    n:= dbms_random.random();
  END LOOP;
END;

With the four sessions up and running, the output of vmstat(8) looks like the following:

procs ------------memory------------ --swap-- ----io---- --system-- -----cpu-----
 r  b  swpd     free   buff    cache  si   so   bi    bo    in   cs   us sy id wa
 5  0     0  6037164  71568  5476620   0    0   66   198  1033  823  100  0  0  0
 5  0     0  6037164  71572  5476616   0    0    0   158  1021  801  100  0  0  0
 5  0     0  6037164  71572  5476616   0    0    0    48  1016  750  100  0  0  0

Notice that there is no idle time and that the number of processes waiting for run time is 5.
Now, to show what instance caging can do to limit the CPU utilization, I started the following PL/SQL block:

DECLARE
  l_sql VARCHAR2(100) := 'ALTER SYSTEM SET cpu_count = ';
BEGIN
  EXECUTE IMMEDIATE l_sql || '4';
  dbms_lock.sleep(10);
  EXECUTE IMMEDIATE l_sql || '3';
  dbms_lock.sleep(10);
  EXECUTE IMMEDIATE l_sql || '2';
  dbms_lock.sleep(10);
  EXECUTE IMMEDIATE l_sql || '1';
  dbms_lock.sleep(10);
  EXECUTE IMMEDIATE l_sql || '0';
END;

This time the output of vmstat(8) looks like the following:

procs ------------memory------------ --swap-- ----io---- --system-- -----cpu-----
 r  b  swpd     free   buff    cache  si   so   bi    bo    in   cs   us sy id wa
 5  0      0 3695176  80856  7780352   0    0    0    96  1019  783  100  0  0  0
 5  0      0 3695176  80856  7780352   0    0    0   170  1023  795  100  0  0  0
 5  0      0 3695184  80856  7780352   0    0    0   156  1028  795  100  0  0  0
 5  0      0 3695184  80860  7780348   0    0    0   120  1021  795  100  0  0  0
 5  0      0 3694928  80860  7780348   0    0    2   168  1029  810  100  0  0  0
 3  0      0 3694928  80860  7780348   0    0    0   214  1029  939   77  0 23  0
 3  0      0 3694928  80864  7780344   0    0    0   118  1020  961   75  0 25  0
 3  0      0 3694928  80864  7780344   0    0    2   152  1026  961   75  0 25  0
 3  0      0 3694596  80868  7780340   0    0    2   158  1027  975   75  0 25  0
 3  0      0 3694612  80868  7780340   0    0    0   142  1031  979   75  0 25  0
 3  0      0 3694612  80868  7780340   0    0    2   164  1024  963   75  0 25  0
 3  0      0 3694616  80872  7780336   0    0    0   358  1079  961   52  0 49  0
 2  0      0 3694616  80872  7780336   0    0    0   120  1021  909   50  0 50  0
 2  0      0 3697312  80872  7780336   0    0    0   162  1025  952   50  0 50  0
 2  0      0 3694744  80876  7780332   0    0    0   142  1027  948   50  0 50  0
 1  0      0 3694744  80876  7780332   0    0    0   120  1021  954   40  0 60  0
 1  0      0 3694748  80876  7780332   0    0    0   234  1034  953   26  0 74  0
 1  0      0 3694748  80876  7780332   0    0    0   134  1021  921   26  0 74  0
 1  0      0 3696484  80876  7780332   0    0    0   120  1020  954   26  0 74  0
 1  0      0 3696476  80880  7780328   0    0    2   196  1035  996   26  0 74  0
 5  0      0 3696476  80880  7780328   0    0    0   112  1020  643   96  0  4  0
 6  0      0 3696484  80880  7780328   0    0    0   216  1040  778  100  0  0  0
 5  0      0 3696484  80884  7780324   0    0    0   160  1021  763  100  0  0  0
 5  0      0 3696484  80884  7780324   0    0    0   112  1020  775  100  0  0  0
 5  0      0 3696468  80888  7780320   0    0    0   156  1026  785  100  0  0  0

Notice that not only the idle time but also the number of processes waiting for run time dynamically changes according to the value specified for the CPU_COUNT initialization parameter. Great!

The only problem I faced during my tests is that the management does not work as expected when lot of time is spent running kernel code. On my server I am able to reproduce such a problem by running the following PL/SQL block:

BEGIN
  WHILE (TRUE)
  LOOP
    FOR i IN (SELECT * FROM t)
    LOOP
      NULL;
    END LOOP;
  END LOOP;
END;

When four sessions run that code, the output of vmstat(8) looks like the following:

procs ------------memory------------ --swap-- ----io---- --system-- -----cpu-----
 r  b  swpd     free   buff    cache  si   so   bi    bo    in   cs   us sy id wa
 4  0     0  3690112  81032  7780652   0    0    0   138  1025  786   65 35  0  0
 4  0     0  3690112  81036  7780648   0    0    0   182  1024  775   66 34  0  0
 4  0     0  3690112  81036  7780648   0    0    0    84  1018  779   66 34  0  0
 4  0     0  3690112  81036  7780648   0    0    0   138  1024  822   66 34  0  0
 4  0     0  3690112  81040  7780644   0    0    0   204  1032  809   66 34  0  0
 4  0     0  3690112  81044  7780640   0    0    0   120  1027  798   63 36  1  0
 4  0     0  3690112  81044  7780640   0    0    0   146  1021  833   61 37  2  0
 4  0     0  3690112  81044  7780640   0    0    0   160  1023  785   63 35  1  0
 5  0     0  3690112  81044  7780640   0    0    0    98  1021  782   62 36  1  0
 4  0     0  3690112  81044  7780640   0    0    0   146  1021  818   64 36  1  0
 4  0     0  3690048  81044  7780640   0    0    2   180  1031  803   63 36  1  0
 3  0     0  3690048  81044  7780640   0    0    0   126  1027  921   53 29 18  0
 3  0     0  3690048  81044  7780640   0    0    0   146  1022  988   46 25 29  0
 3  0     0  3690044  81048  7780636   0    0    0   180  1032  976   48 25 28  0
 3  0     0  3690044  81048  7780636   0    0    0   212  1058  974   48 25 27  0
 2  0     0  3690052  81048  7780636   0    0    0   144  1028  950   47 23 30  0
 2  0     0  3690052  81048  7780636   0    0    0   204  1035  971   34 14 51  0
 2  0     0  3691404  81048  7780636   0    0    0   106  1020  939   32 13 55  0
 2  0     0  3691404  81048  7780636   0    0    0   124  1021  972   34 14 53  0
 2  0     0  3691404  81048  7780636   0    0    0   202  1032  986   33 14 53  0
 4  0     0  3691404  81048  7780636   0    0    0    80  1024  850   51 29 21  0
 4  0     0  3691404  81048  7780636   0    0    0   172  1028  790   65 35  0  0
 4  0     0  3691404  81048  7780636   0    0    0   220  1033  785   63 37  0  0
 5  0     0  3691404  81048  7780636   0    0    0    62  1016  771   64 36  0  0
 5  0     0  3691404  81048  7780636   0    0    0   170  1022  785   62 38  0  0
 5  0     0  3691404  81048  7780636   0    0    0   18   1026  772   64 36  0  0

Notice that in this case about one third of the time is spent running kernel code. As a result, the CPU utilization (for the values of 3, 2 and 1) is always higher than expected.

6 Comments

  1. Marcus Mönnig's Gravatar Marcus Mönnig
    November 29, 2009    

    Christian, thanks for your update on instance caging.

    What leaves me clueless is that setting CPU_COUNT has no influence on the number of CPUs I have to licenses. This makes virtualization of Oracle DBs even more unfeasible. Say e.g., I have one physical machine with four CPUs and four virtual machines running a DB each on it. Until now, I needed to get four 4-CPU licences for that. Now, I can set CPU_COUNT to one for all instances and I still have to get the same licenses. This is ridiculous.
    IMHO the only reasonable way of setting the CPU_COUNT is to open the case and physically unplug CPUs from the machine.

  2. November 29, 2009    

    @Marcus

    This feature is meant to be used for resource management of multiple instances on a single server/platform. It allows one to “curb” instances to limit the maximum CPU. One can set the sum of CPU_COUNT for all instances to match the total physical CPU count (using instance caging to partition) or the sum of the CPU_COUNTs can be higher than the physical CPU count (using instance caging to over-provision, but limit). The latter potentially allows for better host utilization given that each individual instance is below its max CPU threshold.

    For licensing, you can use hard partitioning with Oracle VM to limit the CPUs that have to be licensed as shown by this wiki note: http://wiki.oracle.com/page/Hard+partitioning

  3. March 5, 2011    

    Nice post Christian. Very clear and helpful.
    I wonder if Oracle is contemplating moving the CBO to a % based mechanism rather than CPU_COUNT. That would allow us to cage instances to a percentage of CPU rather than a fixed number. No doubt that would be a big effort. If they did it would behave much like the DBRM manages CPU internally through the plan directives.

  4. Erwin Beckers's Gravatar Erwin Beckers
    June 10, 2011    

    Hello all,
    I’ve been doing some tests on the AIX platform with regards to instance caging.
    I’ve set it up on an LPAR with 2 logical CPUs, but have been unable to actually limit the CPU usage.
    It seems that no matter what the limit is set to, I can always manage to eat up all the cpu allocation on server level.
    This shouldn’t really happen, but I have no clue as to what i might have done wrong. I’ve got the resource_manager setup and set the cpu_count parameter according to the importance (diving to 1 each) over each instance.

  5. June 14, 2011    

    Hi Erwin

    I’m not aware of particular problems in this area. Since I do not have an AIX system handy, I cannot test it either. You might try to open an SR…

    Cheers,
    Chris

    • June 16, 2011    

      Hi Sigrid

      > first of all, let me congratulate you on your Berlin workshop regarding the
      > query optimizer – it was a real informative, nice and interesting event!

      Thank you!

      > For example: On one DSS system, with a mean mreadtim of 9.12, the standard deviation was 3.74,
      > resulting in a conclusion going “with a probability of 90%, mreadtim lies between 2.97 and 15.27″…
      > Even more impressing, on one OLTP system, the mean mbrc was 50, with a stddev of 32, giving a
      > confidence interval, at 90% probability, of minus 3 to plus 103 for mbrc…

      What about the standard deviation of sreadtim?

      > Now I wonder how to explain this extreme variation, given that the values are already taken
      > over a 10-hour period… the only source of variation would be the day of week, and AFAIK
      > there should not be such great variation in workload here, at least as regards the OLTP applications.

      IMO, because of the way multiblock reads work, makes such variations more or less normal. In fact, the size of multiblock reads is strongly dependent on the data which is stored in the buffer cache.

      > Could it be that sampling over a 10-hour period is not a good thing to do, and if so,
      > for what reason (I just remember you having an example involving more fine-grained
      > sampling periods, which were aggregated somehow afterwards …)?

      I always worked with shorter periods. With them I had the opportunity to choose between an average, max, or whatever value I wanted. With long periods you can only take the average.

      > Also, I can’t help wondering whether this great standard deviation invalidates the result, somehow…?

      IMO no.

      Cheers,
      Chris

  1. Log Buffer #171: a Carnival of the Vanities for DBAs | The Pythian Blog on December 4, 2009 at 19:04
  2. Blogroll Report 27/11/2009-04/12/2009 « Coskan’s Approach to Oracle on December 18, 2009 at 01:13
  3. Log Buffer #171: a Carnival of the Vanities for DBAs on February 13, 2013 at 16:35
  4. Limiter la CPU des instances Oracle 11g Release 2 | EASYTEAM on May 29, 2013 at 15:18
  5. Limiter la CPU des instances Oracle 11g Release 2 | ArKZoYd on August 6, 2013 at 16:16

Leave a Reply

Your email address will not be published. Required fields are marked *