Last week I had to analyze a strange performance problem. Since the cause/solution was somehow surprising, at least for me, I thought to share it with you.
Let me start by quickly describing the setup and what was done to reproduce the problem:
- Database version: Oracle Database 11g Enterprise Edition Release 220.127.116.11.0 (64-bit)
- Operating system: Solaris 10 (SPARC)
- To simulate a load job, a simple SQL*Plus script that executes a COPY command is used. Its purpose is to load about 100,000 rows in a table. Let’s call this table T1.
- All modifications in T1 have to be logged into another table. Let’s call it T2. For this purpose, on T1 there are triggers that insert one row into T2 for each inserted, deleted and updated row.
The strange thing was that the rate of the inserts performed by the script decreased over time. In fact, while at the beginning of the processing about 500 rows per second were inserted into T1 (and, therefore, T2), at the end of the processing only about 50 rows per second were processed.
The first thing I did to find out what the problem was is to trace one run by enabling SQL trace. This analysis pointed out that two SQL statements (the ones inserting data into T1 and T2) were responsible for most of the elapsed time. This is not a surprise, of course. The interesting thing was that most of the time was spent on CPU.
Since the rate of the inserts decreased over time, I extracted from the trace file all the lines providing information about the executions of the INSERT statement on T1 and loaded that data into Excel. Then, I created one chart for each performance figure. From all of them the following, that shows the amount of CPU used for every single execution, was the most interesting. In fact, it shows that while at the beginning of the processing one insert uses about 30 milliseconds of CPU, at the end it uses about 300 milliseconds of CPU for doing the same work. Note that all other charts did not show such a behavior. For example, the number of PIO and LIO were exactly the same at the beginning and at the end of the processing.
Since the trace file was not able to provide further information to investigate the problem, I started looking at V$SESSTAT. The aim was to find another statistic experiencing a similar increase. The search pointed out that the statistic “session uga memory” was also increasing during the processing. In fact, while at the beginning of the processing the session was using about 5MB of UGA, at the end of the processing about 110MB were used. This is strange and, as far as I know, there is no good reason for such a behavior. Hence, it was time to review the code of the triggers. While doing so I noticed, by chance, that a trigger was also available on T2 (the table used to store the log about all modifications). The strange thing was its definition:
CREATE OR REPLACE TRIGGER t2 AFTER INSERT ON t2 FOR EACH ROW BEGIN /* execute the referential-integrity actions */ DECLARE NUMROWS INTEGER; BEGIN numrows:=1; END; END;
As you can see the trigger does nothing. Apparently, it exists just because triggers are used to implement integrity constraints (something you should avoid, by the way…) and, as a result, they were automatically created for each table. And, in case of T2, there is no constraint to check.
Since the trigger is pointless, I disabled it. After that, surprisingly, it was no longer possible to reproduce the problem! The following chart, created in the same way as the previous one, shows that without the trigger on T2 the CPU utilization is constant during the whole processing.
Therefore, for some unknown reasons, the pointless trigger was the cause of the problem.
By the way, once the trigger was disabled also the UGA memory was no longer increasing. Hence, to me it seems that the customer hit a bug…