Saturday, March 28, 2009

The Oracle Optimizer & Philosophycal discussions

In Oracle 10.2.x I have found recently very often that some of our systems are slowing down in performance, when looking at the specific SQLs that are causing the problem to our surprise we see poor plan execution, when looking at the tables involved we found that the table has not been analyzed for a while and that puzzle me.

Starting 10g Oracle has a default job process called GATHER_STATS_JOB that using DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC suppose to intelligent decide whether or not collect stats for a table depending of the changes on that table.

I find that job is not working as it suppose to work, we have been replacing that job for explicit analyze using the following command.

EXEC DBMS_STATS.gather_schema_stats (ownname => '', cascade =>true, estimate_percent => dbms_stats.auto_sample_size);

When you specify the cascade parameters equal true explicitly it will include indexes

Oracle Corporation recommend setting the ESTIMATE_PERCENT parameter of the gathering procedures to DBMS_STATS.AUTO_SAMPLE_SIZE. This will maximize performance gains while achieving good statistical accuracy.

I know I may be opening Pandora's Box with the following topic, but I want to set my stand on this clear.

While doing the changes above for the analyze on multiple systems (about 15 production databases), one database presented performance degradation in 2 tables. This particular application was written by a famous software house of billing systems called "A M * * C S" you fill in the stars. When looking to what is causing the problem we found that the particular query has a column in the predicate that was not indexed, and we found that they added a tremendous amount of hidden parameters that modify the optimizer behavior without authorization. When questioned about why they introduced those parameters they said that although they don't know what they do, it has worked for some other customers (It is up to your to decide if that is a valid argument, but it is not for me) . The vendor told the client that the reason for the performance degradation is that the tables were analyzed. They said verbally that we should never analyze this two tables because they do not change much. When asked about number of daily changes we found that in 8 days 11% of the table will change and in a month 41.25% of the table will change (too much for a no changing table in my opinion).

This is my stand, "tables should always be analyzed". If you don't want a particular table to be analyzed for x or y reason you must lock the stats on those tables and document it. The Oracle optimizer picks the right execution plan 90% to 95% of the times (this is only true if you have a decent developer and you understand how the Oracle architecture works). For the other 5% to 10% of the code you need to use hint, SQL Profiles and other artifacts that Oracle provides so you can get a good execution plan.

"Changing the optimizer behavior so 5% to 10% of your code runs properly is just not a good idea nor makes any sense."

I will advice to change the optimizer behavior just when you exhausted any other option, you are dealing with an small set of tables that you know works well with the change, and you know you will not add more code or tables to that database.

As usual I respect the opinion of those people that think that "if it is no broken don't fix it", the fact is that if you are a DBA you better you better get your hands dirty and understand well how the application uses your database, what are your heaviest tables, what are your problematic queries, etc, etc. The vendor will leave after they finish the application code, data will change and you will have a ticking boom in your hands.

No comments: