Saturday, May 12, 2018

-- How to defy the elasticity property --



Leadership in corporate America often struggles with the demands of doing more with less. This is especially true when you work for a public company where results must either meet or exceed expectations, and those expectations increase with every quarter. The pressure forces each service unit in a public company, including IT, to face the same question: How do we comply with corporate objectives while maintaining the ability to service natural organic growth without sacrificing quality? Although each situation is unique for a company, the answer always seems to require a combination of technology and efficiency.

They say necessity is the source of all innovation, and it is by necessity that I have found innovative ways to use technology and efficiency to accomplish more with fewer resources. With a team of just three full-time database administrators, I manage the 24x7 operation of more than 1,300 databases, spread over multiple platforms including Oracle, SQL Server, MySQL, Teradata, and MongoDB. For those of you who are not familiar with the database administration side of IT, the market standard recommends one DBA manage between 30 and 50 databases (depending on database size and complexity). Using the leanest end of that standard, my team should be composed of 26 database administrators, which doesn’t take into consideration the added complexity of having to support databases on five different platforms.

            This is taking the “do more with less” mantra to the extreme. I have only 15% of the resources that industry standards say I need in order to successfully run my department. Luckily I am a technical manager, with a background in database administration so I could count myself as a resource (if I ever find a way out of the back-to-back meetings). And I am especially grateful that I was allowed to hand pick my team, and selected some of the best resources I’ve worked with over the last thirty years. Each of my three DBAs has 20+ years of experience, specialize in different technologies, and possess different professional strengths. You may argue that since we need coverage for so many databases I should have hired more junior DBAs to stretch my payroll budget, rather than just a select few with substantial experience. My response would be that time is literally money; every minute an essential database is down, the company’s earnings potential is negatively impacted. It’s important to have the right tools for the job, and my experienced DBAs know what warning signs to watch for to prevent systems from breaking down, rather than just responding when things go wrong.

I am proud of what my team has been able to accomplish, but if someone had told me at the interview that we would have to support such a complex operation with such a lean team… well, my mind would have been screaming, “Danger Will Robinson, danger!” and I likely would not have taken the position. To be fair, my employer was unaware of the real situation, and it was only through several years of diligent investigation that we finally uncovered all the information systems we are responsible for. It didn’t take long for me to realize that in order to succeed, I would need to:
1) automate and standardize as much as possible,
2) focus on proactive support,
3) set multi-year objectives to bring the ecosystem to a supportable version, and, most importantly,
4) hire and retain the best talent my budget would allow in order to reduce the technical gap.
Having information systems properly supported is key to the success of any organization. While we have found ways to make a lean teamwork, I am not suggesting you reduce the size of your team: remember – we got to this formula out of necessity rather than choice. With the proper use of efficiencies and technologies, sometimes it really is possible to work smarter, not harder (or longer hours, or with more people). Some of the “smart” changes we have implemented are listed below.


Monitoring and diagnostics. We standardized our monitoring of different technologies with a single tool, Oracle Enterprise Manager, which offers multiple plug-ins for different flavors of databases. This provides a single, streamlined process for monitoring and diagnostics regardless of the technology behind it.

Seamless response to production support emergencies. We integrated Oracle Enterprise Manager’s alert system with a SaaS product (PagerDuty). The service receives the alert and contacts the on-call resource with the technical knowledge required to resolve the issue. PagerDuty has the entire team’s contact information, from the Database Administrators all the way to up the Senior Director of Database Engineering. If an alert does not receive a response within a set timeframe, the alert continues up the chain of command until a response is received. This ensures all alerts are responded to in a timely manner and encourages everyone to watch for alerts.

Night and weekend support. With such a lean team, having everyone present during business hours is essential in order to keep up with the changing demands of the business. It would be humanly impossible for the team to support night incidents and show up for work the next day, and working every weekend is a great way to burn out a good employee. And unfortunately, the systems aren’t considerate enough to only break during office hours. We solved this problem by outsourcing our off-hours support to RDX. They are a US company who assigned named resources with skills and experience equivalent to the internal team. They are capable of working independently to solve problems, and they communicate what went wrong and what steps were taken to remedy the situation. We’ve even used their resources for overflow work during regular business hours. I am honestly impressed by the quality and capabilities of the resources at RDX.

Standardize as much as possible. Having clear and well documented standard procedures is not only is a best practice - it also makes the diagnostic process a lot easier. We try to make documentation as detailed as possible, and this allows us to request help from helpdesk engineers that have no database knowledge. No matter who is debugging an issue, every resource is familiar with the environment.

Encourage and promote proactive support. If a problem happened once in a single database, and you have a standardized ecosystem, then it is extremely likely that same problem will happen again in a different database. Proactively preventing problems from recurring in the rest of the enterprise is where I want my team to spend most of their time.

Communication. Having consistent and recurrent communication with the application support team is priceless. They are often challenged to make changes in order to meet the demands of the business, and we have avoided numerous incidents that would have created extended downtimes by having open communication and helping build the vision of what is to come in their world.

Keep up with innovation. Although we can’t afford to patch every quarter, we ensure that every time we perform a database upgrade we apply the latest and greatest patches. While it’s important to take advantage of the latest technology available, you want to be on the leading edge, not the bleeding edge. It’s important to have access to and understand the latest-and-greatest technology, but you don’t want the risk and downtime associated with becoming the debugging guinea pig.

Take ownership. We are often challenged with having to solve issues related to MS Access databases. These databases are usually created by smart business users as a necessary workaround to accommodate software limitations. However, when that smart user is no longer with the company, suddenly no one knows how to maintain a system that is vital to the business operation, and often times no one knows the database exists until it breaks. Needless to say, it is a very bad idea to create this kind of dependency outside IT. Work with the business units, and provide tools that are under IT governance which allow them to have the freedom to customize their own reports and create basic applications. If you don’t take ownership of this problem, then you can expect a proliferation of Excel sheets and decentralized unsupported information systems, which will turn into an information security and support nightmare.

Attempt to reduce the technical gap. Nowadays it is impossible to be an expert in every new technology. While it is true that database systems are becoming smarter, it is also true that additional features require additional skills, and the traditional role of the Database Administrator has evolved. A lot more is expected from a DBA today compared to 10 years ago. Having a small team precludes any possibility of having technical redundancy, so cross-training is essential. One way to do this is through periodic lunch-and-learn meetings and a training-of-trainers model (ToT). This often lifts the morale of participants, promotes teamwork, and works well if you have a scarce (or non-existent) training budget.


The list above is just a compilation of what has worked for us, and since every situation is unique they may not work for you. When faced with a challenge, keep in mind that unconventional situations likely can’t be resolved with conventional solutions. It requires thinking outside the box.


Rafael Orta

Wednesday, August 16, 2017

-- Some Teradata SQLs for newbies --

As I am learning Teradata Database Administration, I thought about sharing some of the SQLs that I found useful for the day to day operation. I will be adding to this list.


How to check if a user is locked

You connect to the database as sysdba, dbc won't have the privileges required.

Execute from Teradata SQL Assistant

SELECT * FROM .USERS WHERE lockedCount1; 

If there is a locked user, you can unlock it using the following command

modify USER "tariqn" as release password lock; 


where tariqn is the name of the account that you are attempting to unlock.

How to modify your password

MODIFY USER "USERNAME" AS PASSWORD = "ParsiPPany12!";


How to assign a temporary password to a user and prompt the user to enter a new password.


MODIFY USER ORTAR as password=********  FOR USER;

Thursday, August 18, 2016

-- Do not forget about the Database recycle bin --

Hello Everybody

Just a quick post reminder to keep in mind that starting on Oracle 10g when you drop objects they still recide in your recycle bin. So if you wonder why after dropping all those tables you do not see the space back, you got your answer. This is specially important if you have some DW like environment where you drop and recreate tables every day part of your ETL.

Every schema in the database has its own recyclebin, if you want to know what is the content of the entire database recycle bin you can do this query.


If you only want to query your schema recycle bin then you execute the following.


In order to be able to purge an object from the recycle bin you need to either own the object or you need to have the "drop any" system privilege assigned to you. 

If you want to purge objects in the recyclebin for the entire database, you need to have the sysdba privilege, be connected as sysdba and execute the following.

SQL> purge dba_recyclebin;

If you want to purge your own recycle bin you do the following.

SQL> purge recyclebin;

If you want to purge objects in the recycle bin stored in an specific tablespace you execute the following:

SQL> purge tablespace ;

If you want to be more specific and purge objects in a specific tablespace for a specific user you execute the following.

SQL> purge tablespace user ;

Last but no least if you want to purge the recycle bin for a table you own and you just dropped you execute the following.

SQL> purge table ;

If that table has indexes associated you execute the following.

SQL> purge index ;

Enjoy. 





Wednesday, June 1, 2016

-- A Quick handy guide to generating SQL Trace --

We all have done this a time or another, and then we have to do it again and we have to go back to our notes to remember the steps, this is well-known information, but I encourage you to bookmark it and have it handy as you will eventually need to use it as well.

Tracing the execution of a SQL

alter session set tracefile_identifier='10046';
alter session set timed_statistics = true;
alter session set statistics_level=all;
alter session set max_dump_file_size = unlimited;
alter session set events '10046 trace name context forever,level 12';

/* now you execute the sql that you want to trace */

exit;

The trace file will be located in your diagnostic destination and will have the number 10046 as part of the file_name, you want to go there to format it using tkprof.  

tkprof input_file output_file waits=yes explain=username/password

Note: you can use the sort clause to sort the output in many ways, for a complete description of options please check the following resources:


Cheers!!!!


Saturday, February 13, 2016

-- Oracle 11g Optimizer and Incremental Statistics --

As I was doing some research about the 11g optimizer I found out that in 11g you can now gather Incremental statistics for partitioned tables. The idea behind this feature is to be able to capture global statistics on large partitioned tables at a lower resource cost by minimizing the time required to collect statistics.

This makes complete sense, think about a large range partitioned table, for example, SALES_DATA. Let's assume that your SALES_DATA table is partitioned by month, it is very unlikely that old sales data will change, especially data that correspond to previous years. As you know Oracle keeps statistics at the partition level and overall statistics for the table,  Oracle monitors DML operations at the table and subpartition levels, normally statistics are gathered only for those partitions that changed (> than 10%), however, global statistics are gathered by scanning the entire table and that makes it a very expensive operation.

In order to do that, you need to first set the table preferences to incremental.

SQL> exec dbms_stats.set_table_prefs('TPM','SALES_DATA','INCREMENTAL','TRUE');

Next we gather the global statistics

SQL> exec dbms_stats.gather_table_stats('TPM','SALES_DATA', GRANULARITY=> 'GLOBAL');

If you query the last analyzed column from users_table you will see that the timestamp is updated, however when you query the same column on user_tab_partitions you will see that not all the partitions got updated, furthermore you should see a different on the time it took to gather the statistics.




Tuesday, January 26, 2016

-- Oracle 12c Grid Infrastructure Management Repository MGMTDB --

After installing Oracle 12c Grid Infrastructure on an ODA X3-2 we started getting a warning from Enterprise Manager alerting us that the target MGMTDB Database SYSMGMTDATA tablespace is 85% full. As you can imagine we were surprised as we did not recognize that database. We reached out to few people before and realized that this must be something new.

Starting 12.1.0.2 Oracle creates a new mandatory database called MGMTDB, Oracle creates on it Cluster Health Monitoring (CHM) Data among others, this database gets failover to a surviving node in case one node goes down, the database needs to be up and running, it starts up automatically.

Please note: There is a minus sign in from of the database name

Database name: -MGMTDB
ORACLE_HOME: Same as the GRID_HOME

I am not sure about the password for SYS or SYSTEM on it, but you can connect as sysdba


There are 3 tablespaces on it (UNDO, SYSTEM and SYSAUX) , you do not have to worry about space management since the datafiles are auto-extensible.


MOS 2065175.1 contains information about how to use MDBUtil tool, this tool allows you to perform a variety of operations with the management repository. The best article I found with information related to this topic is from Amit Bansai and you can access it on this link.


Thursday, March 5, 2015

-- Oracle Compression Offerings --

I recently was approached by a consultant that was confused about the different compression offerings from Oracle, so I decided to write a high-level short summary to provide some clarity.

Oracle offers 3 types of Compression:

A) Basic Table and Index Compression (Good for Data Warehouses): This feature has many years in the market (at least 10 (Oracle 9i)) and it is part of Oracle Enterprise Edition (no additional license cost for you), when Oracle designed this feature the goal as the name of the feature says was to compress data to save space, however, what they found is that by compressing the data, and as a natural consequence you reduced the I/O, and if you add to it the fact that their compression / decompression algorithm is very light and simple (just consume negligent additional CPU amounts) the query performance often increases.

A.1) What are the disadvantages of it:  With this type of compression Oracle only compresses the data when you are inserting it into the table, and you need to insert it using “direct path insert”. The other disadvantage is that during updates Oracle needs to decompress the records and after updating them it does not compress it again, therefore you will end with a mix of compressed and uncompressed records. This type of compression is not good for OLTP databases, but ideal for Data Warehouses where you mostly do read-only transactions and you populate the data through an ETL tool doing the direct path insert.

A.2) What is direct path insert and what does this means: Direct path insert is a special mode if the insert where Oracle while inserting data will ignore the available free space on the table, in other words, it will not reuse the available free space on the table, it will do the insert after the existing data in the table, therefore wasting disk space, but increasing performance because it bypasses the buffer cache and writes directly to the data files.

A.3) What queries will see a performance advantage: Those queries that requires fetching millions of records and do aggregations where I/O is the bottleneck over tables that have a high compression ratio,  tables with small numbers of rows or small size will see adverse performance impact due to the CPU overhead.

A.4) What compression ratio can be achieved: It depends on the nature of the data, tables with duplicated data will achieve higher compression while tables with almost no data duplication will achieve less to none.

B) Oracle Advanced Compression (Available 11g and above) (a.k.a OLTP Compression): This is a licensed feature that can be implemented on top of Oracle Enterprise Edition only, it works similar to Basic table compression but it allows data to be compressed during all types of DML operations such as Insert or Updates, the compression algorithm is enhanced therefore reducing the overhead of write operations. The other advantage is that Oracle is able to read the rows without uncompressing them, therefore, there is almost no performance degradation in accessing compressed data, in fact in many cases you may see performance improvement due to the reduced I/O.


C) Hybrid Columnar compression (HCC): This is a feature available on Oracle Exadata, and recent rumors say that is available on the ODA X5-2. HCC uses an orchestration of Hardware and Software capabilities, it achieves higher compression rates than the other 2 methods, it has  4 different types of setting (Query high / Query Low/ Archive high / Archive low) that tells Oracle how you want  to balance the compression vs. performance impact; in other words the more compression the higher the CPU cost involved in performing operations, the lower compression the less CPU cost . Different from the other 2 methods this one stores the data in columnar format and that is the secret sauce for achieving higher compression, the disadvantage is that you could have adverse performance effects if you query more than one column at a time, although this is often offset by the use of the Exadata Smart scan feature. You still need to do bulk load of the information to maximize the compression ratio. Oracle includes an advisor that tells you what kind of compression you could expect to achieve.