Thursday, October 25, 2012

-- Basic knowledge about some Network Artifacts --

As it had happened to any other DBAs out there, our roles have been evolving and mine hasn't been the exception. I recently have been more involved with Network architectures, I am sharing this information with you hoping that it can be useful.

VIP (Virtual IP): It is an IP that is not bound to a single physical interface.

LB (Load Balancer): Artifact that allows you to share network requests (following some algorithm) across multiple resources.

GSLB (Global Server Load Balancing): Allows you to route network traffic to multiple locations. It's main objective is to route your connection to the most efficient server available to you. 

GSS (Global Site Selector): This is a Cisco GSLB Product.

GTM (Global Traffic Manager): This is a F5 GSLB Product.

Broadcast: When you need to have more than two nodes see the traffic on the same subnet.

Unicast: Unicast is used when two network nodes need to talk to each other.

Anycast: With anycast you advertise the same network in multiple spots of the Internet, and rely on shortest-path calculations to funnel clients to your multiple locations. As far the network nodes themselves are concerned, they're using a unicast connection to talk to your nodes

Multicast: Is like a broadcast that can cross subnets, but unlike broadcast does not touch all nodes. Nodes have to subscribe to a multicast group to receive information.

| TYPE      | ASSOCIATIONS     | SCOPE           | EXAMPLE |
| Unicast   | 1 to 1           | Whole network   | HTTP    | 
| Broadcast | 1 to Many        | Subnet          | ARP     |
| Multicast | One/Many to Many | Defined horizon | SLP     |
| Anycast   | Many to Few      | Whole network   | 6to4    |

Developers vs. DBAs who will win the battle?

In the last 9 months I have been involved in several battles with the development group, and before I continue let me be clear DBAs are humans and we do mistakes. I have tried several approaches to those issues, but only 1 has provided me the satisfaction and the benefits expected.

"A natural approach is to say, I didn't do anything, the problem is not in the database, it is in your code, and it is not my business since I am not a developer. "

As a DBA we tent to think this way. "I did not change anything in the database, I see no errors in the alert.log therefore is not my problem. "

My question is, for how long are we going to say the same, over and over, even that is technically true. I lose a tremendous amount of time in useless RCA meetings (Root Cause Analysis) where everybody points the finger to everybody else, and people worry more about who to blame it to, instead of how are we going to get it fix for good and avoid future recurrences.

Let me be clear, "No one in a organization, DBA nor Developers wants for outages or problems to happen", if there were no problems, we will have no jobs, therefore problems will happen, Focus on learning from the problems and avoid them in the future.

The approach that has worked for me, and have provided the satisfaction and the benefits expected is to research about where is the problem, even that it may not database related. Yes don't be afraid to put your nose into somebody's else business (I know it does not sound right but you know what I mean with it ;-) ), you are getting affected so they need to understand clearly your interest.

In the past I have found that some of the problems are:

- The developers are been push by the project managers into meeting unrealistic and
unnecessary deadlines, and they sacrifice quality for meeting the deadline, in other words
if it works is good enough, I don't care the resources it will consume or how long will take to

- Very inefficient spaghetti code produced for a constant change in the scope of the project.

- Lack of coding standards, and in this case the DBA produced coding standards for them and
the DBA is the watcher that those standards are met. (Yes, I know it is not in our job
description, but it works).

- The developer does not know how to code, they just know a reduced set of instructions and
try to program with whatever they know, therefore producing tons on inefficient code. For
this case we are providing the developers with a database architecture workshop and a SQL
and SQL tuning workshop. In other words I have to help them so they can help me. (Yes,
another one that is not in your job description).

Conclusion: Don't be afraid to think out of the box, rather than complaining about the problems
try to find out where are the problems (even if they are outside your
boundaries) and try to be part of the solution.

Friday, September 28, 2012

-- How to compare the data in two tables using hash values --

We recently had the need to verify if there has been changes on a table since a specific date (for data integrity purposes), here is an easy and effective way to accomplish that.

For this example we create a table called X with the names of my family.

SQL> create table x (x1 number, x2 varchar2(10));

Table created.

SQL> insert into x values (&1,'&2');
Enter value for 1: 1
Enter value for 2: Rafael
old   1: insert into x values (&1,'&2')
new   1: insert into x values (1,'Rafael')

1 row created.

SQL> /
Enter value for 1: 2
Enter value for 2: Kristina
old   1: insert into x values (&1,'&2')
new   1: insert into x values (2,'Kristina')

1 row created.

SQL> /
Enter value for 1: 3
Enter value for 2: Nathan
old   1: insert into x values (&1,'&2')
new   1: insert into x values (3,'Nathan')

1 row created.

SQL> /
Enter value for 1: 4
Enter value for 2: Andrea
old   1: insert into x values (&1,'&2')
new   1: insert into x values (4,'Andrea')

1 row created.

SQL> /
Enter value for 1: 5
Enter value for 2: Samantha
old   1: insert into x values (&1,'&2')
new   1: insert into x values (5,'Samantha')

1 row created.

SQL> commit;

Commit complete.

SQL> select * from x;

        X1 X2
---------- ----------
         1 Rafael
         2 Kristina
         3 Nathan
         4 Andrea
         5 Samantha

We are going to use the Oracle hash function along with the sum function to calculate the hash valued of the aggregated data on the table.

SQL> select sum(ora_hash(x1||'|'||x2) from x;


The number above represents the sum of the aggregation of all the column and all the rows for table x, now let's see what happen when we change one of the rows.

SQL> update x set x2='SAMANTHA' where x2 like 'Samantha';

1 row updated.

SQL> commit;

Commit complete.

Now we apply the same hash function.

SQL> /


It is also a good idea using something like CRC-16 (Cyclic Redundancy Check), but that is a matter of preference. As you can see above the value changed, you can use dynamic SQL to build the sql statement for all the tables in a schema and store the hash value on a table, that will make it easier at the time you need to compare the data on each table.


Thursday, September 27, 2012

-- More about Oracle 11gR2 Single Client Access Name (SCAN) --

The introduction of SCAN have caused a lot of confusion among network administrators, architects and DBAs, let's see if we can keep this to basis to make it less confusing.  

Why do we need this?

 It provides a single name for clients to access Oracle Databases, this is specially true for database clusters, the reason is because you do not have to update later on the client information if you change the cluster configuration. SCAN gets configured when you install Oracle Grid Infrastructure (New name for Clusterware and ASM together), During the installation you will be asked for a SCAN name, ideally you want to define the SCAN in your DNS, with this method your network administrator needs to create a virtual hostname that resolves to 3 IP addresses using a round robin algorithm. The reason behind why 3 IP Addresses is related to load balancing and HA requirements regardless of the number of servers in your cluster. Each IP must be on the same subnet as the public network in your database cluster.

 NOTE: The 3 IPs above should not be assigned to a network interface, Oracle Clusterware will take care of it. Your DNS entry should looks like   IN A 
                                                      IN A 
                                                      IN A 

 How it works?

Clients connecting to the database using the SCAN will receive 3 IP, the client will try to connect using the list of IPs it received, if the client receives an error , it will try the other addresses before it returns an error. When the SCAN listener received a connection request, it checks for the least loaded instance providing the requested service, then it re-directs the connection to the local listener on the node that have the service available. Now as you can see by the explanation above, your client connection information won't be affected by the configuration of your cluster.

Wednesday, July 18, 2012

-- 10gR2 & SCAN IPs --

There has been some controversy with 10gR2 about the use of SCAN IPs, I have seen a lot of confusion about it which motivated me to write this article. Let's refresh our memory a little bit with the basis.
  • Single client access name (SCAN) is the virtual hostname to provide for all clients connecting to the cluster (as opposed to the vip hostnames in 10g and 11gR1).  
  • SCAN is a domain name registered to at least one and up to three IP addresses, either in the domain name service (DNS) or the Grid Naming Service (GNS).
  • The real reason behind SCAN is that because the SCAN is associated with the cluster as a whole, rather than to a particular node, the SCAN makes it possible to add or remove nodes from the cluster without needing to reconfigure clients. It also adds location independence for the databases, so that client configuration does not have to depend on which nodes are running a particular database.
  • Grid Infrastructure will start local listener LISTENER on all nodes to listen on local VIP, and SCAN listener LISTENER_SCAN1 (up to three cluster wide) to listen on SCAN VIP(s); 11gR2 database by default will set local_listener to local LISTENER, and remote_listener to SCAN listener.
  • SCAN listener will be running off GRID_HOME, and by default, in 11gR2 local listener will be running off GRID_HOME as well.
With the SCAN the client sends a request to the SCAN listener, all services in the cluster are registered with the SCAN Listener, the SCAN Listener sends back to the client the address of the local listener on the least loaded node where the service is currently been offered, then the client establishes the connection to the service through the local listener on the node where the service is been offered. 

While 10g/11gR1 SQL*Net does not support certain aspects of this new feature, it is still possible to configure 10g/11gR1 clients and database instances to integrate with 11g R2 SCAN.  In order to make it work you won't will need to have the SCAN VIPs fix, this will not work in the case where the VIP are dynamically allocated via GNS.

 The problem arises because client and database instances with versions < 11g R2 do not support multiple IP addresses associated with the same host name, therefore you can't use the 11gR2 syntax for the remote_listener or the client connect string.  

You still need the SCAN VIPs, SCAN can have between 1 to 3 IPs associated with it, I always recommend 3 because 3 is the maximum number of VIP you will need regardless of the number of nodes in your cluster. It is ok to allocate 2 in the case of you having only 2 nodes as long as you are aware than when you add more nodes you will need an to reconfigure and an additional VIP on the same sub-net as the previous SCAN IPs.   

Please note that as you create the 'local_listener' and remote_listener instance parameters, local_listener should use the node VIP, while the alias for the 'remote_listener' uses the SCAN VIPs same as the setup for 11gR2

         (ADDRESS = (PROTOCOL = TCP) (HOST = node1-vip) (PORT=1521))

         (ADDRESS = (PROTOCOL = TCP) (HOST = node2-vip) (PORT=1521))

          (ADDRESS = (PROTOCOL = TCP) (HOST = X.X.X.1) (PORT = 1521))
         (ADDRESS = (PROTOCOL = TCP) (HOST = X.X.X.2) (PORT = 1521))
          (ADDRESS = (PROTOCOL = TCP) (HOST = X.X.X.3) (PORT = 1521))

NOTE: Before 11.2 the Oracle Client was configured using ADDRESS_LIST listing all the nodes VIP. When you have a client with this configuration the REMOTE_LISTENER parameter for the pre 11gR2 database needs to list every node VIP plus every SCAN VIP. 

alter system set local_listener='LISTENER_10g1' scope=spfile sid='10G1';
alter system set local_listener='LISTENER_10g2' scope=spfile sid='10G2';
alter system set remote_listener='LISTENER_GRID' scope=spfile sid='*';

Check that you have a CRS service configured in OCR for clients to connect to this database. The client connection string should be similar to.
       (ADDRESS = (PROTOCOL = TCP)(HOST = x.x.x.1)(PORT = 1521))
       (ADDRESS = (PROTOCOL = TCP)(HOST = x.x.x.2)(PORT = 1521))
       (ADDRESS = (PROTOCOL = TCP)(HOST = x.x.x.3)(PORT = 1521))
    (LOAD_BALANCE = yes)

Tuesday, July 17, 2012

-- Utilization of Flashback Database as a back out mechanism for Application Deployments (Best Practices) --

Performance observations

 As you have to write to the flashback logs when the database is in flashback mode, you usually have around 5% performance impact especially for OLTP transactions. The impact for large inserts (e.g. batch inserts) or direct loads operations goes from 5% to 40%, this is especially true for 10g, in 11g Oracle introduced “block-new optimization” which significantly relieves that situation.

Best Practices

1. - Set the DB_FLASHBACK_RETENTION_TARGET to a value equal to the farthest time in the past that you wish to be able to recover to.

2.- size the flashback recovery area (FRA), ensure that there is sufficient space allocated to accommodate the flashback database, the flashback logs for the target  retention size and for peak batch rates. A general rule of thumb is that flashback logs get generated approximately at the same rate as redo log.

3. - Configure sufficient I/O bandwidth for FRA, failure to do that will result in high “FLASBACK BUF FREE BY RVWR” wait event and “FLASBACK LOG WRITE” latency of more than 30ms. The easiest way to accomplish this is by placing the flashback logs in the outer portion of the disks or LUNS.

4. - Set the LOG_BUFFER to at least 8MB in order to give the flashback database more buffer space in memory. For large databases consider a range between 32-64 MB.

5. – Set _DB_FLASBACK_LOG_MIN = ( and above only).

6. - Monitor closely your “Recovery area free space (%).

7. - In Oracle 11.2 onward, you can enable flashback database while the database is open.

8. - It is highly recommended that you use Guarantee Restore Point (GRP) before your deployment.

9. - Enable database flashback when needed and disable it after you are sure the deployment was successful. Please note that every single time you enable or disable flashback database a reset log operation occurs therefor you have a different incarnation of the database, once you get the database out of flashback mode there is no way to move back or forth in time.

Thursday, February 2, 2012

-- Big Data - Don't miss this event --

Wednesday, January 18, 2012

-- Fundamental Oracle flaw revealed --

Very interesting InfoWorld article worth to share, please be sure you apply the latest January 2012 CPU patch. You can access the article by clicking here.