Friday, September 28, 2012

-- How to compare the data in two tables using hash values --

We recently had the need to verify if there has been changes on a table since a specific date (for data integrity purposes), here is an easy and effective way to accomplish that.

For this example we create a table called X with the names of my family.


SQL> create table x (x1 number, x2 varchar2(10));

Table created.

SQL> insert into x values (&1,'&2');
Enter value for 1: 1
Enter value for 2: Rafael
old   1: insert into x values (&1,'&2')
new   1: insert into x values (1,'Rafael')

1 row created.

SQL> /
Enter value for 1: 2
Enter value for 2: Kristina
old   1: insert into x values (&1,'&2')
new   1: insert into x values (2,'Kristina')

1 row created.

SQL> /
Enter value for 1: 3
Enter value for 2: Nathan
old   1: insert into x values (&1,'&2')
new   1: insert into x values (3,'Nathan')

1 row created.

SQL> /
Enter value for 1: 4
Enter value for 2: Andrea
old   1: insert into x values (&1,'&2')
new   1: insert into x values (4,'Andrea')

1 row created.

SQL> /
Enter value for 1: 5
Enter value for 2: Samantha
old   1: insert into x values (&1,'&2')
new   1: insert into x values (5,'Samantha')

1 row created.

SQL> commit;

Commit complete.

SQL> select * from x;

        X1 X2
---------- ----------
         1 Rafael
         2 Kristina
         3 Nathan
         4 Andrea
         5 Samantha


We are going to use the Oracle hash function along with the sum function to calculate the hash valued of the aggregated data on the table.


SQL> select sum(ora_hash(x1||'|'||x2) from x;

SUM(ORA_HASH(X1||'|'||X2)
----------------------------------------
                                  108834





The number above represents the sum of the aggregation of all the column and all the rows for table x, now let's see what happen when we change one of the rows.

SQL> update x set x2='SAMANTHA' where x2 like 'Samantha';

1 row updated.

SQL> commit;

Commit complete.





Now we apply the same hash function.

SQL> /

SUM(ORA_HASH(X1||'|'||X2)
----------------------------------------
                                  112024


It is also a good idea using something like CRC-16 (Cyclic Redundancy Check), but that is a matter of preference. As you can see above the value changed, you can use dynamic SQL to build the sql statement for all the tables in a schema and store the hash value on a table, that will make it easier at the time you need to compare the data on each table.

Cheers!.


1 comment:

Mai Thanh Hai said...

How about the deleted row?