Wednesday, January 30, 2008

PHOUG Presentation by Tim Gorman

I had the opportunity tonight of assisted to the PHOUG meeting which included a presentation by Tim Gorman about "Scaling to Infinity: Partitioning Data Warehousing in Oracle".

I have to be honest this is the first time I hear about Tim, but he really surprised me, what a great presentation and presenter. I can see Tim have a lot of Data Warehousing experience and enjoy sharing his knowledge. I will try to summarize below the points that Tim covered.

Tim sees 3 Mayor errors on DW design.

A) Ignore the Basic Requirements for DW and design what is familiar
B) Fail to portray data changes over time.
C) Fail to Utilize partitioning from the beginning.

- Reporting and analysis applications are responsible for presenting data in the format that works best for end users and their query / analysis tools (very ofter what end users want is a simple spreadsheet, hundreds of columns wide.

- Reporting and analysis applications do not enforce business rules.

- Do not build a data model to enforce referential integrity and / or business rules. Normalization is intended for use in process oriented operational systems and not in a DW.

- Provide very simple data representation, with one degree of normalization for flexibility.

Time-variant data, who cares?

There are two major types of queries from business intelligence appliances to the data warehouse databases. Point in time (What is the present situation) and trend analysis (How things looks now versus 3 months ago). Dimension tables are usually designed to be point int time while fact tables are more trend analysis.

Every Data Warehouse has at least one slowly-changing dimension usually involving people.

The Virtuous Cycle

Using EXCHANGE PARTITION for loads enables:
- Elimination of ETL load windows and 24x7 availability for queries.
- Direct path loads
- Bitmap indexes and bitmap-join indexes.

Exchange partition is the basic technique of a bulk-loading new data into a temporary "Load table" which is then indexed , analyzed and then "published" all at once to the end user using the EXCHANGE PARTITION operation.

Final Recommendations

1.- Use dimensional data models for the "presentation" to end users.
2.- Base the database design on time-variant data structures.
3.- Use partitioning

1 comment:

Blogger said...
This comment has been removed by a blog administrator.