Tuesday, October 19, 2010

-- Questions about the Oracle GG Roadmap --

By Oracle Corp.

1.- Will Oracle fix the recyclebin off issue when capturing DDL? If so, what OGG release and what RDBMS versions are expected to support a functioning recyclebin in the source?

We now support the recyclebin on/true for Oracle 11g along with our DDL replication support, Oracle 10g still requires the recyclebin to be off/false.

   2.- Does OGG support active-active in a 3+ node (multi master) configuration? Documentation shows 3 nodes in an active-active configuration, is 3+ supported?  If so, any restrictions for DML or DDL for a 3+ active-active configuration?
  
      Our active-active capabilities apply to any number of nodes in the configuration, in other words, we are not limited to only two. Obviously you need to make special considerations for other factors such as the replication of tables that use sequences, replication of truncate or other DDL operations, for example. For best practices, I recommend for multi-node active-active environments, that user privileges/grants are set so that DDL operations can only originate from one node and not the others. Configure OGG to replicat those DDL operations to all downstream nodes in one direction only. Of course that means that other considerations need to plan in event that the node that allows DDL operations goes down or has a planned downtime for maintenance activities.
·          
      3.- When will OGG support source tables with Advanced Compression?  10.4.x docs says compressed tables are not supported for extract.
     
       It is on the road-map to support advanced compression. In fact, a release defining feature is to integrate our transaction log reading functionality with the XStreamOut API. This integration will give us support for the variety of database compression features. This is scheduled to be part of the next major release. 
     
      4.- Will the SUPPRESSTRIGGERS param available for RDBMS 10.2.0.5 and 11.2.+ for OGG 11.1.x be supported in other RDBMS versions, mainly 10.2.0.4 and all 11G R1 RDBMS versions.
                   
      SUPPRESSTRIGGERS parameter  support is the ultimate goal, but I am not clear which future release this is targeted to be included. Each database has different functionality so it might be a phased approach depending if it is possible and the development effort required to support it.

-- CRS will not start on one node (Fixed on 11.2.0.2 and Workaround below) --


: Hdr: 9469133 11.2.0.1 PCW 11.2.0.1 CSS PRODID-5 PORTID-197
Abstract: CORE DUMP OF OCSSD.BIN WHEN VOTING DISK IS NOT ACTIVATED.

Some explanation from development:
==================================
 In 11.2, voting files are discovered, not hard-wired, so we look through a
 list of files that are specified in the 'discovery string', e.g.
 /dev/vdisk/*, and use all files that appear to be legitimate voting files,
i.e. they have a TOC (Table Of Contents), volume info block, etc.  Since the
VG with the voting files is not online, the discovery does not see them at
 all, so does not consider them as voting files and fails as a result of an
inability to find enough voting files.

=====================================

The workaround is to wait the disks are completely ONLINE after machine boot before cluster is started
(note 459169.1 CRS Does Not Startup Automatically After Node Reboot,
                   Manual Start is OK - but this node is up to 11.1 version)

or apply  patch, backport is available for Solaris x86-64(64 bit) - please confirm the Operating System

please  upload the patch inventory - I may need to raise backport for you

We're getting these errors on the other node when rebooting:

I did see in the boot messages on both hosts these items:
Johnston, Nathaniel [10:39 AM]:
Sep 24 16:05:15 mhddb-nb-2p.philadelphia.pa.bo.comcast.net root: Oracle HA daemon is enabled for autostart.
Johnston, Nathaniel [10:39 AM]:
Sep 24 16:05:18 mhddb-nb-2p.philadelphia.pa.bo.comcast.net unix: vn_rdwr failed with error 0x15
Sep 24 16:05:18 mhddb-nb-2p.philadelphia.pa.bo.comcast.net unix: kobj_load_module: read header failed
Johnston, Nathaniel [10:39 AM]:
Sep 24 16:05:19 mhddb-nb-2p.philadelphia.pa.bo.comcast.net root: exec /u01/app/grid/perl/bin/perl -I/u01/app/grid/perl/lib /u01/app/grid/bin/crswrapexece.pl /u01/app/grid/crs/install/s_crsconfig_mhddb-nb-2p_env.txt /u01/app/grid/bin/ohasd.bin "reboot"
Sep 24 16:05:19 mhddb-nb-2p.philadelphia.pa.bo.comcast.net root: exec /u01/app/grid/perl/bin/perl -I/u01/app/grid/perl/lib /u01/app/grid/bin/crswrap
Johnston, Nathaniel [10:40 AM]:
Sep 24 16:05:36 mhddb-nb-2p.philadelphia.pa.bo.comcast.net mDNSResponder (Engineering Build) (Nov 2 2009 05:02:07) [5272]: starting
Sep 24 16:05:37 mhddb-nb-2p.philadelphia.pa.bo.comcast.net mDNSResponder: Oracle mDNSResponder starting

======================================
                               WORKAROUND
======================================

11gR2 CRS doesn't startup after node reboot [ID 1050164.1]


Modified 31-JAN-2010     Type PROBLEM     Status PUBLISHED

In this Document
  Symptoms
  Changes
  Cause
  Solution




Applies to:

Oracle Server - Enterprise Edition - Version: 11.2.0.1.0 to 11.2.0.1.0 - Release: 11.2 to 11.2
Generic Linux

Symptoms

  • Installation of the 11gR2 Grid Infrastructure on a Linux cluster completed successfully
  • OCR & Voting files located in ASM diskgroup
  • using ASMLIB driver
  • ASM disks are located on multipath devices (/dev/mapper/)
  • following a node reboot CRS does not startup
  • CSS daemon log shows the following message:
    2010-01-13 09:04:15.075: [ CSSD][1150449984]clssnmvDDiscThread: using discovery string for initial discovery
    2010-01-13 09:04:15.075: [ SKGFD][1150449984]Discovery with str::
    2010-01-13 09:04:15.075: [ SKGFD][1150449984]UFS discovery with ::
    2010-01-13 09:04:15.075: [ SKGFD][1150449984]OSS discovery with ::
    2010-01-13 09:04:15.076: [ SKGFD][1150449984]Discovery with asmlib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: str ::
    2010-01-13 09:04:15.076: [ SKGFD][1150449984]Fetching asmlib disk :ORCL:DATA1:
    2010-01-13 09:04:15.076: [ SKGFD][1150449984]Fetching asmlib disk :ORCL:DATA2:
    2010-01-13 09:04:15.076: [ SKGFD][1150449984]Fetching asmlib disk :ORCL:DATA3:
    2010-01-13 09:04:15.076: [ SKGFD][1150449984]Fetching asmlib disk :ORCL:DATA4:
    2010-01-13 09:04:15.077: [ SKGFD][1150449984]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted)
    2010-01-13 09:04:15.077: [ SKGFD][1150449984]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted)
    2010-01-13 09:04:15.077: [ SKGFD][1150449984]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted)
    2010-01-13 09:04:15.077: [ SKGFD][1150449984]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted)
    2010-01-13 09:04:15.077: [ CSSD][1150449984]clssnmvDiskVerify: Successful discovery of 0 disks
    2010-01-13 09:04:15.077: [ CSSD][1150449984]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
    2010-01-13 09:04:15.077: [ CSSD][1150449984]clssnmvFindInitialConfigs: No voting files found
    2010-01-13 09:04:15.077: [ CSSD][1150449984]###################################
    2010-01-13 09:04:15.077: [ CSSD][1150449984]clssscExit: CSSD signal 11 in thread clssnmvDDiscThread
    2010-01-13 09:04:15.077: [ CSSD][1150449984]###################################
    2010-01-13 09:04:15.077: [ CSSD][1139960128]clssgmClientShutdown: total iocapables 0
    2010-01-13 09:04:15.077: [ CSSD][1139960128]clssgmClientShutdown: graceful shutdown completed.
    2010-01-13 09:04:15.077: [ CSSD][1150449984]
  • running the cluster verification utility returns the following messages:
    /cluvfy stage -post crsinst -n racnode1

    Performing post-checks for cluster services setup
    Checking node reachability...
    Node reachability check passed from node "racnode1"
    Checking user equivalence...
    User equivalence check passed for user "grid"
    Checking time zone consistency...
    Time zone consistency check passed.
    ERROR:
    Cluster manager integrity check failed
    PRVF-5434 : Cannot identify the current CRS software version
    UDev attributes check for OCR locations started...
    UDev attributes check passed for OCR locations
    UDev attributes check for Voting Disk locations started...
    ERROR:
    PRVF-5197 : Failed to retrieve voting disk locations
    UDev attributes check failed for Voting Disk locations
    Default user file creation mask check passed
    Checking cluster integrity...
    Cluster integrity check failed This check did not run on the following node(s):
    racnode1
    Checking OCR integrity...
    Checking the absence of a non-clustered configuration...
    All nodes free of non-clustered, local-only configurations
    ERROR:
    PRVF-5300 : Failed to retrieve active version for CRS on this node
    OCR integrity check failed
    Checking CRS integrity...
    ERROR:
    PRVF-5300 : Failed to retrieve active version for CRS on this node
    CRS integrity check failed
    OCR detected on ASM. Running ACFS Integrity checks...
    Starting check to see if ASM is running on all cluster nodes...
    PRVF-5137 : Failure while checking ASM status on node "racnode1"
    Starting Disk Groups check to see if at least one Disk Group configured...
    PRVF-5112 : An Exception occurred while checking for Disk Groups
    PRVF-5114 : Disk Group check failed. No Disk Groups configured
    Task ACFS Integrity check failed
    Checking Oracle Cluster Voting Disk configuration...
    ERROR:
    PRVF-5434 : Cannot identify the current CRS software version
    PRVF-5431 : Oracle Cluster Voting Disk configuration check failed
    User "grid" is not part of "root" group. Check passed
    Post-check for cluster services setup was unsuccessful on all the nodes.


Changes

Node was rebooted after install.

Cause


The CSS daemon crashes because it cannot locate any Voting files in any of the discovered ASM disks, which is indicated by the following message in the CSS daemon log (/log//cssd/ocssd.log):

2010-01-13 09:04:15.077: [ CSSD][1150449984]clssnmvFindInitialConfigs: No voting files found


This error is preceded by the following ASMLIB error:
2010-01-13 09:04:15.077: [ SKGFD][1150449984]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted)
suggesting that ASMLIB has problem accessing the ASM disk.


Solution

1. either edit the file /etc/sysconfig/oracleasm-_dev_oracleasm    and change the lines:
ORACLEASM_SCANORDER=""
ORACLEASM_SCANEXCLUDE=""
to
ORACLEASM_SCANORDER="dm"
ORACLEASM_SCANEXCLUDE="sd"
or alternatively run the following command (as user root)
/usr/sbin/oracleasm configure -i -e -u user -g group -o "dm" -x "sd"

2. stop & restart ASMLIB as user root using:
/usr/sbin/oracleasm exit
/usr/sbin/oracleasm init

3. restart CRS or reboot node

The above steps need to be executed on all nodes

-- ASM unresponsive on 1 cluster node of 4 (Fixed on 11.2.0.2) --

By Rafael Orta


=======================================================
Bug 9276348  BugDB see Bug 9276348.-P                Base Bug 9254569
Related Bugs : BugMatrix 9276348    DDR: BugDesc 9276348
Customer: INTERNAL - RACSYSTST                             Created: 11-JAN-10
Component: PCW          Comp Ver: 11.2.0.2     Rel St: D   Updated: 03-FEB-10
Sub Comp:              RDBMS Ver: 11.2.0.2                      By: HFLAKE
Status: 96,Closed, Duplicate Bug
Severity: 2,Severe Loss of Service                          Fixed In Ver:
O/S: 226 Linux x86-64
PL Group:   Gen/Port: G  Error #: -  Pub: N
Hdr: 9276348 11.2.0.2 PCW 11.2.0.2 PRODID-5 PORTID-226 9254569
Abstract: LX64: CRS STACK FAILS TO COME UP ON FIRST NODE, CSS STARTS OK
=======================================================

3) Bug 9276348  is closed as duplicated of Bug 9254569  :
=======================================================
Bug 9254569  BugDB see Bug 9254569.-P                No Base Bug
Related Bugs : BugMatrix 9254569    DDR: BugDesc 9254569
Customer: INTERNAL-RACSYSTST                               Created: 04-JAN-10
Component: PCW          Comp Ver: 11.2         Rel St: D   Updated: 13-SEP-10
Sub Comp: CRS          RDBMS Ver: 11.2                          By: ARU
Status: 80,Development to Q/A
Severity: 2,Severe Loss of Service                          Fixed In Ver: 11.2.
0.2
O/S: 233 Microsoft Windows Server 2003 (64-bit
PL Group:   Gen/Port: G  Error #: -  Pub: N

Hdr: 9254569 11.2 PCW 11.2 CRS PRODID-5 PORTID-233
Abstract: MISSING DEPENDENCY IN OHASD: GIPC HA REQUIRES GIPCD
=======================================================

4) Bug 9254569  is fixed on release 11.2.0.2.


ACTION PLAN
===========

5) In order to fix the bug 9254569 please apply the patchset 11.2.0.2 on the Grid Infrastructure Oracle Home. Patchset 11.2.0.2 also contains another very important fixes for the Grid Infrastructure CRS.

6) For additional information please check the next note:

=)> Important Changes to Oracle Database Patch Sets Starting With 11.2.0.2 (Doc ID 1189783.1)

-- Backup Issue encountered after DB failover --


By Raja Ankathi


We encountered the backup issue on  db. As per the metalink note 566635.1, there is an bug in 10.2. and it is fixed in 11.2


==================
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.2.0 This problem can occur on any platform.
Symptoms

Resync of the catalog for the Primary database fails with:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of resync command on default channel at 05/06/2008
10:29:09
ORA-00001: unique constraint (RMAN.TF_P) violated Cause You are hitting bug 6653570

The root cause scenario in the bug is:
1) a primary and standby is created
2) on primary customer added a tempfile. This doesn't get created on standby.
    It gets resynced to recovery catalog
3) Then when switchover happens, they add the tempfile on new primary and
    resync it to recovery catalog. This resync marks the tempfile on primary (at
    step 2) as dropped.
4) Then when the switchover happens again, the temp file that was dropped is
    again resynced which is causing the problem.
Solution

Bug 6653570 is fixed in Oracle Database version 11.2

Please check metalink if the patch is available for your platform and version. If not and a one off is required, please contact Oracle support

The workaround for this issue is to recreate the temporary tablespace on the primary database.

-- 11.2.0.2 upgrade error --


By Gary Oprendek
 
When upgrading grid 11.2.0.1.0 to 11.2.0.2.0, you must make sure that patch 9413827 is installed on the 11.2.0.1.0 grid and database binaries PRIOR to starting the upgrade. The upgrade will continue without it until you get to the end when you should run root.sh and it will bomb out stating that that patch is not installed. In order to install it you will have to shut down the grid and asm home and apply the patch. This will of course mean that you have to kill the upgrade process. This causes a problem because after you apply the patch and restart the upgrade process, the installer states that there is no grid or asm installation on the machine. This is because it has wiped out a parameter in the oracle inventory for the grid installation. It removes the “  CRS=”true”  “ parameter from the end of the line listing the grid home in the ~oraInventory/ContentsXML/inventory.xml file. The entry should look similar to this:

CRS="true">

If you run into this problem then manually edit the file and add the parameter back in and the installer will find the cluster and continue normally.

Wednesday, October 13, 2010

- Changing Private IP Addresses 11G R2 OCR / VD on ASM --

By Bill Toranto

I recently encountered this issue and thought everyone should be aware of it.


If any re IP-ing is to take place in your 11G R2 RAC clusters (mainly the Private IP), the Grid / CRS OCR needs to be updated prior to anything being done on the host. The ASM instance needs to be available in order for the OCR to be accessible.

IF the host is re IP-ed first, the Grid will need to be re installed using the new private IP address, since ASM will not mount & the OCR inaccessible.

Oracle does expect to have a fix for this very soon since they are getting a large number of reported issues related to this, and ASM OCR’s a relatively new implementation.

This may seem fundamental, but putting the OCR on ASM is a fairly new practice for most of us and previously could be modified after the host ip’s were modified since we used RAW devices.

Monday, October 11, 2010

-- Oracle Debuts Unbreakable Enterprise Kernel for Oracle Linux --


I don’t know what is in Larry’s mind, but seen this been announced all around makes me think Oracle is reaffirming their commitment to this Linux based platform, does it makes sense for them to keep the expensive Solaris platform alive or will there be a merge path down the road?, this is just something to think about to position ourselves properly in the future.

"Oracle Debuts Unbreakable Enterprise Kernel for Oracle Linux




Oracle has announced the availability of the Unbreakable Enterprise Kernel for Oracle Linux, a fast, modern, reliable kernel that is optimized for Oracle software and hardware. "Today's hardware innovations are fast and frequent, making it very important that the Linux distributions evolve quickly to leverage the latest hardware," said Wim Coekaerts, senior vice president of Linux and Virtualization Engineering at Oracle.
The new Unbreakable Enterprise Kernel is the result of the combined efforts of Oracle's Linux, database, middleware, and hardware engineering teams, and is:
  • Fast—More than 75 percent performance gain demonstrated in OLTP performance tests over a Red Hat Compatible Kernel; 200 percent speedup of Infiniband messaging; 137 percent faster solid state disk access
  • Modern—Provides optimizations for large NUMA servers; improved power management and energy efficiency; fine-grained CPU and memory resource control
  • Reliable—Supports the Data Integrity Extensions and T10 Protection Information Model, to stop corrupt data from being written to storage; hardware fault management improves application uptime; low overhead performance counters for tracing
  • Optimized for Oracle—Built and tested to run Oracle hardware, databases, and middleware with the best Linux performance and reliability available
"With the combination of Oracle Linux and the Unbreakable Enterprise Kernel, customers gain fast access to optimizations, new enhancements, and bug fixes," says Coekaerts. "This new offering is a result of Oracle Linux kernel development efforts, on top of the current mainline kernel, and further distinguishes Oracle Linux as the best choice for enterprise customers.""

Wednesday, October 6, 2010

-- Real Application Testing for Earlier Releases --

Metalink Note: 560977.1: Real Application Testing for Earlier Releases

What is being announced?

Oracle Database 11g introduced Database Replay and SQL Performance Analyzer as part of the Real Application Testing option to enable businesses identify issues with system changes before production deployment. This note provides information on the Real Application Testing functionality available
for pre-11g database releases. The main goal of making Real Application Testing functionality available in earlier releases is to enable customers to upgrade to higher database releases.

SQL Performance Analyzer allows users to identify SQL performance regressions caused by system changes such as adding an index, gathering optimizer statistics, implementing partitioning, RDBMS upgrades, etc. SQLPerformance Analyzer assesses the impact of change on SQL response times by executing each SQL serially with production context (bind variables, optimizer environment, etc.) before and after a change and then provides a report highlighting any performance divergences.

Database Replay allows users to perform comprehensive testing of database and infrastructure changes using real application workloads. It allows capture of production workload including concurrency, think time, and transactionsdependencies, and allows users to replay the workload on a test system with the exact same production characteristics so that all problems can be identified and remediate in test before deploying the change to production.

Please check the “Licensing Information” manual on OTN for details regarding licensing of Real Application Testing Option.

What do you need to do?

What you need to do to use Real Application Testing functionality in earlier releases?

Real Application Testing functionality for pre-11g database releases is installed using the “opatch” utility and following instructions for the generic and platform specific “README” for the patches. If you want to only use Database Replay or SQL Performance Analyzer, then only the patch for that particular functionality
needs to be applied. If you want to use both Database Replay and SQL Performance Analyzer, the patches for both the functionalities need to be applied.

Database Replay

To help customers upgrade to Oracle Database 11g, the “capture” functionality of Database Replay has been made available in previous releases as shown in thetable below. The replay of the captured workload can only be done on Oracle Database 11g and higher.

Please refer to Oracle documentation on OTN for information on the capture part of Database Replay available for the earlier releases.


Enabling Workload Capture


This step is required for 10.2.0.4.0 only. By default, the workload capture is disabled on the pre-11g database releases. Database workload capture functionality is enabled on the system by specifying
the PRE_11G_ENABLE_CAPTURE initialization parameter. To enable workload capture, run the wrrenbl.sql script at the SQL prompt as sys or system: This step is NOT required for version lower then 10.2.0.4.0.

@$ORACLE_HOME/rdbms/admin/wrrenbl.sql

The wrrenbl.sql script calls the ALTER SYSTEM SQL statement to set the PRE_11G_ENABLE_CAPTURE initialization parameter to TRUE. Please check the OTN documentation mentioned above for more details.

SQL Performance Analyzer

SQL Performance Analyzer functionality has been enhanced and made available in earlier releases to help customers upgrade their databases from versions Oracle 9.x, 10.1.x, and 10.2.x to higher release. Applying the one-off patches as shown below actually installs the SQL Performance Analyzer functionality for that
release. After the functionality is installed, customers can then leverage Oracle Database 11g SQL Performance Analyzer functionality to upgrade from any database releases 9.x, 10.1.x, 10.2.x to 10.2 or higher release. The one-off patches that need to be applied depend on the version of the source and the
destination databases as given in the table below.

Further information on this topic is available in the technical white paper “Oracle Real Application Testing: Testing the SQL Performance Impact of Oracle 9i/10g Release 1 to Oracle Database 10g Release 2 upgrade with SQL Performance Analyzer”. It is recommended to review this document before using SQL Performance Analyzer functionality for earlier releases.

Patch 6865809 should be applied on the Oracle Database Release 11.1.0.6 test database that is used to orchestrate SPA on Oracle Database 10.2. The patch enables Oracle 11g test database to consume SQL trace files from previous database releases and generate a SQL Tuning Set (STS). The STS can then be
used as input for the SQL Performance Analyzer task.

Patches applied on Oracle Database 10.2 source or destination databases enable “Remote SQL Test Execution” functionality on that release for more accurate analysis of performance data.

Table 3: SQL Performance Analyzer Availability Information. For Windows
platform, see additional notes below:





Note for Windows Platforms:


* For Windows 32-bit Platform patch number on top of 11.1.0.6.0 database is 7044721.
* For Windows 64-bit AMD64 patch number on top of 11.1.0.6.0 database is 7044728.

Note for 11.1.0.7.0 Database

If you're using 11.1.0.7.0 database for SPA testing, you do not need to apply any of the above-mentioned patches on the 11g SPA system that is used for analysis. All the fixes are already in there. On 10.X database side, you do need the patch as mentioned above.

Who to contact for more information?

Please create a Service Request with Oracle Support

References

Note 466181.1 - 10g Upgrade Companion
Note 562899.1 - TESTING SQL PERFORMANCE IMPACT OF AN ORACLE 9i
TO ORACLE DATABASE 10g RELEASE 2 UPGRADE WITH SQL
PERFORMANCE ANALYZER

Note 601807.1 - Oracle 11g Upgrade Companion

Keywords

SQL~PERFORMANCE~ANALYZER ; UPGRADE ; SQL~TUNING ; UPGRADE~
TO~10.2.0.3.0 ; MIGRATE~DATABASE ; PATCH~UPGRADE~TO10.sp;SQL~T
UNING~ADVISOR ; TUNING~PACK ; UPGRADE~FROM~9.2.0 ; UPGRADE~T
O~9.2.0 ;

-- Failed login attempts updates the account status column as LOCKED in 11.2.0.1 --

There is a known bug under 11.2.0.1 that displays the incorrect account status when a user has accidently locked an account via too many invalid passwords attempts. In a nutshell, too many password attempts should result in a “LOCKED(TIMED)” status, but instead result in a “LOCKED” status. The “LOCKED” status is what an account is set to when a DBA intentionally locks it (i.e. when somebody leaves the company or team).


There is a work around which would require updating all profiles to ensure password_lock_time is set to a number, rather than unlimited. Alternatively we can apply the 11.2.0.2 patchset.

Wednesday, July 21, 2010

-- Warning while upgrading to 11g RAC --

Recently while upgrading to 11gR2 one of our RAC databases it failed while running ASMCA which is now part of grid infrastructure, we found that we were hitting the Bug: 9818657 (11.2 UPGRADE FROM 10.2 FAILED WHEN RUNNING ASMCA), the issue was that we had set in the bash_profile the ORA_CRS_HOME environment variable pointing to the old 10g environment, we removed the parameter from the bash_profile and that solved the issue.

Wednesday, June 23, 2010

-- How to manage AUDIT trails automatically in 11g before it grows out-of-control overtime --

By Ken Patel

11.2 has AUDIT trail turned on for lot of SYSTEM privileges. But there is no automatic purge job defined out-of-box, not only that SYS.AUD$ is located in SYSTEM tablespace which can cause space management problem unnecessarily.  
11.2 has new package DBMS_AUDIT_MGMT to manage this. one of good thing of this package, is that it allows DBA to setup automatic purge job (not to say, Oracle should have done this out-of-box with 7 days or some good interval just like AWR snapshots). this package also helps moving SYS.AUD$ to SYSAUX or any other tablespace DBA wants.

I think, DBA should manage this as soon as 11.2 Database is setup. Purging takes time if done after few weeks. I would do atleast this much after deploying 11.2 database for controlling audit trails ( Documentation has more details on various other package calls as well as different argument and their significance).

1.
-- this block initializes the clean up as well as moves SYS.AUD$ to SYSAUX tablespace

BEGIN
DBMS_AUDIT_MGMT.INIT_CLEANUP(AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_ALL,
DEFAULT_CLEANUP_INTERVAL => 12 );
END;
/

2.
-- manual clean up
BEGIN
  DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL(
   AUDIT_TRAIL_TYPE           =>  DBMS_AUDIT_MGMT.AUDIT_TRAIL_ALL,
   USE_LAST_ARCH_TIMESTAMP    =>  TRUE );
END;
/

2.
-- creating automatic purge job

BEGIN
  DBMS_AUDIT_MGMT.CREATE_PURGE_JOB (
   AUDIT_TRAIL_TYPE            => DBMS_AUDIT_MGMT.AUDIT_TRAIL_ALL,
   AUDIT_TRAIL_PURGE_INTERVAL  => 12,
   AUDIT_TRAIL_PURGE_NAME      => 'Sys_Audit_Purge_JOB',
   USE_LAST_ARCH_TIMESTAMP     => TRUE );
END;

Ken is a Senior Database Engineer at Comcast NETO DBE&O Solutions Engineering Team.

Wednesday, May 26, 2010

-- Sometimes you have no option but to use Multithreaded Servers --

A customer of mine was faced with the situation of having poor memory management even that ASMM was configured, I noticed poor buffer cache utilization 46%, low hit ratio 84% and the sqls were constantly reloading, when faced with the question of why we do not pin objects in the buffers the answer was that their Windows System is 32 bits and was maxed out due to the number of connections to the 3GB limit.

Although we all know that dedicated servers are faster that shared servers, we also know that the most expensive operation in a transaction is I/O, in the case of my customer I know his service time is about 20ms per I/O which is not the best (They use EMC Clarion). So in this case we need to sacrifice a bit of CPU to reduce I/O and do more cache utilization.

The most important factor when configuring MTS is to determinate the number of servers and dispatchers that you will need and this is a good rule of thumb.

For dispatchers of 1 per every 250 connections works well for typical systems, Being too aggressive in your estimates is not beneficial; configuring too many dispatchers can degrade performance.

For shared servers typical systems seem to stabilize at a ratio of 1 shared server for every 10 connections. On NT, exercise care when setting MTS_MAX_SERVERS to too a high value because as mentioned, each server is a thread in a common process. The optimal values for these settings can change based on your configuration; these are just estimates of what seems to work for typical configurations.

Example OLTP Application: If you expect to require 2,000 concurrent connections, begin with 200 shared servers or 1 shared server for every 10 connections. Set MTS_MAX_SERVERS to 400.

Because MTS uses your large_pool for storing MTS related UGA information , we recommend to increase and explicitly set the minimum large_pool_size. If you do not set a value for LARGE_POOL_SIZE, Oracle uses the shared pool for MTS user session memory.

Some important views that will help you to tune your servers / dispatchers ratio:

-  V$DISPATCHER  This view gives information about dispatcher processes.

-  V_$SHARED_SERVER This view gives information about shared server processes.  

-  V$CIRCUITS  This view gives information about the virtual circuits. Virtual circuits are state objects that act as repositories of all the user related state information that needs to be accessed during a database session. There is one virtual circuit per client connection.

-  V$QUEUE This view gives information about messages in the common message queue and the dispatcher message queues. 

- V$DISPATCHER_RATE This view gives information and statistics about the rate at which each dispatcher is receiving and handling messages, events, and so on.

The following initialization parameters control shared server operation:
  • SHARED_SERVERS: Specifies the initial number of shared servers to start and the minimum number of shared servers to keep. This is the only required parameter for using shared servers.
  • MAX_SHARED_SERVERS: Specifies the maximum number of shared servers that can run simultaneously.
  • SHARED_SERVER_SESSIONS: Specifies the total number of shared server user sessions that can run simultaneously. Setting this parameter enables you to reserve user sessions for dedicated servers.
  • DISPATCHERS: Configures dispatcher processes in the shared server architecture.
  • MAX_DISPATCHERS: Specifies the maximum number of dispatcher processes that can run simultaneously. This parameter can be ignored for now. It will only be useful in a future release when the number of dispatchers is auto-tuned according to the number of concurrent connections.
  • CIRCUITS: Specifies the total number of virtual circuits that are available for inbound and outbound network sessions.

=======================================================

One of  my readers posted the followintg comment and I decided to share it, thanks Mitra for the tip

" I used to administer quite a few 32 bit oracle boxes on windows and faced the same problem. Its annoying but the solution I found is to resuce the stack size of the "oracle.exe" and the "listener.exe" with the orastack utility. I believe that the default is 1 MB and for oltp applications 128KB works great. Please try this next time and I bet it will get some extra millage for you"

Yes, reducing the Oracle stack size is definitively an option, reducing it too much will produce the error “ORA-03113: end-of-file on communication channel” errors. Oracle supplies the ORASTACK utility to allow customers to modify the default stack size of a thread / session when created in the Oracle executable.

When ORASTACK is run against specific executables it alters the part of the binary header that defines the default stack size used by the create thread API.

Below are examples of setting the stack to 500K for the main executables :



orastack oracle.exe 500000
orastack tnslsnr.exe 500000

orastack svrmgrl.exe 500000

orastack sqlplus.exe 500000