: Hdr: 9469133 11.2.0.1 PCW 11.2.0.1 CSS PRODID-5 PORTID-197
Abstract: CORE DUMP OF OCSSD.BIN WHEN VOTING DISK IS NOT ACTIVATED.
Some explanation from development:
==================================
In 11.2, voting files are discovered, not hard-wired, so we look through a
list of files that are specified in the 'discovery string', e.g.
/dev/vdisk/*, and use all files that appear to be legitimate voting files,
i.e. they have a TOC (Table Of Contents), volume info block, etc. Since the
VG with the voting files is not online, the discovery does not see them at
all, so does not consider them as voting files and fails as a result of an
inability to find enough voting files.
=====================================
The workaround is to wait the disks are completely ONLINE after machine boot before cluster is started
(note 459169.1 CRS Does Not Startup Automatically After Node Reboot,
Manual Start is OK - but this node is up to 11.1 version)
or apply patch, backport is available for Solaris x86-64(64 bit) - please confirm the Operating System
please upload the patch inventory - I may need to raise backport for you
We're getting these errors on the other node when rebooting:
I did see in the boot messages on both hosts these items:
Johnston, Nathaniel [10:39 AM]:
Sep 24 16:05:15 mhddb-nb-2p.philadelphia.pa.bo.comcast.net root: Oracle HA daemon is enabled for autostart.
Johnston, Nathaniel [10:39 AM]:
Sep 24 16:05:18 mhddb-nb-2p.philadelphia.pa.bo.comcast.net unix: vn_rdwr failed with error 0x15
Sep 24 16:05:18 mhddb-nb-2p.philadelphia.pa.bo.comcast.net unix: kobj_load_module: read header failed
Johnston, Nathaniel [10:39 AM]:
Sep 24 16:05:19 mhddb-nb-2p.philadelphia.pa.bo.comcast.net root: exec /u01/app/grid/perl/bin/perl -I/u01/app/grid/perl/lib /u01/app/grid/bin/crswrapexece.pl /u01/app/grid/crs/install/s_crsconfig_mhddb-nb-2p_env.txt /u01/app/grid/bin/ohasd.bin "reboot"
Sep 24 16:05:19 mhddb-nb-2p.philadelphia.pa.bo.comcast.net root: exec /u01/app/grid/perl/bin/perl -I/u01/app/grid/perl/lib /u01/app/grid/bin/crswrap
Johnston, Nathaniel [10:40 AM]:
Sep 24 16:05:36 mhddb-nb-2p.philadelphia.pa.bo.comcast.net mDNSResponder (Engineering Build) (Nov 2 2009 05:02:07) [5272]: starting
Sep 24 16:05:37 mhddb-nb-2p.philadelphia.pa.bo.comcast.net mDNSResponder: Oracle mDNSResponder starting
======================================
WORKAROUND
======================================
11gR2 CRS doesn't startup after node reboot [ID 1050164.1] |
|
|
| Modified 31-JAN-2010 Type PROBLEM Status PUBLISHED |
|
In this Document
Symptoms
Changes
Cause
Solution
Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1.0 to 11.2.0.1.0 - Release: 11.2 to 11.2
Generic Linux
Symptoms
- Installation of the 11gR2 Grid Infrastructure on a Linux cluster completed successfully
- OCR & Voting files located in ASM diskgroup
- using ASMLIB driver
- ASM disks are located on multipath devices (/dev/mapper/)
- following a node reboot CRS does not startup
- CSS daemon log shows the following message:
2010-01-13 09:04:15.075: [ CSSD][1150449984]clssnmvDDiscThread: using discovery string for initial discovery
2010-01-13 09:04:15.075: [ SKGFD][1150449984]Discovery with str::
2010-01-13 09:04:15.075: [ SKGFD][1150449984]UFS discovery with ::
2010-01-13 09:04:15.075: [ SKGFD][1150449984]OSS discovery with ::
2010-01-13 09:04:15.076: [ SKGFD][1150449984]Discovery with asmlib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: str ::
2010-01-13 09:04:15.076: [ SKGFD][1150449984]Fetching asmlib disk :ORCL:DATA1:
2010-01-13 09:04:15.076: [ SKGFD][1150449984]Fetching asmlib disk :ORCL:DATA2:
2010-01-13 09:04:15.076: [ SKGFD][1150449984]Fetching asmlib disk :ORCL:DATA3:
2010-01-13 09:04:15.076: [ SKGFD][1150449984]Fetching asmlib disk :ORCL:DATA4:
2010-01-13 09:04:15.077: [ SKGFD][1150449984]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted)
2010-01-13 09:04:15.077: [ SKGFD][1150449984]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted)
2010-01-13 09:04:15.077: [ SKGFD][1150449984]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted)
2010-01-13 09:04:15.077: [ SKGFD][1150449984]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted)
2010-01-13 09:04:15.077: [ CSSD][1150449984]clssnmvDiskVerify: Successful discovery of 0 disks
2010-01-13 09:04:15.077: [ CSSD][1150449984]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2010-01-13 09:04:15.077: [ CSSD][1150449984]clssnmvFindInitialConfigs: No voting files found
2010-01-13 09:04:15.077: [ CSSD][1150449984]###################################
2010-01-13 09:04:15.077: [ CSSD][1150449984]clssscExit: CSSD signal 11 in thread clssnmvDDiscThread
2010-01-13 09:04:15.077: [ CSSD][1150449984]###################################
2010-01-13 09:04:15.077: [ CSSD][1139960128]clssgmClientShutdown: total iocapables 0
2010-01-13 09:04:15.077: [ CSSD][1139960128]clssgmClientShutdown: graceful shutdown completed.
2010-01-13 09:04:15.077: [ CSSD][1150449984]
- running the cluster verification utility returns the following messages:
/cluvfy stage -post crsinst -n racnode1
Performing post-checks for cluster services setup
Checking node reachability...
Node reachability check passed from node "racnode1"
Checking user equivalence...
User equivalence check passed for user "grid"
Checking time zone consistency...
Time zone consistency check passed.
ERROR:
Cluster manager integrity check failed
PRVF-5434 : Cannot identify the current CRS software version
UDev attributes check for OCR locations started...
UDev attributes check passed for OCR locations
UDev attributes check for Voting Disk locations started...
ERROR:
PRVF-5197 : Failed to retrieve voting disk locations
UDev attributes check failed for Voting Disk locations
Default user file creation mask check passed
Checking cluster integrity...
Cluster integrity check failed This check did not run on the following node(s):
racnode1
Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations
ERROR:
PRVF-5300 : Failed to retrieve active version for CRS on this node
OCR integrity check failed
Checking CRS integrity...
ERROR:
PRVF-5300 : Failed to retrieve active version for CRS on this node
CRS integrity check failed
OCR detected on ASM. Running ACFS Integrity checks...
Starting check to see if ASM is running on all cluster nodes...
PRVF-5137 : Failure while checking ASM status on node "racnode1"
Starting Disk Groups check to see if at least one Disk Group configured...
PRVF-5112 : An Exception occurred while checking for Disk Groups
PRVF-5114 : Disk Group check failed. No Disk Groups configured
Task ACFS Integrity check failed
Checking Oracle Cluster Voting Disk configuration...
ERROR:
PRVF-5434 : Cannot identify the current CRS software version
PRVF-5431 : Oracle Cluster Voting Disk configuration check failed
User "grid" is not part of "root" group. Check passed
Post-check for cluster services setup was unsuccessful on all the nodes.
Changes
Node was rebooted after install.
Cause
The CSS daemon crashes because it cannot locate any Voting files in any of the discovered ASM disks, which is indicated by the following message in the CSS daemon log (/log//cssd/ocssd.log):
2010-01-13 09:04:15.077: [ CSSD][1150449984]clssnmvFindInitialConfigs: No voting files found
This error is preceded by the following ASMLIB error:
2010-01-13 09:04:15.077: [ SKGFD][1150449984]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted)
suggesting that ASMLIB has problem accessing the ASM disk.
Solution
1. either edit the file
/etc/sysconfig/oracleasm-_dev_oracleasm and change the lines:
ORACLEASM_SCANORDER=""
ORACLEASM_SCANEXCLUDE=""
to
ORACLEASM_SCANORDER="dm"
ORACLEASM_SCANEXCLUDE="sd"
or alternatively run the following command (as user root)
/usr/sbin/oracleasm configure -i -e -u user -g group -o "dm" -x "sd"
2. stop & restart ASMLIB as user root using:
/usr/sbin/oracleasm exit
/usr/sbin/oracleasm init
3. restart CRS or reboot node
The above steps need to be executed on all nodes