Wednesday, September 16, 2009

WARNING::lib=/opt/oracle/extapi/64/asm err:9 rc:Directory does not exist

We were receiving this warning in both of our 11g RAC node's alert logs , after research i found that it is an ORACLE bug which is trying to warn you about ASMLIB not found in /opt !

error snap:

WARNING::lib=/opt/oracle/extapi/64/asm err:9 rc:Directory does not exist
location:skgdllOpenDi
errbuf=2
msgbuf=No such file or directory

Solution:

it is a warning not an error, safely ignore it!

Reference:

[1] Metalink Note. 727204.1

cheers

error, Switch to short timeout for ipc polling

if you have hard luck as me you are hit by another BUG !

symptoms:

you are receiving in /u01/app/diag/asm/+asm/+ASMSID/trace files

e.g.

Switch to short timeout for ipc polling
a session (kjzhi) is registered
session (kjzhi) is about to end
Registered session (kjzhi)[11][4][0][1] is cleaned up
Switch to long timeout for ipc polling

cause:

it is not your fault :) you require to run Patch id: 6678289

in case if you are using SPARC you may have to install SUN Patch id 123908-01 (or later) before doing Patch 6678289.

Reference:
[1] Metalink Note: 750773.1
[2] Metalink Note: 353150.1

Cheers

CRS-0184: Cannot communicate with the CRS daemon, Hunted!

after reboot of my rac nodes, i found that one of the nodes is having problem in starting up CRS i.e.

Error:

CRS-0184: Cannot communicate with the CRS daemon

Logs:

$ less /u01/app/crs/log/nodename/crsd/crsd.log

output:

2009-09-09 09:52:49.154: [ COMMCRS][2]clsc_connect: (10076b610) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_nodename_))

2009-09-09 09:52:49.154: [ CSSCLNT][1]clsssInitNative: failed to connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_nodename_)), rc 9

2009-09-09 09:52:49.156: [ CRSRTI][1] CSS is not ready. Received status 3 from CSS. Waiting for good status ..

Reason:

sometimes when CRS server reboot it try to create sockets under /tmp/.oracle or /var/tmp/.oracle and there are already previous socket files...which are protecting to create new sockets.

Solution:

using root user remove all files under /tmp/.oracle or /var/tmp/.oracle

restart CRS on faulty node or even reboot that machine!

cheers

Tuesday, September 15, 2009

IPMP Error, All Interfaces in group ipmpsync have failed

we were having problem with our IPMP configured virtual interface, it was frequently fluctuating...and even once it was failover to other node in RAC, after investigation we found that there are following kind of errors reported in IPMP logs i.e.

error snap:

Cannot meet requested failure detection time of 10000 ms on (inet ce0) new failure detection time for group "ipmp0" is 188510 ms
Improved failure detection time 47127 ms on (inet ce1) for group "ipmp0"
All Interfaces in group ipmpsync have failed

after small googling i found this post, where this kinda behavior is explained, in fact, it is expected due to network overhead, as IPMP regularly after a small interval test its configured interfaces, and due to any reason if it is unable to test the availability it will report subject interface as down, and will increase the time interval with predefined amount of time...in case if this behavior continues it will declare interface down and possibly will failover that specific interface to any other available node.

We can manually increase or decrease the IPMP testing period/interval by modifying "FAILURE_DETECTION_TIME" to any value in milliseconds, inside /etc/default/mpathd file.

Then you need to run

pkill -HUP in.mpathd

You can continue to increase this value and once you define a rational amount of time where you stop getting this error, you can continue to work on network/communication tuning and find out the network overheads!

Cheers

Monday, September 7, 2009

Error in Voting/OCR disk during CRS installation

My CRS installation story is still continuing, as soon as i get some error, a new blog post is ready :)

now during CRS installation in the screen where you have to supply Voting/OCR disk you might get following error i.e.

The specified shared raw partition /dev/rdsk/ora_ocr_raw_280m may not have correct permission. Verify that the partition is owned by Oracle User.

and when you check your disk's permission, it is telling you something else !

$ls -ltr /dev/rdsk/ora_ocr_raw_280m
crw-rw---- 1 oracle oinstall 85, 8194 Sep 6 08:51 /dev/rdsk/ora_ocr_raw_280m

does it mean the permission is correctly set ? NO NOT YET ;)

you need to check the permission of actual device ! i.e.

# ls -ltr ../../devices/scsi_vhci/ssd@g50060e800000000000005ba500000020:a,raw
crw-r----- 1 root sys 118, 504 Aug 28 16:35 ../../devices/scsi_vhci/ssd@g50060e800000000000005ba500000020:a,raw

so from here you can see that the actual device is owned by root:sys which is causing this permission error...you need to change the permission of actual device to oracle:oinstall. i.e.

#chown -R -h oracle:oinstall ../../devices/scsi_vhci/ssd@g50060e800000000000005ba500000020:a,raw

# ls -ltr ../../devices/scsi_vhci/ssd@g50060e800000000000005ba500000020:a,raw

crw-r----- 1 oracle oinstall 118, 504 Sep 4 14:55 ../../devices/scsi_vhci/ssd@g50060e800000000000005ba500000020:a,raw


NOTE: in both voting & ocr you should use slice 0 of the disk.

cheers

Thursday, September 3, 2009

Concept behind IP Network Addressing in crs installation, rac

As yesterday i was installation CRS and i got stuck on Network Configuration Screen, where i have to fill the Public, Virtual & Private network IPs... i was getting following error i.e.

You must enter unique values for the public node name, the private node name and the virtual hostname for all nodes in the cluster. The name, YOUR-HOSTNAME , that you entered is being used by more than once for the same node.

AND

The virtual hostname(s), YOUR-HOSTNAME, you have specified appears to be already assigned to another system on the network. Please ensure that the virtual hostname(s) that you use for each of the nodes in the cluster are not in use currently.

after reading the ORACLE official documentation and testing the provided machine's configuration i found that, my installation server had name resolution problem, firstly our DNS was resolving hostname on wrong IP and later when they changed it to resolve from /etc/hosts it was unable to resolve from this file...anyhow later with investigation i was able to give a brief idea that how CRS IP Network Configuration should look like, its simple ! here it is:

Concept:

You should have three IPs/Hostname i.e.

1. Public Hostname/IP (Physical Interface):
Public Hostname should register in DNS or /etc/hosts file and should be accessible i.e. one can ping it.

2. VIP Hostname/IP(Logical Interface):
VIP Hostname/IP should register in DNS or /etc/hosts file and should NOT be accessible i.e. one CANNOT ping it.

NOTE: in case of IPMP VIP should be LOGICAL interface.

3. Private Hostname/IP(Physical Interface):
Private Hostname should register in DNS or /etc/hosts file and should be accessible i.e. one can ping it.

cheers