Tuesday, September 15, 2009

IPMP Error, All Interfaces in group ipmpsync have failed

we were having problem with our IPMP configured virtual interface, it was frequently fluctuating...and even once it was failover to other node in RAC, after investigation we found that there are following kind of errors reported in IPMP logs i.e.

error snap:

Cannot meet requested failure detection time of 10000 ms on (inet ce0) new failure detection time for group "ipmp0" is 188510 ms
Improved failure detection time 47127 ms on (inet ce1) for group "ipmp0"
All Interfaces in group ipmpsync have failed

after small googling i found this post, where this kinda behavior is explained, in fact, it is expected due to network overhead, as IPMP regularly after a small interval test its configured interfaces, and due to any reason if it is unable to test the availability it will report subject interface as down, and will increase the time interval with predefined amount of time...in case if this behavior continues it will declare interface down and possibly will failover that specific interface to any other available node.

We can manually increase or decrease the IPMP testing period/interval by modifying "FAILURE_DETECTION_TIME" to any value in milliseconds, inside /etc/default/mpathd file.

Then you need to run

pkill -HUP in.mpathd

You can continue to increase this value and once you define a rational amount of time where you stop getting this error, you can continue to work on network/communication tuning and find out the network overheads!

Cheers

No comments: