Cluster Monitor

Recently I have had another instance of a customer with a very reliable HACMP environment having outages. It goes like this:
The systems run in a blacked out computer room, no operators messing about, and the backup node failed for some reason. This in itself is not a problem but no one noticed, for a month. When the primary system encountered a hardware fault the resources had no place to fail to. There was a total system outage. So I put together a little script to check to see if this is the last node in the cluster and if it is send email to someone to complain about being lonely. We set this in root's crontab file to run every couple of hours. So just for fun here is the script, please change the email address I don't want lots of clusters complaining they are lonely.

Cheers lee


#!/usr/bin/ksh

HADIR=/usr/es/sbin/cluster/utilities

NODE=`$HADIR/get_local_nodename`
LOG=/tmp/ClusterMon.log
MAILTO="lee@matildasystems.com"

OTHERNODE=`$HADIR/clgetactivenodes -n $NODE | grep -v $NODE`

if [[ -z $OTHERNODE ]]
then
mail -s "Message from ClusterMon" $MAILTO <<data
Help!
I seem to be all alone.
My other node is missing.
Please investigate.

Thanks ClusterMon
data
else
TS=`date`
echo $0" "$TS "I seem to have the company of $OTHERNODE. " >> $LOG
fi