Highly Available Crontab
Disclaimer: The information here discusses making changes to the HACMP configuration, no warranty, suitability, functionality is guaranteed, proceed at your own risk.
Lee Elston, Matilda Systems
Purpose:
This is in response to a customer request for modification to HACMP to add in crontab control. It seems that there is two separate issues, first the issue of having a cron job launched when the application is present on another node and secondly keeping the crontab entries synchronized.
Thoughts & Ideas:
Cron uses cron.allow and cron.deny files. These are checked before a crontab entry is processed. Altering the allow and deny files provides instant gratification. Adding or changing actual cron files outside of the crontab will not update the cron processing. (renamed files to deny'ed users still run). If changes are made to the files the cron daemon need an incentive to re-read the cron files
In
large shops many contractors come and go giving assistance to the
customer, although skilled in their specific area there seems to be
tendency not to be HA aware. Many times cron entries are added or
changed without considering HACMP. The requirement to make cron
jobs HA aware is too large of a request as it would involve
educating too many vendors, better to make HACMP more aware of its
environment. To address the first issue of having cron jobs
attempted while the resource group is absent two new event scripts
and a control file have been added. The first event script scans the
control file for userid's associated with HACMP application servers.
It then adds these users to the “/var/adm/cron/cron.deny” file
effectively stopping those cron jobs from running. When an
application server is started the control file is scanned, if there
is a match on the application server starting and the control file
the associated users are removed from the cron.deny file allowing
their cron processes to run.
Crontab synchronization. HACMP
provides a capability of keeping files synchronized in version 5.2,
file collections. The entries are added on a file name basis and
HACM will keep the files the same. There may be a time lag between
the two systems but that is adjustable. The only catch here is cron
may not notice that the file has been altered. To compensate for
that the script that allows cron jobs for specific users to run will
also give the cron daemon a kick in the process to restart it.
A planned feature of HACMP 5.3 is to perform synchronization to on coming nodes if required, this can enhance the HA crontabs by ensuring consistency between the crontabs entries.
The task at hand
The first iteration of HA crontabs we will have to set things up carefully. If this becomes a popular feature I will consider making it more robust and use HACMP type ODM structures, who knows maybe IBM will add this to HACMP.
A pre-event must be added to node_up_local to reset the crontabs. The logic here is if we are running node up then this node was not running and had no resources and no crontab entries related to HACMP application servers should be present.
A post event to the application_start event must be added to allow the crontab files for the specific application that just started. Putting the cron in the post event allows us to check to see if the application start was successful. Note that an unsuccessful application start is application motoring's problem not ours. If application monitoring decides to move the application server we will comply.
A pre-event to application_stop must be added to disable the current HA crontab entries.
The most time consuming part of this installation is to create a file collections group for the crontab entries. Unfortunately at this time we must list each file separately we wish to have synchronized to the other cluster nodes. ** Note: there is an option to have a cron job added to the system to automatically manage keeping the HACMP file collections complete for the files in the crontab directory, as the author I'm not sure if that is a good idea.
Lastly create the control file that links cron.deny user ids to the application server name.
Copy the files to all nodes (or use file collections), synchronize the cluster and test throughly.
FAQ
Q: Does this customization cause support issues with IBM?
A: No, the facility for adding pre and post events to HACMP has existed for many years, few customers actually use this feature. The most important thing to remember is to make sure your event scripts handle necessary error conditions and return a zero exit code back to HACMP. As with any customization document it and provide support with the information up front when reporting problems. Consider additions to clverify to document and verify the customizations. (we can discuss that another day)