Home > Resources > HACMP
Resources Collection > What is HAGEO?
This page is part of the Matilda Team's HACMP Resources
Collection. The home page of the collection is located here.
IMPORTANT: read the disclaimer BEFORE
you use any information provided in this collection.
Introduction
IBM's High Availability Geographic Clustering software (HAGEO
for short) is an extension to IBM's High Availability Cluster Multi-Processing
software (HACMP) allowing a single HACMP cluster to be dispersed across
two (2) sites, unlimited by distance. HAGEO provides three key additional
facilities:
- Support for any wide area network that supports IP.
- Remote data replication at the logical volume level, in three
modes of operation :
- Synchronous
- Synchronous with Mirror Write Consistency
- Asynchronous
- Management and integration tools to assist in the planning, configuration
and administration of HAGEO.
Note: IBM's HAGEO product was replaced by HACMP XD as part of the introduction
of HACMP 5.1. This page's description of HAGEO is still accurate in
the sense that it describes the "geographically distributed clustering" capabilities
of HACMP XD. There are HACMP XD features which are not discussed on
this page.
Why might an organisation invest in HAGEO?
HAGEO is aimed primarily at organisations that require a live
disaster recovery solution for their business critical applications and
data. This might include companies with data centres in locations prone
to prolonged power failure, extreme weather conditions or simply where
the implication of data loss would severly affect future operations.
HAGEO can provide a swift and effective means of recovering access
to data and applications on one site following the partial or complete
loss of data processing facilities at another owing to site wide
events as diverse as fire, flood, power black outs, terrorist or
criminal action or innocent human error.
Unlike other disaster recovery and remote data replication solutions,
HAGEO offers complete independence of chosen disk, network and application.
How does HAGEO work (the short version)?
HAGEO works in conjunction with HACMP 'classic' or HACMP/ES
to extend a single HACMP cluster across a geography. HAGEO does not replace
or replicate any functionaility of HACMP, rather it adds support for
heartbeat and messaging across wide area networks and adds facilities
for mirroring of logical volumes across an IP network.
HAGEO relies on HACMP to perform the following key functions:
- Event detection
- Diagnosis
- Recovery
- Reintegration
Therefore, HAGEO does not actually monitor any element of the geographic
cluster, neither does HAGEO diagnose or react to a status change
in the geographic cluster. These functions remain the responsibility
of HACMP. What HAGEO does add, is support for mirroring logical volumes
from one site, to another. This is achieved by the means of a pseudo
logical volume, known as a geomirror device.
Geomirror devices
Each logical volume to be mirrored between the two sites
in a geographic cluster must have an associated geomirror device
(GMD). The geomirror device is a pseudo logical volume that has a
local and remote component. The existence and behaviour of the geomirror
device is transparent to the application and logical volume manager.
Geomirror devices can operate in any one of three different
modes of mirroring. Each geomirror device configured in a single
cluster can be configured to mirror in any of the three modes.
These modes are:
- Synchronous
- Synchronous with Mirror Write Consistency
- Asynchronous
These three modes of mirroring represent a trade off between data
integrity and performance.
Geograhic Mirroring modes explained
Each mode of geographic mirroring offers different availability
and performance characteristics. Any single logical volume may be
geographically mirrored across two widely dispersed sites using a
geomirror device.
In synchronous mirroring mode, the data is written to the remote
site's disks and then written to the local site's disks. Transaction-oriented
writes (eg. database transactions and writes using the AIX/UNIX
synchronous write option) aren't reported as complete until after
the data has been written to both sets of disks.
In synchronous with mirror write consistency mode, data is written
to both local and remote disks in no particular order and a state
map device is used to track the progress of the operations. This
state map can be used to recover from various failure scenarios.
Transaction-oriented writes aren't reported as complete until
the data has been written to both sets of disks.
In Asynchronous mode, data is first written to the local disks
and then queued for transmission to the remote site. The data
will be written to the remote disks in the same order that it
was written to the local disks. If the out-bound queue reaches
a pre-defined limit then the GMD reverts to synchronous mirroring
with MWC mode until the queue is empty. Transaction-oriented
writes are reported as complete as soon as the data is on the
local disks.
Note that the contents of this out-bound queue are lost
if the node containing the out-bound queue (i.e. the node where
the data is being written to by the application) crashes! IMPORTANT:
This means that GMDs operating in asynchronous mirroring mode WILL
ALMOST CERTAINLY LOSE DATA which the application believes
has been written to disk (i.e. committed database transactions
are lost and/or corrupted) if the application's node crashes.
Use of asynchronous mirroring is a VERY BAD IDEA in practically
all applications.
GMDs configured in either of the synchronous modes can be read
and written from both sites simultaneously although care needs
to be taken to ensure that a particular disk block's data isn't
in transit in both directions at once. GMDs configured in asynchronous
mode can only be read and/or written from one site and must be turned
around before they can be read/written from the other site.
GMD Mirroring and LVM Mirroring combined
HAGEO can be used in combination with the LVM to improve
data availability by virtue of the LVM's capability to mirror a logical
volume up to three ways. Whilst HAGEO does not impose a requirement
for local mirroring of disk, it is sensible to implementing either
LVM mirroring or RAID 1/5 on each site, thereby allowing local disk
failure to be handled locally rather than being escalated to a site
failure.
How it works...
As discussed earlier in this document, the geographic mirroring
device (GMD) appears to AIX as a pseudo logical volume, albeit with
two components, one local and one remote. The geographic mirroring
device sits above a standard AIX logical volume and does not interfere
with local operation on LVM. This separation of geographic and local
components allows for RAID or LVM mirroring of disks for availability
within a single site, or on both sites in the geographic cluster.
It is therefore possible to use LVM in combination with a geographic
mirroring device to have up to 6 copies of a logical volume, 3 local
and 3 remote!
IMPORTANT: If you lack the appropriate skills, experience and/or
competency, are unwilling to take responsibility for your actions,
or if you don't like these disclaimers then
don't use this information.
|