Matilda Systems Corporation | High Availability Resources
Mon, December 6, 2004   

 
Recent Events




Quick Links

What We Do

High Availability Resources

Where we are located

Kids Zone

What is HAGEO?

Home >  Resources > HACMP Resources Collection > What is HAGEO?

This page is part of the Matilda Team's HACMP Resources Collection. The home page of the collection is located here.

IMPORTANT: read the disclaimer BEFORE you use any information provided in this collection.


Introduction

IBM's High Availability Geographic Clustering software (HAGEO for short) is an extension to IBM's High Availability Cluster Multi-Processing software (HACMP) allowing a single HACMP cluster to be dispersed across two (2) sites, unlimited by distance. HAGEO provides three key additional facilities:
  • Support for any wide area network that supports IP.

  • Remote data replication at the logical volume level, in three modes of operation :

    1. Synchronous
    2. Synchronous with Mirror Write Consistency
    3. Asynchronous

  • Management and integration tools to assist in the planning, configuration and administration of HAGEO.
Note: IBM's HAGEO product was replaced by HACMP XD as part of the introduction of HACMP 5.1. This page's description of HAGEO is still accurate in the sense that it describes the "geographically distributed clustering" capabilities of HACMP XD. There are HACMP XD features which are not discussed on this page.

Why might an organisation invest in HAGEO?

HAGEO is aimed primarily at organisations that require a live disaster recovery solution for their business critical applications and data. This might include companies with data centres in locations prone to prolonged power failure, extreme weather conditions or simply where the implication of data loss would severly affect future operations.

HAGEO can provide a swift and effective means of recovering access to data and applications on one site following the partial or complete loss of data processing facilities at another owing to site wide events as diverse as fire, flood, power black outs, terrorist or criminal action or innocent human error.

Unlike other disaster recovery and remote data replication solutions, HAGEO offers complete independence of chosen disk, network and application.

How does HAGEO work (the short version)?

HAGEO works in conjunction with HACMP 'classic' or HACMP/ES to extend a single HACMP cluster across a geography. HAGEO does not replace or replicate any functionaility of HACMP, rather it adds support for heartbeat and messaging across wide area networks and adds facilities for mirroring of logical volumes across an IP network.

HAGEO relies on HACMP to perform the following key functions:

  1. Event detection
  2. Diagnosis
  3. Recovery
  4. Reintegration

Therefore, HAGEO does not actually monitor any element of the geographic cluster, neither does HAGEO diagnose or react to a status change in the geographic cluster. These functions remain the responsibility of HACMP. What HAGEO does add, is support for mirroring logical volumes from one site, to another. This is achieved by the means of a pseudo logical volume, known as a geomirror device.

Geomirror devices

Each logical volume to be mirrored between the two sites in a geographic cluster must have an associated geomirror device (GMD). The geomirror device is a pseudo logical volume that has a local and remote component. The existence and behaviour of the geomirror device is transparent to the application and logical volume manager.

Geomirror devices can operate in any one of three different modes of mirroring. Each geomirror device configured in a single cluster can be configured to mirror in any of the three modes. These modes are:

  1. Synchronous
  2. Synchronous with Mirror Write Consistency
  3. Asynchronous
These three modes of mirroring represent a trade off between data integrity and performance.

Geograhic Mirroring modes explained

Each mode of geographic mirroring offers different availability and performance characteristics. Any single logical volume may be geographically mirrored across two widely dispersed sites using a geomirror device.

In synchronous mirroring mode, the data is written to the remote site's disks and then written to the local site's disks. Transaction-oriented writes (eg. database transactions and writes using the AIX/UNIX synchronous write option) aren't reported as complete until after the data has been written to both sets of disks.

In synchronous with mirror write consistency mode, data is written to both local and remote disks in no particular order and a state map device is used to track the progress of the operations. This state map can be used to recover from various failure scenarios. Transaction-oriented writes aren't reported as complete until the data has been written to both sets of disks.

In Asynchronous mode, data is first written to the local disks and then queued for transmission to the remote site. The data will be written to the remote disks in the same order that it was written to the local disks. If the out-bound queue reaches a pre-defined limit then the GMD reverts to synchronous mirroring with MWC mode until the queue is empty. Transaction-oriented writes are reported as complete as soon as the data is on the local disks.

Note that the contents of this out-bound queue are lost if the node containing the out-bound queue (i.e. the node where the data is being written to by the application) crashes! IMPORTANT: This means that GMDs operating in asynchronous mirroring mode WILL ALMOST CERTAINLY LOSE DATA which the application believes has been written to disk (i.e. committed database transactions are lost and/or corrupted) if the application's node crashes.

Use of asynchronous mirroring is a VERY BAD IDEA in practically all applications.

GMDs configured in either of the synchronous modes can be read and written from both sites simultaneously although care needs to be taken to ensure that a particular disk block's data isn't in transit in both directions at once. GMDs configured in asynchronous mode can only be read and/or written from one site and must be turned around before they can be read/written from the other site.

GMD Mirroring and LVM Mirroring combined

HAGEO can be used in combination with the LVM to improve data availability by virtue of the LVM's capability to mirror a logical volume up to three ways. Whilst HAGEO does not impose a requirement for local mirroring of disk, it is sensible to implementing either LVM mirroring or RAID 1/5 on each site, thereby allowing local disk failure to be handled locally rather than being escalated to a site failure.

How it works...

As discussed earlier in this document, the geographic mirroring device (GMD) appears to AIX as a pseudo logical volume, albeit with two components, one local and one remote. The geographic mirroring device sits above a standard AIX logical volume and does not interfere with local operation on LVM. This separation of geographic and local components allows for RAID or LVM mirroring of disks for availability within a single site, or on both sites in the geographic cluster. It is therefore possible to use LVM in combination with a geographic mirroring device to have up to 6 copies of a logical volume, 3 local and 3 remote!

 

 

IMPORTANT: If you lack the appropriate skills, experience and/or competency, are unwilling to take responsibility for your actions, or if you don't like these disclaimers then don't use this information.