Discussion:
Availability Resource Management
(too old to reply)
Marcelo Tosatti
2000-10-14 18:13:22 UTC
Permalink
I'm moving this discussion to linux-ha at muc.de, as suggested by Lars.
On 2000-10-13T11:35:05,
Where I've gotten lost is in the compare/contrast of the
HACMP vs SA models for resource management. Alex broadly
mentioned that HACMP and failsafe's model is shared nothing,
and that SA is more 'shared everything'. I'm personally
of the shared-storage religion, but I haven't yet seen what
about SA that is particularly more friendly towards that
than HACMP/Failsafe.
Hi David, what in particular are you missing in FailSafe?
Lars,

IMHO, a good thing in SA is its mechanism to deal with resource
dependencies.

Do you know how FailSafe handles resource dependencies ?

If so, could you please comment a bit about it?

Thanks!
wombat
2000-10-15 00:55:46 UTC
Permalink
Marcelo wrote:
Lars,

IMHO, a good thing in SA is its mechanism to deal with resource
dependencies.

Do you know how FailSafe handles resource dependencies ?

If so, could you please comment a bit about it?

Thanks!
=========

I'm not Lars ;-) but I'll try a provide an overall comparison here,
although I haven't used FailSafe, just read about it. I'll also provide
information from HACMP, since I know that reasonably well. Of course, only
FailSafe is currently available on Linux, but if the goal is to provide a
comprehensive HA platform on Linux, input is good! (HACMP documentation is
at:
http://www.rs6000.ibm.com/doc_link/en_US/a_doc_lib/aixgen/hacmp_index.html
primarily the 'Concepts and Facilities' book to begin with.)

I might get bits about FailSafe or SA wrong, as I've not extensive
experience with either, but, I'm sure someone will correct me ;-)

Both FailSafe and HACMP provide 'Resource Groups' and in both they mean the
exact same thing - they encapsulate a set of related resources and they are
the failover unit. HACMP only recognizes a defined set of resource types,
e.g., nodes, IP addresses, disk volumes, file systems, applications.
FailSafe provides an ability to define new resource types, assuming you
provide to it the scripts to control that resource. SA can deal with
individual resources and also provides 'application groups' which collect
together resources.

SA uses 'agents' to manage resources, and has a model for how these work.
Phoenix (RSCT) likewise has a component that provides an API that can be
used to build such agents (the RMC component, a follow-on to the Event
Management work I described.) Unfortunately, RMC is only on AIX, it's
release on Linux and/or as OSS is not currently announced. Send requests
to the usual place...

HACMP and FailSafe both use scripts to manage resources (most provided by
the system, but users can add their own, or, if they're very brave, modify
the system-provided ones.)

Note that for HACMP and FailSafe a resource can appear in only one resource
group (nodes and communication adapters aren't really resources, so, aren't
limited this way, but, applications, file systems, IP addresses, etc. are
so limited.) SA doesn't have this, resources can be contained in multiple
resoruce groups if desired, as well as containing other application groups.

As to resource dependencies, HACMP recognizes only pre-defined
relationships, i.e., an application depends on its file system(s) which in
turn depend on volume group(s) [disks], the application also depends on an
IP address. These relationships are imputed by the fact that these
resources are collected in a resource group, thus, HACMP will use this info
when starting a resource group:
- ensure the volume group (disks) is varied on (i.e., available for use)
- mount the file system(s)
- set the IP address up on an adapter
- start the application
HACMP also understands NFS file systems, and imputs them to be dependent
upon an IP address, thus sets up the IP address then mounts the file
system. Although HACMP doesn't strictly allow you to define 'new' resource
types, it does provide many 'user exit' points where users can add in their
own scripts, and this provides a rough way to manage 'new' resource types
in a limited fashion.

FailSafe appears to offer some ability to manipulate resource dependencies,
or, at least resource type dependencies, although the documentation is a
bit unclear. But, it does clearly describe that there are 'levels' for
resources, and these levels are used to order the bring up and shutdown of
resources. By default, I'd assume it works very much like HACMP, but it
does appear to provide a bit more flexibility and customisability here by
allowing new resource types to be fit into the framework.

SA uses the user-configured resource dependency relationships to understand
which resources need to work together, and to determine the order of
bringup and shutdown. The user can configure these dependencies as
desired, thus making this extremely flexible. Having never tried to
configure SA, I can't rationally comment on how many defaults are provided,
i.e., how many resource types SA recognise by default and is able to impute
default dependency relationships among them. As far as I an tell, if you
are willing to write an agent for the resource type, SA can manage it.

HACMP and FailSafe both provide 'fine grain' failover in that a single
resource group can be moved or failed-over to a different node, leaving
other resource groups alone on the node. SA allows you to specify a single
resource, and the dependency informaiton will be used to determine the set
of resources, or, you can direct an application group to be moved. I know
with HACMP (and I think it's the same with FailSafe) that all resource
groups are 'equal' to each other in priority.

To determine where a resource group is placed, HACMP provides two policies,
'cascading' and 'rotating'. Cascading is essentially the same as
FailSafe's 'ordered' default policy, the first node in the list and a
member of the cluster is chosen. Rotating also uses a list of nodes, but
treats all node as equals and picks one based on which one has available
network adapters to use. For both HACMP policies, the list of nodes can be
all nodes in the cluster, or a subset. FailSafe has a 'round-robin' but (I
think) that uses all nodes in the cluster. In addition, FailSafe provides
a user exit here, where the user can provide a script that is allowed to
return a list of nodes dynamically to place the resource group (although
I'm a bit unclear how dynamic it is.)

HACMP and FailSafe both support various 'modifiers' such as auto-failback
and in-place recovery and such.

Here, SA goes somewhat over the top. It uses a combination of weights on
resources, load information, time of day constraints, dependencies and
where competing resources have been placed to determine the best home for a
resource. It also appears to have various goal-based performance setups to
allow 'less important' resources to be moved if a system is getting too
loaded, so the user can define which resources are more important than
others, and, (I think) that based on the time of day these settings can
change automatically. You can model the behaviours of HACMP and FailSafe,
as well as going well beyond them if desired.

I could sum this point up that HACMP and FailSafe are happy so long as all
resource groups are running, whereas SA isn't happy unless a whole range of
dependency, performance, load and time constraints are satisfied.

Some subjective comments. SA is probably overkill when looking at the vast
bulk of Linux clusters likely to be built. I don't see any limitation in
any of these tools supporting common cluster usage, e.g., shared-disk DB
(e.g., Oracle), shared-nothing DB (DB2 UDB), web servers with failover,
etc., since FailSafe provides a proof point, being the only one currently
on Linux. IMHO SA would be quite useful, but, as an administrator with a
small number of nodes and no background in OS/390, it would appear to be a
VERY steep learning curve to get it working. On the other hand, FailSafe
follows a relatively common model for UNIX-based commercial
recovery/failover tools (per its many similarities with HACMP) so would be
familiar to admins with UNIX HA experience. Plus, it's a bit less
overwhelming to a rookie first approaching it.

In a heterogeneous environment centered around a 390 (oops, I mean zServer)
running OS/390 (oops, z/OS) and one or more Linux images, SA may be quite
useful to maintain a consistent model across the whole installation. On a
cluster of Intel workstations running Linux, it will be a harder sell.

These have been the opinions of:
Peter R. Badovinatz -- (503)578-5530 (TL 775)
Clusters and High Availability, Beaverton, OR
wombat at us.ibm.com
and in no way should be construed as official opinion of IBM, Corp., my
email id notwithstanding.
Alan Robertson
2000-10-17 11:47:20 UTC
Permalink
Hi,
Post by wombat
I might get bits about FailSafe or SA wrong, as I've not extensive
experience with either, but, I'm sure someone will correct me ;-)
Lars and I spent the week at ALS, and are now at the Linux Storage
Symposium. We've both been very busy boys ;-) Generally, it looks like you
got most of it right though...
Post by wombat
As to resource dependencies, HACMP recognizes only pre-defined
relationships, i.e., an application depends on its file system(s) which in
turn depend on volume group(s) [disks], the application also depends on an
IP address. These relationships are imputed by the fact that these
resources are collected in a resource group, thus, HACMP will use this info
- ensure the volume group (disks) is varied on (i.e., available for use)
- mount the file system(s)
- set the IP address up on an adapter
- start the application
HACMP also understands NFS file systems, and imputs them to be dependent
upon an IP address, thus sets up the IP address then mounts the file
system. Although HACMP doesn't strictly allow you to define 'new' resource
types, it does provide many 'user exit' points where users can add in their
own scripts, and this provides a rough way to manage 'new' resource types
in a limited fashion.
This is actually done pretty well in FailSafe. You can define resource
types which need certain parameters to make them run right. You can define
the parameters, and they then get passed to the resource group to
instantiate it correctly. It's pretty nicely done and very general.
Post by wombat
FailSafe appears to offer some ability to manipulate resource dependencies,
or, at least resource type dependencies, although the documentation is a
bit unclear. But, it does clearly describe that there are 'levels' for
resources, and these levels are used to order the bring up and shutdown of
resources. By default, I'd assume it works very much like HACMP, but it
does appear to provide a bit more flexibility and customisability here by
allowing new resource types to be fit into the framework.
And it's reasonably easy to do, although the documentation isn't at all
clear in some instances, and even some FS people sometimes don't seem to
completely understand things (though some of those folks are new). We're
working on documenting it better.
Post by wombat
To determine where a resource group is placed, HACMP provides two policies,
'cascading' and 'rotating'. Cascading is essentially the same as
FailSafe's 'ordered' default policy, the first node in the list and a
member of the cluster is chosen. Rotating also uses a list of nodes, but
treats all node as equals and picks one based on which one has available
network adapters to use. For both HACMP policies, the list of nodes can be
all nodes in the cluster, or a subset. FailSafe has a 'round-robin' but (I
think) that uses all nodes in the cluster. In addition, FailSafe provides
a user exit here, where the user can provide a script that is allowed to
return a list of nodes dynamically to place the resource group (although
I'm a bit unclear how dynamic it is.)
This is how the basic services are done. It's done pretty nicely. see
below.
Post by wombat
HACMP and FailSafe both support various 'modifiers' such as auto-failback
and in-place recovery and such.
Here, SA goes somewhat over the top. It uses a combination of weights on
resources, load information, time of day constraints, dependencies and
where competing resources have been placed to determine the best home for a
resource. It also appears to have various goal-based performance setups to
allow 'less important' resources to be moved if a system is getting too
loaded, so the user can define which resources are more important than
others, and, (I think) that based on the time of day these settings can
change automatically. You can model the behaviours of HACMP and FailSafe,
as well as going well beyond them if desired.
I could sum this point up that HACMP and FailSafe are happy so long as all
resource groups are running, whereas SA isn't happy unless a whole range of
dependency, performance, load and time constraints are satisfied.
It's worth mentioning that you've described the *default* failover policy
module for FailSafe. One could write one which behaves very much as you
describe the zServer modules working for SA.

A failover policy for a given resource type takes a list of nodes which are
up and which have been declared as being in the "failover domain" for that
particular resource group, and then the script/program then decides which
node the particular resource will fail over to.

This is *very* flexible, although I have the impression that no one uses
this flexibility much.
Post by wombat
Some subjective comments. SA is probably overkill when looking at the vast
bulk of Linux clusters likely to be built.
Like 99.999% :-) Of course, since FailSafe has the capabilities I've
descibed AND it's open source, it can (potentially) do anything SA can do,
and then some things that no one has thought of yet ;-)

My impression of FailSafe's model is that it is quite good, and very
flexible, but that things aren't as clearly documented as they could be.
We're trying to fix this, and in particular to provide better and clearer
examples. It's much easier to set up than the docs imply once you
understand what you really have to do.


-- Alan Robertson
alanr at suse.com
TEREKHOV
2000-10-16 11:26:53 UTC
Permalink
David,
Where I've gotten lost is in the compare/contrast of the
HACMP vs SA models for resource management. ?Alex broadly
mentioned that HACMP and failsafe's model is
shared nothing, and that SA is more 'shared everything'.
I'm personally of the shared-storage religion, but I haven't
yet seen what about SA that is particularly more friendly
towards that than HACMP/Failsafe.
shared-nothing: in terms of workload processing there is only
one instance of application (hierarchy of dependent resources)
per cluster running to process particular workload and
complete hierarchy of dependent resources runs on the same node.
note that the cluster may be able to process multiple workloads
and in this case it will have multiple workload processing
applications (may be spread among different nodes - active/active)
running but still only one instance per workload type.
there is no parallelism (working on different nodes in parallel)
in processing workload and because of that, no workload balancing
as well. if workload processing application fails (due to
application failure or node failure on which it runs) workload
processing is moved to another node. in order to be able to
process workload on another node the data required to process
workload (and may be other resources as well) need to be made
available on backup node. for data that could be done in multiple
ways e.g. replication or change of data storage (disk) ownership.
note that in both cases cluster needs shared storage bus or storage
switch. things like fencing are used in order to cut of lost nodes
and in order to insure mutual exclusive resource (storage)
ownership. in terms of cluster data storage - it is partitioned
among cluster nodes. The role of HA software is in reassigning
resources ownerships, starting/activating workload processing
applications on other nodes (failover) and in case of applications
failures (or planned shutdowns) - stopping the rest (all) of
applications resources.

For that architecture SA offers following things
(which I am not sure other products can do - please correct me):

1. Resource dependency model beyond child->parent with symmetrical
start and stop;

2. Resource status model beyond UP/DOWN/IN_TRANSITION

3. Resources on backup nodes are still visible and manageable.
It means that hardware resources (and links to shared resources)
could still be continuously monitored on all backup nodes which
could prevent failovers to nodes not able to process workload w/o
any delay (to run a script as a part of failover). It also means
that rolling upgrades are possible for inactive applications on
backup nodes even if it will change the hierarchy of its dependent
resources because SA allows you to have different hierarchies of
dependent resources of the same application on different nodes
(and locally too - local move group).

4. You can have two applications which share some nodes local
resources and still be able to control availability of these
applications independently. One application could be made
available and another not available.

5. Failover policy allows you to put weights on every candidate
(control where applications should be initially started and
support various failover/failback strategies).

--

shared-disk: in terms of workload processing there are multiple
instances of application (hierarchy of dependent resources) per
cluster running to process particular workload but complete
hierarchy of dependent resources still runs on the same node.
cluster-wide locking and workload balancing. Additional
(to shared-nothing) role of HA software is in managing capacity
(number and placement of instances) of workload processing
applications. SA offers a concept of SERVER group for doing
that. I am not sure that other products have any support for
shared-disk architecture (please correct me). Could you have
multiple instances of failover group be active at the same time
on different nodes????

--

shared-everything: well, your SMP box is a shared-everything
cluster (CPU - node, 'everything' - memory, IO). the idea of
shared-everything cluster (sysplex) is that it behaves like
a single BIG box built from multiple smaller boxes (systems for
sysplex, CPUs for SMP). In shared-everything cluster you are
able to spread parts of your applications among different
cluster nodes in order to take advantage of additional CPUs,
memory and IO (available on other nodes). in terms of workload
processing there are multiple instances of application
(hierarchy of dependent resources) per cluster running to
process particular workload AND resources from that hierarchy of
dependent resources RUN ON DIFFERENT NODES. SA supports
cross-node dependencies needed for shared-everything
architecture and i am not sure that other products have
any support for shared-everything (please correct me)
Could you have PARTS of failover group be active at the same
time on different nodes????

regards,
alexander.
wombat
2000-10-17 01:59:04 UTC
Permalink
I'll try to commet on some of Alexander's descriptions, please also refer
to my previous posting attempting to compare/contrast here.

==========
Alexander Terekhov:
shared-nothing: in terms of workload processing there is only
one instance of application (hierarchy of dependent resources)
per cluster running to process particular workload and
complete hierarchy of dependent resources runs on the same node.
note that the cluster may be able to process multiple workloads
and in this case it will have multiple workload processing
applications (may be spread among different nodes - active/active)
running but still only one instance per workload type.
there is no parallelism (working on different nodes in parallel)
in processing workload and because of that, no workload balancing
as well.
==========

HACMP and FailSafe's 'Resource Groups' quite naturally fit this model very
nicely. SA is quite a bit more flexible in the policies it can apply,
however, the basic idea of keeping the application up and running so long
as at least one node is available for it is innate to all of these. The
weights and some other controls in SA are more granular than the
'auto-failback vs. manual failback' in HACMP and FailSafe.

HACMP and FailSafe also monitor backup nodes, and can recognise that one
may not be capable of accepting a resource group at some point in time.

==========
Alexander Terekhov:
shared-disk: in terms of workload processing there are multiple
instances of application (hierarchy of dependent resources) per
cluster running to process particular workload but complete
hierarchy of dependent resources still runs on the same node.
cluster-wide locking and workload balancing. Additional
(to shared-nothing) role of HA software is in managing capacity
(number and placement of instances) of workload processing
applications. SA offers a concept of SERVER group for doing
that. I am not sure that other products have any support for
shared-disk architecture (please correct me). Could you have
multiple instances of failover group be active at the same time
on different nodes????
==========

HACMP offers what are called 'concurrent resource groups' specifically to
support this model. Here each node is expected to have access to the
shared disks, and to run an instance of the application (e.g., Oracle
Parallel Server.) If a node fails, HACMP does not failover anything, since
each node is already running the application. HACMP performs no load
balancing, it simply starts the resources up on every node where the
concurrent resource group is defined once that node joins the cluster. I
haven't seen this specific feature in FailSafe, although you can certainly
simulate it via using multiple resource groups, each one tied to a specific
node.

==========
Alexander Terekhov:
shared-everything: well, your SMP box is a shared-everything
cluster (CPU - node, 'everything' - memory, IO). the idea of
shared-everything cluster (sysplex) is that it behaves like
a single BIG box built from multiple smaller boxes (systems for
sysplex, CPUs for SMP).
<snip>
SA supports
cross-node dependencies needed for shared-everything
architecture and i am not sure that other products have
any support for shared-everything (please correct me)
Could you have PARTS of failover group be active at the same
time on different nodes????
==========

Ah, the single-system image (SSI) grail. Neither HACMP nor FailSafe
understand this concept, beyond what I've described above. Both do support
running on a single machine, and can start and restart resource groups
locally, but this is a single PHYSICAL machine. Once you have two
machines, the cluster is either shared-nothing or shared-disk, at least
according to current common usage.

And, this point in the UNIX space brings up questions beyond HA. SSI on
UNIX also pulls in questions of a clustered file system, shared process
space, and all of the other shared devices. There exist many clustered
file system views - ranging from good old NFS, to AFS, DFS, GFS (Global
File System), and more, and these are amenable to being managed in an HA
cluster. Not too many useful and effective full UNIX shared process space
implementations exist (Compaq's Non-Stop Clusters is one though, for
example) and this has not been a common commercial HA UNIX cluster feature.
I think the Mosix project is looking at this for Linux, but haven't paid
attention for a while. These questions exist for a sysplex, but you've
solved the problems there, but only on 390.

In any case, no, HACMP and FailSafe don't handle this, because the full
structure necessary for it doesn't exist in general usage in the UNIX space
(definitely not in AIX.) IRIX has NUMA/SSI support, but I don't know to
what, if any, degree FailSafe supports or exploits it? I haven't seen much
in the docs, so I assume you need to be running in NUMA mode, and I don't
think that's yet commonly done with Linux, and I don't know how well
FailSafe would deal with it. A key issue is that although it looks like
one machine, the HA driver must still understand that there are multiple
machines under the covers.

<The remainder of Alexander's note snipped to save bandwidth.>

These have been the opinions of:
Peter R. Badovinatz -- (503)578-5530 (TL 775)
Clusters and High Availability, Beaverton, OR
wombat at us.ibm.com
and in no way should be construed as official opinion of IBM, Corp., my
email id notwithstanding.
wombat
2000-10-18 00:44:42 UTC
Permalink
Welcome back!

=====
Post by wombat
HACMP and FailSafe both support various 'modifiers' such as auto-failback
and in-place recovery and such.
Here, SA goes somewhat over the top. It uses a combination of weights on
resources, load information, time of day constraints, dependencies and
where competing resources have been placed to determine the best home for
a
Post by wombat
resource. It also appears to have various goal-based performance setups
to
Post by wombat
allow 'less important' resources to be moved if a system is getting too
loaded, so the user can define which resources are more important than
others, and, (I think) that based on the time of day these settings can
change automatically. You can model the behaviours of HACMP and
FailSafe,
Post by wombat
as well as going well beyond them if desired.
I could sum this point up that HACMP and FailSafe are happy so long as
all
Post by wombat
resource groups are running, whereas SA isn't happy unless a whole range
of
Post by wombat
dependency, performance, load and time constraints are satisfied.
It's worth mentioning that you've described the *default* failover policy
module for FailSafe. One could write one which behaves very much as you
describe the zServer modules working for SA.

A failover policy for a given resource type takes a list of nodes which are
up and which have been declared as being in the "failover domain" for that
particular resource group, and then the script/program then decides which
node the particular resource will fail over to.

This is *very* flexible, although I have the impression that no one uses
this flexibility much.
=====

Yes, I didn't mean to shortchange FailSafe. I mentioned this capability
somewhere, although possibly in my other posting. In any case, purely
subjective comment time. This feature is very flexible and very powerful
and very hard to get right in this model. Of course, this is based on how
complex is the cluster. Although HACMP doesn't offer an exit for this
specific feature, it does offer many other places where scripts/programs
can be inserted. Experience there is that delving into custom scripts is
not something pursued by the faint of heart, and that even for sometimes
simple aims, nasty surprises can crop up (and recovery is not a good time
to get those surprises ;-) So I would tend to feel that it would likewise
be daunting to do 'too much' in a FailSafe failover policy script and have
it work correctly all of the time. HA programming is, of course, a haven
for the paranoid programmers who like to think "what can possibly go wrong
next?" but still, even we often manage to miss something in all of the
bizarre permutations.

=====
alanr:
Like 99.999% :-) Of course, since FailSafe has the capabilities I've
descibed AND it's open source, it can (potentially) do anything SA can do,
and then some things that no one has thought of yet ;-)
=====

IMHO then, many of the decisions made by SA are best made as part of the
supported features of the recovery subsystem, rather than through ad-hoc
scripts. That is, where these more expansive features are needed, the
mainframe world from whence SA comes has traditionally expected the kinds
of control it embodies (no, no, memories returning, no :-O ) But these are
rather less expected in most Linux environments, but hey, Linux/zServer
included in sysplex (z/OS) clusters, the compatibility makes sene. In any
case, building this into something such as FailSafe, as you have said,
patches being gratefully accepted.

<Remainder of Alan's note snipped to save bandwidth.>

These have been the opinions of:
Peter R. Badovinatz -- (503)578-5530 (TL 775)
Clusters and High Availability, Beaverton, OR
wombat at us.ibm.com
and in no way should be construed as official opinion of IBM, Corp., my
email id notwithstanding.

Continue reading on narkive:
Loading...