Everything else is working great (have yet to add nfs to the pot)..
All of mine (eventually 24) are SMP boxes also. I guess it is safe to
Send Linux-HA mailing list submissions to
linux-ha at lists.linux-ha.org
To subscribe or unsubscribe via the World Wide Web, visit
http://lists.linux-ha.org/mailman/listinfo/linux-ha
or, via email, send a message with subject or body 'help' to
linux-ha-request at lists.linux-ha.org
You can reach the person managing the list at
linux-ha-owner at lists.linux-ha.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-HA digest..."
1. Re: Clock jumped backwards (Ryan Taylor)
2. Re: Clock jumped backwards (Ryan Taylor)
3. monitoring apache (Sweet, Larry D)
4. Re: Serge's script for apache (Alan Robertson)
5. Re: haresources question (Alan Robertson)
6. Re: multipath support? (Alan Robertson)
7. Re: Clock jumped backwards (Alan Robertson)
8. Re: Re: tengine (was Re: Re: Linux-HA Digest, Vol 24, Issue
45) (Alan Robertson)
----------------------------------------------------------------------
Message: 1
Date: Wed, 16 Nov 2005 16:21:32 -0500
From: Ryan Taylor <rtaylor82 at gmail.com>
Subject: [Linux-HA] Re: Clock jumped backwards
To: linux-ha at lists.linux-ha.org
<611ae4000511161321l89b8ba5nf6a6aa591df38a6e at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
I have the firewall setup to allow all traffic...
I am using ntp because I thought was nessecary. If the time changes
or they are different then can't that affect heartbeat and rsync
etc...?
Is there a better way to handle the time issue? I have been ssh'd
into the cluster IP and notice that it has been flipping randomly back
and forth between massive/bdc... Im guessing these symptoms are
related to my time issues..
Thank you again for the quick responses!
Ryan Taylor
rtaylor82 at gmail.com
------------------------------
Message: 2
Date: Wed, 16 Nov 2005 17:58:03 -0500
From: Ryan Taylor <rtaylor82 at gmail.com>
Subject: [Linux-HA] Re: Clock jumped backwards
To: linux-ha at lists.linux-ha.org
<611ae4000511161458m47bbb641j9a058a2dcd53ffa3 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
I have taken your advice and turned off ntp.. Even without ntp I still
heartbeat[18620]: 2005/11/16_17:53:24 info: **************************
heartbeat[18620]: 2005/11/16_17:53:24 info: Configuration validated.
Starting heartbeat 2.0.2
heartbeat[18621]: 2005/11/16_17:53:24 info: heartbeat: version 2.0.2
heartbeat[18621]: 2005/11/16_17:53:24 info: Heartbeat generation: 8
heartbeat[18621]: 2005/11/16_17:53:24 info: Removing
/var/run/heartbeat/rsctmp failed, recreating.
heartbeat[18621]: 2005/11/16_17:53:24 info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth1
heartbeat[18621]: 2005/11/16_17:53:24 info: glib: ucast: bound send
socket to device: eth1
heartbeat[18621]: 2005/11/16_17:53:24 info: glib: ucast: bound receive
socket to device: eth1
heartbeat[18621]: 2005/11/16_17:53:24 info: glib: ucast: started on
port 694 interface eth1 to 192.168.0.89
/dev/watchdog
Added signal handler for signal 17
heartbeat[18621]: 2005/11/16_17:53:24 info: pid 18621 locked in memory.
heartbeat[18621]: 2005/11/16_17:53:24 info: Local status now set to: 'up'
heartbeat[18624]: 2005/11/16_17:53:25 info: pid 18624 locked in memory.
heartbeat[18626]: 2005/11/16_17:53:27 info: pid 18626 locked in memory.
heartbeat[18625]: 2005/11/16_17:53:25 info: pid 18625 locked in memory.
heartbeat[18621]: 2005/11/16_17:53:28 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:29 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:30 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:31 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:34 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:33 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:33 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:33 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:34 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:34 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:48 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:47 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:47 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:48 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:50 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:10 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:11 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:14 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:24 WARN: node bdc.beefylinux.com: is dead
heartbeat[18621]: 2005/11/16_17:54:24 info: Local status now set to: 'active'
heartbeat[18621]: 2005/11/16_17:54:24 WARN: No STONITH device configured.
heartbeat[18621]: 2005/11/16_17:54:24 WARN: Shared disks are not protected.
heartbeat[18621]: 2005/11/16_17:54:24 info: Resources being acquired
from bdc.beefylinux.com.
harc[18637]: 2005/11/16_17:54:24 info: Running /etc/ha.d/rc.d/status status
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources
acquired
heartbeat[18621]: 2005/11/16_17:54:24 info: Initial resource
acquisition complete (T_RESOURCES(us))
heartbeat[18621]: 2005/11/16_17:54:24 info: mach_down takeover complete.
mach_down[18656]: 2005/11/16_17:54:24 info: mach_down takeover
complete for node bdc.beefylinux.com.
heartbeat[18621]: 2005/11/16_17:54:25 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:26 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:27 info: Clock jumped backwards.
Compensating.
heartbeat[18638]: 2005/11/16_17:54:27 info: Local Resource acquisition
completed.
heartbeat[18621]: 2005/11/16_17:54:27 info: Clock jumped backwards.
Compensating.
harc[18702]: 2005/11/16_17:54:27 info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[18702]: 2005/11/16_17:54:27 received ip-request-resp
sleep::3 OK yes
ResourceManager[18715]: 2005/11/16_17:54:27 info: Acquiring resource
group: massive.beefylinux.com sleep::3 IPaddr::192.168.0.99
heartbeat[18621]: 2005/11/16_17:54:30 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:29 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:29 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:30 info: Clock jumped backwards.
Compensating.
ResourceManager[18715]: 2005/11/16_17:54:30 info: Running
/etc/ha.d/resource.d/sleep 3 start
ResourceManager[18715]: 2005/11/16_17:54:33 info: Running
/etc/ha.d/resource.d/IPaddr 192.168.0.99 start
IPaddr[18805]: 2005/11/16_17:54:33 info: /sbin/ifconfig eth0:0
192.168.0.99 netmask 255.255.255.0 broadcast 192.168.0.255
IPaddr[18805]: 2005/11/16_17:54:35 info: Sending Gratuitous Arp for
192.168.0.99 on eth0:0 [eth0]
IPaddr[18805]: 2005/11/16_17:54:35 /usr/lib/heartbeat/send_arp -i 500
-r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.0.99 eth0
192.168.0.99 auto 192.168.0.99 ffffffffffff
heartbeat[18621]: 2005/11/16_17:54:35 info: Local Resource acquisition
completed. (none)
heartbeat[18621]: 2005/11/16_17:54:35 info: local resource transition completed.
This is with the bdc (node-b) heartbeat off.
------------------------------
Message: 3
Date: Wed, 16 Nov 2005 20:33:21 -0600
From: "Sweet, Larry D" <ldsweet at midsouth.ualr.edu>
Subject: [Linux-HA] monitoring apache
To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
<12629A044DD7974EA207D5BD468741FB8AAB0E at msca-exchange.facstaff.eden.ualr.edu>
Content-Type: text/plain; charset="utf-8"
Hello, All<
Please don't give up on me now! I have almost everything working except the monitor funciton. I just cant get the variables that the apache script needs to be located in the httpd.conf. I think :o)
Is it possible that the /usr/lib/ocf/resources.d/heartbeat/apache script/RA is looking for an LoadModule mod_status statement in httpd.conf?
mod_status is a built in module and is not added using "LoadModule" so "apache monitor" produces the error "Monitoring is not supported by /usr/local/apache2/conf/httpd.conf? AddModule has been discontinued in apache2. What can I do to get the apache script to load the variables for the status-server URL and Port?
Thanks
Larry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.community.tummy.com/pipermail/linux-ha/attachments/20051116/a02d776b/attachment.htm
------------------------------
Message: 4
Date: Tue, 15 Nov 2005 19:22:17 -0700
From: Alan Robertson <alanr at unix.sh>
Subject: Re: [Linux-HA] Serge's script for apache
To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
Message-ID: <437A97D9.6060700 at unix.sh>
Content-Type: text/plain; charset=iso-8859-1; format=flowed
Hello, again,
Alan, how do you "map to a URL?" I have httpd.conf set to allow server-status in the <Location /server-status>
That's how you do it.
I think my materials for the San Francisco class have working apache
configurations in them.
See the http://linux-ha.org/Talks page for the tar ball.
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
------------------------------
Message: 5
Date: Tue, 15 Nov 2005 19:19:52 -0700
From: Alan Robertson <alanr at unix.sh>
Subject: Re: [Linux-HA] haresources question
To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
Message-ID: <437A9748.80406 at unix.sh>
Content-Type: text/plain; charset=us-ascii; format=flowed
Hi,
I have a sequence problem in a customer's HA cluster. We've got two
kmscom01 IPaddr::10.214.16.110/24/eth0 IPaddr::10.214.16.113/24/eth0
Filesystem::/dev/sdb1::/REGVOL::ext3 Filesystem::/dev/sdc1::/KMSVOL::ext3
Filesystem::/dev/sde1::/KMHVOL::ext3 egate_reg egate_kms egate_kmh
kmbcom01 IPaddr::10.214.16.111/24/eth0 IPaddr::10.214.16.112/24/eth0
Filesystem::/dev/sdd1::/KMBVOL::ext3 Filesystem::/dev/sdf1::/KMNVOL::ext3
egate_kmb egate_kmn
Here, the cluster is active-active and runs two SAP R/3 gateway
applications each (egate_kms, egate_kmh, egate_kmb, and egate_kmn), plus a
registry application which must always be started first and stopped last (
egate_reg). During a failover it's okay if egate_reg is restarted with the
local gateway apps running - they can reconnect just fine. Eventually the
customer tried to do a "heartbeat stop" on the machine which held all 5
applications (after a successful failover after shutting one node off), and
heartbeat stopped egate_reg in the middle of everything else. The two
remaining egate_km* processes then refused to shut down because the
registry was missing. This is application specific but we need to figure
out how to help the stuff to shut down cleanly.
The sequence of resource groups in haresources is as above.
How do I make sure that heartbeat stops egate_reg last? Change the sequence
of the resource groups in haresources? Are the resource groups processed
top-to-down and left-to-right (acquire resource group) or right-to-left
(release r.g.)?
We can't currently figure this out by simply trying because the cluster is
productive and we get no downtime...
Make them a single resource group. Then you get the ordering you want.
I don't think there's any guaranteed ordering between groups - at
least not in R1-style clusters.
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
------------------------------
Message: 6
Date: Tue, 15 Nov 2005 19:31:49 -0700
From: Alan Robertson <alanr at unix.sh>
Subject: Re: [Linux-HA] multipath support?
To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
Message-ID: <437A9A15.4030303 at unix.sh>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Keep in mind that from our perspective - as an open source community -
that almost anything goes ;-).
That reads "it will most probably work but don't ever attempt to open a
problem call fer it e.g. with Red Hat or Novell" ;-))
Harald - he's from my company (IBM) and I was answering the other half
of his question - that yes, we expect for his particular configuration
to work - and that if it gets escalated from IBM support line to my
team, we _will_ handle it, and not send him packing.
But, that's not quite what he asked, hence my more general comment about
"anything goes".
So, if he wasn't confused before, he probably is now :-(
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
------------------------------
Message: 7
Date: Wed, 16 Nov 2005 19:01:56 -0700
From: Alan Robertson <alanr at unix.sh>
Subject: Re: [Linux-HA] Clock jumped backwards
To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
Message-ID: <437BE494.7040303 at unix.sh>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Do u have other programs running that may adjust the time automatically,
e.g, ntpd?
Another more important problem is that two nodes massive/bdc cannot see
other since they both claims the other is dead.
It is usually caused by firewall, so u may want to check that
I have one computer which does this at my house. It's an SMP box with
Pentium II or III processors.
I got rid of this message in R2 for EXACTLY this reason.
It's real. It's not caused by NNTP, and it's very annoying.
My guess is it's related to it being an SMP box.
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
------------------------------
Message: 8
Date: Tue, 15 Nov 2005 01:51:45 -0700
From: Alan Robertson <alanr at unix.sh>
Subject: Re: [Linux-HA] Re: tengine (was Re: Re: Linux-HA Digest, Vol
24, Issue 45)
To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
Message-ID: <4379A1A1.6030803 at unix.sh>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi!
I run heartbeat-2.0.1-1 on a 3node linux 2.6 cluster having configured
2 resource groups (I've attached the cibadmin -Q to this email)
1. is that the tengine process goes sometimes probably in to a loop
using as much CPU as it gets.
snip...
Can you use gdb to attach to the process and post a backtrace when this
happens?
thanks
-Guochun
Sorry for the subject 8-(
Some additional info: I did a failover node3 -> node2 by stopping the
heartbeat on node3. After node3:/../heartbeat start the resource moved
back to node3 but there is no "tengine" process there running!?
tengine and pengine only run on the DC (designated controller)
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
------------------------------
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
End of Linux-HA Digest, Vol 24, Issue 72
****************************************