[Linux-HA] Clock jumped backwards

Discussion:

Ryan Taylor

2005-11-16 18:32:50 UTC

I have a new install of drbd and heartbeat... I have drbd functioning
and I want heartbeat to control drbd and nfs but haven't gotten that
far yet. When I start heartbeat with just the haresource of an
ipaddress on node-a; it says node-a fails right off the bat and node-b
takes over. All while my logs are being filled with info: Clock
jumped backwards. Compensating.

I have node-a broadcasting and node-b syncing with node-a for time...
I have checked this and the time is the same between the two boxes.
Any help would be greatly appreciated!!! Good news is that if I stop
heartbeat on node-b, then node-a does take over the IP!

Thank you for your time,

Ryan
rtaylor82 at gmail.com

Guochun Shi

2005-11-16 18:51:02 UTC

Permalink

please post logs from both nodes

-Guochun

Post by Ryan Taylor
I have a new install of drbd and heartbeat... I have drbd functioning
and I want heartbeat to control drbd and nfs but haven't gotten that
far yet. When I start heartbeat with just the haresource of an
ipaddress on node-a; it says node-a fails right off the bat and node-b
takes over. All while my logs are being filled with info: Clock
jumped backwards. Compensating.
I have node-a broadcasting and node-b syncing with node-a for time...
I have checked this and the time is the same between the two boxes.
Any help would be greatly appreciated!!! Good news is that if I stop
heartbeat on node-b, then node-a does take over the IP!
Thank you for your time,
Ryan
rtaylor82 at gmail.com
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Ryan Taylor

2005-11-16 20:17:21 UTC

Permalink

I sent one with logs attached but is pending approval because of extra
long length. Until then I have just copied-n-pasted the ends of each
log below. my node-a is named massive.beefylinux.com
and node-b is named bdc.beefylinux.com

Thanks again,

Node A:
...
heartbeat[15853]: 2005/11/16_12:17:35 WARN: Logging daemon is disabled
--enabling logging daemon is recommended
heartbeat[15853]: 2005/11/16_12:17:35 info: **************************
heartbeat[15853]: 2005/11/16_12:17:35 info: Configuration validated.
Starting heartbeat 2.0.2
heartbeat[15854]: 2005/11/16_12:17:35 info: heartbeat: version 2.0.2
heartbeat[15854]: 2005/11/16_12:17:35 info: Heartbeat generation: 6
heartbeat[15854]: 2005/11/16_12:17:35 info: Removing
/var/run/heartbeat/rsctmp failed, recreating.
heartbeat[15854]: 2005/11/16_12:17:35 info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth1
heartbeat[15854]: 2005/11/16_12:17:35 info: glib: ucast: bound send
socket to device: eth1
heartbeat[15854]: 2005/11/16_12:17:35 info: glib: ucast: bound receive
socket to device: eth1
heartbeat[15854]: 2005/11/16_12:17:35 info: glib: ucast: started on
port 694 interface eth1 to 192.168.0.89
heartbeat[15854]: 2005/11/16_12:17:35 notice: Using watchdog device:
/dev/watchdog
heartbeat[15854]: 2005/11/16_12:17:35 info: G_main_add_SignalHandler:
Added signal handler for signal 17
heartbeat[15854]: 2005/11/16_12:17:35 info: pid 15854 locked in memory.
heartbeat[15854]: 2005/11/16_12:17:35 info: Local status now set to: 'up'
heartbeat[15857]: 2005/11/16_12:17:36 info: pid 15857 locked in memory.
heartbeat[15858]: 2005/11/16_12:17:36 info: pid 15858 locked in memory.
heartbeat[15859]: 2005/11/16_12:17:36 info: pid 15859 locked in memory.
heartbeat[15854]: 2005/11/16_12:17:47 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:17:48 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:17:49 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:17:50 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:17:51 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:17:51 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:17:51 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:02 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:03 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:07 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:05 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:05 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:06 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:29 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:29 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:30 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:31 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:32 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:35 WARN: node bdc.beefylinux.com: is dead
heartbeat[15854]: 2005/11/16_12:18:35 info: Local status now set to: 'active'
heartbeat[15854]: 2005/11/16_12:18:35 WARN: No STONITH device configured.
heartbeat[15854]: 2005/11/16_12:18:35 WARN: Shared disks are not protected.
heartbeat[15854]: 2005/11/16_12:18:35 info: Resources being acquired
from bdc.beefylinux.com.
harc[15867]: 2005/11/16_12:18:35 info: Running /etc/ha.d/rc.d/status status
mach_down[15886]: 2005/11/16_12:18:35 info:
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources
acquired
heartbeat[15854]: 2005/11/16_12:18:35 info: Initial resource
acquisition complete (T_RESOURCES(us))
heartbeat[15854]: 2005/11/16_12:18:35 info: mach_down takeover complete.
mach_down[15886]: 2005/11/16_12:18:35 info: mach_down takeover
complete for node bdc.beefylinux.com.
heartbeat[15868]: 2005/11/16_12:18:38 info: Local Resource acquisition
completed.
harc[15932]: 2005/11/16_12:18:38 info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[15932]: 2005/11/16_12:18:38 received ip-request-resp
sleep::3 OK yes
ResourceManager[15945]: 2005/11/16_12:18:38 info: Acquiring resource
group: massive.beefylinux.com sleep::3 IPaddr::192.168.0.99
ResourceManager[15945]: 2005/11/16_12:18:41 info: Running
/etc/ha.d/resource.d/sleep 3 start
ResourceManager[15945]: 2005/11/16_12:18:44 info: Running
/etc/ha.d/resource.d/IPaddr 192.168.0.99 start
IPaddr[16036]: 2005/11/16_12:18:44 info: /sbin/ifconfig eth0:0
192.168.0.99 netmask 255.255.255.0 broadcast 192.168.0.255
IPaddr[16036]: 2005/11/16_12:18:46 info: Sending Gratuitous Arp for
192.168.0.99 on eth0:0 [eth0]
IPaddr[16036]: 2005/11/16_12:18:44 /usr/lib/heartbeat/send_arp -i 500
-r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.0.99 eth0
192.168.0.99 auto 192.168.0.99 ffffffffffff
heartbeat[15854]: 2005/11/16_12:18:46 info: Local Resource acquisition
completed. (none)
heartbeat[15854]: 2005/11/16_12:18:46 info: local resource transition completed.

Node B:
...
heartbeat[1987]: 2005/11/16_13:02:14 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:15 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:18 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:28 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:29 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:32 info: Heartbeat shutdown in
progress. (1987)
heartbeat[2297]: 2005/11/16_13:02:33 info: Giving up all HA resources.
heartbeat[1987]: 2005/11/16_13:02:32 info: Clock jumped backwards. Compensating.
ResourceManager[2307]: 2005/11/16_13:02:34 info: Releasing resource
group: massive.beefylinux.com sleep::3 IPaddr::192.168.0.99
ResourceManager[2307]: 2005/11/16_13:02:32 info: Running
/etc/ha.d/resource.d/IPaddr 192.168.0.99 stop
IPaddr[2343]: 2005/11/16_13:02:32 info: /sbin/route -n del -host 192.168.0.99
IPaddr[2343]: 2005/11/16_13:02:32 info: /sbin/ifconfig dev5083:0 down
IPaddr[2343]: 2005/11/16_13:02:34 info: IP Address 192.168.0.99 released
ResourceManager[2307]: 2005/11/16_13:02:32 info: Running
/etc/ha.d/resource.d/sleep 3 stop
heartbeat[1987]: 2005/11/16_13:02:32 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:36 info: Clock jumped backwards. Compensating.
heartbeat[2297]: 2005/11/16_13:02:37 info: All HA resources relinquished.
heartbeat[1987]: 2005/11/16_13:02:35 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:35 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:36 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:36 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:37 info: killing HBWRITE process
1991 with signal 15
heartbeat[1987]: 2005/11/16_13:02:37 info: killing HBREAD process 1992
with signal 15
heartbeat[1987]: 2005/11/16_13:02:37 info: killing HBFIFO process 1990
with signal 15
heartbeat[1987]: 2005/11/16_13:02:39 info: Core process 1992 exited. 3 remaining
heartbeat[1987]: 2005/11/16_13:02:39 info: Core process 1990 exited. 2 remaining
heartbeat[1987]: 2005/11/16_13:02:37 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:37 info: Core process 1991 exited. 1 remaining
heartbeat[1987]: 2005/11/16_13:02:37 info: Heartbeat shutdown complete.
heartbeat[2460]: 2005/11/16_13:51:56 WARN: Logging daemon is disabled
--enabling logging daemon is recommended
heartbeat[2460]: 2005/11/16_13:51:56 info: **************************
heartbeat[2460]: 2005/11/16_13:51:56 info: Configuration validated.
Starting heartbeat 2.0.2
heartbeat[2461]: 2005/11/16_13:51:55 info: heartbeat: version 2.0.2
heartbeat[2461]: 2005/11/16_13:51:55 info: Heartbeat generation: 5
heartbeat[2461]: 2005/11/16_13:51:55 info: Removing
/var/run/heartbeat/rsctmp failed, recreating.heartbeat[2461]:
2005/11/16_13:51:55 info: glib: ucast: write socket priority set to
IPTOS_LOWDELAY on eth0
heartbeat[2461]: 2005/11/16_13:51:55 info: glib: ucast: bound send
socket to device: eth0
heartbeat[2461]: 2005/11/16_13:51:55 info: glib: ucast: bound receive
socket to device: eth0
heartbeat[2461]: 2005/11/16_13:51:55 info: glib: ucast: started on
port 694 interface eth0 to 192.168.0.70
heartbeat[2461]: 2005/11/16_13:51:55 notice: Using watchdog device:
/dev/watchdog
heartbeat[2461]: 2005/11/16_13:51:55 info: G_main_add_SignalHandler:
Added signal handler for signal 17
heartbeat[2461]: 2005/11/16_13:51:55 info: pid 2461 locked in memory.
heartbeat[2461]: 2005/11/16_13:51:55 info: Local status now set to: 'up'
heartbeat[2464]: 2005/11/16_13:51:56 info: pid 2464 locked in memory.
heartbeat[2466]: 2005/11/16_13:51:57 info: pid 2466 locked in memory.
heartbeat[2465]: 2005/11/16_13:51:56 info: pid 2465 locked in memory.
heartbeat[2461]: 2005/11/16_13:51:58 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:51:57 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:51:57 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:51:58 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:12 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:11 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:11 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:12 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:17 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:18 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:19 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:31 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:31 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:32 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:33 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:40 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:39 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:49 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:54 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:53 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:53 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:54 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:55 WARN: node massive.beefylinux.com: is dead
heartbeat[2461]: 2005/11/16_13:52:57 info: Local status now set to: 'active'
heartbeat[2461]: 2005/11/16_13:52:57 WARN: No STONITH device configured.
heartbeat[2461]: 2005/11/16_13:52:57 WARN: Shared disks are not protected.
heartbeat[2461]: 2005/11/16_13:52:57 info: Resources being acquired
from massive.beefylinux.com.
harc[2475]: 2005/11/16_13:52:55 info: Running /etc/ha.d/rc.d/status status
heartbeat[2476]: 2005/11/16_13:52:57 info: No local resources
[/usr/lib/heartbeat/ResourceManager listkeys bdc.beefylinux.com] to
acquire.
heartbeat[2461]: 2005/11/16_13:52:55 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:55 info: Initial resource
acquisition complete (T_RESOURCES(us))
mach_down[2496]: 2005/11/16_13:52:57 info: Taking over resource
group sleep::3
ResourceManager[2517]: 2005/11/16_13:52:55 info: Acquiring resource
group: massive.beefylinux.com sleep::3 IPaddr::192.168.0.99
ResourceManager[2517]: 2005/11/16_13:52:58 info: Running
/etc/ha.d/resource.d/sleep 3 start
ResourceManager[2517]: 2005/11/16_13:53:01 info: Running
/etc/ha.d/resource.d/IPaddr 192.168.0.99 start
IPaddr[2619]: 2005/11/16_13:53:01 info: /sbin/ifconfig dev5083:0
192.168.0.99 netmask 255.255.255.0 broadcast 192.168.0.255
IPaddr[2619]: 2005/11/16_13:53:01 info: Sending Gratuitous Arp for
192.168.0.99 on dev5083:0 [dev5083]
IPaddr[2619]: 2005/11/16_13:53:01 /usr/lib/heartbeat/send_arp -i 500
-r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.0.99
dev5083 192.168.0.99 auto 192.168.0.99 ffffffffffff
mach_down[2496]: 2005/11/16_13:53:01 info:
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources
acquired
heartbeat[2461]: 2005/11/16_13:53:01 info: mach_down takeover complete.
mach_down[2496]: 2005/11/16_13:53:01 info: mach_down takeover
complete for node massive.beefylinux.com.
heartbeat[2461]: 2005/11/16_13:53:05 info: Local Resource acquisition
completed. (none)
heartbeat[2461]: 2005/11/16_13:53:05 info: local resource transition completed.

Guochun Shi

2005-11-16 20:36:35 UTC

Permalink

Do u have other programs running that may adjust the time automatically,
e.g, ntpd?

Another more important problem is that two nodes massive/bdc cannot see
other since they both claims the other is dead.
It is usually caused by firewall, so u may want to check that

-Guochun

Post by Ryan Taylor
I sent one with logs attached but is pending approval because of extra
long length. Until then I have just copied-n-pasted the ends of each
log below. my node-a is named massive.beefylinux.com
and node-b is named bdc.beefylinux.com
Thanks again,
...
heartbeat[15853]: 2005/11/16_12:17:35 WARN: Logging daemon is disabled
--enabling logging daemon is recommended
heartbeat[15853]: 2005/11/16_12:17:35 info: **************************
heartbeat[15853]: 2005/11/16_12:17:35 info: Configuration validated.
Starting heartbeat 2.0.2
heartbeat[15854]: 2005/11/16_12:17:35 info: heartbeat: version 2.0.2
heartbeat[15854]: 2005/11/16_12:17:35 info: Heartbeat generation: 6
heartbeat[15854]: 2005/11/16_12:17:35 info: Removing
/var/run/heartbeat/rsctmp failed, recreating.
heartbeat[15854]: 2005/11/16_12:17:35 info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth1
heartbeat[15854]: 2005/11/16_12:17:35 info: glib: ucast: bound send
socket to device: eth1
heartbeat[15854]: 2005/11/16_12:17:35 info: glib: ucast: bound receive
socket to device: eth1
heartbeat[15854]: 2005/11/16_12:17:35 info: glib: ucast: started on
port 694 interface eth1 to 192.168.0.89
/dev/watchdog
Added signal handler for signal 17
heartbeat[15854]: 2005/11/16_12:17:35 info: pid 15854 locked in memory.
heartbeat[15854]: 2005/11/16_12:17:35 info: Local status now set to: 'up'
heartbeat[15857]: 2005/11/16_12:17:36 info: pid 15857 locked in memory.
heartbeat[15858]: 2005/11/16_12:17:36 info: pid 15858 locked in memory.
heartbeat[15859]: 2005/11/16_12:17:36 info: pid 15859 locked in memory.
heartbeat[15854]: 2005/11/16_12:17:47 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:17:48 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:17:49 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:17:50 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:17:51 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:17:51 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:17:51 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:02 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:03 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:07 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:05 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:05 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:06 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:29 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:29 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:30 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:31 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:32 info: Clock jumped backwards.
Compensating.
heartbeat[15854]: 2005/11/16_12:18:35 WARN: node bdc.beefylinux.com: is dead
heartbeat[15854]: 2005/11/16_12:18:35 info: Local status now set to: 'active'
heartbeat[15854]: 2005/11/16_12:18:35 WARN: No STONITH device configured.
heartbeat[15854]: 2005/11/16_12:18:35 WARN: Shared disks are not protected.
heartbeat[15854]: 2005/11/16_12:18:35 info: Resources being acquired
from bdc.beefylinux.com.
harc[15867]: 2005/11/16_12:18:35 info: Running /etc/ha.d/rc.d/status status
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources
acquired
heartbeat[15854]: 2005/11/16_12:18:35 info: Initial resource
acquisition complete (T_RESOURCES(us))
heartbeat[15854]: 2005/11/16_12:18:35 info: mach_down takeover complete.
mach_down[15886]: 2005/11/16_12:18:35 info: mach_down takeover
complete for node bdc.beefylinux.com.
heartbeat[15868]: 2005/11/16_12:18:38 info: Local Resource acquisition
completed.
harc[15932]: 2005/11/16_12:18:38 info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[15932]: 2005/11/16_12:18:38 received ip-request-resp
sleep::3 OK yes
ResourceManager[15945]: 2005/11/16_12:18:38 info: Acquiring resource
group: massive.beefylinux.com sleep::3 IPaddr::192.168.0.99
ResourceManager[15945]: 2005/11/16_12:18:41 info: Running
/etc/ha.d/resource.d/sleep 3 start
ResourceManager[15945]: 2005/11/16_12:18:44 info: Running
/etc/ha.d/resource.d/IPaddr 192.168.0.99 start
IPaddr[16036]: 2005/11/16_12:18:44 info: /sbin/ifconfig eth0:0
192.168.0.99 netmask 255.255.255.0 broadcast 192.168.0.255
IPaddr[16036]: 2005/11/16_12:18:46 info: Sending Gratuitous Arp for
192.168.0.99 on eth0:0 [eth0]
IPaddr[16036]: 2005/11/16_12:18:44 /usr/lib/heartbeat/send_arp -i 500
-r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.0.99 eth0
192.168.0.99 auto 192.168.0.99 ffffffffffff
heartbeat[15854]: 2005/11/16_12:18:46 info: Local Resource acquisition
completed. (none)
heartbeat[15854]: 2005/11/16_12:18:46 info: local resource transition completed.
...
heartbeat[1987]: 2005/11/16_13:02:14 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:15 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:18 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:28 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:29 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:32 info: Heartbeat shutdown in
progress. (1987)
heartbeat[2297]: 2005/11/16_13:02:33 info: Giving up all HA resources.
heartbeat[1987]: 2005/11/16_13:02:32 info: Clock jumped backwards. Compensating.
ResourceManager[2307]: 2005/11/16_13:02:34 info: Releasing resource
group: massive.beefylinux.com sleep::3 IPaddr::192.168.0.99
ResourceManager[2307]: 2005/11/16_13:02:32 info: Running
/etc/ha.d/resource.d/IPaddr 192.168.0.99 stop
IPaddr[2343]: 2005/11/16_13:02:32 info: /sbin/route -n del -host 192.168.0.99
IPaddr[2343]: 2005/11/16_13:02:32 info: /sbin/ifconfig dev5083:0 down
IPaddr[2343]: 2005/11/16_13:02:34 info: IP Address 192.168.0.99 released
ResourceManager[2307]: 2005/11/16_13:02:32 info: Running
/etc/ha.d/resource.d/sleep 3 stop
heartbeat[1987]: 2005/11/16_13:02:32 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:36 info: Clock jumped backwards. Compensating.
heartbeat[2297]: 2005/11/16_13:02:37 info: All HA resources relinquished.
heartbeat[1987]: 2005/11/16_13:02:35 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:35 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:36 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:36 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:37 info: killing HBWRITE process
1991 with signal 15
heartbeat[1987]: 2005/11/16_13:02:37 info: killing HBREAD process 1992
with signal 15
heartbeat[1987]: 2005/11/16_13:02:37 info: killing HBFIFO process 1990
with signal 15
heartbeat[1987]: 2005/11/16_13:02:39 info: Core process 1992 exited. 3 remaining
heartbeat[1987]: 2005/11/16_13:02:39 info: Core process 1990 exited. 2 remaining
heartbeat[1987]: 2005/11/16_13:02:37 info: Clock jumped backwards. Compensating.
heartbeat[1987]: 2005/11/16_13:02:37 info: Core process 1991 exited. 1 remaining
heartbeat[1987]: 2005/11/16_13:02:37 info: Heartbeat shutdown complete.
heartbeat[2460]: 2005/11/16_13:51:56 WARN: Logging daemon is disabled
--enabling logging daemon is recommended
heartbeat[2460]: 2005/11/16_13:51:56 info: **************************
heartbeat[2460]: 2005/11/16_13:51:56 info: Configuration validated.
Starting heartbeat 2.0.2
heartbeat[2461]: 2005/11/16_13:51:55 info: heartbeat: version 2.0.2
heartbeat[2461]: 2005/11/16_13:51:55 info: Heartbeat generation: 5
heartbeat[2461]: 2005/11/16_13:51:55 info: Removing
2005/11/16_13:51:55 info: glib: ucast: write socket priority set to
IPTOS_LOWDELAY on eth0
heartbeat[2461]: 2005/11/16_13:51:55 info: glib: ucast: bound send
socket to device: eth0
heartbeat[2461]: 2005/11/16_13:51:55 info: glib: ucast: bound receive
socket to device: eth0
heartbeat[2461]: 2005/11/16_13:51:55 info: glib: ucast: started on
port 694 interface eth0 to 192.168.0.70
/dev/watchdog
Added signal handler for signal 17
heartbeat[2461]: 2005/11/16_13:51:55 info: pid 2461 locked in memory.
heartbeat[2461]: 2005/11/16_13:51:55 info: Local status now set to: 'up'
heartbeat[2464]: 2005/11/16_13:51:56 info: pid 2464 locked in memory.
heartbeat[2466]: 2005/11/16_13:51:57 info: pid 2466 locked in memory.
heartbeat[2465]: 2005/11/16_13:51:56 info: pid 2465 locked in memory.
heartbeat[2461]: 2005/11/16_13:51:58 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:51:57 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:51:57 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:51:58 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:12 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:11 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:11 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:12 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:17 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:18 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:19 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:31 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:31 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:32 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:33 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:40 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:39 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:49 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:54 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:53 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:53 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:54 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:55 WARN: node massive.beefylinux.com: is dead
heartbeat[2461]: 2005/11/16_13:52:57 info: Local status now set to: 'active'
heartbeat[2461]: 2005/11/16_13:52:57 WARN: No STONITH device configured.
heartbeat[2461]: 2005/11/16_13:52:57 WARN: Shared disks are not protected.
heartbeat[2461]: 2005/11/16_13:52:57 info: Resources being acquired
from massive.beefylinux.com.
harc[2475]: 2005/11/16_13:52:55 info: Running /etc/ha.d/rc.d/status status
heartbeat[2476]: 2005/11/16_13:52:57 info: No local resources
[/usr/lib/heartbeat/ResourceManager listkeys bdc.beefylinux.com] to
acquire.
heartbeat[2461]: 2005/11/16_13:52:55 info: Clock jumped backwards. Compensating.
heartbeat[2461]: 2005/11/16_13:52:55 info: Initial resource
acquisition complete (T_RESOURCES(us))
mach_down[2496]: 2005/11/16_13:52:57 info: Taking over resource
group sleep::3
ResourceManager[2517]: 2005/11/16_13:52:55 info: Acquiring resource
group: massive.beefylinux.com sleep::3 IPaddr::192.168.0.99
ResourceManager[2517]: 2005/11/16_13:52:58 info: Running
/etc/ha.d/resource.d/sleep 3 start
ResourceManager[2517]: 2005/11/16_13:53:01 info: Running
/etc/ha.d/resource.d/IPaddr 192.168.0.99 start
IPaddr[2619]: 2005/11/16_13:53:01 info: /sbin/ifconfig dev5083:0
192.168.0.99 netmask 255.255.255.0 broadcast 192.168.0.255
IPaddr[2619]: 2005/11/16_13:53:01 info: Sending Gratuitous Arp for
192.168.0.99 on dev5083:0 [dev5083]
IPaddr[2619]: 2005/11/16_13:53:01 /usr/lib/heartbeat/send_arp -i 500
-r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.0.99
dev5083 192.168.0.99 auto 192.168.0.99 ffffffffffff
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources
acquired
heartbeat[2461]: 2005/11/16_13:53:01 info: mach_down takeover complete.
mach_down[2496]: 2005/11/16_13:53:01 info: mach_down takeover
complete for node massive.beefylinux.com.
heartbeat[2461]: 2005/11/16_13:53:05 info: Local Resource acquisition
completed. (none)
heartbeat[2461]: 2005/11/16_13:53:05 info: local resource transition completed.
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Alan Robertson

2005-11-17 02:01:56 UTC

Permalink

Post by Guochun Shi
Do u have other programs running that may adjust the time automatically,
e.g, ntpd?
Another more important problem is that two nodes massive/bdc cannot see
other since they both claims the other is dead.
It is usually caused by firewall, so u may want to check that

--
Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce

Ryan Taylor

2005-11-16 21:21:32 UTC

Permalink

I have the firewall setup to allow all traffic...
I am using ntp because I thought was nessecary. If the time changes
or they are different then can't that affect heartbeat and rsync
etc...?
Is there a better way to handle the time issue? I have been ssh'd
into the cluster IP and notice that it has been flipping randomly back
and forth between massive/bdc... Im guessing these symptoms are
related to my time issues..

Thank you again for the quick responses!

Ryan Taylor
rtaylor82 at gmail.com

Alan Robertson

2005-11-17 02:03:13 UTC

Permalink

Post by Ryan Taylor
I have the firewall setup to allow all traffic...
I am using ntp because I thought was nessecary. If the time changes
or they are different then can't that affect heartbeat and rsync
etc...?
Is there a better way to handle the time issue? I have been ssh'd
into the cluster IP and notice that it has been flipping randomly back
and forth between massive/bdc... Im guessing these symptoms are
related to my time issues..

No. I doubt it's related to NTP.

I would guess that your cluster IP is on both nodes. Check your logs.

This has various causes - the main one being firewalls.

--
Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce

Ryan Taylor

2005-11-16 22:58:03 UTC

Permalink

I have taken your advice and turned off ntp.. Even without ntp I still
get the errors:

heartbeat[18620]: 2005/11/16_17:53:24 info: **************************
heartbeat[18620]: 2005/11/16_17:53:24 info: Configuration validated.
Starting heartbeat 2.0.2
heartbeat[18621]: 2005/11/16_17:53:24 info: heartbeat: version 2.0.2
heartbeat[18621]: 2005/11/16_17:53:24 info: Heartbeat generation: 8
heartbeat[18621]: 2005/11/16_17:53:24 info: Removing
/var/run/heartbeat/rsctmp failed, recreating.
heartbeat[18621]: 2005/11/16_17:53:24 info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth1
heartbeat[18621]: 2005/11/16_17:53:24 info: glib: ucast: bound send
socket to device: eth1
heartbeat[18621]: 2005/11/16_17:53:24 info: glib: ucast: bound receive
socket to device: eth1
heartbeat[18621]: 2005/11/16_17:53:24 info: glib: ucast: started on
port 694 interface eth1 to 192.168.0.89
heartbeat[18621]: 2005/11/16_17:53:24 notice: Using watchdog device:
/dev/watchdog
heartbeat[18621]: 2005/11/16_17:53:24 info: G_main_add_SignalHandler:
Added signal handler for signal 17
heartbeat[18621]: 2005/11/16_17:53:24 info: pid 18621 locked in memory.
heartbeat[18621]: 2005/11/16_17:53:24 info: Local status now set to: 'up'
heartbeat[18624]: 2005/11/16_17:53:25 info: pid 18624 locked in memory.
heartbeat[18626]: 2005/11/16_17:53:27 info: pid 18626 locked in memory.
heartbeat[18625]: 2005/11/16_17:53:25 info: pid 18625 locked in memory.
heartbeat[18621]: 2005/11/16_17:53:28 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:29 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:30 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:31 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:34 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:33 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:33 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:33 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:34 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:34 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:48 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:47 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:47 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:48 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:50 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:10 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:11 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:14 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:24 WARN: node bdc.beefylinux.com: is dead
heartbeat[18621]: 2005/11/16_17:54:24 info: Local status now set to: 'active'
heartbeat[18621]: 2005/11/16_17:54:24 WARN: No STONITH device configured.
heartbeat[18621]: 2005/11/16_17:54:24 WARN: Shared disks are not protected.
heartbeat[18621]: 2005/11/16_17:54:24 info: Resources being acquired
from bdc.beefylinux.com.
harc[18637]: 2005/11/16_17:54:24 info: Running /etc/ha.d/rc.d/status status
mach_down[18656]: 2005/11/16_17:54:24 info:
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources
acquired
heartbeat[18621]: 2005/11/16_17:54:24 info: Initial resource
acquisition complete (T_RESOURCES(us))
heartbeat[18621]: 2005/11/16_17:54:24 info: mach_down takeover complete.
mach_down[18656]: 2005/11/16_17:54:24 info: mach_down takeover
complete for node bdc.beefylinux.com.
heartbeat[18621]: 2005/11/16_17:54:25 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:26 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:27 info: Clock jumped backwards.
Compensating.
heartbeat[18638]: 2005/11/16_17:54:27 info: Local Resource acquisition
completed.
heartbeat[18621]: 2005/11/16_17:54:27 info: Clock jumped backwards.
Compensating.
harc[18702]: 2005/11/16_17:54:27 info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[18702]: 2005/11/16_17:54:27 received ip-request-resp
sleep::3 OK yes
ResourceManager[18715]: 2005/11/16_17:54:27 info: Acquiring resource
group: massive.beefylinux.com sleep::3 IPaddr::192.168.0.99
heartbeat[18621]: 2005/11/16_17:54:30 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:29 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:29 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:30 info: Clock jumped backwards.
Compensating.
ResourceManager[18715]: 2005/11/16_17:54:30 info: Running
/etc/ha.d/resource.d/sleep 3 start
ResourceManager[18715]: 2005/11/16_17:54:33 info: Running
/etc/ha.d/resource.d/IPaddr 192.168.0.99 start
IPaddr[18805]: 2005/11/16_17:54:33 info: /sbin/ifconfig eth0:0
192.168.0.99 netmask 255.255.255.0 broadcast 192.168.0.255
IPaddr[18805]: 2005/11/16_17:54:35 info: Sending Gratuitous Arp for
192.168.0.99 on eth0:0 [eth0]
IPaddr[18805]: 2005/11/16_17:54:35 /usr/lib/heartbeat/send_arp -i 500
-r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.0.99 eth0
192.168.0.99 auto 192.168.0.99 ffffffffffff
heartbeat[18621]: 2005/11/16_17:54:35 info: Local Resource acquisition
completed. (none)
heartbeat[18621]: 2005/11/16_17:54:35 info: local resource transition completed.

This is with the bdc (node-b) heartbeat off.

Ryan Taylor

2005-11-16 19:20:07 UTC

Permalink

Attached are the two logs, my node-a is named massive.beefylinux.com
and node-b is named bdc.beefylinux.com

Thanks again,
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node-a.log
Type: text/x-log
Size: 96999 bytes
Desc: not available
URL: <http://lists.linux-ha.org/pipermail/linux-ha/attachments/20051116/fe12ce2a/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node-b.log
Type: text/x-log
Size: 146368 bytes
Desc: not available
URL: <http://lists.linux-ha.org/pipermail/linux-ha/attachments/20051116/fe12ce2a/attachment-0001.bin>

Ryan Taylor

2005-11-17 14:38:32 UTC

Permalink

Everything else is working great (have yet to add nfs to the pot)..
All of mine (eventually 24) are SMP boxes also. I guess it is safe to
just ignore the log...?

Thank you for help,

Ryan

On 11/16/05, linux-ha-request at lists.linux-ha.org

Send Linux-HA mailing list submissions to
linux-ha at lists.linux-ha.org
To subscribe or unsubscribe via the World Wide Web, visit
http://lists.linux-ha.org/mailman/listinfo/linux-ha
or, via email, send a message with subject or body 'help' to
linux-ha-request at lists.linux-ha.org
You can reach the person managing the list at
linux-ha-owner at lists.linux-ha.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-HA digest..."
1. Re: Clock jumped backwards (Ryan Taylor)
2. Re: Clock jumped backwards (Ryan Taylor)
3. monitoring apache (Sweet, Larry D)
4. Re: Serge's script for apache (Alan Robertson)
5. Re: haresources question (Alan Robertson)
6. Re: multipath support? (Alan Robertson)
7. Re: Clock jumped backwards (Alan Robertson)
8. Re: Re: tengine (was Re: Re: Linux-HA Digest, Vol 24, Issue
45) (Alan Robertson)
----------------------------------------------------------------------
Message: 1
Date: Wed, 16 Nov 2005 16:21:32 -0500
From: Ryan Taylor <rtaylor82 at gmail.com>
Subject: [Linux-HA] Re: Clock jumped backwards
To: linux-ha at lists.linux-ha.org
<611ae4000511161321l89b8ba5nf6a6aa591df38a6e at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
I have the firewall setup to allow all traffic...
I am using ntp because I thought was nessecary. If the time changes
or they are different then can't that affect heartbeat and rsync
etc...?
Is there a better way to handle the time issue? I have been ssh'd
into the cluster IP and notice that it has been flipping randomly back
and forth between massive/bdc... Im guessing these symptoms are
related to my time issues..
Thank you again for the quick responses!
Ryan Taylor
rtaylor82 at gmail.com
------------------------------
Message: 2
Date: Wed, 16 Nov 2005 17:58:03 -0500
From: Ryan Taylor <rtaylor82 at gmail.com>
Subject: [Linux-HA] Re: Clock jumped backwards
To: linux-ha at lists.linux-ha.org
<611ae4000511161458m47bbb641j9a058a2dcd53ffa3 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
I have taken your advice and turned off ntp.. Even without ntp I still
heartbeat[18620]: 2005/11/16_17:53:24 info: **************************
heartbeat[18620]: 2005/11/16_17:53:24 info: Configuration validated.
Starting heartbeat 2.0.2
heartbeat[18621]: 2005/11/16_17:53:24 info: heartbeat: version 2.0.2
heartbeat[18621]: 2005/11/16_17:53:24 info: Heartbeat generation: 8
heartbeat[18621]: 2005/11/16_17:53:24 info: Removing
/var/run/heartbeat/rsctmp failed, recreating.
heartbeat[18621]: 2005/11/16_17:53:24 info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth1
heartbeat[18621]: 2005/11/16_17:53:24 info: glib: ucast: bound send
socket to device: eth1
heartbeat[18621]: 2005/11/16_17:53:24 info: glib: ucast: bound receive
socket to device: eth1
heartbeat[18621]: 2005/11/16_17:53:24 info: glib: ucast: started on
port 694 interface eth1 to 192.168.0.89
/dev/watchdog
Added signal handler for signal 17
heartbeat[18621]: 2005/11/16_17:53:24 info: pid 18621 locked in memory.
heartbeat[18621]: 2005/11/16_17:53:24 info: Local status now set to: 'up'
heartbeat[18624]: 2005/11/16_17:53:25 info: pid 18624 locked in memory.
heartbeat[18626]: 2005/11/16_17:53:27 info: pid 18626 locked in memory.
heartbeat[18625]: 2005/11/16_17:53:25 info: pid 18625 locked in memory.
heartbeat[18621]: 2005/11/16_17:53:28 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:29 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:30 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:31 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:34 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:33 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:33 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:33 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:34 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:34 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:48 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:47 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:47 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:48 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:53:50 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:10 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:11 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:14 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:24 WARN: node bdc.beefylinux.com: is dead
heartbeat[18621]: 2005/11/16_17:54:24 info: Local status now set to: 'active'
heartbeat[18621]: 2005/11/16_17:54:24 WARN: No STONITH device configured.
heartbeat[18621]: 2005/11/16_17:54:24 WARN: Shared disks are not protected.
heartbeat[18621]: 2005/11/16_17:54:24 info: Resources being acquired
from bdc.beefylinux.com.
harc[18637]: 2005/11/16_17:54:24 info: Running /etc/ha.d/rc.d/status status
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources
acquired
heartbeat[18621]: 2005/11/16_17:54:24 info: Initial resource
acquisition complete (T_RESOURCES(us))
heartbeat[18621]: 2005/11/16_17:54:24 info: mach_down takeover complete.
mach_down[18656]: 2005/11/16_17:54:24 info: mach_down takeover
complete for node bdc.beefylinux.com.
heartbeat[18621]: 2005/11/16_17:54:25 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:26 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:27 info: Clock jumped backwards.
Compensating.
heartbeat[18638]: 2005/11/16_17:54:27 info: Local Resource acquisition
completed.
heartbeat[18621]: 2005/11/16_17:54:27 info: Clock jumped backwards.
Compensating.
harc[18702]: 2005/11/16_17:54:27 info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[18702]: 2005/11/16_17:54:27 received ip-request-resp
sleep::3 OK yes
ResourceManager[18715]: 2005/11/16_17:54:27 info: Acquiring resource
group: massive.beefylinux.com sleep::3 IPaddr::192.168.0.99
heartbeat[18621]: 2005/11/16_17:54:30 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:29 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:29 info: Clock jumped backwards.
Compensating.
heartbeat[18621]: 2005/11/16_17:54:30 info: Clock jumped backwards.
Compensating.
ResourceManager[18715]: 2005/11/16_17:54:30 info: Running
/etc/ha.d/resource.d/sleep 3 start
ResourceManager[18715]: 2005/11/16_17:54:33 info: Running
/etc/ha.d/resource.d/IPaddr 192.168.0.99 start
IPaddr[18805]: 2005/11/16_17:54:33 info: /sbin/ifconfig eth0:0
192.168.0.99 netmask 255.255.255.0 broadcast 192.168.0.255
IPaddr[18805]: 2005/11/16_17:54:35 info: Sending Gratuitous Arp for
192.168.0.99 on eth0:0 [eth0]
IPaddr[18805]: 2005/11/16_17:54:35 /usr/lib/heartbeat/send_arp -i 500
-r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.0.99 eth0
192.168.0.99 auto 192.168.0.99 ffffffffffff
heartbeat[18621]: 2005/11/16_17:54:35 info: Local Resource acquisition
completed. (none)
heartbeat[18621]: 2005/11/16_17:54:35 info: local resource transition completed.
This is with the bdc (node-b) heartbeat off.
------------------------------
Message: 3
Date: Wed, 16 Nov 2005 20:33:21 -0600
From: "Sweet, Larry D" <ldsweet at midsouth.ualr.edu>
Subject: [Linux-HA] monitoring apache
To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
<12629A044DD7974EA207D5BD468741FB8AAB0E at msca-exchange.facstaff.eden.ualr.edu>
Content-Type: text/plain; charset="utf-8"
Hello, All<
Please don't give up on me now! I have almost everything working except the monitor funciton. I just cant get the variables that the apache script needs to be located in the httpd.conf. I think :o)
Is it possible that the /usr/lib/ocf/resources.d/heartbeat/apache script/RA is looking for an LoadModule mod_status statement in httpd.conf?
mod_status is a built in module and is not added using "LoadModule" so "apache monitor" produces the error "Monitoring is not supported by /usr/local/apache2/conf/httpd.conf? AddModule has been discontinued in apache2. What can I do to get the apache script to load the variables for the status-server URL and Port?
Thanks
Larry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.community.tummy.com/pipermail/linux-ha/attachments/20051116/a02d776b/attachment.htm
------------------------------
Message: 4
Date: Tue, 15 Nov 2005 19:22:17 -0700
From: Alan Robertson <alanr at unix.sh>
Subject: Re: [Linux-HA] Serge's script for apache
To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
Message-ID: <437A97D9.6060700 at unix.sh>
Content-Type: text/plain; charset=iso-8859-1; format=flowed

Hello, again,
Alan, how do you "map to a URL?" I have httpd.conf set to allow server-status in the <Location /server-status>

That's how you do it.
I think my materials for the San Francisco class have working apache
configurations in them.
See the http://linux-ha.org/Talks page for the tar ball.
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
------------------------------
Message: 5
Date: Tue, 15 Nov 2005 19:19:52 -0700
From: Alan Robertson <alanr at unix.sh>
Subject: Re: [Linux-HA] haresources question
To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
Message-ID: <437A9748.80406 at unix.sh>
Content-Type: text/plain; charset=us-ascii; format=flowed

Hi,
I have a sequence problem in a customer's HA cluster. We've got two
kmscom01 IPaddr::10.214.16.110/24/eth0 IPaddr::10.214.16.113/24/eth0
Filesystem::/dev/sdb1::/REGVOL::ext3 Filesystem::/dev/sdc1::/KMSVOL::ext3
Filesystem::/dev/sde1::/KMHVOL::ext3 egate_reg egate_kms egate_kmh
kmbcom01 IPaddr::10.214.16.111/24/eth0 IPaddr::10.214.16.112/24/eth0
Filesystem::/dev/sdd1::/KMBVOL::ext3 Filesystem::/dev/sdf1::/KMNVOL::ext3
egate_kmb egate_kmn
Here, the cluster is active-active and runs two SAP R/3 gateway
applications each (egate_kms, egate_kmh, egate_kmb, and egate_kmn), plus a
registry application which must always be started first and stopped last (
egate_reg). During a failover it's okay if egate_reg is restarted with the
local gateway apps running - they can reconnect just fine. Eventually the
customer tried to do a "heartbeat stop" on the machine which held all 5
applications (after a successful failover after shutting one node off), and
heartbeat stopped egate_reg in the middle of everything else. The two
remaining egate_km* processes then refused to shut down because the
registry was missing. This is application specific but we need to figure
out how to help the stuff to shut down cleanly.
The sequence of resource groups in haresources is as above.
How do I make sure that heartbeat stops egate_reg last? Change the sequence
of the resource groups in haresources? Are the resource groups processed
top-to-down and left-to-right (acquire resource group) or right-to-left
(release r.g.)?
We can't currently figure this out by simply trying because the cluster is
productive and we get no downtime...

Make them a single resource group. Then you get the ordering you want.
I don't think there's any guaranteed ordering between groups - at
least not in R1-style clusters.
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
------------------------------
Message: 6
Date: Tue, 15 Nov 2005 19:31:49 -0700
From: Alan Robertson <alanr at unix.sh>
Subject: Re: [Linux-HA] multipath support?
To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
Message-ID: <437A9A15.4030303 at unix.sh>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Keep in mind that from our perspective - as an open source community -
that almost anything goes ;-).

That reads "it will most probably work but don't ever attempt to open a
problem call fer it e.g. with Red Hat or Novell" ;-))

Harald - he's from my company (IBM) and I was answering the other half
of his question - that yes, we expect for his particular configuration
to work - and that if it gets escalated from IBM support line to my
team, we _will_ handle it, and not send him packing.
But, that's not quite what he asked, hence my more general comment about
"anything goes".
So, if he wasn't confused before, he probably is now :-(
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
------------------------------
Message: 7
Date: Wed, 16 Nov 2005 19:01:56 -0700
From: Alan Robertson <alanr at unix.sh>
Subject: Re: [Linux-HA] Clock jumped backwards
To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
Message-ID: <437BE494.7040303 at unix.sh>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Do u have other programs running that may adjust the time automatically,
e.g, ntpd?
Another more important problem is that two nodes massive/bdc cannot see
other since they both claims the other is dead.
It is usually caused by firewall, so u may want to check that

I have one computer which does this at my house. It's an SMP box with
Pentium II or III processors.
I got rid of this message in R2 for EXACTLY this reason.
It's real. It's not caused by NNTP, and it's very annoying.
My guess is it's related to it being an SMP box.
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
------------------------------
Message: 8
Date: Tue, 15 Nov 2005 01:51:45 -0700
From: Alan Robertson <alanr at unix.sh>
Subject: Re: [Linux-HA] Re: tengine (was Re: Re: Linux-HA Digest, Vol
24, Issue 45)
To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
Message-ID: <4379A1A1.6030803 at unix.sh>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi!
I run heartbeat-2.0.1-1 on a 3node linux 2.6 cluster having configured
2 resource groups (I've attached the cibadmin -Q to this email)
1. is that the tengine process goes sometimes probably in to a loop
using as much CPU as it gets.

snip...

Can you use gdb to attach to the process and post a backtrace when this
happens?
thanks
-Guochun

Sorry for the subject 8-(
Some additional info: I did a failover node3 -> node2 by stopping the
heartbeat on node3. After node3:/../heartbeat start the resource moved
back to node3 but there is no "tengine" process there running!?

tengine and pengine only run on the DC (designated controller)
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
------------------------------
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
End of Linux-HA Digest, Vol 24, Issue 72
****************************************

Continue reading on narkive:

Search results for '[Linux-HA] Clock jumped backwards' (Questions and Answers)

replies

Your thoughts on a story I'm writing?

started 2009-04-12 05:19:21 UTC

books & authors

replies

Have you ever seen digital clock jump to previous number?

started 2009-05-19 07:43:00 UTC

alternative

replies

Have you ever seen a second hand tick BACKWARDS on a clock?