[Linux-HA] heartbeat thinks other node is dead

Discussion:

[Linux-HA] heartbeat thinks other node is dead

Howard Yuan

2007-03-06 22:56:32 UTC

Hi all,

I'm fairly new to Linux. I know a little bit about it. I'm currently running SLES 10 and I'm trying to set up DRBD and Heartbeat. I got the DRBD portion working and now I'm trying to get Heartbeat working. I currently have Heartbeat 2.0.8 installed on my two SLES 10 systems. I followed a few guides that I found online and I believe I have it configured right. However, every time I start up heartbeat (whether on both server simultaneously or one after the other), I always see in the log that it doesn't see the other node. Node A declares Node B is dead and binds my test IP address and Node B declares Node A as being dead and binds the test IP address as well. I can't figure out why it's doing this and I believe maybe I messed something up in my configuration file?

Any advice, comments, or suggestions are greatly appreciated. Thanks in advance.
Howard

Node A's ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 80
udpport 694
bcast eth1
auto_failback on
node LU3-US
node LU4-US
ping 192.168.15.1 192.168.15.2
#ping_group group1 192.168.15.1 192.168.15.2
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster
apiauth ping gid=haclient uid=root
compression bz2
compression_threshold 2
crm yes

Node B's ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 80
udpport 694
bcast eth0
auto_failback on
node LU3-US
node LU4-US
ping 192.168.15.1 192.168.15.2
#ping_group group1 192.168.15.1 192.168.15.2
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster
apiauth ping gid=haclient uid=root
compression bz2
compression_threshold 2
crm yes

Node A's and Node B's config's only difference is the broadcast port. Node A and B both have two network cards. The second cards are connected via a crossover cable while their first cards are connected to the main switch. The 192.168.x.x address is bound on Node A's eth1 while it's on eth0 on Node B's. They are able to ping each other using just the node name (i.e. LU4-US). Any help is greatly appreciated.

Yan Fitterer

2007-03-06 23:35:30 UTC

Firewall? Heartbeat by default will communicate through UDP on port 694 (IIRC). Try with firewall disabled (it is
enabled by default on SLES10).

What IP is on each of the interfaces on both hosts?

What IP do the LU*-US hostnames resolve to?

Maybe post the output of "ifconfig", "ping LU3-US" "ping LU4-US" "ping 192.168.x.x" where x.x is the IP address of the
other host over the Xover cable.

"other node dead" simply means that HB cannot communicate to the other node in the cluster.

Why do you use eth0 on one node and eth1 on the other? You say "the secondary cards are connected via a cross-over
cable". If that is true, then I would expect both hosts to have "bcast eth1" in the ha.cf

HTH
Yan

PS - don't forget - you _really_ need redundant communication paths between the hosts.

Post by Howard Yuan

On 06/03/2007 at 22:56, in message <45ED80AF.8712.00C1.0 at Valence.com>, "Howard

Hi all,
I'm fairly new to Linux. I know a little bit about it. I'm currently running
SLES 10 and I'm trying to set up DRBD and Heartbeat. I got the DRBD portion
working and now I'm trying to get Heartbeat working. I currently have
Heartbeat 2.0.8 installed on my two SLES 10 systems. I followed a few guides
that I found online and I believe I have it configured right. However, every
time I start up heartbeat (whether on both server simultaneously or one after
the other), I always see in the log that it doesn't see the other node. Node
A declares Node B is dead and binds my test IP address and Node B declares
Node A as being dead and binds the test IP address as well. I can't figure
out why it's doing this and I believe maybe I messed something up in my
configuration file?
Any advice, comments, or suggestions are greatly appreciated. Thanks in advance.
Howard
Node A's ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 80
udpport 694
bcast eth1
auto_failback on
node LU3-US
node LU4-US
ping 192.168.15.1 192.168.15.2
#ping_group group1 192.168.15.1 192.168.15.2
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster
apiauth ping gid=haclient uid=root
compression bz2
compression_threshold 2
crm yes
Node B's ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 80
udpport 694
bcast eth0
auto_failback on
node LU3-US
node LU4-US
ping 192.168.15.1 192.168.15.2
#ping_group group1 192.168.15.1 192.168.15.2
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster
apiauth ping gid=haclient uid=root
compression bz2
compression_threshold 2
crm yes
Node A's and Node B's config's only difference is the broadcast port. Node A
and B both have two network cards. The second cards are connected via a
crossover cable while their first cards are connected to the main switch. The
192.168.x.x address is bound on Node A's eth1 while it's on eth0 on Node B's.
They are able to ping each other using just the node name (i.e. LU4-US). Any
help is greatly appreciated.
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Howard Yuan

2007-03-07 00:27:50 UTC

Hi Yan,

Thanx for the reply. Here's the information you asked for:

Node A = LU3-US
eth0: 10.0.0.2
eth1: 12.29.124.31

Node B = LU4-US
eth0: 12.29.124.33
eth1: 10.0.0.3

The floating address I'm trying to pass around is 10.0.0.4.

LU3-US' eth0 is connected to LU4-US' eth1 via a crossover cable.

Pinging to LU*-US will resolve to the 12.29.124.x address. Both systems are capable of pinging LU*-US, 192.168.15.x, or 10.0.0.2(3).

Let me explain a little further. The 12 address is our public address and allows the outside world to communicate directly to them. The 192 address is the address we use internally. Our router has an address on each range, so machine from both range are capable of talking to each other. The 10 address is unique only to these two systems, as they are connected by the crossover cable.

Also, from further browsing of the mailing list, I found some articles mentioning that IPFail doesn't work with CRM, do you know if that is still the case?

I tried disabling the firewall and the same thing still happened.

Hope this will help you understand my problem better. Thank you.
Firewall? Heartbeat by default will communicate through UDP on port 694 (IIRC). Try with firewall disabled (it is
enabled by default on SLES10).

What IP is on each of the interfaces on both hosts?

What IP do the LU*-US hostnames resolve to?

Maybe post the output of "ifconfig", "ping LU3-US" "ping LU4-US" "ping 192.168.x.x" where x.x is the IP address of the
other host over the Xover cable.

"other node dead" simply means that HB cannot communicate to the other node in the cluster.

Why do you use eth0 on one node and eth1 on the other? You say "the secondary cards are connected via a cross-over
cable". If that is true, then I would expect both hosts to have "bcast eth1" in the ha.cf

HTH
Yan

PS - don't forget - you _really_ need redundant communication paths between the hosts.

Post by Howard Yuan

On 06/03/2007 at 22:56, in message <45ED80AF.8712.00C1.0 at Valence.com>, "Howard

Hi all,
I'm fairly new to Linux. I know a little bit about it. I'm currently running
SLES 10 and I'm trying to set up DRBD and Heartbeat. I got the DRBD portion
working and now I'm trying to get Heartbeat working. I currently have
Heartbeat 2.0.8 installed on my two SLES 10 systems. I followed a few guides
that I found online and I believe I have it configured right. However, every
time I start up heartbeat (whether on both server simultaneously or one after
the other), I always see in the log that it doesn't see the other node. Node
A declares Node B is dead and binds my test IP address and Node B declares
Node A as being dead and binds the test IP address as well. I can't figure
out why it's doing this and I believe maybe I messed something up in my
configuration file?
Any advice, comments, or suggestions are greatly appreciated. Thanks in advance.
Howard
Node A's ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 80
udpport 694
bcast eth1
auto_failback on
node LU3-US
node LU4-US
ping 192.168.15.1 192.168.15.2
#ping_group group1 192.168.15.1 192.168.15.2
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster
apiauth ping gid=haclient uid=root
compression bz2
compression_threshold 2
crm yes
Node B's ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 80
udpport 694
bcast eth0
auto_failback on
node LU3-US
node LU4-US
ping 192.168.15.1 192.168.15.2
#ping_group group1 192.168.15.1 192.168.15.2
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster
apiauth ping gid=haclient uid=root
compression bz2
compression_threshold 2
crm yes
Node A's and Node B's config's only difference is the broadcast port. Node A
and B both have two network cards. The second cards are connected via a
crossover cable while their first cards are connected to the main switch. The
192.168.x.x address is bound on Node A's eth1 while it's on eth0 on Node B's.
They are able to ping each other using just the node name (i.e. LU4-US). Any
help is greatly appreciated.
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux ( http://lists.linux/ )-ha.org/mailman/listinfo/linux-ha
See also: http://linux ( http://linux/ )-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux ( http://lists.linux/ )-ha.org/mailman/listinfo/linux-ha
See also: http://linux ( http://linux/ )-ha.org/ReportingProblems

--
Howard Yuan
I.T. Department
Valence Technology, Inc.
http://www.valence.com/

Yan Fitterer

2007-03-07 13:33:58 UTC

I _think_ your problem is because the interfaces you've configured hb to use don't hold the IPs that the node names
resolve to.

So - either you make LU*-US resolve to 10.0.0.x, or you change ha.cf so that LU3-US is "bcast eth1", and the other is
"bcast eth0", or you add both "bcast eth0" and "bcast eth1" to both nodes.

The latter would give you redundant paths for hb communications, so would be a good idea anyway.

Yan

Post by Howard Yuan

Post by Howard Yuan

On Wed, Mar 7, 2007 at 12:27 AM, in message <45ED9668.8712.00C1.0 at Valence.com>,

Hi Yan,
Node A = LU3- US
eth0: 10.0.0.2
eth1: 12.29.124.31
Node B = LU4- US
eth0: 12.29.124.33
eth1: 10.0.0.3
The floating address I'm trying to pass around is 10.0.0.4.
LU3- US' eth0 is connected to LU4- US' eth1 via a crossover cable.
Pinging to LU*- US will resolve to the 12.29.124.x address. Both systems are
capable of pinging LU*- US, 192.168.15.x, or 10.0.0.2(3).
Let me explain a little further. The 12 address is our public address and
allows the outside world to communicate directly to them. The 192 address is
the address we use internally. Our router has an address on each range, so
machine from both range are capable of talking to each other. The 10 address
is unique only to these two systems, as they are connected by the crossover
cable.
Also, from further browsing of the mailing list, I found some articles
mentioning that IPFail doesn't work with CRM, do you know if that is still
the case?
I tried disabling the firewall and the same thing still happened.
Hope this will help you understand my problem better. Thank you.
Firewall? Heartbeat by default will communicate through UDP on port 694
(IIRC). Try with firewall disabled (it is
enabled by default on SLES10).
What IP is on each of the interfaces on both hosts?
What IP do the LU*- US hostnames resolve to?
Maybe post the output of "ifconfig", "ping LU3- US" "ping LU4- US" "ping
192.168.x.x" where x.x is the IP address of the
other host over the Xover cable.
"other node dead" simply means that HB cannot communicate to the other node in the cluster.
Why do you use eth0 on one node and eth1 on the other? You say "the
secondary cards are connected via a cross- over
cable". If that is true, then I would expect both hosts to have "bcast eth1" in the ha.cf
HTH
Yan
PS - don't forget - you _really_ need redundant communication paths between
the hosts.

Post by Howard Yuan

On 06/03/2007 at 22:56, in message <45ED80AF.8712.00C1.0 at Valence.com>, "Howard

Hi all,
I'm fairly new to Linux. I know a little bit about it. I'm currently running
SLES 10 and I'm trying to set up DRBD and Heartbeat. I got the DRBD portion
working and now I'm trying to get Heartbeat working. I currently have
Heartbeat 2.0.8 installed on my two SLES 10 systems. I followed a few guides
that I found online and I believe I have it configured right. However, every
time I start up heartbeat (whether on both server simultaneously or one

after

Post by Howard Yuan
the other), I always see in the log that it doesn't see the other node. Node
A declares Node B is dead and binds my test IP address and Node B declares
Node A as being dead and binds the test IP address as well. I can't figure
out why it's doing this and I believe maybe I messed something up in my
configuration file?
Any advice, comments, or suggestions are greatly appreciated. Thanks in advance.
Howard
Node A's ha.cf
debugfile /var/log/ha- debug
logfile /var/log/ha- log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 80
udpport 694
bcast eth1
auto_failback on
node LU3- US
node LU4- US
ping 192.168.15.1 192.168.15.2
#ping_group group1 192.168.15.1 192.168.15.2
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster
apiauth ping gid=haclient uid=root
compression bz2
compression_threshold 2
crm yes
Node B's ha.cf
debugfile /var/log/ha- debug
logfile /var/log/ha- log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 80
udpport 694
bcast eth0
auto_failback on
node LU3- US
node LU4- US
ping 192.168.15.1 192.168.15.2
#ping_group group1 192.168.15.1 192.168.15.2
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster
apiauth ping gid=haclient uid=root
compression bz2
compression_threshold 2
crm yes
Node A's and Node B's config's only difference is the broadcast port. Node A
and B both have two network cards. The second cards are connected via a
crossover cable while their first cards are connected to the main switch.

The

Post by Howard Yuan
192.168.x.x address is bound on Node A's eth1 while it's on eth0 on Node

B's.

Post by Howard Yuan
They are able to ping each other using just the node name (i.e. LU4- US). Any
help is greatly appreciated.
_______________________________________________
Linux- HA mailing list
Linux- HA at lists.linux- ha.org
http://lists.linux ( http://lists.linux/ )- ha.org/mailman/listinfo/linux- ha
See also: http://linux ( http://linux/ )- ha.org/ReportingProblems

_______________________________________________
Linux- HA mailing list
Linux- HA at lists.linux- ha.org
http://lists.linux ( http://lists.linux/ )- ha.org/mailman/listinfo/linux- ha
See also: http://linux ( http://linux/ )- ha.org/ReportingProblems

Alan Robertson

2007-03-07 15:40:09 UTC

Post by Yan Fitterer
I _think_ your problem is because the interfaces you've configured hb
to use don't hold the IPs that the node names resolve to.
So - either you make LU*-US resolve to 10.0.0.x, or you change ha.cf
so that LU3-US is "bcast eth1", and the other is "bcast eth0", or you
add both "bcast eth0" and "bcast eth1" to both nodes.

There is no necessity for host names to match IP addresses.
Broadcasting doesn't involve knowing IP addresses. And, even in the
case of ucast addresses, the DNS/hosts name of IP addresses don't have
to match the uname -n host names.

When a packet is received, the source IP address (if any) is ignored, as
is the destination IP address. It just has to be received somehow.
Make it show up on in one of the media input processes, and if it's
digitally signed correctly, it will be paid attention to.

I hope that eth1 on node A is the same subnet as eth0 on node B.

If that isn't the case, then that's going to be a serious handicap ;-)

--
Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce

Howard Yuan

2007-03-07 17:28:53 UTC

Post by Yan Fitterer
I _think_ your problem is because the interfaces you've configured hb
to use don't hold the IPs that the node names resolve to.
So - either you make LU*-US resolve to 10.0.0.x, or you change ha.cf
so that LU3-US is "bcast eth1", and the other is "bcast eth0", or you
add both "bcast eth0" and "bcast eth1" to both nodes.

There is no necessity for host names to match IP addresses.
Broadcasting doesn't involve knowing IP addresses. And, even in the
case of ucast addresses, the DNS/hosts name of IP addresses don't have
to match the uname -n host names.

When a packet is received, the source IP address (if any) is ignored,
as
is the destination IP address. It just has to be received somehow.
Make it show up on in one of the media input processes, and if it's
digitally signed correctly, it will be paid attention to.

I hope that eth1 on node A is the same subnet as eth0 on node B.

If that isn't the case, then that's going to be a serious handicap ;-)
--
Alan Robertson <alanr at unix.sh>

*----------------------------------------------------------------------------------

Node A's eth1 is on the same subnet as eth0 on node B. For the sake of
simplicity, I will rearrange the IP configurations on the systems. The
reason it's set up like it is now is because the live systems that I
will be using heartbeat for is set up the same. So...the closer the
better. But first, I gotta get it working so I can understand it first.
It's a good thing I'm doing this in a test environment. Hee hee hee. :)

I'll report back and let you guys know what happen. If you need to see
a log, I can provide that too.

Thanks!
Howard

Lars Marowsky-Bree

2007-03-07 20:39:40 UTC

Post by Howard Yuan
Node A's eth1 is on the same subnet as eth0 on node B. For the sake of
simplicity, I will rearrange the IP configurations on the systems.

That's totally confusing and unnecessary. Just rename the interfaces to
match names everywhere, you'll be much happier that way.

Howard Yuan

2007-03-13 18:52:05 UTC

What do you mean by that? As in...well...what I don't get is...is there a rename an interface (as in eth1 -> LU4) in Linux?

Howard

Post by Howard Yuan
Node A's eth1 is on the same subnet as eth0 on node B. For the sake of
simplicity, I will rearrange the IP configurations on the systems.

That's totally confusing and unnecessary. Just rename the interfaces to
match names everywhere, you'll be much happier that way.

Howard Yuan

2007-03-13 20:50:17 UTC

Well, after messing around with it some more, turns out that it was the firewall blocking the traffic (I also had to have broadcast on both eth0 and eth1 as Yan suggested).

So, it's working now, but my question is...doesn't it use port 694 UDP? I allowed that and it doesn't work until I turn off the firewall. Does it use another port as well? Or is it 694 TCP? Anybody know? Thanx for all the help, Yan, Alan.

Howard Yuan
I.T. Department
Valence Technology, Inc.
http://www.valence.com/

Post by Yan Fitterer
I _think_ your problem is because the interfaces you've configured hb
to use don't hold the IPs that the node names resolve to.
So - either you make LU*-US resolve to 10.0.0.x, or you change ha.cf
so that LU3-US is "bcast eth1", and the other is "bcast eth0", or you
add both "bcast eth0" and "bcast eth1" to both nodes.

There is no necessity for host names to match IP addresses.
Broadcasting doesn't involve knowing IP addresses. And, even in the
case of ucast addresses, the DNS/hosts name of IP addresses don't have
to match the uname -n host names.

When a packet is received, the source IP address (if any) is ignored, as
is the destination IP address. It just has to be received somehow.
Make it show up on in one of the media input processes, and if it's
digitally signed correctly, it will be paid attention to.

I hope that eth1 on node A is the same subnet as eth0 on node B.

If that isn't the case, then that's going to be a serious handicap ;-)

--
Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce

Alan Robertson

2007-03-13 21:03:27 UTC

Post by Howard Yuan
Well, after messing around with it some more, turns out that it was
the firewall blocking the traffic (I also had to have broadcast on
both eth0 and eth1 as Yan suggested).
So, it's working now, but my question is...doesn't it use port 694 UDP? I

allowed that and it doesn't work until I turn off the firewall. Does it
use another port as well? Or is it 694 TCP? Anybody know? Thanx for all
the help, Yan, Alan.

By default uses port 694 UDP.

The usual root cause for this kind of occurrence is that the person
managing the firewall didn't do what they thought they did. It happens
pretty frequently.

You can also tcpdump to figure out exactly which port it's really using.

You shouldn't _have_ to broadcast on both networks in order to get it to
work, but you _should_ have redundant media.

Post by Howard Yuan
What do you mean by that? As in...well...what I don't get is...is
there a rename an interface (as in eth1 -> LU4) in Linux?

There are many ways to do it -- but if you make eth1 on one machine
match eth1 on the other, you will indeed be much happier.

A simple way is to swap cables ;-).

Another is to reverse which slots the PCI cards are in.

There are other ways as well...

--
Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce

Yan Fitterer

2007-03-14 08:51:07 UTC

Post by Howard Yuan

Post by Howard Yuan

On 13/03/2007 at 21:03, in message <45F7119F.1030607 at unix.sh>, Alan Robertson

Well, after messing around with it some more, turns out that it was
the firewall blocking the traffic (I also had to have broadcast on
both eth0 and eth1 as Yan suggested).
So, it's working now, but my question is...doesn't it use port 694 UDP? I

allowed that and it doesn't work until I turn off the firewall. Does it
use another port as well? Or is it 694 TCP? Anybody know? Thanx for all
the help, Yan, Alan.
By default uses port 694 UDP.
The usual root cause for this kind of occurrence is that the person
managing the firewall didn't do what they thought they did. It happens
pretty frequently.

Another likely candidate is that the firewall blocks broadcasts - that's not uncommon. So although you had explicitly
allowed UDP 694, you may have had another rule that was blocking broadcasts regardless of port.

If I look at my standard SUSE firewall (SLES10), I see a DROP rule that matches "PKTTYPE = broadcast"

Maybe you have the same in your disto.

Howard Yuan

2007-03-14 17:23:13 UTC

Ahh...I am using SLES10's firewall...and...well...I can't find where any of the rules are kept. I went through all the options on the left... :(

Post by Howard Yuan

Post by Howard Yuan

On 13/03/2007 at 21:03, in message <45F7119F.1030607 at unix.sh>, Alan Robertson

Well, after messing around with it some more, turns out that it was
the firewall blocking the traffic (I also had to have broadcast on
both eth0 and eth1 as Yan suggested).
So, it's working now, but my question is...doesn't it use port 694 UDP? I

allowed that and it doesn't work until I turn off the firewall. Does it
use another port as well? Or is it 694 TCP? Anybody know? Thanx for all
the help, Yan, Alan.
By default uses port 694 UDP.
The usual root cause for this kind of occurrence is that the person
managing the firewall didn't do what they thought they did. It happens
pretty frequently.

Another likely candidate is that the firewall blocks broadcasts - that's not uncommon. So although you had explicitly
allowed UDP 694, you may have had another rule that was blocking broadcasts regardless of port.

If I look at my standard SUSE firewall (SLES10), I see a DROP rule that matches "PKTTYPE = broadcast"

Maybe you have the same in your disto.

_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux ( http://lists.linux/ )-ha.org/mailman/listinfo/linux-ha
See also: http://linux ( http://linux/ )-ha.org/ReportingProblems

--
Howard Yuan
I.T. Department
Valence Technology, Inc.
http://www.valence.com/

Yan Fitterer

2007-03-14 17:28:41 UTC

/etc/sysconfig/SuSEfirewall2

Post by Howard Yuan

Post by Howard Yuan

On 14/03/2007 at 17:23, in message <45F7BF00.8712.00C1.0 at Valence.com>, "Howard

Ahh...I am using SLES10's firewall...and...well...I can't find where any of
the rules are kept. I went through all the options on the left... :(

Post by Howard Yuan

On 13/03/2007 at 21:03, in message <45F7119F.1030607 at unix.sh>, Alan Robertson

Well, after messing around with it some more, turns out that it was
the firewall blocking the traffic (I also had to have broadcast on
both eth0 and eth1 as Yan suggested).
So, it's working now, but my question is...doesn't it use port 694 UDP? I

allowed that and it doesn't work until I turn off the firewall. Does it
use another port as well? Or is it 694 TCP? Anybody know? Thanx for all
the help, Yan, Alan.
By default uses port 694 UDP.
The usual root cause for this kind of occurrence is that the person
managing the firewall didn't do what they thought they did. It happens
pretty frequently.

Another likely candidate is that the firewall blocks broadcasts - that's not
uncommon. So although you had explicitly
allowed UDP 694, you may have had another rule that was blocking broadcasts
regardless of port.
If I look at my standard SUSE firewall (SLES10), I see a DROP rule that
matches "PKTTYPE = broadcast"
Maybe you have the same in your disto.
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux ( http://lists.linux/ )-ha.org/mailman/listinfo/linux-ha
See also: http://linux ( http://linux/ )-ha.org/ReportingProblems

Howard Yuan

2007-03-14 22:17:42 UTC

Thanx, Yan! That did it. I added 694 to the broadcast and it fixed the problem. :)
/etc/sysconfig/SuSEfirewall2

Post by Howard Yuan

Post by Howard Yuan

On 14/03/2007 at 17:23, in message <45F7BF00.8712.00C1.0 at Valence.com>, "Howard

Ahh...I am using SLES10's firewall...and...well...I can't find where any of
the rules are kept. I went through all the options on the left... :(

Post by Howard Yuan

On 13/03/2007 at 21:03, in message <45F7119F.1030607 at unix.sh>, Alan Robertson

Well, after messing around with it some more, turns out that it was
the firewall blocking the traffic (I also had to have broadcast on
both eth0 and eth1 as Yan suggested).
So, it's working now, but my question is...doesn't it use port 694 UDP? I

allowed that and it doesn't work until I turn off the firewall. Does it
use another port as well? Or is it 694 TCP? Anybody know? Thanx for all
the help, Yan, Alan.
By default uses port 694 UDP.
The usual root cause for this kind of occurrence is that the person
managing the firewall didn't do what they thought they did. It happens
pretty frequently.

Another likely candidate is that the firewall blocks broadcasts - that's not
uncommon. So although you had explicitly
allowed UDP 694, you may have had another rule that was blocking broadcasts
regardless of port.
If I look at my standard SUSE firewall (SLES10), I see a DROP rule that
matches "PKTTYPE = broadcast"
Maybe you have the same in your disto.
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux ( http://lists.linux/ ) ( http://lists.linux/ )-ha.org/mailman/listinfo/linux-ha
See also: http://linux ( http://linux/ ) ( http://linux/ )-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux ( http://lists.linux/ )-ha.org/mailman/listinfo/linux-ha
See also: http://linux ( http://linux/ )-ha.org/ReportingProblems

--
Howard Yuan
I.T. Department
Valence Technology, Inc.
http://www.valence.com/

Howard Yuan

2007-03-20 19:17:35 UTC

Just wondering if my system is setup correctly or not. If I have two servers set up for Heartbeat, both servers has two ethernet cards, where eth1 were connected to each other and configured with a 10 network address, and eth0 is connected to the main hub configured with a 192 network address....

Do I have a redundant communication path set up here? Or no?

Just want to make sure I'm understanding this correctly. :)

Thanks!

Howard

Yan Fitterer

2007-03-20 19:45:53 UTC

You do - at least physically.

To make sure heartbeat uses both, you'll need the correct configuration
in ha.cf though.

Yan

Post by Howard Yuan
Just wondering if my system is setup correctly or not. If I have two servers set up for Heartbeat, both servers has two ethernet cards, where eth1 were connected to each other and configured with a 10 network address, and eth0 is connected to the main hub configured with a 192 network address....
Do I have a redundant communication path set up here? Or no?
Just want to make sure I'm understanding this correctly. :)
Thanks!
Howard
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3397 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linux-ha.org/pipermail/linux-ha/attachments/20070320/679ccdb3/attachment.bin>

Howard Yuan

2007-03-20 20:02:18 UTC

Okie, I think I got it configured to use both since I can break one connection and the nodes won't claim the others dead. :) YAAAY~! :)
You do - at least physically.

To make sure heartbeat uses both, you'll need the correct configuration
in ha.cf though.

Yan

Post by Howard Yuan
Just wondering if my system is setup correctly or not. If I have two servers set up for Heartbeat, both servers has two ethernet cards, where eth1 were connected to each other and configured with a 10 network address, and eth0 is connected to the main hub configured with a 192 network address....
Do I have a redundant communication path set up here? Or no?
Just want to make sure I'm understanding this correctly. :)
Thanks!
Howard
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux ( http://lists.linux/ )-ha.org/mailman/listinfo/linux-ha
See also: http://linux ( http://linux/ )-ha.org/ReportingProblems

Alan Robertson

2007-03-07 03:05:07 UTC

Post by Howard Yuan
Hi all,
I'm fairly new to Linux. I know a little bit about it. I'm currently running SLES 10 and I'm trying to set up DRBD and Heartbeat. I got the DRBD portion working and now I'm trying to get Heartbeat working. I currently have Heartbeat 2.0.8 installed on my two SLES 10 systems. I followed a few guides that I found online and I believe I have it configured right. However, every time I start up heartbeat (whether on both server simultaneously or one after the other), I always see in the log that it doesn't see the other node. Node A declares Node B is dead and binds my test IP address and Node B declares Node A as being dead and binds the test IP address as well. I can't figure out why it's doing this and I believe maybe I messed something up in my configuration file?
Any advice, comments, or suggestions are greatly appreciated. Thanks in advance.
Howard

Howard:

Yan has it right. 90% chance your problem is a firewall enabled. By
default most Linuxes start firewalls which would DEFINITELY block our port.

--
Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce

Continue reading on narkive:

Search results for '[Linux-HA] heartbeat thinks other node is dead' (Questions and Answers)

ok this is another question here. I have a serious heart problem that will claim my life?

started 2009-06-08 00:44:54 UTC

Do you believe the stories of people who say they have come back from the dead?

started 2016-06-25 09:28:26 UTC

religion & spirituality

When is someone truly dead? (As in, their spirit has left their body..)?

started 2008-03-23 01:56:22 UTC

religion & spirituality

Will I die from this?

started 2007-07-11 19:34:03 UTC

is rapid heartbeat bad?

started 2009-02-28 20:31:14 UTC

17 Replies
102 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Howard Yuan 2007-03-06 22:56:32 UTC

Yan Fitterer 2007-03-06 23:35:30 UTC

Howard Yuan 2007-03-07 00:27:50 UTC

Yan Fitterer 2007-03-07 13:33:58 UTC

Alan Robertson 2007-03-07 15:40:09 UTC

Howard Yuan 2007-03-07 17:28:53 UTC

Lars Marowsky-Bree 2007-03-07 20:39:40 UTC

Howard Yuan 2007-03-13 18:52:05 UTC

Howard Yuan 2007-03-13 20:50:17 UTC

Alan Robertson 2007-03-13 21:03:27 UTC

Yan Fitterer 2007-03-14 08:51:07 UTC

Howard Yuan 2007-03-14 17:23:13 UTC

Yan Fitterer 2007-03-14 17:28:41 UTC

Howard Yuan 2007-03-14 22:17:42 UTC

Howard Yuan 2007-03-20 19:17:35 UTC

Yan Fitterer 2007-03-20 19:45:53 UTC

Howard Yuan 2007-03-20 20:02:18 UTC

Alan Robertson 2007-03-07 03:05:07 UTC

about - legalese

Loading...