[Linux-HA] multipath sbd stonith device recommended configuration

Discussion:

Muhammad Sharfuddin

2015-01-15 15:33:38 UTC

I have to put this 2 node active/passive cluster in production very soon
and I have tested the resource migration
works perfectly in case of the node running the resource goes
down(abruptly/forcefully).

I have always read and heard to increase msgwait and watchdog timeout
when sbd is a multipath disk, but in my case
I have just created the disk via
sbd -d /dev/mapper/mpathe create

and I have following resource for sbd
primitive sbd_stonith stonith:external/sbd \
op monitor interval="3000" timeout="120" start-delay="21" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120" \
params sbd_device="/dev/mapper/mpathe"

as of now I am quite satisfied, but should I increase the msgwait and
watchdog timeouts ?

also I am using the start-delay=21 for "op monitor interval" should I
also use the start-delay=11 for "op start interval"

Please recommend

--
Regards,

Muhammad Sharfuddin
Cell: +92-3332144823 | UAN: +92(21) 111-111-142 ext: 113 | NDS.COM.PK
<http://www.nds.com.pk>

Muhammad Sharfuddin

2015-01-15 15:40:52 UTC

Permalink

Post by Muhammad Sharfuddin
I have to put this 2 node active/passive cluster in production very

soon and I have tested the resource migration

Post by Muhammad Sharfuddin
works perfectly in case of the node running the resource goes

down(abruptly/forcefully).

Post by Muhammad Sharfuddin
I have always read and heard to increase msgwait and watchdog timeout

when sbd is a multipath disk, but in my case

Post by Muhammad Sharfuddin
I have just created the disk via
sbd -d /dev/mapper/mpathe create
and I have following resource for sbd
primitive sbd_stonith stonith:external/sbd \
op monitor interval="3000" timeout="120" start-delay="21" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120" \
params sbd_device="/dev/mapper/mpathe"
as of now I am quite satisfied, but should I increase the msgwait and

watchdog timeouts ?

Post by Muhammad Sharfuddin
also I am using the start-delay=21 for "op monitor interval" should I

also use the start-delay=11 for "op start interval"

Post by Muhammad Sharfuddin
Please recommend

Oh I forgot to mention:

cat /etc/sysconfig/sbd
SBD_DEVICE="/dev/mapper/mpathe"
SBD_OPTS="-W"

sbd -d /dev/mapper/mpathe dump
==Dumping header on disk /dev/mapper/mpathe
Header version : 2.1
UUID : 505dc5b5-5da0-463e-a4fa-1ce55384542a
Number of slots : 255
Sector size : 512
Timeout (watchdog) : 5
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 10
==Header on disk /dev/mapper/mpathe is dumped

sbd -d /dev/mapper/mpathe list
0 node2 clear
1 node1 clear

--
Regards,

Muhammad Sharfuddin

Ulrich Windl

2015-01-16 07:11:48 UTC

Permalink

Hi!

MHO: The correct time to wait is in an interval bounded by these two values:
1: An I/O delay that may occur during normal operation that is never allowed to trigger fencing
2: The maximum value to are willing to accept to wait for fencing to occur

Many people thing making 1 close to zero and 2 as small as possible is the best solution.

But imagine one of your SBD disks has some read problem, and the operation has be be retried a few times. Or think about "online" upgrading your disk firmware, etc.: Usually I/Os are stopped for a short time (typically less than one minute).

So once you have determined you timeout value for your environment, you can configure SBD. We have a rather long timeout, so SBD fencing can take some time. That means usually fencing takes place in a few seconds, but the cluster waits the longer time to make sure the node must have processed the SBD fencing command (fencing is not confirmed at the SBD level: You send the fencing command on SBD, then you expect that every node reads the command after some delay (and thus performs the command).

Unfortunately the SBD syntax is a real mess, and there is not manual page (AFAIK) for SBD.
YOu can change the SBD parameters (on disk) online, but to be effective, the SBD daemon has to be restarted.

I hope this helps.

Regards,
Ulrich

Post by Muhammad Sharfuddin
I have to put this 2 node active/passive cluster in production very soon
and I have tested the resource migration
works perfectly in case of the node running the resource goes
down(abruptly/forcefully).
I have always read and heard to increase msgwait and watchdog timeout
when sbd is a multipath disk, but in my case
I have just created the disk via
sbd -d /dev/mapper/mpathe create
and I have following resource for sbd
primitive sbd_stonith stonith:external/sbd \
op monitor interval="3000" timeout="120" start-delay="21" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120" \
params sbd_device="/dev/mapper/mpathe"
as of now I am quite satisfied, but should I increase the msgwait and
watchdog timeouts ?
also I am using the start-delay=21 for "op monitor interval" should I
also use the start-delay=11 for "op start interval"
Please recommend
--
Regards,
Muhammad Sharfuddin
Cell: +92-3332144823 | UAN: +92(21) 111-111-142 ext: 113 | NDS.COM.PK
<http://www.nds.com.pk>
_______________________________________________
Linux-HA mailing list
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Lars Marowsky-Bree

2015-01-16 10:33:03 UTC

Permalink

Post by Ulrich Windl
Hi!
1: An I/O delay that may occur during normal operation that is never allowed to trigger fencing
2: The maximum value to are willing to accept to wait for fencing to occur
Many people thing making 1 close to zero and 2 as small as possible is the best solution.
But imagine one of your SBD disks has some read problem, and the operation has be be retried a few times. Or think about "online" upgrading your disk firmware, etc.: Usually I/Os are stopped for a short time (typically less than one minute).

Newer versions of SBD are less affected by this (and by newer, I mean
"about 2 years ago"). sbd uses async IO and every IO request is timed
out individually; so IO no longer "gets stuck". As long as one read gets
through within the watchdog period, you're going to be OK.

In addition to that, I'd strongly recommend enabling the pacemaker
integration, which (since it was new functionality) couldn't be
auto-enabled on SLE HA 11, but is standard on SLE HA 12. On SLE HA 11,
it needs to be enabled using the -P switch.

Then you can enjoy shorter timeouts for SBD and thus lower fail-over
latencies even on a single MPIO device.

Post by Ulrich Windl
Unfortunately the SBD syntax is a real mess, and there is not manual page (AFAIK) for SBD.

... because "man sbd" isn't obvious enough, I guess. ;-)

Post by Ulrich Windl
YOu can change the SBD parameters (on disk) online, but to be
effective, the SBD daemon has to be restarted.

Yes. Online change of parameters is not supported. You specify them at
create time, not all of them can be overridden by commandline arguments.
You should not need to tune them in newer versions.

Regards,
Lars

--
Architect Storage/HA
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

Ulrich Windl

2015-01-16 11:22:57 UTC

Permalink

[...]

Post by Ulrich Windl

Post by Ulrich Windl
Unfortunately the SBD syntax is a real mess, and there is not manual page

(AFAIK) for SBD.
... because "man sbd" isn't obvious enough, I guess. ;-)

OK, I haven't re-checked recently: You added one!

[...]

Ulrich

Lars Marowsky-Bree

2015-01-16 11:53:54 UTC

Permalink

Post by Ulrich Windl

Post by Lars Marowsky-Bree

Post by Ulrich Windl
Unfortunately the SBD syntax is a real mess, and there is not manual page
(AFAIK) for SBD.

... because "man sbd" isn't obvious enough, I guess. ;-)

OK, I haven't re-checked recently: You added one!

Yes, we 'recently' added one - June 2012 ... ;-)