Mailing List Archive

Cluster node hanging upon access to ocfs2 fs when second cluster node dies ?
Hello,

I am new to HA setup and my first try was to set up a HA cluster (using
SLES 11 SP2 and the SLES11 SP2 HA extension) that simply offers an
OCFS2 filesystem. I did the setup according to the SLES 11 SP2 HA
manual, that describes the steps needed quite exactly.

Basically it works.

At the moment I have two nodes. When I stop one node by halt -f or by
suspending this virtual machine, then access to the cluster filesystem
on the remaining machine hangs until the halted machine comes up again
which is of course not what I want.

When I run a clean shutdown on one of the nodes the remaining node can
still access the cluster filesystem without problems.

Here is the current cluster configuration (crm configure show):

node clusternode1
node clusternode2
primitive dlm ocf:pacemaker:controld \
op monitor interval="60" timeout="60"
primitive o2cb ocf:ocfs2:o2cb \
op monitor interval="60" timeout="60"
primitive ocfs2-1 ocf:heartbeat:Filesystem \
params device="/dev/disk/by-id/scsi-259316a7265713551-part2"
directory="/shared/cluster" fstype="ocfs2" \
op monitor interval="20" timeout="40" \
meta target-role="Started"
primitive stonith_sbd stonith:external/sbd \
op monitor interval="15" timeout="15" start-delay="15" \
params sbd_device="/dev/disk/by-id/scsi-259316a7265713551-part1"
group base-group dlm o2cb ocfs2-1
clone base-clone base-group \
meta interleave="true"
property $id="cib-bootstrap-options" \
dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-timeout="30s" \
no-quorum-policy="ignore" \
stonith-enabled="false"
op_defaults $id="op_defaults-options" \
record-pending="false"

Any ideas what might cause this stange effect?

Thank you very much
Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, http://userpages.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html,Fax: +49261287
1001312
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Cluster node hanging upon access to ocfs2 fs when second cluster node dies ? [ In reply to ]
On Tue, Apr 3, 2012 at 10:32 AM, Rainer Krienke <krienke@uni-koblenz.de> wrote:
> Hello,
>
> I am new to HA setup and my first try was to set up a HA cluster (using
> SLES 11 SP2 and the SLES11 SP2 HA extension)  that simply offers an
> OCFS2 filesystem. I did the setup according to the SLES 11 SP2 HA
> manual, that describes the steps needed quite exactly.
>
> Basically it works.
>
> At the moment I have two nodes. When I stop one node by halt -f or by
> suspending this virtual machine, then access to the cluster filesystem
> on the remaining machine hangs until the halted machine comes up again
> which is of course not what I want.

Working as designed. Enable fencing.

> When I run a clean shutdown on one of the nodes the remaining node can
> still access the cluster filesystem without problems.
>
> Here is the current cluster configuration (crm configure show):
>
> node clusternode1
> node clusternode2
> primitive dlm ocf:pacemaker:controld \
>        op monitor interval="60" timeout="60"
> primitive o2cb ocf:ocfs2:o2cb \
>        op monitor interval="60" timeout="60"
> primitive ocfs2-1 ocf:heartbeat:Filesystem \
>        params device="/dev/disk/by-id/scsi-259316a7265713551-part2"
> directory="/shared/cluster" fstype="ocfs2" \
>        op monitor interval="20" timeout="40" \
>        meta target-role="Started"
> primitive stonith_sbd stonith:external/sbd \
>        op monitor interval="15" timeout="15" start-delay="15" \
>        params sbd_device="/dev/disk/by-id/scsi-259316a7265713551-part1"
> group base-group dlm o2cb ocfs2-1
> clone base-clone base-group \
>        meta interleave="true"
> property $id="cib-bootstrap-options" \
>        dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="2" \
>        stonith-timeout="30s" \
>        no-quorum-policy="ignore" \
>        stonith-enabled="false"
> op_defaults $id="op_defaults-options" \
>        record-pending="false"
>
> Any ideas what might cause this stange effect?

It's not strange at all. It's working exactly as it's supposed to.
Enable STONITH, test your fencing, and you're good to go.

Florian

--
Need help with High Availability?
http://www.hastexo.com/now
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Cluster node hanging upon access to ocfs2 fs when second cluster node dies ? [ In reply to ]
On 2012-04-03T10:32:48, Rainer Krienke <krienke@uni-koblenz.de> wrote:

Hi Rainer,

> I am new to HA setup and my first try was to set up a HA cluster (using
> SLES 11 SP2 and the SLES11 SP2 HA extension) that simply offers an
> OCFS2 filesystem. I did the setup according to the SLES 11 SP2 HA
> manual, that describes the steps needed quite exactly.

Not quite, you disabled fencing, which our documentation is quite clear
on I hope. Like Florian says, this is the problem.


> primitive stonith_sbd stonith:external/sbd \
> op monitor interval="15" timeout="15" start-delay="15" \
> params sbd_device="/dev/disk/by-id/scsi-259316a7265713551-part1"

You do have a fencing device configured, so all you need to do is to
set:

> property $id="cib-bootstrap-options" \
> dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-timeout="30s" \
> no-quorum-policy="ignore" \
> stonith-enabled="false"

stonith-enabled="true" an all shall be well.

(Though you may want to use multiple SBD devices to protect against loss
of a single device.)


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Cluster node hanging upon access to ocfs2 fs when second cluster node dies ? [ In reply to ]
Am 03.04.2012 11:44, schrieb Lars Marowsky-Bree:

>> property $id="cib-bootstrap-options" \
>> dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \
>> cluster-infrastructure="openais" \
>> expected-quorum-votes="2" \
>> stonith-timeout="30s" \
>> no-quorum-policy="ignore" \
>> stonith-enabled="false"
>
> stonith-enabled="true" an all shall be well.
>
> (Though you may want to use multiple SBD devices to protect against loss
> of a single device.)

Hi to all,

thanks for the hint to enable the stonith resource. I did and checked
that it is set to true now, but after all the behaviour of the cluster
is still the same, if I do a halt -f on one node.
Access on the clusterfilesystem on the still running node simply hangs.

crm_mon -1 in this case shows this (note: the nodes names are: rzinstal4
and rzinstal5):

Last updated: Tue Apr 3 13:58:10 2012
Last change: Tue Apr 3 13:41:56 2012 by root via cibadmin on rzinstal4
Stack: openais
Current DC: rzinstal4 - partition WITHOUT quorum
Version: 1.1.6-b988976485d15cb702c9307df55512d323831a5e
2 Nodes configured, 2 expected votes
7 Resources configured.
============

Node rzinstal5: UNCLEAN (offline)
Online: [ rzinstal4 ]

Clone Set: base-clone [base-group]
Started: [ rzinstal4 ]
Stopped: [ base-group:1 ]
stonith_sbd (stonith:external/sbd): Started rzinstal4


crm_verify -V -L says this:

crm_verify[10218]: 2012/04/03_14:00:00 WARN: pe_fence_node: Node
rzinstal5 will be fenced because it is un-expectedly down
crm_verify[10218]: 2012/04/03_14:00:00 WARN: determine_online_status:
Node rzinstal5 is unclean
crm_verify[10218]: 2012/04/03_14:00:00 WARN: custom_action: Action
dlm:1_stop_0 on rzinstal5 is unrunnable (offline)
crm_verify[10218]: 2012/04/03_14:00:00 WARN: custom_action: Marking node
rzinstal5 unclean
crm_verify[10218]: 2012/04/03_14:00:00 WARN: custom_action: Action
o2cb:1_stop_0 on rzinstal5 is unrunnable (offline)
crm_verify[10218]: 2012/04/03_14:00:00 WARN: custom_action: Marking node
rzinstal5 unclean
crm_verify[10218]: 2012/04/03_14:00:00 WARN: custom_action: Action
ocfs2-1:1_stop_0 on rzinstal5 is unrunnable (offline)
crm_verify[10218]: 2012/04/03_14:00:00 WARN: custom_action: Marking node
rzinstal5 unclean
crm_verify[10218]: 2012/04/03_14:00:00 WARN: stage6: Scheduling Node
rzinstal5 for STONITH
Warnings found during check: config may not be valid


No idea what the reason might be ...

Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, http://userpages.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html,Fax: +49261287
1001312
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Cluster node hanging upon access to ocfs2 fs when second cluster node dies ? [ In reply to ]
On Tue, Apr 3, 2012 at 2:06 PM, Rainer Krienke <krienke@uni-koblenz.de> wrote:
> Am 03.04.2012 11:44, schrieb Lars Marowsky-Bree:
>
>>> property $id="cib-bootstrap-options" \
>>>         dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \
>>>         cluster-infrastructure="openais" \
>>>         expected-quorum-votes="2" \
>>>         stonith-timeout="30s" \
>>>         no-quorum-policy="ignore" \
>>>         stonith-enabled="false"
>>
>> stonith-enabled="true" an all shall be well.
>>
>> (Though you may want to use multiple SBD devices to protect against loss
>> of a single device.)
>
> Hi to all,
>
> thanks for the hint to enable the stonith resource. I did and checked
> that it is set to true now, but after all the behaviour of the cluster
> is still the same, if I do a halt -f on one node.
> Access on the clusterfilesystem on the still running node simply hangs.
>
> crm_mon -1 in this case shows this (note: the nodes names are: rzinstal4
> and rzinstal5):
>
> Last updated: Tue Apr  3 13:58:10 2012
> Last change: Tue Apr  3 13:41:56 2012 by root via cibadmin on rzinstal4
> Stack: openais
> Current DC: rzinstal4 - partition WITHOUT quorum
> Version: 1.1.6-b988976485d15cb702c9307df55512d323831a5e
> 2 Nodes configured, 2 expected votes
> 7 Resources configured.
> ============
>
> Node rzinstal5: UNCLEAN (offline)
> Online: [ rzinstal4 ]
>
>  Clone Set: base-clone [base-group]
>     Started: [ rzinstal4 ]
>     Stopped: [ base-group:1 ]
>  stonith_sbd    (stonith:external/sbd): Started rzinstal4
>
>
> crm_verify -V -L says this:
>
> crm_verify[10218]: 2012/04/03_14:00:00 WARN: pe_fence_node: Node
> rzinstal5 will be fenced because it is un-expectedly down
> crm_verify[10218]: 2012/04/03_14:00:00 WARN: determine_online_status:
> Node rzinstal5 is unclean
> crm_verify[10218]: 2012/04/03_14:00:00 WARN: custom_action: Action
> dlm:1_stop_0 on rzinstal5 is unrunnable (offline)
> crm_verify[10218]: 2012/04/03_14:00:00 WARN: custom_action: Marking node
> rzinstal5 unclean
> crm_verify[10218]: 2012/04/03_14:00:00 WARN: custom_action: Action
> o2cb:1_stop_0 on rzinstal5 is unrunnable (offline)
> crm_verify[10218]: 2012/04/03_14:00:00 WARN: custom_action: Marking node
> rzinstal5 unclean
> crm_verify[10218]: 2012/04/03_14:00:00 WARN: custom_action: Action
> ocfs2-1:1_stop_0 on rzinstal5 is unrunnable (offline)
> crm_verify[10218]: 2012/04/03_14:00:00 WARN: custom_action: Marking node
> rzinstal5 unclean
> crm_verify[10218]: 2012/04/03_14:00:00 WARN: stage6: Scheduling Node
> rzinstal5 for STONITH
> Warnings found during check: config may not be valid
>
>
> No idea what the reason might be ...

Ahem. A Google search for "Pacemaker unclean node" or even just
"Pacemaker unclean" would have turned up the answer in about one
second.

Although your STONITH is now configured, and your node is correctly
being scheduled for fencing, the fence operation is not succeeding.
You need to troubleshoot the root cause of your failed fencing action.

Cheers,
Florian

--
Need help with High Availability?
http://www.hastexo.com/now
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Cluster node hanging upon access to ocfs2 fs when second cluster node dies ? [ In reply to ]
On 2012-04-03T14:06:44, Rainer Krienke <krienke@uni-koblenz.de> wrote:

> thanks for the hint to enable the stonith resource. I did and checked
> that it is set to true now, but after all the behaviour of the cluster
> is still the same, if I do a halt -f on one node.
> Access on the clusterfilesystem on the still running node simply hangs.

It'll pause until SBD has completed the fence.

This is either caused by a misconfigured sbd setup, or by a too short
stonith-timeout for your sbd configuration. It'd need hb_report to
diagnose, or you could try actively reading the logfiles.


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Cluster node hanging upon access to ocfs2 fs when second cluster node dies ? [ In reply to ]
Am 03.04.2012 14:51, schrieb Lars Marowsky-Bree:
> On 2012-04-03T14:06:44, Rainer Krienke <krienke@uni-koblenz.de> wrote:
>
>> thanks for the hint to enable the stonith resource. I did and checked
>> that it is set to true now, but after all the behaviour of the cluster
>> is still the same, if I do a halt -f on one node.
>> Access on the clusterfilesystem on the still running node simply hangs.
>
> It'll pause until SBD has completed the fence.
>
> This is either caused by a misconfigured sbd setup, or by a too short
> stonith-timeout for your sbd configuration. It'd need hb_report to
> diagnose, or you could try actively reading the logfiles.
>
>
> Regards,
> Lars
>

Hello

@Lars: thanks for the hint that UNCLEAN is not a state that is allowed.
I thought that "unclean" was the natural result of the fact that I did a
halt -f on this host. Just like a filesystem is unclean after a reset of
the host.

My SBD-Device is on a external ISCSI RAID and after I halted the other
node rzinstal5 manually by halt -f, the SBD disk is accessible from the
running node rzinstal4 without any problem.
I ran this dump after I had halted the node rzinstal5:

rzinstal4:~ # sbd -d /dev/disk/by-id/scsi-259316a7265713551-part1 dump
==Dumping header on disk /dev/disk/by-id/scsi-259316a7265713551-part1
Header version : 2
Number of slots : 255
Sector size : 512
Timeout (watchdog) : 90
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 180
==Header on disk /dev/disk/by-id/scsi-259316a7265713551-part1 is dumped

I /var/log/messages I can see, that on rzinstal4 it is tried several
times to fence the dead host rzinstal5. But I cannot see why it is
unsuccessful. Of course the node is already dead so it cannot respond to
any messages written into the sbd device. But this should not be a problem.

Perhaps someone with more experiences in clustering can spot the problem
in the small log below, or can point me how to narrow the search.

The log (/var/log/messages) I posted below starts about one minute after
I halted rzinstal5.

....
Apr 3 14:45:56 rzinstal4 crmd: [3910]: info: te_fence_node: Executing
reboot fencing operation (33) on rzinstal5 (timeout=30000)
Apr 3 14:45:57 rzinstal4 stonith-ng: [3904]: info:
can_fence_host_with_device: Refreshing port list for stonith_sbd
Apr 3 14:45:57 rzinstal4 stonith-ng: [3904]: WARN: parse_host_line:
Could not parse (0 0):
Apr 3 14:45:57 rzinstal4 stonith-ng: [3904]: info:
can_fence_host_with_device: stonith_sbd can fence rzinstal5: dynamic-list
Apr 3 14:45:57 rzinstal4 stonith-ng: [3904]: info: call_remote_stonith:
Requesting that rzinstal4 perform op reboot rzinstal5
Apr 3 14:45:57 rzinstal4 stonith-ng: [3904]: info: stonith_fence: Exec
<stonith_command t="stonith-ng"
st_async_id="64c8badf-0753-42a6-9ab6-8ff778a3a4e3" st_op="st_fence"
st_callid="0" st_
callopt="0" st_remote_op="64c8badf-0753-42a6-9ab6-8ff778a3a4e3"
st_target="rzinstal5" st_device_action="reboot" st_timeout="27000"
src="rzinstal4" seq="3" />
Apr 3 14:45:57 rzinstal4 stonith-ng: [3904]: info:
can_fence_host_with_device: stonith_sbd can fence rzinstal5: dynamic-list
Apr 3 14:45:57 rzinstal4 stonith-ng: [3904]: info: stonith_fence: Found
1 matching devices for 'rzinstal5'
Apr 3 14:45:57 rzinstal4 stonith-ng: [3904]: info: stonith_command:
Processed st_fence from rzinstal4: rc=-1
Apr 3 14:45:57 rzinstal4 stonith-ng: [3904]: info: make_args:
reboot-ing node 'rzinstal5' as 'port=rzinstal5'
Apr 3 14:45:57 rzinstal4 sbd: [5268]: info: rzinstal5 owns slot 1
Apr 3 14:45:57 rzinstal4 sbd: [5268]: info: Writing reset to node slot
rzinstal5
Apr 3 14:46:29 rzinstal4 crmd: [3910]: info: tengine_stonith_callback:
Stonith operation 2/33:0:0:0409029b-02e7-498e-b4d1-650f9f7cad08:
Operation timed out (-8)
Apr 3 14:46:29 rzinstal4 crmd: [3910]: ERROR: tengine_stonith_callback:
Stonith of rzinstal5 failed (-8)... aborting transition.
Apr 3 14:46:29 rzinstal4 crmd: [3910]: info: abort_transition_graph:
tengine_stonith_callback:454 - Triggered transition abort (complete=0) :
Stonith failed
Apr 3 14:46:29 rzinstal4 crmd: [3910]: info: update_abort_priority:
Abort priority upgraded from 0 to 1000000
Apr 3 14:46:29 rzinstal4 crmd: [3910]: info: update_abort_priority:
Abort action done superceeded by restart
Apr 3 14:46:32 rzinstal4 stonith-ng: [3904]: ERROR: remote_op_timeout:
Action reboot (64c8badf-0753-42a6-9ab6-8ff778a3a4e3) for rzinstal5 timed out
Apr 3 14:46:32 rzinstal4 crmd: [3910]: WARN: stonith_perform_callback:
STONITH command failed: Operation timed out
Apr 3 14:46:32 rzinstal4 stonith-ng: [3904]: info: remote_op_done:
Notifing clients of 64c8badf-0753-42a6-9ab6-8ff778a3a4e3 (reboot of
rzinstal5 from 32d17687-11f1-4e83-b776-c86aae03b54b b
y (null)): 1, rc=-8
Apr 3 14:46:32 rzinstal4 crmd: [3910]: ERROR: tengine_stonith_notify:
Peer rzinstal5 could not be terminated (reboot) by <anyone> for
rzinstal4 (ref=64c8badf-0753-42a6-9ab6-8ff778a3a4e3):
Operation timed out
Apr 3 14:46:32 rzinstal4 stonith-ng: [3904]: info:
stonith_notify_client: Sending st_fence-notification to client
3910/0db1b22a-e55a-4375-84f3-2e6d2f98ec85
Apr 3 14:48:57 rzinstal4 sbd: [5268]: info: reset successfully
delivered to rzinstal5
Apr 3 14:48:58 rzinstal4 stonith-ng: [3904]: info: log_operation:
Operation 'reboot' [5262] (call 0 from (null)) for host 'rzinstal5' with
device 'stonith_sbd' returned: 0
Apr 3 14:48:58 rzinstal4 stonith-ng: [3904]: info: log_operation:
stonith_sbd: Performing: stonith -t external/sbd -T reset rzinstal5
Apr 3 14:48:58 rzinstal4 stonith-ng: [3904]: info: log_operation:
stonith_sbd: success: rzinstal5 0
Apr 3 14:48:58 rzinstal4 stonith-ng: [3904]: info:
process_remote_stonith_exec: ExecResult <st-reply
st_origin="stonith_construct_async_reply" t="stonith-ng"
st_op="st_notify" st_remote_op
="64c8badf-0753-42a6-9ab6-8ff778a3a4e3" st_callid="0" st_callopt="0"
st_rc="0" st_output="Performing: stonith -t external/sbd -T reset
rzinstal5 success: rzinstal5 0 " src="rzinstal4" seq="
4" />
Apr 3 14:48:58 rzinstal4 stonith-ng: [3904]: ERROR: remote_op_done:
We've already notified clients of 64c8badf-0753-42a6-9ab6-8ff778a3a4e3
(reboot of rzinstal5 from 32d17687-11f1-4e83-b776
-c86aae03b54b by rzinstal4): 2, rc=0

Thanks a lot
Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, http://userpages.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html,Fax: +49261287
1001312
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Cluster node hanging upon access to ocfs2 fs when second cluster node dies ? [ In reply to ]
On 2012-04-03T15:50:29, Rainer Krienke <krienke@uni-koblenz.de> wrote:

> rzinstal4:~ # sbd -d /dev/disk/by-id/scsi-259316a7265713551-part1 dump
> ==Dumping header on disk /dev/disk/by-id/scsi-259316a7265713551-part1
> Header version : 2
> Number of slots : 255
> Sector size : 512
> Timeout (watchdog) : 90
> Timeout (allocate) : 2
> Timeout (loop) : 1
> Timeout (msgwait) : 180

You have configured a msgwait of 180s - e.g., the message will be
considered delivered after 180s earliest; your stonith-timeout is set to
30s, so this can *never* result in a successful fence.

I suggest to increase stonith-timeout to 300s.


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Cluster node hanging upon access to ocfs2 fs when second cluster node dies ? [ In reply to ]
Am 03.04.2012 15:51, schrieb Lars Marowsky-Bree:
> On 2012-04-03T15:50:29, Rainer Krienke <krienke@uni-koblenz.de> wrote:
>
>> rzinstal4:~ # sbd -d /dev/disk/by-id/scsi-259316a7265713551-part1 dump
>> ==Dumping header on disk /dev/disk/by-id/scsi-259316a7265713551-part1
>> Header version : 2
>> Number of slots : 255
>> Sector size : 512
>> Timeout (watchdog) : 90
>> Timeout (allocate) : 2
>> Timeout (loop) : 1
>> Timeout (msgwait) : 180
>
> You have configured a msgwait of 180s - e.g., the message will be
> considered delivered after 180s earliest; your stonith-timeout is set to
> 30s, so this can *never* result in a successful fence.
>
> I suggest to increase stonith-timeout to 300s.
>
>
> Regards,
> Lars
>
Hi Lars,

this was something I detected already. And I changed the timeout in the
cluster configuration to 200sec. So the log I posted was the result of
the configuration below (200sec). Is this still to small?

$ crm configure show
...
primitive stonith_sbd stonith:external/sbd \
op monitor interval="200" timeout="200" start-delay="200" \
params sbd_device="/dev/disk/by-id/scsi-259316a7265713551-part1"
...

Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, http://userpages.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html,Fax: +49261287
1001312
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Cluster node hanging upon access to ocfs2 fs when second cluster node dies ? [ In reply to ]
On 2012-04-03T15:59:00, Rainer Krienke <krienke@uni-koblenz.de> wrote:

> Hi Lars,
>
> this was something I detected already. And I changed the timeout in the
> cluster configuration to 200sec. So the log I posted was the result of
> the configuration below (200sec). Is this still to small?
>
> $ crm configure show
> ...
> primitive stonith_sbd stonith:external/sbd \
> op monitor interval="200" timeout="200" start-delay="200" \
> params sbd_device="/dev/disk/by-id/scsi-259316a7265713551-part1"

This is not what I meant. I meant to change the setting stonith-timeout,
not the settings on the primitive ;-) In fact, monitoring sbd is quite
unnecessary, and you actually don't need to specify sbd_device anymore,
you can just do:

primitive stonith_sbd stonith:external/sbd

and leave it at this. But, back to your timeout! Run this:

crm configure property stonith-timeout=240s

(And yes, it needs to be over 10% higher than the msgwait timeout,
because of how stonith-ng internally allocates the stonith-timeout value
to various stages in the stonith process. Sorry about that, that's a
pacemaker issue.)

You will still see IO freeze for approx. 3 minutes until the fence
completes. That's a side-effect of the sbd values you have configured,
in particular watchdog and msgwait.


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Cluster node hanging upon access to ocfs2 fs when second cluster node dies ? [ In reply to ]
Am 03.04.2012 17:06, schrieb Lars Marowsky-Bree:
> On 2012-04-03T15:59:00, Rainer Krienke <krienke@uni-koblenz.de> wrote:
>
>> Hi Lars,
>>
>> this was something I detected already. And I changed the timeout in the
>> cluster configuration to 200sec. So the log I posted was the result of
>> the configuration below (200sec). Is this still to small?
>>
>> $ crm configure show
>> ...
>> primitive stonith_sbd stonith:external/sbd \
>> op monitor interval="200" timeout="200" start-delay="200" \
>> params sbd_device="/dev/disk/by-id/scsi-259316a7265713551-part1"
>
> This is not what I meant. I meant to change the setting stonith-timeout,
> not the settings on the primitive ;-) In fact, monitoring sbd is quite
> unnecessary, and you actually don't need to specify sbd_device anymore,
> you can just do:
>
> primitive stonith_sbd stonith:external/sbd
>
> and leave it at this. But, back to your timeout! Run this:
>
> crm configure property stonith-timeout=240s
>
> (And yes, it needs to be over 10% higher than the msgwait timeout,
> because of how stonith-ng internally allocates the stonith-timeout value
> to various stages in the stonith process. Sorry about that, that's a
> pacemaker issue.)
>
> You will still see IO freeze for approx. 3 minutes until the fence
> completes. That's a side-effect of the sbd values you have configured,
> in particular watchdog and msgwait.

Hi Lars,

thanks a lot for finding the problem. The wrong set timeout value was
really the causing the trouble. Now it works. I lowered the timeout
values to avoid to long freezing of the clustered filesystem and it
works fine.

There is one basic thing however I do not understand: My setup involves
only a clustered filesystem. What I do not understand is why a stonith
resource is needed at all in this case which after all causes freezes
of the cl-filesystem depending on the timeout values.

Basically in a cluster fs it should be not important if a node dies. Its
the nature of a cluster fs that many nodes can acces it. If one dies
this is of no meaning to the other nodes that still can access the
filesystem.

So my question comes down to this: Why do I have to fence a node (in
case it failes) in a cluster that has nothing else but a cluster
filesystem. What could go wrong without fencing in this case?

Thanks a lot
Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, http://userpages.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html,Fax: +49261287
1001312
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Cluster node hanging upon access to ocfs2 fs when second cluster node dies ? [ In reply to ]
On 2012-04-04T11:28:31, Rainer Krienke <krienke@uni-koblenz.de> wrote:

> There is one basic thing however I do not understand: My setup involves
> only a clustered filesystem. What I do not understand is why a stonith
> resource is needed at all in this case which after all causes freezes
> of the cl-filesystem depending on the timeout values.
>
> Basically in a cluster fs it should be not important if a node dies. Its
> the nature of a cluster fs that many nodes can acces it. If one dies
> this is of no meaning to the other nodes that still can access the
> filesystem.

You're getting that exactly the wrong way around ;-)

In a cluster file system, concurrent access from multiple nodes is
expected. Hence, the file system needs to coordinate writes (but also
some reads), metadata updates, assigning new blocks, etc across the
nodes to remain consistent.

If the two nodes can no longer communicate, they could potentially write
to the same block. Hence, if they can no longer communicate (network
failure or node down), the cluster needs to reach a consistent state
first, recover the file system portion accessed by the removed node,
etc.

> So my question comes down to this: Why do I have to fence a node (in
> case it failes) in a cluster that has nothing else but a cluster
> filesystem. What could go wrong without fencing in this case?

As long as you have reasonably uptodate backups to recover from the data
corruption that will ensue, not all that much ;-)

A non-clustered fs would continue to operate (so no freeze), but you'd
still see a fence to make sure the node is truly down before a fail-over
was done.



Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems