Mailing List Archive

Failback problem with active/active cluster
Hello,

I set up a 2 nodes cluster (active/active) to build an http reverse
proxy/firewall. There is one vip shared by both nodes and an apache
instance running on each node.

Here is the configuration :

node lpa \
attributes standby="off"
node lpb \
attributes standby="off"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="10.1.52.3" cidr_netmask="16" clusterip_hash="sourceip" \
op monitor interval="30s"
primitive HttpProxy ocf:heartbeat:apache \
params configfile="/etc/apache2/apache2.conf" \
op monitor interval="1min"
clone HttpProxyClone HttpProxy
clone ProxyIP ClusterIP \
meta globally-unique="true" clone-max="2" clone-node-max="2"
colocation HttpProxy-with-ClusterIP inf: HttpProxyClone ProxyIP
order HttpProxyClone-after-ProxyIP inf: ProxyIP HttpProxyClone
property $id="cib-bootstrap-options" \
dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"


Everything works fine at the beginning :


Online: [ lpa lpb ]

Clone Set: ProxyIP (unique)
ClusterIP:0 (ocf::heartbeat:IPaddr2): Started lpa
ClusterIP:1 (ocf::heartbeat:IPaddr2): Started lpb
Clone Set: HttpProxyClone
Started: [ lpa lpb ]


But after simulating an outage of one of the nodes with "crm node
standby" and a recovery with "crm node online", all resources stay on
the same node :


Online: [ lpa lpb ]

Clone Set: ProxyIP (unique)
ClusterIP:0 (ocf::heartbeat:IPaddr2): Started lpa
ClusterIP:1 (ocf::heartbeat:IPaddr2): Started lpa
Clone Set: HttpProxyClone
Started: [ lpa ]
Stopped: [ HttpProxy:1 ]


Can you tell me if something is wrong in my configuration ?

crm_verify give me the following output :

crm_verify[22555]: 2011/03/10_13:49:00 ERROR: clone_rsc_order_lh: Cannot
interleave clone ProxyIP and HttpProxyClone because they do not support
the same number of resources per node
crm_verify[22555]: 2011/03/10_13:49:00 ERROR: clone_rsc_order_lh: Cannot
interleave clone HttpProxyClone and ProxyIP because they do not support
the same number of resources per node


Many thanks,

Regards,

--
Charles KOPROWSKI
Re: Failback problem with active/active cluster [ In reply to ]
On Thu, Mar 10, 2011 at 1:50 PM, Charles KOPROWSKI <cko@audaxis.com> wrote:
> Hello,
>
> I set up a 2 nodes cluster (active/active) to build an http reverse
> proxy/firewall. There is one vip shared by both nodes and an apache instance
> running on each node.
>
> Here is the configuration :
>
> node lpa \
>        attributes standby="off"
> node lpb \
>        attributes standby="off"
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>        params ip="10.1.52.3" cidr_netmask="16" clusterip_hash="sourceip" \
>        op monitor interval="30s"
> primitive HttpProxy ocf:heartbeat:apache \
>        params configfile="/etc/apache2/apache2.conf" \
>        op monitor interval="1min"
> clone HttpProxyClone HttpProxy
> clone ProxyIP ClusterIP \
>        meta globally-unique="true" clone-max="2" clone-node-max="2"
> colocation HttpProxy-with-ClusterIP inf: HttpProxyClone ProxyIP
> order HttpProxyClone-after-ProxyIP inf: ProxyIP HttpProxyClone
> property $id="cib-bootstrap-options" \
>        dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="2" \
>        stonith-enabled="false" \
>        no-quorum-policy="ignore"
>
>
> Everything works fine at the beginning :
>
>
> Online: [ lpa lpb ]
>
>  Clone Set: ProxyIP (unique)
>     ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started lpa
>     ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started lpb
>  Clone Set: HttpProxyClone
>     Started: [ lpa lpb ]
>
>
> But after simulating an outage of one of the nodes with "crm node standby"
> and a recovery with "crm node online", all resources stay on the same node :
>
>
> Online: [ lpa lpb ]
>
>  Clone Set: ProxyIP (unique)
>     ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started lpa
>     ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started lpa
>  Clone Set: HttpProxyClone
>     Started: [ lpa ]
>     Stopped: [ HttpProxy:1 ]
>
>
> Can you tell me if something is wrong in my configuration ?

Essentially you have encountered a limitation in the allocation
algorithm for clones in 1.0.x
The recently released 1.1.5 has the behavior you're looking for, but
the patch is far too invasive to consider back-porting to 1.0.

>
> crm_verify give me the following output :
>
> crm_verify[22555]: 2011/03/10_13:49:00 ERROR: clone_rsc_order_lh: Cannot
> interleave clone ProxyIP and HttpProxyClone because they do not support the
> same number of resources per node
> crm_verify[22555]: 2011/03/10_13:49:00 ERROR: clone_rsc_order_lh: Cannot
> interleave clone HttpProxyClone and ProxyIP because they do not support the
> same number of resources per node
>
>
> Many thanks,
>
> Regards,
>
> --
> Charles KOPROWSKI
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: Failback problem with active/active cluster [ In reply to ]
Le 11/03/2011 11:47, Andrew Beekhof a écrit :
> On Thu, Mar 10, 2011 at 1:50 PM, Charles KOPROWSKI<cko@audaxis.com> wrote:
>> Hello,
>>
>> I set up a 2 nodes cluster (active/active) to build an http reverse
>> proxy/firewall. There is one vip shared by both nodes and an apache instance
>> running on each node.
>>
>> Here is the configuration :
>>
>> node lpa \
>> attributes standby="off"
>> node lpb \
>> attributes standby="off"
>> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>> params ip="10.1.52.3" cidr_netmask="16" clusterip_hash="sourceip" \
>> op monitor interval="30s"
>> primitive HttpProxy ocf:heartbeat:apache \
>> params configfile="/etc/apache2/apache2.conf" \
>> op monitor interval="1min"
>> clone HttpProxyClone HttpProxy
>> clone ProxyIP ClusterIP \
>> meta globally-unique="true" clone-max="2" clone-node-max="2"
>> colocation HttpProxy-with-ClusterIP inf: HttpProxyClone ProxyIP
>> order HttpProxyClone-after-ProxyIP inf: ProxyIP HttpProxyClone
>> property $id="cib-bootstrap-options" \
>> dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
>> cluster-infrastructure="openais" \
>> expected-quorum-votes="2" \
>> stonith-enabled="false" \
>> no-quorum-policy="ignore"
>>
>>
>> Everything works fine at the beginning :
>>
>>
>> Online: [ lpa lpb ]
>>
>> Clone Set: ProxyIP (unique)
>> ClusterIP:0 (ocf::heartbeat:IPaddr2): Started lpa
>> ClusterIP:1 (ocf::heartbeat:IPaddr2): Started lpb
>> Clone Set: HttpProxyClone
>> Started: [ lpa lpb ]
>>
>>
>> But after simulating an outage of one of the nodes with "crm node standby"
>> and a recovery with "crm node online", all resources stay on the same node :
>>
>>
>> Online: [ lpa lpb ]
>>
>> Clone Set: ProxyIP (unique)
>> ClusterIP:0 (ocf::heartbeat:IPaddr2): Started lpa
>> ClusterIP:1 (ocf::heartbeat:IPaddr2): Started lpa
>> Clone Set: HttpProxyClone
>> Started: [ lpa ]
>> Stopped: [ HttpProxy:1 ]
>>
>>
>> Can you tell me if something is wrong in my configuration ?
>
> Essentially you have encountered a limitation in the allocation
> algorithm for clones in 1.0.x
> The recently released 1.1.5 has the behavior you're looking for, but
> the patch is far too invasive to consider back-porting to 1.0.

Thanks Andrew,

Is there any possibility to move back manualy a part of the ClusterIP
resource (for example ClusterIP:1) to the other node ? Or is it just
impossible with this version ?

>>
>> crm_verify give me the following output :
>>
>> crm_verify[22555]: 2011/03/10_13:49:00 ERROR: clone_rsc_order_lh: Cannot
>> interleave clone ProxyIP and HttpProxyClone because they do not support the
>> same number of resources per node
>> crm_verify[22555]: 2011/03/10_13:49:00 ERROR: clone_rsc_order_lh: Cannot
>> interleave clone HttpProxyClone and ProxyIP because they do not support the
>> same number of resources per node
>>
>>
>> Many thanks,
>>
>> Regards,
>>
>> --
>> Charles KOPROWSKI
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>


--
Charles KOPROWSKI
Administrateur Systèmes et Réseaux
Audaxis

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
Re: Failback problem with active/active cluster [ In reply to ]
On Fri, Mar 11, 2011 at 2:19 PM, Charles KOPROWSKI <cko@audaxis.com> wrote:
> Le 11/03/2011 11:47, Andrew Beekhof a écrit :

>> Essentially you have encountered a limitation in the allocation
>> algorithm for clones in 1.0.x
>> The recently released 1.1.5 has the behavior you're looking for, but
>> the patch is far too invasive to consider back-porting to 1.0.
>
> Thanks Andrew,
>
> Is there any possibility to move back manualy a part of the ClusterIP
> resource (for example ClusterIP:1) to the other node ? Or is it just
> impossible with this version ?

I _think_ its impossible - which is certainly not terribly useful behavior.

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: Failback problem with active/active cluster [ In reply to ]
11.03.2011 16:27, Andrew Beekhof:
> On Fri, Mar 11, 2011 at 2:19 PM, Charles KOPROWSKI<cko@audaxis.com> wrote:
>
>> Is there any possibility to move back manualy a part of the ClusterIP
>> resource (for example ClusterIP:1) to the other node ? Or is it just
>> impossible with this version ?
> I _think_ its impossible - which is certainly not terribly useful behavior.
>

What if you set clone-node-max=1 for the resource?


--
Pavel Levshin

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: Failback problem with active/active cluster [ In reply to ]
On Sat, Mar 12, 2011 at 9:53 AM, Pavel Levshin <pavel@levshin.spb.ru> wrote:
> 11.03.2011 16:27, Andrew Beekhof:
>>
>> On Fri, Mar 11, 2011 at 2:19 PM, Charles KOPROWSKI<cko@audaxis.com>
>>  wrote:
>>
>>> Is there any possibility to move back manualy a part of the ClusterIP
>>> resource (for example ClusterIP:1) to the other node ? Or is it just
>>> impossible with this version ?
>>
>> I _think_ its impossible - which is certainly not terribly useful
>> behavior.
>>
>
> What if you set clone-node-max=1 for the resource?

Good point. Temporarily setting that should allow them to move back.

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: Failback problem with active/active cluster [ In reply to ]
Le 14/03/2011 09:43, Andrew Beekhof a écrit :
> On Sat, Mar 12, 2011 at 9:53 AM, Pavel Levshin<pavel@levshin.spb.ru> wrote:
>> 11.03.2011 16:27, Andrew Beekhof:
>>>
>>> On Fri, Mar 11, 2011 at 2:19 PM, Charles KOPROWSKI<cko@audaxis.com>
>>> wrote:
>>>
>>>> Is there any possibility to move back manualy a part of the ClusterIP
>>>> resource (for example ClusterIP:1) to the other node ? Or is it just
>>>> impossible with this version ?
>>>
>>> I _think_ its impossible - which is certainly not terribly useful
>>> behavior.
>>>
>>
>> What if you set clone-node-max=1 for the resource?
>
> Good point. Temporarily setting that should allow them to move back.

Thank you for the tip Pavel !

Unfortunately the cluster is now on production in active/passive mode so
I can not play with it anymore.

Anyway, I'll give it a try on the next install

--
Charles KOPROWSKI
Re: Failback problem with active/active cluster [ In reply to ]
Charles KOPROWSKI <cko@...> writes:

>
> Le 14/03/2011 09:43, Andrew Beekhof a écrit :
> > On Sat, Mar 12, 2011 at 9:53 AM, Pavel Levshin<pavel@...> wrote:
> >> 11.03.2011 16:27, Andrew Beekhof:
> >>>
> >>> On Fri, Mar 11, 2011 at 2:19 PM, Charles KOPROWSKI<cko@...>
> >>> wrote:
> >>>
> >>>> Is there any possibility to move back manualy a part of the ClusterIP
> >>>> resource (for example ClusterIP:1) to the other node ? Or is it just
> >>>> impossible with this version ?
> >>>
> >>> I _think_ its impossible - which is certainly not terribly useful
> >>> behavior.
> >>>
> >>
> >> What if you set clone-node-max=1 for the resource?
> >
> > Good point. Temporarily setting that should allow them to move back.

It works indeed:

1. crm configure
2. edit ClusterIP-clone
3. change clone-node-max from "2" to "1" and save
4. commit
5. undo the change, commit again and service is started on both hosts (like
before fail of one node)

@Pavel: good catch!

The other clones like apache2, mysql, ocfs2, o2cb, dlm are coming up again as
expected without interference (debian squeeze).

Nice work, guys - I really love it!

Best regards

Robert





_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: Failback problem with active/active cluster [ In reply to ]
Ho,

On Fri, Mar 18, 2011 at 10:38:08PM +0000, Robert Schumann wrote:
> Charles KOPROWSKI <cko@...> writes:
>
> >
> > Le 14/03/2011 09:43, Andrew Beekhof a écrit :
> > > On Sat, Mar 12, 2011 at 9:53 AM, Pavel Levshin<pavel@...> wrote:
> > >> 11.03.2011 16:27, Andrew Beekhof:
> > >>>
> > >>> On Fri, Mar 11, 2011 at 2:19 PM, Charles KOPROWSKI<cko@...>
> > >>> wrote:
> > >>>
> > >>>> Is there any possibility to move back manualy a part of the ClusterIP
> > >>>> resource (for example ClusterIP:1) to the other node ? Or is it just
> > >>>> impossible with this version ?
> > >>>
> > >>> I _think_ its impossible - which is certainly not terribly useful
> > >>> behavior.
> > >>>
> > >>
> > >> What if you set clone-node-max=1 for the resource?
> > >
> > > Good point. Temporarily setting that should allow them to move back.
>
> It works indeed:
>
> 1. crm configure
> 2. edit ClusterIP-clone
> 3. change clone-node-max from "2" to "1" and save
> 4. commit

There's a more direct way to edit meta attributes:

crm resource meta

Thanks,

Dejan

> 5. undo the change, commit again and service is started on both hosts (like
> before fail of one node)
>
> @Pavel: good catch!
>
> The other clones like apache2, mysql, ocfs2, o2cb, dlm are coming up again as
> expected without interference (debian squeeze).
>
> Nice work, guys - I really love it!
>
> Best regards
>
> Robert
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker