Mailing List Archive

Heartbeat IPAddr and ARP flushing
I am in the testing stage of my 2 node HA cluster. I am running
heartbeat 2.1.3_3 and DRBD 8.0.8. My highly available resources are

1 IP address
sshd ( I have a secondary admin sshd process running on a different port)
a custom java application

We are also running rsync over ssh as in rsync -av --rsh="ssh ..."

When a client is connected and rsyncing data I issue an hb_takeover
from the secondary node. Everything swaps over to the new machine
just fine. We rerun the client and we get a connection timeout
message. Then I run hb_takeover from the new secondary node (initial
primary) and again all resources swap over successfully. We try the
client again and it works.

We have a Watchguard Firewall between the client and the cluster.
Behind the firewall I am able to ssh from the secondary node to the
primary node on the internal ip address that is a resource. I have
full connectivity between the machines on all ip addresses.

I feel this is an ARP cache issue on the firewall.

My question to the masses is this.

Does/Can heartbeat do any upstream ARP management at its router?
If not how can one programatically flush the ARP cache on a firewall
from another machine? Is this possible?

regards,

Doug



--
What profits a man if he gains the whole world yet loses his soul?
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Heartbeat IPAddr and ARP flushing [ In reply to ]
Doug Lochart <dlochart <at> gmail.com> writes:

>
> I am in the testing stage of my 2 node HA cluster. I am running
> heartbeat 2.1.3_3 and DRBD 8.0.8. My highly available resources are
>
> 1 IP address
> sshd ( I have a secondary admin sshd process running on a different port)
> a custom java application
>
> We are also running rsync over ssh as in rsync -av --rsh="ssh ..."
>
> When a client is connected and rsyncing data I issue an hb_takeover
> from the secondary node. Everything swaps over to the new machine
> just fine. We rerun the client and we get a connection timeout
> message. Then I run hb_takeover from the new secondary node (initial
> primary) and again all resources swap over successfully. We try the
> client again and it works.
>
> We have a Watchguard Firewall between the client and the cluster.
> Behind the firewall I am able to ssh from the secondary node to the
> primary node on the internal ip address that is a resource. I have
> full connectivity between the machines on all ip addresses.
>
> I feel this is an ARP cache issue on the firewall.
>
> My question to the masses is this.
>
> Does/Can heartbeat do any upstream ARP management at its router?
> If not how can one programatically flush the ARP cache on a firewall
> from another machine? Is this possible?
>
> regards,
>
> Doug
>


Hi Doug,

we had the same problem. Between the heartbeat cluster and the clients there is
a gateway which do not receive broadcasts. So the gateway doesn't realise the
change when the virtual ips move to another host. The only way we found to
solve this problem is to send an arping directly to this switch.

I have added a function notify_switches into the script usr/lib/ocf/resource.d/
heartbeat/IPaddr:

notify_switches() {
if [ $OCF_RESKEY_nic != "" ]
then
INTERFACE=$OCF_RESKEY_nic
IP=$OCF_RESKEY_ip
# notify switches about IP<=>MAC change
# -f : quit on first reply
# -q : be quiet
# -c count : how many packets to send
# -w timeout : how long to wait for a reply
# -I device : which ethernet device to use (eth0)
# -s source : source ip address
# -U : Unsolicited ARP mode, update your neighbours
/sbin/arping -f -q -c 5 -w 5 -I $INTERFACE -s $IP -U <switch ip or name>
fi
}

This function is called in the function ip_start (at this time the interface
name is known). This works for our cluster very well.

Kind regards,
Christof


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Re: Heartbeat IPAddr and ARP flushing [ In reply to ]
Hi,

On Fri, Mar 14, 2008 at 08:04:43AM +0000, Christof Wiltschek wrote:
> Doug Lochart <dlochart <at> gmail.com> writes:
>
> >
> > I am in the testing stage of my 2 node HA cluster. I am running
> > heartbeat 2.1.3_3 and DRBD 8.0.8. My highly available resources are
> >
> > 1 IP address
> > sshd ( I have a secondary admin sshd process running on a different port)
> > a custom java application
> >
> > We are also running rsync over ssh as in rsync -av --rsh="ssh ..."
> >
> > When a client is connected and rsyncing data I issue an hb_takeover
> > from the secondary node. Everything swaps over to the new machine
> > just fine. We rerun the client and we get a connection timeout
> > message. Then I run hb_takeover from the new secondary node (initial
> > primary) and again all resources swap over successfully. We try the
> > client again and it works.
> >
> > We have a Watchguard Firewall between the client and the cluster.
> > Behind the firewall I am able to ssh from the secondary node to the
> > primary node on the internal ip address that is a resource. I have
> > full connectivity between the machines on all ip addresses.
> >
> > I feel this is an ARP cache issue on the firewall.
> >
> > My question to the masses is this.
> >
> > Does/Can heartbeat do any upstream ARP management at its router?
> > If not how can one programatically flush the ARP cache on a firewall
> > from another machine? Is this possible?
> >
> > regards,
> >
> > Doug
> >
>
>
> Hi Doug,
>
> we had the same problem. Between the heartbeat cluster and the clients there is
> a gateway which do not receive broadcasts. So the gateway doesn't realise the
> change when the virtual ips move to another host. The only way we found to
> solve this problem is to send an arping directly to this switch.
>
> I have added a function notify_switches into the script usr/lib/ocf/resource.d/
> heartbeat/IPaddr:

Doug: Has this helped you?

Perhaps then this should be added to IPaddr. We could add an
attribute to contain a list of network devices which need
explicit arp cache updates. Anybody have an opinion on this? I
can recall that the issue of arp caches would come up sometimes.

Cheers,

Dejan

>
> notify_switches() {
> if [ $OCF_RESKEY_nic != "" ]
> then
> INTERFACE=$OCF_RESKEY_nic
> IP=$OCF_RESKEY_ip
> # notify switches about IP<=>MAC change
> # -f : quit on first reply
> # -q : be quiet
> # -c count : how many packets to send
> # -w timeout : how long to wait for a reply
> # -I device : which ethernet device to use (eth0)
> # -s source : source ip address
> # -U : Unsolicited ARP mode, update your neighbours
> /sbin/arping -f -q -c 5 -w 5 -I $INTERFACE -s $IP -U <switch ip or name>
> fi
> }
>
> This function is called in the function ip_start (at this time the interface
> name is known). This works for our cluster very well.
>
> Kind regards,
> Christof
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

--
Dejan
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems