Mailing List Archive

I suppose introductions are in order now...
'allo 'gain.

So the list and project are pretty much dead 'eh? ;-)

I've got this funny urge to introduce myself. My name is David Storey.
Currently, I'm a senior in Computer Science at George Mason University
in Fairfax, Virginia. I worked for the admissions department there for
three years and gained most of my Linux/Unix experience while employed
there. I admit, I used to hate just about everything Unix related,
finding it outdated, unfriendly, arcane, and very ugly. It's taken a
while, but I now feel the empowerment surging through my arteries.
There's this taste in my mouth which becomes more bitter with each
passing day. I call it Microsoft, but they're inconsequential, and it's
not my life's goal to ruin a company, but rather to improve an ideal. So
now, I find myself advocating Linux by example. (practice what you
preach!) I run Linux on anything and everything I own or can get my
hands on, including laptops, PPro 200 SMP machines, DEC Alpha's and Sun
Sparc's. I don't consider myself an incredible programmer, but I love
to program on occasion, especially, when I feel inspired. I get into
these bouts and churn out 4,200 lines of C at a time. (so it crashed a
couple times. CS440 was much better for me. My programs were more like
artwork in that class.) Now, I work from my house doing Internet
Development for a guy I know in D.C. It works out much better now and
my grades give that indication.

Lately, I've been growing an interest in clustering. I suppose it began
with Linux HA last year, but it died off with the beginning of a
semester, a change in jobs and a relocation of a server. It got
ressurected again with issue 45 of the Linux Journal on Network
Clustering. It was further cultivated by the discovery that a
significant portion of the movie Titanic was rendered using Linux on 200
500Mhz DEC Alpha's. (cover of LJ issue 46.) I know that this has less
to do with High Availability and more with Computational matters, but,
like the classic Venn diagram, I feel that there's a definte overlap
between the two.

My interest in Linux-HA is more or less curiosity and proof-of-concept
rather than for the pure need of it. I feel that most of the people on
this list are just waiting for something to happen so that they can try
it out and use it rather than to start developing it. I feel that
Harald Milz did a great service starting the HOWTO and it's unfortunate
that he's no longer involved. (gotta find out what it was he was
smoking!) It's just a guess, but I think that most Linux'ers don't have
the kind of hardware that's REALLY needed for REAL HA applications. On
the flip side, one advantage to using Linux is that you pay nothing for
the software, so most of your investment can go into the hardware to
make it dynamite. (unless of course, you went with linux, because it was
cheap and your boss did "not want to pay a lot for that supercomputer!")

I now have two machines connected to the internet that I call a
cluster. I don't have the proper authentication stuff installed yet,
but I'm leaning towards NIS/NYS over Kerberos and PAM. (BTW, any
thoughts on this? I haven't received any recommendations _for_ or
_against_ anything, other than with Kerberos. One has to recompile every
application they wish to use for authentication. I don't think I'm into
doing that quite yet, just because it would be a real pain. But I
digress.) Both machines share almost identical configurations. Both
have at least two ethernet cards, one which connects to the internet and
the second which connects to the "other" node via crossover cable. It's
not much, but it's a start. For testing purposes, I want to be able to
try this stuff out on all kinds of hardware with all kinds of
distributions. (perhaps, not limiting ourselves to Linux only, but
again, that becomes a limitation.)

I don't have access to any expensive equipment designed for high
availability, but I am willing to start writing the cluster manager
daemon. I've been going over the HA Howto and like what I see quite a
bit. (then again, I've never gotten into anything else like it, so the
new stuff always seems pretty cool.) I found a very little book at
Border's that talked about HP HA based systems. It looked like it could
get mildy interesting, but I wasn't interested in paying $35+ for
something that looked more like a pamphlet. If anyone knows of any
other resources that could help us in the project, I'd love to hear 'em.
I found a document recently on multicast programming, which was one of
the recipes recommended in the howto for sending Heartbeats between
systems. I'm also willing to set up a web site dedicated to HA related
stuff.

-ds
I suppose introductions are in order now... [ In reply to ]
I'm Jiva DeVoe, I'm a software developer and a network engineer. I work
alot with Microsoft products and a bit with Linux. I'd like to see an
alternative to Microsoft's wolfpack come out of this. Something cheap,
and done right. Couple of notes:

Correct me if I'm wrong, but isn't high availability mainly Server B
taking over if Server A goes down? And if so, can't that already be
accomplished with IP Aliasing and a bit of hardware english? Could it
be that Linux-HA need be nothing more than a set of instructions and a
spec? Or am I smoking something I shouldn't be. It just looks to me
like Linux-HA is practically there already. I can elaborate if needed.

David E. Storey wrote:
>
> 'allo 'gain.
>
> So the list and project are pretty much dead 'eh? ;-)

-- snipped for brevity sakes --

--
Jiva DeVoe
MCSE
Devware Systems
mailto:jiva@devware.com
I suppose introductions are in order now... [ In reply to ]
> Correct me if I'm wrong, but isn't high availability mainly Server B
> taking over if Server A goes down? And if so, can't that already be
> accomplished with IP Aliasing and a bit of hardware english? Could it
> be that Linux-HA need be nothing more than a set of instructions and a
> spec? Or am I smoking something I shouldn't be. It just looks to me
> like Linux-HA is practically there already. I can elaborate if needed.

I think that's only a part of it, but you're right. That part could be
done fairly easily. You could probably do it with shell scripts using
ping! A better implementation would require a little bit of work, but I
think I'm up to it. The HOWTO recommends we use multicast UDP for
communication between the nodes. Each UDP message gets signed using Public
Key Crypto (probably the tough part.) to deter spoofing. I feel this is
very important because of security. These messages also send other pieces
of info. Personally, I'd like to see a load average in there, in case
someone wants to do load balancing. (like me)

Here's an idea. Most of the HA implementations I've seen have a primary
and a backup. Why not load balance? Why let silicon sit idly? =) I was
thinking about also setting up the cluster manager to handle updates with
DNS per a certain recent RFC dealing with Dynamic Record Updates. (Does
anyone know if bind v8.1.1 supports this?)

There are other thoughts I have, but I'm happy to get the ball rolling.
Glad the list is becoming active again.

-ds
I suppose introductions are in order now... [ In reply to ]
> Here's an idea. Most of the HA implementations I've seen have a primary
> and a backup. Why not load balance? Why let silicon sit idly? =) I was
> thinking about also setting up the cluster manager to handle updates with
> DNS per a certain recent RFC dealing with Dynamic Record Updates. (Does
> anyone know if bind v8.1.1 supports this?)


BIND v8.1.1 does support it, but I haven't figured out an easy way to use it
from a shell script. Anybody know of a utility that'll do it? There's
probably some way to do it with one of the utils in the BIND distribution,
but the docs don't mention it.

--Bill
I suppose introductions are in order now... [ In reply to ]
Hi,

There appear to be several flavors of HA, the most basic form
is to share a "common" disk between two systems. The more elaborate
forms share not only disk, but up-to-date process information, among
other things. You are correct in sense that Linux has most of the
resources available.

I wrote a series of scripts four years ago to implement the simplest form
between two HP's. Each system had two network cards, and a shared a
single dual-port raid box. I ran a heartbeat program that, if not present,
hinted that something might be wrong, and to start a "failover" process.

The failover process would then change the second network controller IP's
address to be that of the failed box. (arp caching at the router was not
allowed for these two networks, of course this can be handled in several
other ways) The next step was take over the common disk, with a raid
system it was quite easy, just issue a command to drop the failed system's
scsi port, and then assign the disk(s) to the active system.

To do this with Linux is just as easy, the first step is to make sure
that you cross-connect your lan cards. The primary system's first card
is connected to the secondary system's second card, and the primary system's
second card is connected to second system's first card. Having achieved this,
it is a simple manner to change the IP addresses on either system. Then
update or advertise to the other systems on the appropriate segment to
update their arp cache for the failed systems IP address.

The "shared" or common drive implementation is a little more difficult to
do in Linux. In tests, I used a single external scsi drive with two
connectors, one for the primary system, and one for the secondary. The
primary's scsi adapter was set for an address of 7, the secondary's
was set to an address of 6, and the drive was set to an address of 1. As
to be expected there were some problems with the setup, scsi termination,
and scsi termination power source. However it did allow me to conduct a
proof of concept evaluation. (it worked)

Now it is much easier to implement, as many disk drives support active
termination. I just recently acquired a Quantum drive that does active
termination, and I am anxious to test it out. I intend to have the Quantum
provide termination power. The theory is, if either the primary or secondary
system fails, the Quantum drive will automatically sense the loss of
termination, and correctly supply it.

The only other major item is a heartbeat program. This can be sent across
either the network boards, or the serial ports. There are a lot pro's
and con's for both. At this stage I would just continue to used the
networks (both of them) as the link. Also keep in mind that Linux
does not like it when a file system gets yanked out from underneath
it, however unlike many of the commercial ones, you can recover from
this without rebooting.

I believe that we should approach this in stages, first get a common
setup that works. (the one I used might not work for everyone) This
would then allow us to develop a robust heartbeat/failover detection
scheme. Once the failover software is complete we could then focus
in on a more elaborate hardware/software (like a logical volume
manager)

Andy Thomson
Leesburg, Virginia USA
andy.thomson@worldnet.att.net

>I'm Jiva DeVoe, I'm a software developer and a network engineer. I work
>alot with Microsoft products and a bit with Linux. I'd like to see an
>alternative to Microsoft's wolfpack come out of this. Something cheap,
>and done right. Couple of notes:
>
>Correct me if I'm wrong, but isn't high availability mainly Server B
>taking over if Server A goes down? And if so, can't that already be
>accomplished with IP Aliasing and a bit of hardware english? Could it
>be that Linux-HA need be nothing more than a set of instructions and a
>spec? Or am I smoking something I shouldn't be. It just looks to me
>like Linux-HA is practically there already. I can elaborate if needed.
>
>David E. Storey wrote:
>>
>> 'allo 'gain.
>>
>> So the list and project are pretty much dead 'eh? ;-)
>
>-- snipped for brevity sakes --
>
>--
>Jiva DeVoe
>MCSE
>Devware Systems
>mailto:jiva@devware.com
>
I suppose introductions are in order now... [ In reply to ]
On Tue, 27 Jan 1998, Andy Thomson wrote:
> The "shared" or common drive implementation is a little more difficult to
> do in Linux. In tests, I used a single external scsi drive with two
> connectors, one for the primary system, and one for the secondary. The
> primary's scsi adapter was set for an address of 7, the secondary's
> was set to an address of 6, and the drive was set to an address of 1. As
> to be expected there were some problems with the setup, scsi termination,
> and scsi termination power source. However it did allow me to conduct a
> proof of concept evaluation. (it worked)
>
> Now it is much easier to implement, as many disk drives support active
> termination. I just recently acquired a Quantum drive that does active
> termination, and I am anxious to test it out. I intend to have the Quantum
> provide termination power. The theory is, if either the primary or secondary
> system fails, the Quantum drive will automatically sense the loss of
> termination, and correctly supply it.

Wouldn't external termination (rather than the system SCSI controllers
providing termination internally) avoid this problem altogether?

-Andy

Global Auctions
http://www.globalauctions.com
I suppose introductions are in order now... [ In reply to ]
Here's my interest in all of this...

I ran a network of about 20 Linux machines with 2,000 users on it with
almost no budget. The problems we had weren't only that we had slow
hardware, but also that we had flakey hardware.

The big challenge for me is to create some sort of distributed NFS server,
which would let a cluster of machines of questionable quality service the
NFS requests.

Supposing you had X GB of data that needed to be stored (things like home
directories and incoming mail), I'd like to be able to spread little disks
across the cluster of a dozen machines, for a total of something like 2 X
GB. Not too long ago I came across a system which would let you store
something like 10 diskettes of data on 16 disks. If 4 of them died, you'd
still be able to recover the data fromt he remaining 12. So, clustered
disks. You could just slap four 100MB disks in 12 486's and have 4.8GB of
total disks space, with maybe 3GB of that being reliable.

Somehow magically you'd be able to handle all your replication in
realtime, so that in theory a user could also log onto any two machines in
the cluster and have an active filesystem. I've done some reading into
the CODA fs, and that looks neat, but it requires strict checking in and
out for file locking, and the manual mechanism for recreating broken files
looks inconvenient.

Now, to add to all of this... I'd like to do a shared NFS server for other
clients, which would be able to do redundancy for a) performance, and b)
stability. This has been brought up in the group before.

Here's the way that I'd worked it out; the servers magically have cohesive
filesystems among them so that any disk caching is synchronized. I have
no idea how to do this. They're on the same wire, and they do a
round-robin style arbitration to determine which hosts handles the UDP
requests. They all have heartbeats, so they know when one machine has
died. The IP address used by the NFS clients is one that through some ARP
magic is mapped to the ethernet broadcast address so all the NFS clients
pick up on it.

The flaw in all of this is that it assumes that the two hosts are on the
same network. That would mean that both hosts are using the same router,
which is unhealthy. The best is if this could be done with multicast.
But then I don't know how you'd do the round robin; that is, how would the
servers know that another machine had had its turn?

And I'm at the point of thinking that updating all the siblings
dynamically is pretty much going to kill any performance increases, since
you'd have to have virtually no local caching.

And I suppose somehow you could extend this further to other tasks, like
distributed SMB file sharing (where the actual TCP stack would have to be
shared among multiple machines).

All these things sound like pure torture, and I'm not sure there are any
solutions to them.

- Alex

--
Alex deVries Run Linux on everything,
run everything on Linux.
I suppose introductions are in order now... [ In reply to ]
> The failover process would then change the second network controller IP's
> address to be that of the failed box. (arp caching at the router was not
> allowed for these two networks, of course this can be handled in several
> other ways) The next step was take over the common disk, with a raid
> system it was quite easy, just issue a command to drop the failed system's
> scsi port, and then assign the disk(s) to the active system.

Maybe I'm paranoid (not that my name is Bill), but there is something
about sharing disks between HA machines that makes me restless: if
one of the machines fail due to a hardware breakdown in the disk
controller(s), the other machine(s) could get hung too. I remember a
broken parallel port (!) to hung a server with spurious errors either
during startup or during operation. Maybe from an engineer's
standpoint the chances are comparable low that this happens, so how
do the others on this list assess this? Which concepts for HA are
available which don't need hardware sharing (like disks), but instead
share information on a higher level (of abstraction of operation)?

A side note: I'm just one from the "fence audience" interested in
concepts and implementations of HA, because it is an technology
waiting to be awoken (or partly awake already) and used in industrial
projects. Here at the Chair of Process Control Engineering we're
working in heterogenous environments using various Unices, OpenVMS
(on Alpha hardware) and Windows NT/95. Interesting enough, most of
our information servers (based on our own freely available protocol)
are running on Unix systems. When I started work, the main workhorses
were HP-UX systems, but my boss has accepted Linux as a full-blown
server system: I think the main reason is that we're running a
function block system server on a 486/33Mhz Laptop with 8Megs RAM,
try this using WindowsXX! It would be very interesting to implement
HA into such systems, as they're running 24 hours a day, 366 days a
year in production systems.

Harald
Harald Albrecht
Chair of Process Control Engineering
Aachen University of Technology
Turmstrasse 46, D-52064 Aachen, Germany
Tel.: +49 241 80-7703, Fax: +49 241 8888-238
email: harald@plt.rwth-aachen.de
I suppose introductions are in order now... [ In reply to ]
On Tue, 27 Jan 1998, T's Mailing Lists wrote:
> 3) IP fallover.

Oh, well you can just use NT for that. I hear it falls over all the time! :-)

Sorry - that was just toooo good to let go...

-Andy

Global Auctions
http://www.globalauctions.com
I suppose introductions are in order now... [ In reply to ]
> And on top >there still are bugs and DoS< in linux. So sometimes machines
> just >hang<.

There are in every OS. Its pretty close to right but your network wires
themselves are even subject to DoS attacks like smurf. It doesn't matter
how reliable your OS is HA has big value. Even if your OS is perfect someone
will still trip over the power lead.

> > Here's the way that I'd worked it out; the servers magically have cohesive
> > filesystems among them so that any disk caching is synchronized. I have
> > no idea how to do this.

Make the client commit the write to all servers before returning. Reads
you can share around. Coping with coherency loss requires major brainwork
though - thats a not minor chunk of the CODA fs for example

> can do that! The host runing rinetd becomes single point of failure though
> and it has to be made sure that this one is redundant too.

You make the two hosts running ridentd be one on each of your two incoming
internet feeds and doing the route adverts - you lose the box, you lose
the link you get all your traffic on the other box (and vice versa)

Alan
I suppose introductions are in order now... [ In reply to ]
> > > Here's the way that I'd worked it out; the servers magically have cohesive
> > > filesystems among them so that any disk caching is synchronized. I have
> > > no idea how to do this.

On Tue, 27 Jan 1998, Alan Cox wrote:

> Make the client commit the write to all servers before returning. Reads
> you can share around. Coping with coherency loss requires major
> brainwork though - thats a not minor chunk of the CODA fs for example

Just an idea. If you could commit to some constant number of servers
instead of all, you would probably gain considerable performance since you
wouldn't have to wait for the slow ones. You could throw in slow servers
for extra redundancy without hurting performance.

astor

--
Alexander Kjeldaas, Guardian Networks AS, Trondheim, Norway
http://www.guardian.no/
I suppose introductions are in order now... [ In reply to ]
On Tue, 27 Jan 1998, Alan Cox wrote:
> There are in every OS. Its pretty close to right but your network wires
> themselves are even subject to DoS attacks like smurf. It doesn't matter
> how reliable your OS is HA has big value. Even if your OS is perfect someone
> will still trip over the power lead.

Agreed. And lots of thigns can go wrong; power failures, year 2000
problems, hardware blowouts, floods, nuclear war...

> > > Here's the way that I'd worked it out; the servers magically have cohesive
> > > filesystems among them so that any disk caching is synchronized. I have
> > > no idea how to do this.
> Make the client commit the write to all servers before returning. Reads
> you can share around. Coping with coherency loss requires major brainwork
> though - thats a not minor chunk of the CODA fs for example

The problem with getting the client to commit everything to the servers is
that it takes a huge performance hit. Nobody wants for their file changes
to take a couple of seconds to flush all the local cache buffers.

The one thing that I don't like about coda is that the file writes to the
server are only done on a close(), which means that you can easily have a
reader on another client that's not reading the most recent version.

I know, I'm asking for the world. I want multiple servers all nicely
synchronized and perfectly redundant. I want it now, and I want it free.

The solution to this problem is to figure out what you want to trade off:
synchronization, performance or redundancy.

- Alex
I suppose introductions are in order now... [ In reply to ]
On Tue, 27 Jan 1998, Alex deVries wrote:

> > > > Here's the way that I'd worked it out; the servers magically have cohesive
> > > > filesystems among them so that any disk caching is synchronized. I have
> > > > no idea how to do this.
> > Make the client commit the write to all servers before returning. Reads
> > you can share around. Coping with coherency loss requires major brainwork
> > though - thats a not minor chunk of the CODA fs for example
>
> The problem with getting the client to commit everything to the servers is
> that it takes a huge performance hit. Nobody wants for their file changes
> to take a couple of seconds to flush all the local cache buffers.
>
> The one thing that I don't like about coda is that the file writes to the
> server are only done on a close(), which means that you can easily have a
> reader on another client that's not reading the most recent version.
>
> I know, I'm asking for the world. I want multiple servers all nicely
> synchronized and perfectly redundant. I want it now, and I want it free.
>
> The solution to this problem is to figure out what you want to trade off:
> synchronization, performance or redundancy.

Well I don't really think you have to trade off. Let's say f.ex. You have
a LAN and set up a 2nd 100M LAN just for synchronisation. How big will the
delays be? I think not really relevant compared to other stuff like for
example Win95 client speed, slow internet connections etc. Again, I'm
talking about "realworld" applications and not supercomper clusters
serving a mission critical nuclear station.

And allthough I haven't had a look at coda, I think it's not that
stupid to wait longer than necessary: set up >one< backup box connected
through a >fast< connection, and have it synchronized.

*
t

"This Perl language is wonderfull. Where can I get it - from Microsoft?"
--------------------------------------------------------------------------------
Tomas Pospisek's mailing-lists mailbox
www.SPIN.ch - Internet Services in Graubuenden/Switzerland
--------------------------------------------------------------------------------