Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!elroy.jpl.nasa.gov!sdd.hp.com!hplabs!hpl-opus!hpcc05!hpyhde4!hpycla!hpcuhc!dhepner
From: dhepner@hpcuhc.cup.hp.com (Dan Hepner)
Newsgroups: comp.databases
Subject: Re: "High Availability" Sun/Sybase systems.
Message-ID: <2060019@hpcuhc.cup.hp.com>
Date: 21 Jun 91 01:16:12 GMT
References: <1991Jun19.201044.814@st-andy.uucp>
Organization: Hewlett Packard, Cupertino
Lines: 50

From: sqr@st-andy.uucp

>Has anyone had any experience setting up a "High Availability" Sybase
>system on Suns similar to the VAX Sybase Companion Server?
>
>The System would be set up so that if the production server's machine
>went down there would be an automatic switching to a backup machine
>running the backup server. I think mirrored disks are also involved.
>Apparently the VAX software/hardware does all this.
>Ric

What I think you've seen is that such products, in order to work
well and be guaranteed a future, are typically sold and supported by 
the hardware vendor.

Have you asked Sun directly?

Hewlett-Packard sells such a product on our 800 Unix series; so does Pyramid.

Things you might consider are:

1. (at the top of the list on purpose)  How transparent is the failover
   to the application software?  Nobody wants to develop special code 
   to be able to exploit such a feature.

2. What happens at the user screen?  Does the user have to recognize
   the failover and know to login to a different machine?

3. What happens to NFS mounted filesystems?

4. What about networking data cached at remote sites?

5. Does the vendor credibly claim that no single failure will make
   the system unavailable?

6. Some gotcha's which are likely to be solved as part of a "real" product,
   but complicate the hell out of something cobbled together:
   a. Core dumps & savecore.
   b. When possible, getting two systems concurrently booting on the same 
      disk is a far more likely procedural error than might be expected, 
      and at best results in both sides panicked, at worst a corruption of
      data.
   c. What if the "failed" system doesn't have a clue and continues
      along modifying the disks? 
   d. Naive mirrored disk implementations are vulnerable to having one mirror 
      written but a system crash prevents the second write.   Copying an 
      entire disk farm is an effective solution, but  may take a long time.

Dan Hepner
Not a statement of the Hewlett-Packard Company.