Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!elroy.jpl.nasa.gov!sdd.hp.com!hplabs!hpl-opus!hpcc05!hpyhde4!hpycla!hpcuhc!dhepner From: dhepner@hpcuhc.cup.hp.com (Dan Hepner) Newsgroups: comp.databases Subject: Re: "High Availability" Sun/Sybase systems. Message-ID: <2060019@hpcuhc.cup.hp.com> Date: 21 Jun 91 01:16:12 GMT References: <1991Jun19.201044.814@st-andy.uucp> Organization: Hewlett Packard, Cupertino Lines: 50 From: sqr@st-andy.uucp >Has anyone had any experience setting up a "High Availability" Sybase >system on Suns similar to the VAX Sybase Companion Server? > >The System would be set up so that if the production server's machine >went down there would be an automatic switching to a backup machine >running the backup server. I think mirrored disks are also involved. >Apparently the VAX software/hardware does all this. >Ric What I think you've seen is that such products, in order to work well and be guaranteed a future, are typically sold and supported by the hardware vendor. Have you asked Sun directly? Hewlett-Packard sells such a product on our 800 Unix series; so does Pyramid. Things you might consider are: 1. (at the top of the list on purpose) How transparent is the failover to the application software? Nobody wants to develop special code to be able to exploit such a feature. 2. What happens at the user screen? Does the user have to recognize the failover and know to login to a different machine? 3. What happens to NFS mounted filesystems? 4. What about networking data cached at remote sites? 5. Does the vendor credibly claim that no single failure will make the system unavailable? 6. Some gotcha's which are likely to be solved as part of a "real" product, but complicate the hell out of something cobbled together: a. Core dumps & savecore. b. When possible, getting two systems concurrently booting on the same disk is a far more likely procedural error than might be expected, and at best results in both sides panicked, at worst a corruption of data. c. What if the "failed" system doesn't have a clue and continues along modifying the disks? d. Naive mirrored disk implementations are vulnerable to having one mirror written but a system crash prevents the second write. Copying an entire disk farm is an effective solution, but may take a long time. Dan Hepner Not a statement of the Hewlett-Packard Company.