Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!rutgers!ames!sdcsvax!darrell
From: darrell@sdcsvax.UCSD.EDU (Darrell Long)
Newsgroups: comp.os.research
Subject: How do you tell if a remote site is alive?
Message-ID: <3290@sdcsvax.UCSD.EDU>
Date: Tue, 9-Jun-87 02:35:08 EDT
Article-I.D.: sdcsvax.3290
Posted: Tue Jun  9 02:35:08 1987
Date-Received: Thu, 11-Jun-87 06:34:08 EDT
Organization: University of California, San Diego
Lines: 23
Keywords: networks, delay, ditributed systems
Approved: mod-os@sdcsvax.uucp

Here's a question for you: In a distributed system, how do you tell if a remote
site is alive or not?  A time-out could be used, but it's not reliable and its
also very slow.  I can think of many approximate solutions, but reliability is
important.

When constructing a distributed system, the network is the slowest component
and presents the bottle-neck.  From what I've read, most folks just assume that
there is a way to tell if a remote site is dead.  But, this information is very
important to many algorithms.

How is it done in practice?  For us university-types, time-out is the usual
approximation to a solution since we're usually just out to prove a concept
and not to build a product.

How do the folks in industry do it?  Performance is critical there, unlike a
university prototype.

DL
-- 
Darrell Long
Department of Computer Science & Engineering, UC San Diego, La Jolla CA 92093
ARPA: Darrell@Beowulf.UCSD.EDU  UUCP: darrell@sdcsvax.uucp
Operating Systems submissions to: mod-os@sdcsvax.uucp