Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!uunet!mcsun!ukc!dcl-cs!aber-cs!pcg
From: pcg@aber-cs.UUCP (Piercarlo Grandi)
Newsgroups: comp.arch
Subject: Re: Single user vs. shared
Message-ID: <1720@aber-cs.UUCP>
Date: 11 Apr 90 21:48:59 GMT
Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi)
Organization: Dept of CS, UCW Aberystwyth
	(Disclaimer: my statements are purely personal)
Lines: 105

In article <1990Apr10.225542.13662@world.std.com> bzs@world.std.com (Barry Shein) writes:
  
  From: pcg@aber-cs.UUCP (Piercarlo Grandi)
  >In article <8840010@hpfcso.HP.COM> dgr@hpfcso.HP.COM (Dave Roberts) writes:
  >  In school we had a lab full of Sun 3/50s which were all diskless (via NFS)
  >  to a server.  There were about 50 machines on an ethernet which worked
  >
  >Note that 50 machines to a single server is *crazy*. I would not go over a
  >dozen; and even with multiple servers I think that 50+ hosts doing heavvy
  >traffic on a single Ethernet requires some careful analysis.
  
  Gee, Piercarlo, do you ever work from facts forward rather than the
  other way around? He said the 50 workstations worked fine except
  during peak load (finals), what else is new? Every utility on earth is
  set up this way. So you say the set-up is crazy?  Why? Because it
  worked? Because it offends your intuitive sensibilities?

The setup is crazy because it collapses ungracefully under load. Almost
anything works well if it is used for a fraction of nominal; the system
engineer is the guy that makes thing work even under load.

The problem with a 50 workstation ethernet is that its knee is reached very
quickly as the more workstations become significantly active. There three
possible alternatives that do not guarantee a meltdown:

1) A single large, 50 users, machine with local fast discs, as it would not
have wire contention and network overheads.

2) 5 segments each with 10 diskless and a small server would not have wire
contention, because we expect cross segment transaction to be very rare.

3) A wire with 50 diskful workstation would not experience network
contention, nor network overheads.

It is a damn interesting research problem to find a performance profile of
each of these solutions for various loads, and a cost profile, and compare
them. It is not an interesting research problem to discuss configuration
with in-built narrow bottlenecks.
  
  I thought compiling hasn't been disk intensive for years, it's CPU
  intensive.

Tell that to Borland! Their compilers are neither... :-). Or maybe :-(.

It depends on how inefficient and stupidly built is the compiler. Based on
my impressions, I'd say that pcc derived or inspired compilers tend to be
disk traffic intensive, while those with glocal optimizers tend to be memory
intensive, and thus again usually disc traffic (paging!) intensive.

If you have infinite memory, either for caching disc blocks, or for avoiding
paging, then both types of compilers obviously tend to become CPU *bound*,
rather than intensive. Of course, if you have infinite resources, any
solution will do.

Yet, compile times are often fairly "short", and with lots of IO instead,
especially in development environments where you don't optimize but generate
large symbol tables.

  Does anyone have measurements?

Very precious few, for the distributed case. For the local case, and some
inferences, however haphazard, can be extrapolated, we have more data; the
landmark paper on disc caching by J Smith, and a few others on the
performance characterization of Unix disc access. We also have some
interesting timings for network communications (the CACM one on efficient
RPC on ethernet, even if old, the one on the galloping bits syndrome, the
Amoeba ones, etc...). All these papers are well known, I assume.

  That doesn't stop you from running with this lead and drawing conclusions
  based on it.  To be frank, I don't trust your intuitions. I'd rather see
  some data. Perhaps that's rude.

I'd like to see it as well. I know people are working on that. On the other
hand I think good arguments can be built out of known facts:

1) Ethernet has a well known problem (understatement of the decade) as soon
as average utilization gets over 30-50%.

2) The total conceivable bandwidth of an Ethernet is just over 1MB/sec, but
only when just two stations are using it, and if the receiving one can
accepts full size back to back packets without ovveruns.

3) Each network transaction takes about 3-5ms. on your typical UNIX machine
(from kernel buffer to kernel buffer); it may take much more, depending on
various misdesigns, and on whether you are instead measuring program to
program times.

4) A diskless workstation being actively used generates about 10-20KB/sec of
network traffic, and about 10/20 packets/second.

5) Many Ethernet boards and their interface software cannot sustain *input*
rates anywhere near the theoretical maximum. In particular there is a limit
to the number of packets/sec. that can be read by many machines.

6) On average, if users are doing mostly editing, one user in 10 has an
active process (but then, why ever give them a workstation each?). If they
are mostly compiling this ratio worsens substantially.


I will let the interested readers draw their own conclusions based on back of
the envelope arithmetic everybody can do.
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk