Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!uunet!dev!dgis!jkrueger
From: jkrueger@dgis.dtic.dla.mil (Jon)
Newsgroups: comp.databases
Subject: Re: Performance Data (was Re: Client/Server processes and implementations)
Message-ID: <684@dgis.dtic.dla.mil>
Date: 1 Dec 89 20:17:02 GMT
References: <7169@sybase.sybase.com> <13520006@hpisod2.HP.COM>
Organization: Defense Technical Information Center (DTIC), Alexandria VA
Lines: 112

dhepner@hpisod2.HP.COM (Dan Hepner) writes:

>From: jkrueger@dgis.dtic.dla.mil (Jon)
>> 
>> >1. Is it your experience that more than 10% of the work is done by 
>> >   the clients?
>> 
>> Sometimes.  If it's only 10%, we may then assign 10 clients per server,
>> thus balancing the load.  Yes, the server load increases too, but not
>> proportionately; balance might be 12 or 15 clients per server.

>In the example, if one moved 10 clients taking 10% of a 100% used CPU, 
>we would simplistically end up with the client CPU 10% used, and 
>the server CPU still 90%.

Perhaps I'm not making myself clear.  That's 10% per client.
10% of the work is done by the client; this client serves a
single user.  Each additional concurrent user gets another
client, which consumes another 10%, in this example.

>Adding one more client, we would end up with 
>a saturated system with 11 Clients on an 11% utilized client machine, 
>while the server was now 99% used.  If this were so, it wouldn't 
>seem either all that balanced, and probably a economically unjustifyable
>move.

All you're saying is that a two-process model doesn't scale well if
we're already bottlenecked on either process.  This is a tautology.

>100+% increase in hardware cost yielding a 10% increase in
>throughput.

Indeed, it's worse than that: the interconnects aren't free.
One doesn't win by distributing inherently sequential problems
that one doesn't know how to decompose.  Again, a tautology.

>> >2. Is it your experience that remote communication costs don't end
>> >   up chewing into the savings attained by moving the clients 
>> >   somewhere else?
>> 
>> No, the lower bandwidth is more than offset by multiprocessing.

>Let's assume you have plenty of bandwidth, but not plenty of CPU
>cycles at the server.  Remote communication, especially reliable remote 
>comm, being more expensive than local communication.

In exactly the same way that reading bytes off disks costs more
cycles than referencing memory, yes.  But compelling cases for
not requiring databases to reside in main memory can be made, no?

>The extreme of my 
>concern would be illustrated if the remote communication costs at the server
>end exceeded the processing/terminal handling done by the client, 
>in which case one would actually lose by adding a remote machine 
>for the clients. 

A valid concern.  Got any data?  Measured degradation in latencies?
Throughput?  I don't deny it can happen, just asking how often it
does.

And again, you're simply saying that sometimes costs of distributing
the load are greater than benefits achieved.  How true:  sometimes the
problem is intractable, or you don't know enough to decompose it, or
your tools are poor, or the implementation is poor.  Then you get the
biggest monoprocessor you can afford, indeed.  You've admitted you
can't work smarter, so you'd better work harder.

>> >>(and in the extreme (and not at all impractical) case, you run each 
>> >> client and each server on its own machine).  This model is simple, 
>> >> elegant, and fundamentally right.
>> 
>> This isn't the extreme case.  Multiple processors can divide work
>> with better granularity than client and server processes.

>Maybe you can clarify.  The case in question was how frequently it would
>practical to put each client and each server on its own machine, with
>the assertion that if the client/server workload split weren't near
>50-50, it wouldn't be practical. 

The usual assumption is that each client can get its own machine, but
the server has to share a single machine.  This makes the server the
bottleneck, in general.  It's also a bad assumption: multithreaded
servers can use multiprocessors to scale up, distributed DBMS can use
distributed hosts to execute queries, and parallel servers can apply
processors to each component of each query.  The first two animals
exist now.

>The points of confusion:
>   1) "Multiple processors" can be ambiguous as to remoteness, but given
>      the context I'll assume remoteness. (right?)

Wrong, as in previous graf.

>   2) Granularity. Are you postulating a flexible division of the work 
>      between client and server?  A server which is flexibly divisible 
>      over both machines?

Nope, a flexible approach to designing database engines.  Remember,
your query language can't tell the difference anyway.

>I think all of these questions are facets of the same underlying question: 
>how much of the typical application can be done at the client?

Fair question, but needlessly special.  The general question is how
can we divide up work, and what tools do we need, and how many of
them exist yet?

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
Isn't it interesting that the first thing you do with your
color bitmapped window system on a network is emulate an ASR33?


Brought to you by Super Global Mega Corp .com