Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!unmvax!ncar!tank!mimsy!haven!vrdxhq!daitc!daitc.daitc.mil
From: jkrueger@daitc.daitc.mil (Jonathan Krueger)
Newsgroups: comp.databases
Subject: Re: Single Server Bottlenecks (was RE:ORACLE REL 6, INGRES REL 6)
Message-ID: <523@daitc.daitc.mil>
Date: 17 May 89 00:19:45 GMT
References: <3241@tank.uchicago.edu>
Sender: jkrueger@daitc.daitc.mil
Reply-To: jkrueger@daitc.daitc.mil (Jonathan Krueger)
Distribution: usa
Organization: DTIC Special Projects Office (DTIC-SPO), Alexandria VA
Lines: 203
In-reply-to: cs_bob@gsbacd.uchicago.edu

In article <3241@tank.uchicago.edu>, cs_bob@gsbacd (R. Kohout?) writes:
>My favorite of Mr. Krueger's suggestions:
>
>>	If you want to support multiple fully runnable (COMputable)
>>	jobs without interference, you need a multiprocessor.  Buy one.
>>	Configure your servers as you find optimal for best throughput
>>	and fair for INGRES versus non-INGRES applications.
>> 
>Got that? If you want your Ingres 6.1 performance to equal your Ingres 5.0
>performance, Jonathan Krueger recommends that you buy yourself a
>multiprocessor.

I think your criticisms would have something worthwhile to contribute
if you read what I wrote.  Material quoted above doesn't lead me to
believe that you have.

At no time, in the part you cite or at any other point, have I said
anything about INGRES 6.x performance, relative to 5.x performance,
some absolute standard, or other vendor product.  In particular,
having no data on relative performance, I have no opinions on it, have
not expressed any, and in fact, am not overly concerned with the
topic.  I take it that you are, and you're unhappy with 6.x
performance.  That's fine.  INGRES 6.x performance issues are
appropriate to this group.  But they're distinct from the
multiprocessor issues.  Let me be boringly clear: at no time did I say
that you need, want, or should get a multiprocessor to run 6.x, nor do
I believe this, for performance reasons or anything else.

>I think that you're missing the point. My posting was an attempt to point
>out that single server bottlenecks can and do exist. I'm not attacking Ingres,
>but the fact remains that under VMS, with its retarded scheduling strategy,
>the Ingres 6.1 server can become a true bottleneck.

Of course they can, of course it can.  Again, if you want to be
helpful, now that you've explored the nature of the bottleneck, could
you give us estimates of its extent?  How often, how bad, how
pathological?

>If any 5.0 users want to determine whether or not this could happen to
>them, I suggest the following. 
>
>a) when you suspect the database activity is heavy, run MONITOR PROCESS/TOPCPU
>and look for the ING_BACK_* processes. You should see the busiest backends,
>and if you sum the percentage of CPU they're getting, you should get a
>rough estimate of the total CPU being provided the backends.
>
>b) run MONITOR STATE/AVE for a typical working day to get an idea
>of the typical CPU load (i.e. the average number of processes in the COM
>state throughout the day.)
>
>If we call the result of a) CPU_USED and remember that it is a percentage
>strictly between 0 and 1, and if we call the result of b) CPU_LOAD then
>IF CPU_USED > 1 / CPU_LOAD YOU WILL PROBABLY EXPERIENCE A BOTTLENECK 
>RUNNING A SINGLE INGRES 6.1 SERVER. 

Well, your units aren't comparable, and that provokes some doubt about
the validity of conclusions drawn from the metric.  Specifically, what
you call CPU_USED, or INGRES processor share, has units

	sum of processor time used by INGRES processes
	----------------------------------------------
			clock time

over an undefined sampling period (and recall that MONITOR returns
snapshots, not sum over time window: the INTERVAL parameter merely
sets sampling resolution, not reporting or updating resolution.  Over
short time windows this can add significant error when the underlying
unit is continuous, as in time).  The time units cancel yielding a
ratio, although not a pure unit, it's a time share, as in timesharing.

To be boringly clear, this isn't what you said, it's what I assume you
meant.  To "sum the percentages of cpu they're getting" is to arrive
at a meaningless number; I assume you meant instead to average them.
This can be measured and expressed in useful ways, such as deriving it
from cpu time over clock time as shown.

Now, what you call CPU_LOAD has units

	sum of number of processes in COM state
	---------------------------------------
	    	number of samples

over a sampling period, in your example a day.  This is long enough
that the snapshots collected by MONITOR should approach load averages.
Thus we can neglect quantization effects, the underlying unit is
discrete and we collect many samples.  The numbers cancel yielding a
ratio which is a pure unit, the average number of COM processes over
the time measured.

So the two numbers CPU_USED and CPU_LOAD aren't just different
measurements, they're not in comparable units.  Neither one be
expressed in terms of the other.  Worse, experience shows that real
measurements of the two are not well related; that is, neither is very
predictive of the other.  Either varies inversely with the other, but
not in a well behaved manner.  They somewhat complement each other for
performance analysis.  But not in the way you suggest:

	CPU_USED > 1 / CPU_LOAD

which may be re-expressed as

	CPU_USED * CPU_LOAD > 1

which, following my notation, means

	INGRES processor share * load average > 1

Substituting in pure units, and specifying a common time window for
data collection, this reads

	processor time used by INGRES	  number of COM processes
	-----------------------------  *  -----------------------  > 1
		total clock time	     number of samples


Take again the simple case of one fully computable INGRES process and
one fully computable non-INGRES process, at equal priorities in a
round-robin scheduler.  By this metric, "you will probably experience
a bottleneck running a single INGRES 6.1 server."  In fact, this makes
sense, you probably will, although "probably" and "bottleneck" have
yet to be expressed quantitatively: how probably and how bad.

Now take the case of the two processes staying about half compute
bound and half i/o bound.  If their i/o and computation overlap well,
they'll never see each other; if they don't, they'll interfere with
each other exactly as much as in the previous case.  But the metric
doesn't distinguish between these two sets of conditions.  Thus it's
easy to show beta error, the metric falsely predicts no problem.

Alpha error is even easier to show: consider the case of ten INGRES
users and two non-INGRES.  Number of COM processes can average 2 or
more with 30% idle time.  With good overlapping this again means that
the next pending INGRES job (process that goes from LEF to COM) will
have processor available.  VMS priority promotion favors that job over
the more recently served and higher computable ones.  Thus no problem,
but anytime INGRES processes get more than half the used processor
time, the metric falsely predicts a problem.

Alpha error points out a deeper problem with this analysis: the metric
complains about bottlenecks hurting INGRES when INGRES processes are
getting the best of things!  For an extreme example, consider what
happens if we modify priorities so that INGRES processes always
pre-empt the non-INGRES.  Now let INGRES loads increase to approach
100% use of the processor.  The other jobs remain COMputable, in fact
they'll wait for processor share indefinitely.  It's trivial to put 10
or more other jobs into the run queue, they just pile up waiting for
processor because they're not waiting for memory or i/o.  The metric
now says

	processor time used by INGRES	  number of COM processes
	-----------------------------  *  -----------------------
		total clock time	  number of samples


	   n seconds			   ~11 * (interval / seconds)
=	----------------		*  --------------------------
	n + delta seconds		   (interval / seconds)

which, for small delta (as INGRES loads increase to 100%),

=	11 >> 1

The metric predicts bottlenecks, imposed by "single server
architecture", where non-INGRES jobs get an unfair boost over INGRES.
In point of fact it's exactly the other way around, we've set it up so
that the INGRES jobs are beating the stuffing out of the other jobs,
but the metric doesn't know this.  Of course, all metrics have
contexts, and we could say this is outside of the context of
usefulness of this metric.  If we went into this further, we'd
probably agree that part of the context is that the other jobs have to
proceed too.  That makes us wonder if the context isn't getting and
giving fair share of shared use of uniprocessors.

So my point is, yes you have a bottleneck, it's just not one of INGRES
competing at a disadvantage with other jobs, as you suggest.  If
you're compute bound on a single shared processor yourr bottleneck is
that processor.  There are different methods of attaining higher
performance, but multiple servers isn't one of them.  Greg Pavlov
points out one solution: move INGRES to dedicated resources.  Another
solution is to increase the shared resources, such as more or faster
processors.  Software can't work miracles and create more resources.
All it can do is ration existing resources more fairly, flexibly,
optimally, or closely in accordance with local policies and needs.

>most of Mr. Krueger's posting is a smokescreen. He asks for hard data, then
>himself proceeds with a thouroughly general, theorectical treatment of
>scheduling problems. Most of what he says applies equally to all DBMS
>systems, Ingres 5.0 as well as 6.1.

Thanks, I couldn't ask for a better review.  It was my intention to
discuss a more general set of issues.  Since I'm not making any
specific claims, hard data from specific systems would not be
appropriate.  You, on the other hand, are: do you have any?

>My point is that in this respect,
>Ingres 6.1 can provide inferior performance to Ingres 5.0.

And your point is well taken.  Okay, it can.  Does it?  How often?
How inferior?  If you want to be helpful, please tell us not only the
what and where of bottlenecks but also when and how much.

-- Jon
--