Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!purdue!haven!vrdxhq!daitc!daitc.daitc.mil
From: jkrueger@daitc.daitc.mil (Jonathan Krueger)
Newsgroups: comp.databases
Subject: Re: Single Server Bottlenecks (was RE:ORACLE REL 6, INGRES REL 6)
Message-ID: <518@daitc.daitc.mil>
Date: 13 May 89 09:10:19 GMT
References: <3176@tank.uchicago.edu>
Sender: jkrueger@daitc.daitc.mil
Reply-To: jkrueger@daitc.daitc.mil (Jonathan Krueger)
Distribution: usa
Organization: DTIC Special Projects Office (DTIC-SPO), Alexandria VA
Lines: 198

cs_bob writes:

>it has become fairly common for use to se an Ingres server running
>on a VAX 8650 with only 4 or 5 Ingres users to become CPU bound while only
>getting 15-20% of the available CPU. Moreover, this happens when the Ingres
>users are competing with only 4 or 5 other processes for the processor.

Can you give us some DATA?  How common?  Under what conditions?  What
are the users doing?  What performance degradation is measured for the
individual user?  For system throughput?

>Imagine an small interactive, multi-user system where a typical mix of users
>includes 5 interactive CPU bound non-DBMS jobs and 10 Ingres users. 
>Under Ingres 5.0, each of the ten Ingres users had their own backend which
>competed with other processes for CPU. Thus, in the extreme case where all
>ten Ingres users become CPU bound, roughly 2/3 of the processor will be
>available to the backends (that is, 10 out of 15 CPU bound jobs will be Ingres
>backends). Under version 6.1, only 1/6 of the processor will be available,
>since there is only one backend.

Not that simple.  Each user has his own front end, too.  And processes
don't get time slices in proportion to their relative number on the
processor.  Has a lot to do with the scheduler and the other processes
behavior.  Also depends on memory management and i/o.  For instance,
it's common to have significant idle time (unused processor time) even
when system load is high (many jobs in the run queue at any given
instant, or many processes in COMputable state for you VMS users).
And database applications are biased toward interactive workloads,
where each user cycles think time==>input=>wait for the system
output==>look at output (think time again).  This means that 10
database users may be added before they "compete" for CPU in any real
sense.  And this is just scratching the surface: scheduling and
prioritization for mixed workloads is a hard problem in realtime
allocation of resources in more or less optimal ways.  For instance,
it's been shown that fairness must be traded off against optimality:
class schedulers versus round-robin schedulers are a case in point.
Another case in point is your observation:

>It is not feasible to raise the Ingres server to a higher priority, since
>large reports/sorts do consume large amounts of CPU and can starve interactive
>users.

Clearly, you can be more optimal or you can be more fair.  There are
also tradeoffs that meet some needs better than others.  But this is
not a problem specific to INGRES, all allocations of finite resources
suffer from this problem.  Consider for example how VMS sets
priorities for SWAPPER, OPCOM, JOB_CONTROL, or the simpler UNIX
solution of just requiring certain system code and data structures
always to reside in physical memory -- clearly this is neither a fair
nor optimal use of memory resources, it just happens that it seldom
makes a critical difference in overall fairness or performance.

>The only practical solution in this case is to start several backends,

No, there are several other solutions:

	If you want to support multiple fully runnable (COMputable)
	jobs without interference, you need a multiprocessor.  Buy one.
	Configure your servers as you find optimal for best throughput
	and fair for INGRES versus non-INGRES applications.

	If you want to support multiple fully runnable jobs but accept
	some interference, decide how much and how often, buy the
	minimum sized processor (and balanced config) to support this,
	and limit system load by limiting access by classic mechanisms
	such as limiting access, shifting usage to off-peak hours, etc.

	If you want to support different applications but they need
	not share a common system image, offload the compute intensive
	ones to cheaper systems (dedicated systems are always cheaper
	than shared and general purpose ones)

	You could implement (or buy) a class scheduler for VMS, which
	guarantees the INGRES server a certain percentage of the
	processor and more if available.  This prevents the "high
	priority" problem of the round robin scheduler: to wit, either
	INGRES or other highly computable processes starving the
	others indefinitely.

	There are others, these are just four examples.

>but there are a couple of problems with this approach (multiple servers):

Yes, for one you're just giving the scheduler more mouths to feed and
then expecting better or fairer treatment because more of the mouths
are those of your people.

>a) it cannot be done dynamically. That is, one cannot improve a bad situation
>post hoc by starting new servers, because the current DBMS jobs running under
>one server cannot be off-loaded to a new one.

As pointed out above, if all you have is a single processor, nothing
gets off-loaded anyway, you just increase the granularity of spreading
things thinner.  If you have multiple processors, you can use them for
multiple servers and let the system software do the offloading in a
transparent and flexible way.

Thus one doesn't improve the situation just by finding more mouths to
feed and dividing them up in ways that favor one group over another.
You need to ship some of those mouths over to where there really are
more resources, and if that's possible, why not allocate resources to
mouths in the first place?

>Even worse, in the true single
>server environment, RTI provides FAST COMMIT and GROUP COMMIT options which
>can only be disabled by taking down the server. If a server is running
>with these features, designed to dramatically improve OLTP in particular,
>no new servers can be started until it is taken down.

This is exactly the tradeoff of SYSGEN options under VMS: some are
dynamic and can be changed on running systems, some require a reboot.
The cost for making them all dynamic is higher cost development and
lower performance execution.  Clearly some proper subset should be
dynamic, we can argue about which should be members of that set.

But again it comes back to an invalid assumption that more mouths is a
way to get more resources or a good way to allocate existing
resources.  Look, consider the generic case of a single fully
computable non-INGRES process competing with a single fully computable
INGRES server, whether from one INGRES user's requests or a hundred.
They both sink to base priority under VMS priority promotion.  They
then compete.  Your point is that the one non-INGRES user gets half
the pie, and the remaining possibly one hundred divide up the other
half among them.  This is absolutely true.  This remains true as long
as neither process page faults, reads or writes to disk or other
devices, or sleeps (LEF state) pending user input.  This is simply
uncharacteristic of database queries: they constantly read from disk
and write to networks.  Every time they do they get priority promotion
over the other process.

>b) a corollary to this is that any multi-server configuration loses the
>advantages of FAST COMMIT (including VAX clusters with one server per node).

No, this is unrelated to your point.  How many mouths to feed has
nothing to do with the tradeoffs of removing computation bottlenecks
versus removing disk i/o bottlenecks.  In point of fact the VMS
scheduler only allocates processor time, not working sets or i/o queue
ordering.  You can play with priorities all you want and get no
advantage if you were i/o bound; in that case you need to work harder
or smarter on i/o.  Harder might be faster disks, such as the CDC
Wren.  Smarter might be fast commit.  If, however, you were compute
bound, priorities might be the answer, or other system management
tools and practices such as the ones I list above, including multiple
servers if you have multiple processors.

>An obvious solution to this would be to run the Ingres server at priority 5,
>but have it monitor its own usage vis a vis other processes in the system
>and periodically give up the processor if it is starving other processes.
>While obvious, this solution is not exactly simple, and at the present time
>the Ingres 6.1 server can very definitely become a system bottleneck.

You're about to re-invent the class scheduler without teeth, also
known as TSX-11.  It's dealt with above, I don't think anything need
be added here.  Instead, consider your use of terms: a "system
bottleneck" is a system resource that critically limits some
application or workload.  Therefore the server isn't a system
bottleneck, it's a something which bottlenecks affect.  In this case
the system bottleneck is schedulers don't know that some applications
serve more users than others, and thus allocate processor time via an
equally sized quantum.

In other words, a valid point related to the one you were making is
that servers pool the identities and thus quotas of individual
processes.  This is true, but again hardly unique to INGRES.  For
instance, memory managers, device drivers, and network processes all
serve multiple users without being able to charge back the costs of
each operation to the correct user served.  Multithreaded i/o allows
originating processes to queue multiple requests, which prevents bad
citizens from slowing down other processes just by filling up queue
slots, but other resources can still be unfairly and/or suboptimally
allocated.

For instance, consider VMS memory management: per-process quotas for
working sets were designed to prevent bad citizens from hurting anyone
but themselves.  This succeeds to the extent that other processes are
allocated the processor time while the bad citizen is waiting for its
pages to be faulted in.  But it fails when the paging causes disk i/o
whose seeks now compete with other processes' seeks.  Clearly the fair
thing to do is allocate equal seeks per user, but we can't do that
because we don't know how many users each seek represents.  Thus, just
as in the case you cite, the needs of the many may be forced to
compete on an equal basis with the needs of the few or the one.

This is a fact of life.  The cost of getting VMS to become more fair
about memory management, including its disk i/o ramifications, is more
complexity in the operating system, higher resulting cost to the user,
and poorer performance for the usual and expected case.  Sure, we
could add internal accounting to support per-user quotas on seeks, but
it isn't worth it, as far as we can tell at this time.  The cost of
getting the scheduler give some processes quotas proportional to the
number of users they serve may or may not be worth the increased
fairness, but this is a question to be settled by measurement.  Do you
have any data to support your contention that it's currently highly
suboptimal or unfair?   How suboptimal?  How unfair?  How often does
this come up?

-- Jon
--