Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!purdue!haven!vrdxhq!daitc!daitc.daitc.mil From: jkrueger@daitc.daitc.mil (Jonathan Krueger) Newsgroups: comp.databases Subject: Re: Single Server Bottlenecks (was RE:ORACLE REL 6, INGRES REL 6) Message-ID: <518@daitc.daitc.mil> Date: 13 May 89 09:10:19 GMT References: <3176@tank.uchicago.edu> Sender: jkrueger@daitc.daitc.mil Reply-To: jkrueger@daitc.daitc.mil (Jonathan Krueger) Distribution: usa Organization: DTIC Special Projects Office (DTIC-SPO), Alexandria VA Lines: 198 cs_bob writes: >it has become fairly common for use to se an Ingres server running >on a VAX 8650 with only 4 or 5 Ingres users to become CPU bound while only >getting 15-20% of the available CPU. Moreover, this happens when the Ingres >users are competing with only 4 or 5 other processes for the processor. Can you give us some DATA? How common? Under what conditions? What are the users doing? What performance degradation is measured for the individual user? For system throughput? >Imagine an small interactive, multi-user system where a typical mix of users >includes 5 interactive CPU bound non-DBMS jobs and 10 Ingres users. >Under Ingres 5.0, each of the ten Ingres users had their own backend which >competed with other processes for CPU. Thus, in the extreme case where all >ten Ingres users become CPU bound, roughly 2/3 of the processor will be >available to the backends (that is, 10 out of 15 CPU bound jobs will be Ingres >backends). Under version 6.1, only 1/6 of the processor will be available, >since there is only one backend. Not that simple. Each user has his own front end, too. And processes don't get time slices in proportion to their relative number on the processor. Has a lot to do with the scheduler and the other processes behavior. Also depends on memory management and i/o. For instance, it's common to have significant idle time (unused processor time) even when system load is high (many jobs in the run queue at any given instant, or many processes in COMputable state for you VMS users). And database applications are biased toward interactive workloads, where each user cycles think time==>input=>wait for the system output==>look at output (think time again). This means that 10 database users may be added before they "compete" for CPU in any real sense. And this is just scratching the surface: scheduling and prioritization for mixed workloads is a hard problem in realtime allocation of resources in more or less optimal ways. For instance, it's been shown that fairness must be traded off against optimality: class schedulers versus round-robin schedulers are a case in point. Another case in point is your observation: >It is not feasible to raise the Ingres server to a higher priority, since >large reports/sorts do consume large amounts of CPU and can starve interactive >users. Clearly, you can be more optimal or you can be more fair. There are also tradeoffs that meet some needs better than others. But this is not a problem specific to INGRES, all allocations of finite resources suffer from this problem. Consider for example how VMS sets priorities for SWAPPER, OPCOM, JOB_CONTROL, or the simpler UNIX solution of just requiring certain system code and data structures always to reside in physical memory -- clearly this is neither a fair nor optimal use of memory resources, it just happens that it seldom makes a critical difference in overall fairness or performance. >The only practical solution in this case is to start several backends, No, there are several other solutions: If you want to support multiple fully runnable (COMputable) jobs without interference, you need a multiprocessor. Buy one. Configure your servers as you find optimal for best throughput and fair for INGRES versus non-INGRES applications. If you want to support multiple fully runnable jobs but accept some interference, decide how much and how often, buy the minimum sized processor (and balanced config) to support this, and limit system load by limiting access by classic mechanisms such as limiting access, shifting usage to off-peak hours, etc. If you want to support different applications but they need not share a common system image, offload the compute intensive ones to cheaper systems (dedicated systems are always cheaper than shared and general purpose ones) You could implement (or buy) a class scheduler for VMS, which guarantees the INGRES server a certain percentage of the processor and more if available. This prevents the "high priority" problem of the round robin scheduler: to wit, either INGRES or other highly computable processes starving the others indefinitely. There are others, these are just four examples. >but there are a couple of problems with this approach (multiple servers): Yes, for one you're just giving the scheduler more mouths to feed and then expecting better or fairer treatment because more of the mouths are those of your people. >a) it cannot be done dynamically. That is, one cannot improve a bad situation >post hoc by starting new servers, because the current DBMS jobs running under >one server cannot be off-loaded to a new one. As pointed out above, if all you have is a single processor, nothing gets off-loaded anyway, you just increase the granularity of spreading things thinner. If you have multiple processors, you can use them for multiple servers and let the system software do the offloading in a transparent and flexible way. Thus one doesn't improve the situation just by finding more mouths to feed and dividing them up in ways that favor one group over another. You need to ship some of those mouths over to where there really are more resources, and if that's possible, why not allocate resources to mouths in the first place? >Even worse, in the true single >server environment, RTI provides FAST COMMIT and GROUP COMMIT options which >can only be disabled by taking down the server. If a server is running >with these features, designed to dramatically improve OLTP in particular, >no new servers can be started until it is taken down. This is exactly the tradeoff of SYSGEN options under VMS: some are dynamic and can be changed on running systems, some require a reboot. The cost for making them all dynamic is higher cost development and lower performance execution. Clearly some proper subset should be dynamic, we can argue about which should be members of that set. But again it comes back to an invalid assumption that more mouths is a way to get more resources or a good way to allocate existing resources. Look, consider the generic case of a single fully computable non-INGRES process competing with a single fully computable INGRES server, whether from one INGRES user's requests or a hundred. They both sink to base priority under VMS priority promotion. They then compete. Your point is that the one non-INGRES user gets half the pie, and the remaining possibly one hundred divide up the other half among them. This is absolutely true. This remains true as long as neither process page faults, reads or writes to disk or other devices, or sleeps (LEF state) pending user input. This is simply uncharacteristic of database queries: they constantly read from disk and write to networks. Every time they do they get priority promotion over the other process. >b) a corollary to this is that any multi-server configuration loses the >advantages of FAST COMMIT (including VAX clusters with one server per node). No, this is unrelated to your point. How many mouths to feed has nothing to do with the tradeoffs of removing computation bottlenecks versus removing disk i/o bottlenecks. In point of fact the VMS scheduler only allocates processor time, not working sets or i/o queue ordering. You can play with priorities all you want and get no advantage if you were i/o bound; in that case you need to work harder or smarter on i/o. Harder might be faster disks, such as the CDC Wren. Smarter might be fast commit. If, however, you were compute bound, priorities might be the answer, or other system management tools and practices such as the ones I list above, including multiple servers if you have multiple processors. >An obvious solution to this would be to run the Ingres server at priority 5, >but have it monitor its own usage vis a vis other processes in the system >and periodically give up the processor if it is starving other processes. >While obvious, this solution is not exactly simple, and at the present time >the Ingres 6.1 server can very definitely become a system bottleneck. You're about to re-invent the class scheduler without teeth, also known as TSX-11. It's dealt with above, I don't think anything need be added here. Instead, consider your use of terms: a "system bottleneck" is a system resource that critically limits some application or workload. Therefore the server isn't a system bottleneck, it's a something which bottlenecks affect. In this case the system bottleneck is schedulers don't know that some applications serve more users than others, and thus allocate processor time via an equally sized quantum. In other words, a valid point related to the one you were making is that servers pool the identities and thus quotas of individual processes. This is true, but again hardly unique to INGRES. For instance, memory managers, device drivers, and network processes all serve multiple users without being able to charge back the costs of each operation to the correct user served. Multithreaded i/o allows originating processes to queue multiple requests, which prevents bad citizens from slowing down other processes just by filling up queue slots, but other resources can still be unfairly and/or suboptimally allocated. For instance, consider VMS memory management: per-process quotas for working sets were designed to prevent bad citizens from hurting anyone but themselves. This succeeds to the extent that other processes are allocated the processor time while the bad citizen is waiting for its pages to be faulted in. But it fails when the paging causes disk i/o whose seeks now compete with other processes' seeks. Clearly the fair thing to do is allocate equal seeks per user, but we can't do that because we don't know how many users each seek represents. Thus, just as in the case you cite, the needs of the many may be forced to compete on an equal basis with the needs of the few or the one. This is a fact of life. The cost of getting VMS to become more fair about memory management, including its disk i/o ramifications, is more complexity in the operating system, higher resulting cost to the user, and poorer performance for the usual and expected case. Sure, we could add internal accounting to support per-user quotas on seeks, but it isn't worth it, as far as we can tell at this time. The cost of getting the scheduler give some processes quotas proportional to the number of users they serve may or may not be worth the increased fairness, but this is a question to be settled by measurement. Do you have any data to support your contention that it's currently highly suboptimal or unfair? How suboptimal? How unfair? How often does this come up? -- Jon --