Path: utzoo!attcan!utgpu!news-server.csri.toronto.edu!mailrus!uunet!aplcen!haven!udel!bunnell From: bunnell@udel.edu (H Timothy Bunnell) Newsgroups: comp.os.minix Subject: Performance Tuning Message-ID: <26628@nigel.ee.udel.edu> Date: 4 Aug 90 16:53:01 GMT Sender: usenet@ee.udel.edu Reply-To: bunnell@udel.edu () Organization: University of Delaware Lines: 81 Nntp-Posting-Host: dewey.udel.edu On my 16 MHz '286 system, Minix 1.5.10 performance is generally acceptable, especially if just one job is running at a time. However, with one CPU-intensive job running, the system runs other jobs, especially ones that are more I/O-intensive, in a balky, start-stop, fashion that seems unlike most of the other multi-tasking systems I've used (e.g., on old PDP-11/34 (TSX+) and Sun 386i systems). I wondered if this poor performance was due to the much-discussed FS bottleneck or if maybe a little system tuning could help. I remembered once seeing a comment (by Bruce Evans I think) to the effect that the default scheduler quantum time was too long for his (386) system. The following is a longish discussion of the effects of fiddling with the scheduler quantum time, and a request for others who have experimented with similar simple changes to improve performance to post their results or suggestions. In clock.c under Minix 1.5.10 MILLISEC is set to allow CPU-bound jobs to run for at least 100 msec before possibly being shoved to the bottom of the user queue (if anything else could run). I looked at the effects on performance of several (shorter) quantum values. Specifically, I looked at what happens when two jobs run simultaneously versus when each runs on an otherwise idle system. The two jobs were (a) doing a "cat *.c" in /usr/src/kernel (call this the I/O-job) and (b) training a neural network simulator (call this the CPU-job). Here are the results of timing how long a single pass of the I/O-job takes with or without simultaneously running the CPU-job (times are in seconds): system-loaded system-idle Quantum times(real/user/sys) times(real/user/sys) 0.100 85.0/0.0/6.7 15.0/0.1/8.3 0.050 45.0/0.0/6.0 16.0/0.2/8.3 0.036 33.0/0.0/5.9 15.0/0.2/8.2 0.018 30.0/0.0/5.4 16.0/0.1/8.3 The complementary case is how long a sample of the CPU-job takes with or without continuously running the I/O-job: system-loaded system-idle Quantum times(real/user/sys) times(real/user/sys) 0.100 37.0/30.4/0.2 30.0/28.7/0.2 0.050 43.0/32.0/0.9 30.0/28.7/0.2 0.036 49.0/33.9/1.9 30.0/28.8/0.2 0.018 50.0/34.1/4.0 30.0/28.9/0.3 Note that changing the quantum has little effect in either case when the system is otherwise idle. With long quanta the I/O-job showed a huge performance drop when the CPU-job was running simultaneously. The performance drop is less severe with smaller quanta. With 100 msec quanta, real time increased by a factor of 5.67, but with 18 msec quanta the increase was only a factor of 1.875. From the perspective of the CPU-job, the effects of changing quantum size are reversed; smaller quanta produce larger performance drops when the system is loaded with an I/O-job. However, the differences are not as great as they were from the perspective of the I/O-job. In the worst case (quantum = 18 msec) the execution time increased by a factor of 1.67 versus a best-case (quantum = 100) factor of 1.23. For my machine, changing the quantum from 100 msec to 18 msec (that's one 60 Hz clock tick) greatly improves performance of I/O-bound jobs while producing only a moderate degredation on CPU-bound job performance. With a slower machine the results might be different, the system might spend too much time switching jobs and not enough time running them. But on faster CPUs the overall performance improvement is real. It occurs because I/O bound jobs execute for a very short time before blocking at which point another process will run. If that other process is CPU-bound, the I/O job will then have to sit for a full quantum before it gets to run again, even if the I/O completes much sooner. By using a smaller time slice I/O-bound jobs get to run sooner after they unblock and will quickly block again giving the CPU to another process. From a subjective standpoint the effect of reducing quantum size is fairly important because I/O-intensive jobs tend to be the ones with which users are directly interacting. How much of the perceived slowness of Minix (from a more casual user's standpoint) is really due to small things like quantum size that can be adjusted quite easily? If anyone else has results of similar experiments, or has other ideas I would certainly like to hear about them or see them posted. In fact, a little tutorial on system tuning (things that do not require redesign of the operating system :-) from one of the real gurus would be just wonderful. -- Tim Bunnell