Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 (Fortune) 6/7/84; site dmsd.UUCP Path: utzoo!dciem!nrcaero!pesnta!hplabs!hpda!dmsd!bass From: bass@dmsd.UUCP (John Bass) Newsgroups: net.unix-wizards Subject: Re: Is 4.2BSD a failure? Message-ID: <158@dmsd.UUCP> Date: Sat, 26-Jan-85 06:11:19 EST Article-I.D.: dmsd.158 Posted: Sat Jan 26 06:11:19 1985 Date-Received: Mon, 28-Jan-85 02:15:29 EST References: <7552@brl-tgr.UUCP> Lines: 143 Is 4.x a failure ... NO IS 4.x a high performance system ... MAYBE YES and MAYBE NO People forget that 4.1 and 4.2 were paid for and tuned for AI Vision and CAD/CAM projects sponsored by ARPA and various compainies. For the job mix that 4.x systems are tuned for it is the ONLY way to do those jobs on a UNIX machine with any cost effectiveness. Many tradeoffs that go into 4.x systems are directly counter the best ways to tune UNIX systems for development environments ... but they were the only way to make things work for the target applications. The converse is also true to a large extent ... V7/SIII/S5 kernels don't handle large applications well or at all --- try running a 6mb application on an older bell system with swapping .... it takes many seconds for a single swap in/out. But exactly the same problem arises out of blindly using a paging system for the typical mail/news/editor/compiler type support or development machine. In this environment the typical job size is 5-50 pages with an 75-95% working set ... when the total working set for the virtual address space approaches the real memory size on a 4.x system running these small jobs the system goes unstable causing any program doing disk I/O loose one or more critical pages from it's working set while it waits for its disk i/o (either requested via a system call ... or just from page faulting). The result is a step degradation in system throughput and an interesting non-linear load curve with LOTS of hysterisis AND a sharp load limit at about 150-250% of real memory. In comparison a swap based system linearly degrades smoothly at 1/#users for most systems ... up to a limit. On Swap based systems if the swap in + swap out time excedes the scheduling quantum (several seconds on most unix systems) then even a swap based system can trash and show a simular step degradation, non-linear load curve, hysterisis, and the load limit. This was evident on ONYX because the burst disk thruput was limited by the z80 controller to a 3:1 interleave or about 180kb/sec .... memory was relatively cheap compared to fast disks in 1980 so we sold lots of memory. This was evident on the Fortune VAX running 4.1 after several months of intensive load analysis tracking load factor results and instrumenting the disk subsystem. 4.1's favorite trick is to have a step increase in load factors from between 1-4 to 10-20 with little time any where in between. On the Fortune Vax this was caused by an interaction between paging and filesystem traffic on the root spindle when the average service time in the disk queue exceded the memory reapers quantum. A careful policy of relayout of the filesystems and regular dump/restore of filesystems to keep them sequential and optimally packed kept teh filesystem (read disk subsystem) thruput high enough the step degadation (step increase in load factors) would not occur and we then seldom saw load factors of 10-20, and only then with a linear rise in load. I have seen the same problem on most other 4.1 systems ... particularly those with a single spindle and small memory configurations (less than 2mb). Most vax systems run 35-50 transactions per second average to the entire disk subsystem ... a swap system handling a 40k process will typically take one/two transactions ... a paging system 40 or more depending on the thrashing level. The working set theory CORRECTLY predicts such poor behavior for such small programs with large percentages of active pages. If it is required to run several very large images (CAD/CAM, vision or other high res graphical application) with 2-8 mbyte arrays ... then the working set theory combined with processor speed/memory size predictors make paging a clear choice. Much of the speed difference of 4.1 over v7 and SIII/S5 was simply the 1k filesystem. For older PDP11's the per block processing time for most 512 byte sectors was several times the transaction period .... IE ... it took several 6-10 milliseconds of cpu time to digest a block which was tradedoff against filesystem thruput and memory constraints of 256kb max system size. The advent of much faster processors and much larger system memory made using 1k blocks necessary and practical where your system didn't have the cycles or space before. For those of us mothering 11/45's in the 70's this was a very difficult tradeoff ... we had kernels of about 70kb leaving less than 180kb to support 2-6 incore processes/users ... or in todays terms ... nor more than 2/3 happy vi users. increasing the filesystem size to 1k would increase memory overhead by 6-10k in the kernel and 2-4k in each process ... or ONE LESS VI DATA/STACK segment --- a major reduction in the number or incore jobs 30-50% and much more swapping and response time delays. Today with relatively cheap ram ... only the smallest systems need worry this problem ... and then a mix of swapping (for jobs less than 150-500kb) and paging (for jobs greater than 150-500kb) will make most of these problems go away. As for the 4.2 "fast filesystem" ... it was again tuned to make large file transaction run at an acceptable rate .... try to load/process a 4mb vision or cad/cam file at 30-50 1k block transactions per second -- it will run SLOW compared to a production system with contigous files. A number of tradeoffs were made to help large file I/O and improve the transaction rates on very loaded systems (LIKE ucb ernie ... the slowest UNIX system I have ever used .... even my 11/23 running on floppies was faster). But for most of us -- particularly us small machine types .. PDP11/23's, 11/73's, ONYX's, Fortune's, Tandy 16B's ... and a number of other commercial systems (including VAX 11/730, 11/750, and micro vaxs) which run 1-8 users ... the 4.2 filesystem is VERY SLOW and gets SLOWER much faster over time than a v7/4.1 filesystem. The tradeoff here is that "locality of reference" is much smaller and well defined on smaller systems ... on larger systems (like ernie) the disk queue has a large number of requests spread across the entire disk with a much broader locality of reference. The 4.2 filesystem attempts to remove certain bimodal or n-modal access patterns based on the FACT it doesn't much mater where the data is for reading ... but it is better to write it without generating a seek .... for systems with large disk request queues. This doesn't hold up on small systems where much of the time there is a single active reader using the disk subsystem. On the small system locality of reference is the entire key to throughput, thus randomly allocating files wherever is a great loss. I have spent most of my 10 years of UNIX systems hacking, porting and tuning on smaller systems. Other than CAD markets I don't see much use for paging systems, and as a result view 4.1/4.2 as only a hinderance due to the tendancy of some firms to put all the bells and toys into their system. This has been a disaster for several firms who got side tracked by Berkeley grads and hangers on. But in the big system markets ... particularly CAD/CAM, highres graphics, large multiuser system (30-200 users), and AI/Lisp markets 4.2 may be the only alternative ... it would be a mistake to drag the standard unix blindly down the 4.2 path ... 99.99% of the unix systems either delivered today or built in the next couple years would be hurt badly by it. It would make the number one alternative to UNIX on smaller systems ONLY MSDOS -- not such a bad system ... but lets keep it in its place too. I have a lot of interesting numbers and recomendations in performance areas ... I was going to give them in a talk at Dallas but they saw fit to cancel it after requireing a formal paper for the unplanned proceedings without any notice ... and then having a two page draft lost in the mail. I don't feel to bad about it since appearently 8 other speakers were also accepted and dropped because they couldn't get papers written and approved in the several day to 2week window. I hope that next time they put out a call for presentations USENIX lets people know in advance papers are required and don't change the rules in the 11th hour if they say they are not. Most of us can't write a GOOD 5-10 page paper with a 24 hour deadline which is basiclly what they asked of speakers this time -- other than those who had already done a paper for some reason. Good nite ... have fun John Bass Systems Consultant (408)996-0557