Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!cmcl2!rutgers!bellcore!faline!hammond From: hammond@faline.bellcore.com (Rich A. Hammond) Newsgroups: comp.arch,comp.os.misc Subject: Re: Unix File System Performance Message-ID: <1397@faline.bellcore.com> Date: Wed, 16-Sep-87 15:36:19 EDT Article-I.D.: faline.1397 Posted: Wed Sep 16 15:36:19 1987 Date-Received: Sat, 19-Sep-87 02:10:59 EDT References: <1384@faline.bellcore.com> Reply-To: hammond@faline.UUCP (Rich A. Hammond) Organization: Bellcore MRE Lines: 96 Xref: mnetor comp.arch:2202 comp.os.misc:182 Martin@felix.UUCP (Martin McKendry) responsed to my comments on his original posting about file-system performance. First, I must apologize for the tone of my article, I didn't mean to offend Martin. >>> Martin claimed a high proportion of jobs were I/O bound. >>I asked: ... Where do you get the idea that a high proportion are I/O bound? >> >From extensive workload analysis. Try putting in a make. Look at >your CPU utilization. If its not 100%, you are waiting on I/O when you >could be processing. Depending on how much you like to wait on I/O, >you are I/O bound. This isn't realistic, given that disks have both a seek and rotational delay, the only way to get rid of ALL disk I/O time for a single job is to prefetch the data into main memory. Only if you can predict what file I want before I ask for it can you have 100% CPU utilization. If you can do that, you can make a lot of money in the stock market. :-) >To look at a single 780 is hardly representative of the world. Most >of the world's data processing is production commercial data processing. >We do image processing. Don't assume that your load is everyone's. I agree, but I thought we were talking about UNIX file system performance. >In a previous life, I worked with extensive analyses of commercial >customer workloads taken from real customers sites. Based on simulation >results and real benchmarks, we found that you could make changes >by large factors (2-5) in either direction in CPU performance >without seeing anything like the same change in throughput (total >time to run benchmark). Like a factor of 4 or 5 faster in CPU >for only a factor of 2 change in throughput. Idle time on the faster >CPU goes up as expected. This on batch processing >with no terminal I/O. If that's not I/O bound, I don't know what is. >Since CPU speed/$ is improving at a faster rate than the corresponding >figure for disk, I'd expect the class of problems for which this >occurs to increase. No terminal I/O - were these UNIX systems? I am quite willing to concede that UNIX and "real, commercial data processing" aren't the same. I'm not sure that we want them to become the same. I can show you UNIX systems where doubling the CPU speed doubles throughput. But neither of our anecdotes should be generalized to the whole world. Regarding my claims that larger main memory will help, Martin replies: >What if I want to support 400 users from one server, each of whom >wants 50Kb of data every 15 seconds. Or if I have to process/merge two >or three 60Mb data files? What if I don't want to ship 32 M on all >machines? I'll agree, those could benefit from more I/O bandwidth. But do they need to be done under UNIX? Wouldn't a dedicated OS to handle the disks and communications work better in the first case, even if the clients were UNIX systems? Merging files might better be left to IBM MVS? As for shipping 32 M on all systems, this is a tradeoff between your development time and the incremental memory cost * # systems shipped. If you only ship a few systems and the 32 M solves the problem adequately, you'd be better off sticking it in. Is the first one a real situation? Regarding my claim that improving the use of main memory for buffering would help Martin pointed out that he could do both that and disk layout. I said that improving the buffering would have a better payoff, so that's what I would look at first. It wasn't clear from Martin's note that he had already considered it. Regarding my asking for long term measurements of I/O demand not peak measurements Martin said: >Why not? Often its the peaks I want to handle. I can already handle the >regular loads. I'm looking for the largest average payoff, which I perceive as the regular loads. Working to alleviate the peaks may not gain you much if the result has little effect on regular loads. For example, if the regular load runs pretty much out of the in-core buffer pool and the only large amounts of I/O are the peaks, then you may not save your customer much per $ of development time to spend man-months or man-years reworking the I/O. Engineering is a tradeoff, I was saying that you have to know your work load and tailor your efforts to extract the greatest gain. Martin made a claim "that the vast majority of jobs were I/O bound" which I didn't think justified in the UNIX environment. His reply to my comments indicated that he wasn't thinking of the UNIX environment and that he had specific applications in mind. Fine, but I thought that we were interested in improving UNIX in general not Martin's product in particular. In that context I claimed that there are other things that might give a better payoff. Don't take this to mean that I'm against file system I/O improvements, I'll welcome any that come along. In summary, we were talking at cross purposes and I apologize for the resulting bad feelings. Rich Hammond hammond@bellcore