Path: utzoo!mnetor!tmsoft!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!uunet!telesoft!choll From: choll@telesoft.com (Chris Holl @adonna) Newsgroups: comp.benchmarks Subject: Re: bc benchmark [sigh] Summary: QBM - Boeing Capacity Benchmark. Message-ID: <1148@telesoft.com> Date: 31 Dec 90 20:44:35 GMT References: <44342@mips.mips.COM> <15379@ogicse.ogi.edu> <1142@telesoft.com> <44371@mips.mips.COM> Organization: TeleSoft, San Diego, CA. Lines: 170 >From: borasky@ogicse.ogi.edu (M. Edward Borasky) > I was hoping for a response like this. There are two types of computer > users -- those like you and me who realize that computing costs money, > is a resource that must and can be managed, and those like students, > computer science faculty, dreamer/architects who think that computing > should be, can be and often is essentially free. Boeing Computer Services occasionally received criticism of their "high" bills for computing services. In response they produced a paper called "The Real Cost of Computing" to educate their customers. It described many of the things you mentioned including support of the hardware, software, configuration, backups, etc. I don't know if the paper is available, but I could find out if there is interest. > I'll bet that the COST difference between the two machines was such > that you could afford to give away the extra speed on the X-MP from > the hardware scatter/gather -- bill the X-MP as if it were strictly > 1.32 times the Cray 1. That's exactly what we wound up doing. We couldn't justify a larger figure because if a job ran on the X that didn't use scatter/gather it would get a higher bill than it would have on the S, and that wouldn't do. Using 1.32 a job that used scatter/gather simply got a better deal on the X. We made a point of telling users this (so they got the proper perspective :-) and to encourage them to take advantage of the hardware. If their jobs became more efficient, throughput would go up. The algorithm for billing was slightly different however. The 1-S had a single processor and 2 Meg while the X-MP had 2 processors and 4 Meg. We billed for CPU seconds and memory residency. When a job started using more than 2 Meg it started to pay a percentage of the other processor, even if it wasn't using it. If a job used the entire 4 Meg it payed for both processors, because no one else could use the other processor without occuping memory. BCS took the 1-S out of the configuration after users had migrated to the X, so after the overlap period everyone got a better deal. Now they have a Y-MP. I wasn't there for that transition, so I don't know exactly how they managed it. > You just said the secret word -- "capacity planning"! I wish the duck > were still around to drop down and give you fifty dollars! Duck? $50? > >(Dr. Howard "Doc" Schemising - wonderful guy) developed a capacity > >test that would precisely model the current workload on a machine. > >This was used with great accuracy to measure the capacity of different > >machine (for that workload). > > Is this published? Could you post it? The guys here and in "comp.arch" > would LOVE to see it! In article <44371@mips.mips.COM>, mash@mips.COM (John Mashey) writes: > Yes, it would be good to see this. Note the important fact that there are > two steps: > a) Characterizing the workload > b) Predicting the performance on that workload > Also, I've seen some pretty good benchmarks, with workloads tailored to > different departments within a company ... unfortunately, the best ones > I've seen were all proprietary... Yes, step a) is critical. And yes, Doc Schmeising's benchmark is proprietary. (Sorry I typo-ed his name wrong the first time.) I have had a few requests for more information on this capacity benchmark, and I don't think the following violates any of BCS' rights. Doc Schmeising's benchmark was called QBM (Quick BenchMark) and is owned by Boeing Computer Services (BCS). There was talk at one time of marketing it, but they never did. A shame, because it is a great tool. Doc has retired and I'm not there any more and I don't even know if they are still using it. The basic premise is to take a "slice" of your system's workload that runs in some fixed period of time (we used 10 minutes - Quick) and dump it into another system to see how long it takes. If the work can't complete in 10 minutes, the target system has less throughput than the base system. If it can complete in 10 minutes the target system has equal or greater throughput. The tricky part is to quantify throughput, and this was QBM's real strength. BCS collected data on a variety of resources used by jobs. This included thing like . CPU seconds burned . Amount of memory used . Duration of memory residency . Disk blocks transfered . Disk accesses and a few others. This data was stored on tapes and went back years. It was collected for CDC Cybers and Crays (the two main workhorses at BCS). You, as the performance guru and benchmarker, would have to select a 10 minute window where your machine was "full." Full does not mean the system was on its knees and response time was horrible. It means processing a reasonable workload with acceptable response time, good CPU usage, no thrashing, etc. I would pick a period of a busy day; say, a busy hour, or 2 hours, or 30 minutes, or whatever, and feed it to QBM. QBM would filter out noise, and select 10 minute samples from the time period. It would select as many as you asked for (since the 10 minute slices could start on any fraction of a second) and print out a variety of statistics about the sample. A GREAT DEAL OF CARE WAS TAKEN TO PICK GOOD DATA. A lot of accounting and performance data was reviewed, and eventually one was picked and that was called your baseline. That represented the "full" capacity of your base machine. This selection process was done once or twice a year. Only when you felt the work profile (type of work being done) had changed enough to affect your results. This would happen. For example, when users migrated from a Cyber to a Cray eventually they would start writing code to take better advantage of vectorization. QBM would then take the data from that sample and produce a synthetic workload consisting of the same number of jobs, starting and stoping at the same times during the 10 minute window (very important) and using the same resources. Some assumptions were made about how the CPU seconds were used (matrix reductions? straight line code? etc.) and how they were distributed during the job. You can't use all the CPU seconds and then do all the I/O. On the other hand, an even distribution is not representitave either. A variety of CPU kernals were used (the 10 to 14 I mentioned in my first posting) and given different weightings dependent on what we thought our machines were being used for. Now here's the real beauty of QBM: Not only could it create a workload that was the same as your real-life sample, it could create a workload that was 1.5 times that sample. Or 5 times, or 0.5 times. You could now create any multiple of that workload. A lot of testing went into QBM to be sure that the jobs it created would actually run in exactly 10 minutes on the base machine. It is much to Doc's credit that they did. A QBM workload of 1.0 ran in 10 minutes. If 1.1 ran in 10 minutes, then the sample you picked wasn't really when your machine was full. After some experience with it, we had samples were 0.9 would complete in 10 minutes, 1.0 would also, but 1.1 would not. We would then take these samples to a new machine and scale the load up or down until the jobs JUST ran in 10 minutes. In this way we could report that "For our current workload AND configuration, the X-MP will provide 3.85 times the throughput of the 1-S." (This was our actual result.) Of course the configurations of both base and target system had to be taken into account. This is significant also, because we could benchmark different configurations on the same machine. Suppose we had more channels? Fewer channels and faster disks? More disks? All these questions could be answered by setting up test configuations and measuring the throughput. No papers were written about QBM (unfortunately) althought I did present benchmarking results at a couple CUGs (Cray Users Group) and those papers are available. Hope this has helped, Chris. Christopher Holl TeleSoft 5959 Cornerstone Ct. W. San Diego, CA 92121 (619) 457-2700