Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!pt.cs.cmu.edu!rochester!udel!nigel.ee.udel.edu!mccalpin From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin) Newsgroups: comp.arch Subject: Re: LINPACK 1000x1000 MFLOPS per $$$ Message-ID: Date: 21 Jul 90 02:02:34 GMT References: <2349@crdos1.crd.ge.COM> <37683@ucbvax.BERKELEY.EDU> Sender: usenet@ee.udel.EDU Organization: College of Marine Studies, U. Del. Lines: 84 In-reply-to: elm@sprite.Berkeley.EDU's message of 20 Jul 90 22:53:15 GMT In article , mccalpin@perelandra.cms.udel.edu (John D. McCalpin) I wrote: > [on the topic of KILLER MICROS] > % (1) The cost must be within the available budget. > % This includes the cost of porting the code as well. In article <37683@ucbvax.BERKELEY.EDU> elm@sprite.Berkeley.EDU (ethan miller) asks: > Is it any harder to port code to the Cray than to other machines? How > about other supercomputers, such as Convex? There will certainly be > porting costs, but I don't think they'll be much worse for a supercomputer > than for any other computer. Please correct me if I'm wrong on this, > though. Actually, this was a veiled reference to another "Killer Micro", the Intel i860, which is already notorious for the considerable programming effort required to obtain good performance (relative to its own theoretical peak performance). The i860 system benchmarks that I have seen to date suggest that at 33 MHz it is slower on compiled Fortran than the IBM 320. These configurations are also more expensive than the IBM. > % (2) The wall-clock turnaround must be within the limits > % of the research project. > > If you suffer a 25 to 1 slowdown of CPU time, that will change turnaround > times from overnight to one month. That's a big difference. I have already addressed this at some length in other postings. Of course, if you gave me a whole Cray I could get my answers faster. But in the real world, lots of people share these big machines and most users receive allocations that come out on the order of one cpu hour per day. So it is the *total productivity* summed over some months that is the appropriate metric for this part of the problem. With respect to this metric, a machine which is 1/25 as fast as the Cray, but dedicated to my job, is competitive. > [my comments on running the equivalent of a 200 hour Cray job > on an IBM 320 deleted] > Really? That's 5000 hours on a PowerStation (using the 25/1 ratio from > the table). That's about 200 days, assuming you use every single CPU > cycle on the machine. Since you'll be doing some I/O, though, I'd be > surprised to see better than 50-75% utilization, which brings total > running time to close to a year. Granted, it's cheaper than Cray time, > but is it practical to wait a year for a single simulation to finish? 200 days is a good starting estimate. I actually get more like 95% cpu utilization on this job, so figure 7 months calendar time. The interesting aspect is that the whole process of proposal review and funding takes about 6 months from the time the proposal is submitted to the time that the money is available. If you include the time spent actually writing the proposal, then 7 months on the calendar is not at all unreasonable an estimate. Then once the proposal is funded, I have to make another proposal to the NSF supercomputer centers and once that is funded I have to fight the queues to get my job through. Each of these phases will take several months as well. So call it 13 months from start to finish --- assuming that everything goes well and that proposals (remember that there are two of them now) do not have to get re-written, etc. In light of this, waiting 7 months for a dedicated server to finish is not so silly as it first sounded. Another point is that the time on the Cray has a real world cost of several hundred dollars per hour, split sort of evenly between depreciation and operations/maintenance/utilities. At $500/hour, the $200 hour Cray calculation is costing us taxpayers about $100,000 --- not counting my salary while I write the proposals, the time spent by the mail reviewers and the panels reviewing the proposals, the salaries of the administrators and paper-pushers, etc, etc, etc.... This is compared to about the use of about $20,000 of hardware for about 1/4 of its useful life span. And as a final point, recall that the codes in question are fairly well optimized for the Cray, running at >120 MFLOPS sustained speeds. The ratio of Cray Y/MP to IBM 320 performance is much poorer than 25:1 for less well-vectorized codes (my example from the earlier posting was 3:1 for another of my FP-intensive applications with a scalar bottleneck). -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@vax1.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET