Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!rex!wuarchive!udel!nigel.ee.udel.edu!mccalpin From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin) Newsgroups: comp.arch Subject: Re: LINPACK 1000x1000 MFLOPS per $$$ Message-ID: Date: 25 Jul 90 19:05:32 GMT References: <2349@crdos1.crd.ge.COM> <37683@ucbvax.BERKELEY.EDU> <8576@ur-cc.UUCP> Sender: usenet@ee.udel.EDU Organization: College of Marine Studies, U. Del. Lines: 72 In-reply-to: leadley@uhura.cc.rochester.edu's message of 25 Jul 90 15:51:20 GMT In article mccalpin@perelandra.cms.udel.edu (John D. McCalpin) I wrote about running a simulation on a "Killer Micro" for 7 months instead of waiting even longer for equivalent tiome to become available on a Cray. In response, In article <8576@ur-cc.UUCP> leadley@uhura.cc.rochester.edu (Scott Leadley) asks: > > What is the mean time between crashes on your PowerStation? What is > the mean time between power glitches? (Maybe a UPS would be a good > investment.) I agree that having a personal PowerStation is the most cost > effective per MFLOP for your purposes. However, I don't think that is the only > measure of worth in this debate. Here are some other measures of worth: > > - is the simulation amenable to checkpointing? This is important if > you have to worry about the mean time between failure of the > system as a whole. ==> Like most computational fluid dynamics codes, checkpointing is trivial. All that is required is writing out all the fields at the current and previous time levels, and then restarting using those fields. > - how much effort does it take on the part of the researcher to restart > from a checkpoint? This can be a significant time sink with > some programs. ==> It is trivial to make the saving of checkpoint/restart data automatic. It is not difficult to make a 'cron' entry that will automagically restart the code from the most recent complete checkpoint if it is not running. > - does the work require substantial support by a human? Mounting > and unmounting tapes, for example. ==> Archiving results to tape is necessary, but it with sufficient local disk, this can be done once every week or so. > - is the system hardware and software reliable when used for your > purposes? All systems require some administrative effort. > However, having just one user and a well-behaved task can > reduce this effort to close to nothing. ==> Since I already have a personal graphics workstation, the added administration required is minimal. > - does use of the system require interacting with other people? Having > to coordinate your work with other people is sometimes a a > hassle (depending on the people or the situation). Having to > conform to the procedures of a bureaucracy is always a hassle. ==> That's why it is so nice that the entry cost is only about $10,000. I buy it for myself and don't have to share it with anyone. Other "Killer Micros" that have been proposed as competitive with supercomputers have been *much* more expensive -- typically $50,000 or more for Stardent 3010, SGI 4D/2x0, or other systems. > - how easy is it to recover from a system failure? If the researcher > is not familar with the system and doesn't have access to the > services of a knowledgeable systems support person, system > failures (due to crashes, file corruption, inept system > administration, etc.) can be catastrophic. ==> Well, I am an experienced system administrator from my graduate student days, so I don't expect any trouble there. The IBM AIX filesystem is supposed to be much less corruptible than the standard System V or BSD filesystems, though I have not looked into that in detail yet. > I'm playing devil's advocate here, in a similar situation I'd make the > same choice you did (and probably quicker too since I am a fan of decentralized > computing). > -- > Scott Leadley - leadley@cc.rochester.edu -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@vax1.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET