Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!ucsd!ucbvax!PHY.DUKE.EDU!rgb From: rgb@PHY.DUKE.EDU ("Robert G. Brown") Newsgroups: comp.sys.sgi Subject: Processor efficiency Message-ID: <9006150334.AA03405@physics.phy.duke.edu> Date: 15 Jun 90 03:34:12 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 33 We have a Power Series 220S in our department as a compute server. It has 24 Mb of RAM, no graphics console, and two processors. My question is this: we have empirically observed that small jobs written in C or F77 for a single processor and optimized run at around 3.5 MFLOPS (as advertised). The problem is, that if one takes these jobs (typically a loop containing just one equation with a multiply, a divide, an add, and a subtract) and scales them up by making the loop set every element of a vector and increasing the size of the vector and the loop, there is a point (which I have not yet tried to precisely pinpoint) where the speed degrades substantially -- by more than a factor of two. This point is >>far<< short of saturating the available RAM, and seems independent of "normal" system load (which is usually carried by one processor when the other is running a numerical task like this). My current hypothesis is that this phenomenon is caused by saturation of some internal cache on the R3000. Has anyone else noticed or documented this? Is there a technical explanation that someone could post? Since we (of course) want to use the SG machine for fairly large jobs, it is important for us to learn about performance cutoffs in order to optimize performance. On the other hand, if there is something wrong with our SG-220, we'd like to learn that too... Thanks, Dr. Robert G. Brown System Administrator Duke University Physics Dept. Durham, NC 27706 (919)-684-8130 Fax (24hr) (919)-684-8101 rgb@phy.duke.edu rgb@physics.phy.duke.edu