Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!sun-barr!sun!chiba!khb From: khb%chiba@Sun.COM (Keith Bierman - SPD Languages Marketing -- MTS) Newsgroups: comp.arch Subject: Re: Register Scoreboarding Message-ID: <104999@sun.Eng.Sun.COM> Date: 16 May 89 04:48:34 GMT References: <24821@lll-winken.LLNL.GOV> <3288@orca.WV.TEK.COM> <19463@winchester.mips.COM> <170@dg.dg.com> <19661@winchester.mips.COM> Sender: news@sun.Eng.Sun.COM Reply-To: khb@sun.UUCP (Keith Bierman - SPD Languages Marketing -- MTS) Organization: Sun Microsystems, Mountain View Lines: 74 In article <19661@winchester.mips.COM> mash@mips.COM (John Mashey) writes: > >ASSERTION 3: well, if interlocking works after all, then scoreboarding >is better for performance reasons. > 3A: in supercomputers > 3B: in current microprocessors >------ .... >ASSERTION 3 > 3A: supercomputers > Probably true ??? I can't think of a supercomputer which _does_ use scoreboarding, at least not as I understand the term. Supercompuers tend to not have data caches... instead they have very fast high bandwidth memory systems (using many banks of memory) and rely on the fact that real scientific programs tend to have well behaved, in some sense, (i.e. sort of vectorizable) key loops. The fact that these are typically FORTRAN machines, alters the program statistics somewhat. As has been demonstrated by many vendors, lots of registers, software and sometimes hardware pipelines, loop transformations (percolation scheduling, etc.) and other unnatural acts are key to getting good performance. Note that these remarks apply to machines like the CDC6600, the various Crays, misc. Japanese vector machines, the Cydra 5 and the Multiflow machines (which is why I said "sort of vectorizable"). These machines typically have interlocks; but nothing like the scoreboard scheme of the 88K (unless my memory is very leaky this week). > 3B: current microprocessors > Seems unlikelym until they start having memory systems > like 3A. Scoreboarding probably works; but there seems to be a certain lack of evidence that it is necessary. Seems overly complex to me .... but what do I know... I studied math and grew up working in Kalman filtering applications ... :> > > >Like both 88K and MIPS, SPARC is defined to allow different-latency >FP implementations, and in fact, 3 different ones are already extant. >Perhaps the SPARC guys would care to join the fun and talk about >differences in latencies, overlap, etc. [If you haven't noticed it, >SUn-4s recent got the FPU2 that raised FP performance in the same >systems.] I don't know what to say... other than that it works just fine. All the binaries I tested (and it was more than a few) worked up and down the line. It is true that it is possible to tickle the compiler into generating code which is better for one chip or another and this is likely to continue. There will always be a compatible mode (which most people will use exclusively) which will have good performance for the most popular implementations and different implementations will have some special code ordering/other implementor neat stuff ... As John pointed out in an ealier posting, SUN has chosen to give implementors more leeway (the N-design teams notion), and folks at Prisma probably will have lots of neat stories to tell around Jan 1990 (4nsec SPARC ... design goal of 100Mflops) about how well SPARC really scales, and how the cleverly picked interlocks were just right (or caused them endless grief :>). Keith H. Bierman |*My thoughts are my own. Only my work belongs to Sun* It's Not My Fault | Marketing Technical Specialist ! kbierman@sun.com I Voted for Bill & | Languages and Performance Tools. Opus (* strange as it may seem, I do more engineering now *)