Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!lll-winken!uunet!portal!cup.portal.com!bcase From: bcase@cup.portal.com (Brian bcase Case) Newsgroups: comp.arch Subject: Re: EXACTLY what is Superscalar? Message-ID: <16155@cup.portal.com> Date: 23 Mar 89 19:12:13 GMT References: <37196@bbn.COM> <1989Mar16.190043.23227@utzoo.uucp> <24889@amdcad.AMD.COM> <355@bnr-fos.UUCP> <27600@apple.Apple.COM> <16080@cup.portal.com> <22975@ames.arc.nasa.gov> Organization: The Portal System (TM) Lines: 28 >For quite a while, I have heard superscalar used, and I think the term was >defined in a paper in IEEE Computer a while back, but I am still a little >fuzzy on it. Is "superscalar" an exact concept, or is it a buzzword like >"RISC"? Is a Multiflw machine a superscalar machine, or the i860, or >the Weitek XL-8064? Well, this is a good question. Since I have been using the "buzz word" superscalar, maybe I should give the definition I use. To me, superscalar is simply an implementation that executes multiple instructions per cycle (at least for a RISC architecture) when dependencies permit. It accomplishes this multiple-instruction-per-cycle rate *without any help from the instruction stream itself.* That is, take an instruction stream that executes just fine on the 29000; if the same instruction stream were presented to the S-29000, the superscalar 29000, more than one instruction would be executed per cycle when dependencies permit. To accomplish this to any reasonable degree, two or more (nearly?) identical pipelines must be present (I think). Note that this is significantly different from VLIW or the i860. For these implementations, multiple operations can execute in one cycle, but that is because the instruction says, in an explicit way, to do so. Said another way, these machines will not execute multiple operations per cycle *unless* the instruction stream says to do so. A superscalar machine needs no such help. *However*, to squeeze the most from a superscalar design, one would like to have the compiler arrange things so that dependencies are minimized. *But note*, even the compiler-arranged instruction stream will still execute just fine on a non-superscalar implementation of the archticture.