Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!olivea!uunet!mcsun!ukc!acorn!armltd!abaum From: abaum (Allen Baum) Newsgroups: comp.arch Subject: Re: endian etc Message-ID: <173@armltd.uucp> Date: 16 May 91 09:19:20 GMT References: <186@titccy.cc.titech.ac.jp> Sender: abaum@armltd.uucp Distribution: comp Organization: A.R.M. Ltd, Swaffham Bulbeck, Cambs, UK Lines: 36 In article <186@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: >In article <166@armltd.uucp> abaum (Allen Baum) writes: > >>As a sweeping generalization, the path from cache to registers/forwarding >>path is THE critical path (if it isn't, you've probably done the design >>wrong, or have a CISC architecture) > >No. THE critical path is on register/ALU loop, which determines the maximuum >clock speed (see Jouppi). Transfer between cache and a register involves >TLB look up and cache access requiring a little more time. Thus, if it >requires 2.5 (or 1.5 or 2.3 or...) clock cycles, it in not on the critical >path. Hmm, we have a difference of opinion here. In a sense, you can take as many cycles for each 'step' in the pipe as you like. In fact, the R4000 did just as you suggested: an ALU loop is a single cycle, and the cache access has 2 cycle latency, (but 1 cycle initiation, i.e. it's pipelined). They have stated that they worked real hard to make sure that their ALU cycle fit into one clock. They didn't have to do that, of course- it could have taken 2 as well (in which case they would be superpipelined in the now 'classical' sense of Jouppi). Some SPARC implementation have gone in the other direction: cache access is a single cycle, but they've merged the ALU+reg. writeback stages (or access & ALU stages, or both, I don't quite remember). In effect, they've lengthened the ALU cycle. So, another way of saying what I meant to say is: if both ALU and Cache access are going to fit into a cycle, the cache access will be the limiting case. Perhaps not true with a trivial cache, but true for sizes that are useful. Again, assuming that you intend everything to run in a single cycle, anything that you put into that path will therefore lengthen your minimum cycle time. Which was the point I was trying to make about sign extension.