Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!cs.utexas.edu!sun-barr!ccut!wnoc-tyo-news!cs.titech!titccy.cc.titech!necom830!mohta From: mohta@necom830.cc.titech.ac.jp (Masataka Ohta) Newsgroups: comp.arch Subject: Re: endian etc Message-ID: <186@titccy.cc.titech.ac.jp> Date: 14 May 91 14:16:59 GMT References: <3407@spim.mips.COM> <166@armltd.uucp> Sender: news@titccy.cc.titech.ac.jp Distribution: comp Organization: Tokyo Institute of Technology Lines: 41 In article <166@armltd.uucp> abaum (Allen Baum) writes: >I've thought about this a bit, and I'm not sure its true. All the byte lane >switching hardware already exists for the low order byte; Yes. >The upper halfword >doesn't get mucked with at all, except for getting sign extended or cleared. And, thus, multiplexers already exist for the high order bytes. >So, the critical path already exists. The muxing isn't terribly symmetrical- >all the work goes into the LS byte, almost none into the MS byte. That means >the layout probably has holes in it, which could be filled with the rest of >the byte lane logic, at (here I'm speculating some) no extra cost in space, or >time. The possible problem is that, if things going to 64 bit, it will become a little more complex. >By the way, note that while the buffering, etc. can go on in parallel, if >sign extension is required, it can't - you can have muxes set up for sign >selection, but you have to wait until it gets their, buffer it so it can >drive 24 loads, and then stick it into all those upper bit positions. With 32 bit word, 4 to 1 MUX is enough for the upper bit posisions, including sign-extended/non-sign-extended half-word/byte loading, which is not so different from the current 3 to 1 MUXing. >As a sweeping generalization, the path from cache to registers/forwarding >path is THE critical path (if it isn't, you've probably done the design >wrong, or have a CISC architecture) No. THE critical path is on register/ALU loop, which determines the maximuum clock speed (see Jouppi). Transfer between cache and a register involves TLB look up and cache access requiring a little more time. Thus, if it requires 2.5 (or 1.5 or 2.3 or...) clock cycles, it in not on the critical path. Masataka Ohta