Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!cmcl2!rutgers!ames!amdcad!neptune!david From: david@neptune.AMD.COM Newsgroups: comp.arch Subject: Re: register windows Message-ID: <479@neptune.AMD.COM> Date: Tue, 10-Nov-87 10:57:34 EST Article-I.D.: neptune.479 Posted: Tue Nov 10 10:57:34 1987 Date-Received: Thu, 12-Nov-87 21:35:43 EST References: <230@usl-pc.UUCP> <6681@apple.UUCP> Sender: david@neptune.AMD.COM Reply-To: david@neptune.AMD.COM (David Witt) Organization: Advanced Micro Devices, Inc., Austin, Texas Lines: 37 In article <6681@apple.UUCP> bcase@apple.UUCP (Brian Case) writes: >Ok, so to address future speed advantages, yes there might be some speed >advantages for those with simple register files. However, for the Am29000, >the critical paths were quite balanced (Dave Witt, are you out there?) >with, I believe, the TLB and/or instruction cache being the limiting >factor. Next came the ALU, and then the register file. Unless you want >to do things like spread the ALU cost over two pipestages (possible to do), >I don't think the register file is going to be the limiting factor. well, since my friend bcase requested a response from me, on the 29k design the stack relative add was one of the speed paths encountered on the part, but certainly no worse that the 64-32 funnel shift or worst case alu adds, tlb translation or conditional jump and read from the branch target cache. Specifically for that particular path, in one half clock phase, the internal pipe was required to discharge the instruction bus and statically add the stack pointer to the a,b,c offset in three separate 7-bit adders. In parallel, a zero detect and a check on the msb of the a,b,c values determined the selection in a 3:1 multiplexor to enable the stack-relative local register, the global registers, or the indirect pointers. The output of the multiplexor was the address for the row/column decode for the 3-port register file which would be locally decoded and accessed in the next half clock phase for a double read. (the write is obviously delayed due to the internal pipe and therefore not a speed path). The total amount of gate delays for this path (including a small amount of lookahead for the adder) was 12 gates. In initial silicon at nominal temp this worst case path was passing in excess of 35mhz. In my opinion, in our internal pipe, it was not a major concern in terms of designing in this functionality.