Path: utzoo!mnetor!uunet!lll-winken!lll-tis!ames!hao!boulder!sunybcs!bingvaxu!leah!itsgw!imagine!pawl21.pawl.rpi.edu!jesup From: jesup@pawl21.pawl.rpi.edu (Randell E. Jesup) Newsgroups: comp.arch Subject: Re: RPM-40 [really forwarding] Message-ID: <499@imagine.PAWL.RPI.EDU> Date: 9 Mar 88 08:24:06 GMT References: <9758@steinmetz.steinmetz.UUCP> <9799@steinmetz.steinmetz.UUCP> <1800@gumby.mips.COM> Sender: news@imagine.PAWL.RPI.EDU Reply-To: beowulf!lunge!jesup@steinmetz.UUCP Organization: RPI Public Access Workstation Lab - Troy, NY Lines: 48 In article <1800@gumby.mips.COM> earl@mips.COM (Earl Killian) writes: >In article <475@imagine.PAWL.RPI.EDU> jesup@pawl23.pawl.rpi.edu (Randell E. Jesup) writes: > > 1) Slows down critical path. Any finely tuned risc CPU will most > probably have it's cycle time determined by the latency through the > ALU. Using a loopback of ALU results might result (depending on > layout, tech, etc) in up to a 20% slowdown in the ALU, plus > increase the chip area and layout problems. This doesn't mean a ... >To answer these questions I reran a local analysis program on the >results of 13 program runs. [data indicating 20% loss on Mips R2000 by removing loopback AND increasing load delay to 3] >I.e. the lack of bypassing is equivalent to a cycle time increase of >20%. I.e. 5ns @ 40MHz. The effect was as low as 2.4% and as high as >41%, which simply proves you can prove anything you like by looking at >single data points. Thanks for the data! Sounds like a nice piece of software for playing with architectures. Two points: 1) The RPM-40 does have bypass on loads, you can use the result of a load in the cycle it's going into the register file. Bypass is only missing on ALU ops. I'd appreciate it is you'd re-run using just an increased ALU latency. 2) I suspect that the software is assuming that it can't store the result of an ALU op in the next cycle. In the rpm-40, you can store it in the next cycle, as the store accesses the register in it's WB phase; it's using it's ALU phase for address calculation. Also, we have a smaller number of GP registers, which causes more modify-store and load-modify- store operations. It looks like my 20% figure (of the top of my head) was 'interesting'. Of curse that was just chance. I agree that there is a cost due to not having ALU bypassing, but I think your 20% figure is a upper limit for the average loss. I suspect maybe more like 5-15% will be the case, given the factors above. >Anyway, I hope the hard data helps the discussion. Most certainly! Thank you. // Randell Jesup Lunge Software Development // Dedicated Amiga Programmer 13 Frear Ave, Troy, NY 12180 \\// beowulf!lunge!jesup@steinmetz.UUCP (518) 272-2942 \/ (uunet!steinmetz!beowulf!lunge!jesup) BIX: rjesup (-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-)