Path: utzoo!mnetor!uunet!lll-winken!lll-tis!ames!hao!boulder!sunybcs!bingvaxu!leah!itsgw!imagine!pawl21.pawl.rpi.edu!jesup
From: jesup@pawl21.pawl.rpi.edu (Randell E. Jesup)
Newsgroups: comp.arch
Subject: Re: RPM-40 [really forwarding]
Message-ID: <499@imagine.PAWL.RPI.EDU>
Date: 9 Mar 88 08:24:06 GMT
References: <9758@steinmetz.steinmetz.UUCP> <9799@steinmetz.steinmetz.UUCP> <1800@gumby.mips.COM>
Sender: news@imagine.PAWL.RPI.EDU
Reply-To: beowulf!lunge!jesup@steinmetz.UUCP
Organization: RPI Public Access Workstation Lab - Troy, NY
Lines: 48

In article <1800@gumby.mips.COM> earl@mips.COM (Earl Killian) writes:
>In article <475@imagine.PAWL.RPI.EDU> jesup@pawl23.pawl.rpi.edu (Randell E. Jesup) writes:
>
>   1) Slows down critical path.  Any finely tuned risc CPU will most
>   probably have it's cycle time determined by the latency through the
>   ALU.  Using a loopback of ALU results might result (depending on
>   layout, tech, etc) in up to a 20% slowdown in the ALU, plus
>   increase the chip area and layout problems.  This doesn't mean a
...
>To answer these questions I reran a local analysis program on the
>results of 13 program runs.

[data indicating 20% loss on Mips R2000 by removing loopback AND increasing
 load delay to 3]

>I.e. the lack of bypassing is equivalent to a cycle time increase of
>20%.  I.e. 5ns @ 40MHz.  The effect was as low as 2.4% and as high as
>41%, which simply proves you can prove anything you like by looking at
>single data points.

	Thanks for the data!  Sounds like a nice piece of software for
playing with architectures.

	Two points:  1)  The RPM-40 does have bypass on loads, you can use the
result of a load in the cycle it's going into the register file.  Bypass is
only missing on ALU ops.  I'd appreciate it is you'd re-run using just an
increased ALU latency.  2)  I suspect that the software is assuming that it
can't store the result of an ALU op in the next cycle.  In the rpm-40, you can
store it in the next cycle, as the store accesses the register in it's WB
phase; it's using it's ALU phase for address calculation.  Also, we have a
smaller number of GP registers, which causes more modify-store and load-modify-
store operations.  

	It looks like my 20% figure (of the top of my head) was 'interesting'.
Of curse that was just chance.  I agree that there is a cost due to not having
ALU bypassing, but I think your 20% figure is a upper limit for the average
loss.  I suspect maybe more like 5-15% will be the case, given the factors
above.

>Anyway, I hope the hard data helps the discussion.

	Most certainly!  Thank you.

     //	Randell Jesup			      Lunge Software Development
    //	Dedicated Amiga Programmer            13 Frear Ave, Troy, NY 12180
 \\//	beowulf!lunge!jesup@steinmetz.UUCP    (518) 272-2942
  \/    (uunet!steinmetz!beowulf!lunge!jesup) BIX: rjesup

(-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-)