Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!pyramid!voder!apple!bcase
From: bcase@Apple.COM (Brian Case)
Newsgroups: comp.arch
Subject: Re: RPM-40 microprocessor @ 40 MHz; dat
Message-ID: <7575@apple.Apple.Com>
Date: 7 Mar 88 01:58:55 GMT
References: <9758@steinmetz.steinmetz.UUCP> <9800@steinmetz.steinmetz.UUCP> <476@imagine.PAWL.RPI.EDU>
Reply-To: bcase@apple.UUCP (Brian Case)
Organization: Ungermann-Bass Enterprises
Lines: 44

In article <476@imagine.PAWL.RPI.EDU> beowulf!lunge!jesup@steinmetz.UUCP writes:
>In article <9800@steinmetz.steinmetz.UUCP> oconnor%sungod@steinmetz.UUCP writes:
>	Another point: even without bypassing, if you're using the ALU for
>address computation you can store the results of a ALU op in the next
>instruction.  This removes a lot of whatever loss you have in not having
>bypassing, since <modify;store> and <load;modify;store> are fairly
>frequent operations.

Yes, this works if the TLB or address bus is in the pipe stage following
the ALU.  In the Am29000, this is not the case:  the TLB is in the same
stage as the ALU.  Thus, without bypassing, things would be more difficult.
(The TLB is alongside the ALU to make simple pointer dereferences go fast.)
Also, it may not be the case that <modify;store> and <load;modify;store>
are as frequent as you might like (with fewer registers, they are more
frequent).

>Once again, you must look at your assumptions with
>skepticism in RISC design: calculate what it will cost you to implement
>a feature, then how much you gain.  Also, remember that other peoples 
>figures/assumptions may not match yours, especially if they are focusing
>on a specific part of performance (like integer-only, or FP-only, etc).

A very good point:  features/organizations are usually very interdependent
so that changing one thing can have significant effects on others.  Trivial
example:  change the instruction size on the Am29000 to 16 bits.

Re:  no bypassing.  Probably the most important thing is to get your compiler
to produce great code for inner loops.  If the lack of bypassing adds
a cycle to a 10 cycle loop, then you are hurt unless you have a 10% faster
cycle time because of no bypassing.  I looked at one inner loop (in sieve,
so this is proabably not representative of everything else :-) and it
seemed that omitting bypassing was OK, i.e. it didn't force no-ops to be
added.  Gosh, there really ought to be some data somewhere on this....

>>But I'm not sure the details of the TIB
>>have been released. I'll expand on it if it has been.
>
>	Dennis, I think the title of the ISSCC talk was "40 Mhz CMOS CPU with
>instruction cache", so I think it's ok.  Not like it's a patentable idea,
>anyway. :-)

(SIGH.  Yet another example of my foot in my mounth.  A patent can still be
issued for the implementation, I think.  And I didn't mention the patent
application to be antagonistic in anyway; I was just trying to point out
that there were earlier incarnations.)