Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!pyramid!voder!apple!bcase From: bcase@Apple.COM (Brian Case) Newsgroups: comp.arch Subject: Re: RPM-40 microprocessor @ 40 MHz; dat Message-ID: <7575@apple.Apple.Com> Date: 7 Mar 88 01:58:55 GMT References: <9758@steinmetz.steinmetz.UUCP> <9800@steinmetz.steinmetz.UUCP> <476@imagine.PAWL.RPI.EDU> Reply-To: bcase@apple.UUCP (Brian Case) Organization: Ungermann-Bass Enterprises Lines: 44 In article <476@imagine.PAWL.RPI.EDU> beowulf!lunge!jesup@steinmetz.UUCP writes: >In article <9800@steinmetz.steinmetz.UUCP> oconnor%sungod@steinmetz.UUCP writes: > Another point: even without bypassing, if you're using the ALU for >address computation you can store the results of a ALU op in the next >instruction. This removes a lot of whatever loss you have in not having >bypassing, since and are fairly >frequent operations. Yes, this works if the TLB or address bus is in the pipe stage following the ALU. In the Am29000, this is not the case: the TLB is in the same stage as the ALU. Thus, without bypassing, things would be more difficult. (The TLB is alongside the ALU to make simple pointer dereferences go fast.) Also, it may not be the case that and are as frequent as you might like (with fewer registers, they are more frequent). >Once again, you must look at your assumptions with >skepticism in RISC design: calculate what it will cost you to implement >a feature, then how much you gain. Also, remember that other peoples >figures/assumptions may not match yours, especially if they are focusing >on a specific part of performance (like integer-only, or FP-only, etc). A very good point: features/organizations are usually very interdependent so that changing one thing can have significant effects on others. Trivial example: change the instruction size on the Am29000 to 16 bits. Re: no bypassing. Probably the most important thing is to get your compiler to produce great code for inner loops. If the lack of bypassing adds a cycle to a 10 cycle loop, then you are hurt unless you have a 10% faster cycle time because of no bypassing. I looked at one inner loop (in sieve, so this is proabably not representative of everything else :-) and it seemed that omitting bypassing was OK, i.e. it didn't force no-ops to be added. Gosh, there really ought to be some data somewhere on this.... >>But I'm not sure the details of the TIB >>have been released. I'll expand on it if it has been. > > Dennis, I think the title of the ISSCC talk was "40 Mhz CMOS CPU with >instruction cache", so I think it's ok. Not like it's a patentable idea, >anyway. :-) (SIGH. Yet another example of my foot in my mounth. A patent can still be issued for the implementation, I think. And I didn't mention the patent application to be antagonistic in anyway; I was just trying to point out that there were earlier incarnations.)