Path: utzoo!mnetor!uunet!steinmetz!sungoddess!oconnor From: oconnor@sungoddess.steinmetz (Dennis M. O'Connor) Newsgroups: comp.arch Subject: Re: RPM-40 [really forwarding] Message-ID: <9799@steinmetz.steinmetz.UUCP> Date: 5 Mar 88 03:28:02 GMT References: <9758@steinmetz.steinmetz.UUCP> Sender: news@steinmetz.steinmetz.UUCP Reply-To: oconnor%sungod@steinmetz.UUCP Organization: GE Corporate R&D Center Lines: 48 An article by tim@amdcad.UUCP (Tim Olson) says: ] In article <9758@steinmetz.steinmetz.UUCP> sungoddess!oconnor@steinmetz.UUCP writes: ] | IMHO, a pipelined processor should run as fast as the its ALU ] | lets it. ... ] | ] | Even a simple bypass path adds to this delay. It means ] | that whatever the setup and delay times of this path, ] | it must be added to the basic machine cycle time, IF ] | that cycle time is determined by the ALU, as it SHOULD BE (IMHO). ] | This is LESS of a penalty than adding a register access, ] | but still a penalty. So is it a win ? ] ] It depends upon how often alu forwarding occurs (see below). If it is ] frequent, it is much better to slow the pipeline by the small amount of ] time it takes to forward the result, rather than stalling a whole cycle. ] [... example deleted ...] So far I agree, but there's more ... How often forwarding is needed is only PART of the story. The other part is how often you could "fill" the delay from forwarding. ] Here are some numbers from the Am29000 simulator running a small "nroff" ] ] instructions executed: 89435 ] instructions requiring alu forwarding: 41420 (46%) ] instructions forwarding from load buffer: 13669 (15%) But if I can fill 90%, say, of the one-cycle latency delays with a reorganizer, then I only incur a penalty of 5%, which means, for RPM40, that a bypass path is justified only if it incurs a penalty of 1.2 nanoseconds or less. If I can fill 80% of the latencies, then a bypass that inflicts a penalty on the basic cycle time of 2.5 nanoseconds or less is a win. SO not only do we need data like you've provided, we need to know how often we can reorganize the delay away. Unfortuneately, I don't really have good data for either of these factors. ] I haven't seen published studies on dynamic forwarding frequencies -- ] does anyone know of such papers? ] -- Tim Olson I, too, would be VERY interested in any such works. -- Dennis O'Connor oconnor%sungod@steinmetz.UUCP ARPA: OCONNORDM@ge-crd.arpa (-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-)