Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!cmcl2!beta!hc!ames!amdcad!tim
From: tim@amdcad.AMD.COM (Tim Olson)
Newsgroups: comp.arch
Subject: Re: What should be in hardware but isn'
Message-ID: <18502@amdcad.AMD.COM>
Date: Fri, 2-Oct-87 13:38:14 EDT
Article-I.D.: amdcad.18502
Posted: Fri Oct  2 13:38:14 1987
Date-Received: Tue, 6-Oct-87 04:56:48 EDT
References: <581@l.cc.purdue.edu> <28200048@ccvaxa> <340@oracle.UUCP>
Reply-To: tim@amdcad.UUCP (Tim Olson)
Organization: Advanced Micro Devices
Lines: 28

In article <340@oracle.UUCP> bradbury@oracle.UUCP (Robert Bradbury) writes:
| So from my experience the cost of mapping CISC functions into CISC instructions
| can be quite a large part of the code generator of a compiler.  Do the RISC
| people have any measures of how much work goes into a RISC code generator
| for things like DIV/MUL, STRCPY/MEMCPY or BRANCH scheduling?  (Some of the code
| published for the AMD 29000 indicates these aren't afternoon efforts :-).)

In our "development" C compiler, div/mul, strcpy/memcpy are simply calls
to the runtime routines to perform these functions, so there was no cost
in the code generator for these.  I didn't write the code generator, but
the delayed-branch scheduling code in the optimizer is very small.


| Have we gotten to the point where we can estimate the hardware development
| costs of branch destination caching vs. the software development costs
| of branch scheduling and trade them off against each other?

The two aren't mutually-exclusive (the Am29000 implements both). 
Delayed-branches allow execution of instructions following the branch
which are already in the pipeline, while the Branch Target Cache reduces
or eliminates the latency involved in starting a new instruction stream.

Perhaps you mean the tradeoff between delayed-branches and branch
prediction?

	-- Tim Olson
	Advanced Micro Devices
	(tim@amdcad.amd.com)