Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!gatech!amdcad!tim From: tim@amdcad.AMD.COM (Tim Olson) Newsgroups: comp.arch Subject: Re: AM29000 Booleans Message-ID: <16627@amdcad.AMD.COM> Date: Sat, 9-May-87 14:09:03 EDT Article-I.D.: amdcad.16627 Posted: Sat May 9 14:09:03 1987 Date-Received: Sun, 10-May-87 05:20:50 EDT References: <1270@aw.sei.cmu.edu> <138@neptune.AMD.COM> <3540@spool.WISC.EDU> <16587@amdcad.AMD.COM> <1512@drivax.UUCP> Reply-To: tim@amdcad.UUCP (Tim Olson) Organization: Advanced Micro Devices, Inc., Sunnyvale, Ca. Lines: 50 In article <1512@drivax.UUCP> socha@drivax.UUCP (Henri J. Socha (x6251)) writes: +----- | Well, to add some fuel to the fire about how a machine's performance can | depend on the smarts in the compiler used, I hand modified the example code | given in the referenced article. The changed code is shown below. | The changes were limited to taking better advantage of the delayed branching. | I only re-arranged and removed some code. ... changed code +----- | The savings were: | nop | jmp $17 <-- remember those delayed branches! | nop | jmp $19 +----- The optimizations you performed were valid. Our internal compiler is conservative in the instructions it selects for placement after a delayed branch -- it just didn't recognize the case you found. The standard "commercial compiler" for the Am29000 should be able to perform this, though. +----- | BTW I like the fact that non-arithmetic instructions DO NOT change (affect) | the condition code (status) register. This can develop other optimizations. +----- Actually, the only condition codes used by the processor are the carry (C) and the divide flag (DF). The others are there for "completeness" and to enhance performance in emulations. ... discussion of multiply taking 32 steps +----- | Now, I can understand the advantages of a RISC processor but this is going | to far. Should I put 32 instructions in the processing stream each time | I need to multiply two numbers? Should I use a subroutine? | Seems to me a perfect time/space tradeoff decision. But, what are the costs? +----- Yes, it is a time/space tradeoff. Note, however, that the Am29000 has a relatively low overhead cost associated with a subroutine call. Our C runtime library code we use for simulations has a multiply subroutine which is a variant of the one shown in the user's manual. It performs a quick check at the beginning to see how many steps are really required, then performs only that many steps. This has been shown to reduce the entire cost of a runtime multiply (including procedure-call overhead) to around 25 cycles (when used on a range of multiply-intensive code). -- Tim Olson Advanced Micro Devices