Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!cmcl2!rutgers!sunybcs!boulder!hao!noao!mcdsun!nud!rover!mph From: mph@rover.UUCP (Mark Huth) Newsgroups: comp.sys.atari.st,comp.sys.misc,comp.sys.amiga Subject: Re: Weighty instructions Message-ID: <557@rover.UUCP> Date: Thu, 15-Oct-87 12:36:18 EDT Article-I.D.: rover.557 Posted: Thu Oct 15 12:36:18 1987 Date-Received: Sat, 17-Oct-87 11:12:30 EDT References: <1138@water.waterloo.edu> <2452@cbmvax.UUCP> <7422@e.ms.uky.edu> <90@piring.cwi.nl> Reply-To: mph@rover.UUCP (Mark Huth) Organization: Motorola Microcomputer Division, Tempe, Az. Lines: 69 Xref: mnetor comp.sys.atari.st:5708 comp.sys.misc:933 comp.sys.amiga:9477 In article <90@piring.cwi.nl> steven@cwi.nl (Steven Pemberton) writes: > >Well, surely this is the purpose of the Dhrystone benchmark. Of >course, the quality of the compiler distorts the figure, but at least It seems to me that if one programs in C, then the C compiler is part of the environment. The fact that the compiler distorts the raw machine power to some extent is true, but unless you are an assembly guru (not just think that you are, since the CISC machines are quite complex and have timings that are no longer obvious due to caches and pipelines) you cannot generate code to fully utilize a giver archetecture'r power. Therefore, the high-level language benchmarks are very useful. We are able to improve the Dhrystone ratings of our systems by as much as 33% by improving the compiler. This is real good news, as all programs get some considerable performance gain by recompilation as better compilers become available. A couple of comments about RISC - Usually RISC is indicative of a design philosophy which uses little or no microcode. Most instructions are 2 or 3 address register to register instructions, with memory accesses limited to a few simple addressing modes of a load or store instruction. The simple instructions allow them to be organized to require the same length pipeline. Often pipline interlocks are left for the compiler to worry about. As a result, once the pipe is full, RISC will complete one instruction per clock. Normally, the instruction after a branch is executed whether the branch is take or not, leading to a significant performance improvement (by keeping the pipe full) provided the compiler can find a useful instruction to execute regardless of whether the branch is taken or not. This appears to be possible about 90% of the time. The other 10% is a nop - which is no loss, as the pipe would have otherwise been disrupted anyway. It is argued that the simpler instruction sets allow the compiler a better shot at optimization during code generation than trying to find exactly the right CISC instruction for a particular purpose. In essence, the compiler works at a level similar to the microcode level of a CISC architecture. Complex addressing modes are generated by multiple simple instructions. For example, the compiler generates MOVE.L ([ptr],offset),D0 to load a value given by the c statements register int D0; struct TMP *ptr; D0 = ptr -> offset; while the RISC machine might need to do LOAD #ptr,R20 Load immediate (use value from instruction stream) LOAD (R20),R24 Load indirect (use address in) R20 LOAD #offset,R21 ADD R20,R21,R22 Add R20 and R21, leaving value in R22 LOAD (R22),R0 Now get actual value (previous stuff was address) Of course, due to the (normally) large register set of the RISC machine, the constants and variables may already be in the registers, considerably reducing the number of instructions needed. The compiler is supposed to make this choice. Of course, since RISC often requires more instructions to accomplish its task, it is common to find RISC machines belonging to the Harvard class (separate instruction and data memory streams). Mark Huth