Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!cmcl2!rutgers!sunybcs!boulder!hao!noao!mcdsun!nud!rover!mph
From: mph@rover.UUCP (Mark Huth)
Newsgroups: comp.sys.atari.st,comp.sys.misc,comp.sys.amiga
Subject: Re: Weighty instructions
Message-ID: <557@rover.UUCP>
Date: Thu, 15-Oct-87 12:36:18 EDT
Article-I.D.: rover.557
Posted: Thu Oct 15 12:36:18 1987
Date-Received: Sat, 17-Oct-87 11:12:30 EDT
References: <1138@water.waterloo.edu> <2452@cbmvax.UUCP> <7422@e.ms.uky.edu> <90@piring.cwi.nl>
Reply-To: mph@rover.UUCP (Mark Huth)
Organization: Motorola Microcomputer Division, Tempe, Az.
Lines: 69
Xref: mnetor comp.sys.atari.st:5708 comp.sys.misc:933 comp.sys.amiga:9477

In article <90@piring.cwi.nl> steven@cwi.nl (Steven Pemberton) writes:
>
>Well, surely this is the purpose of the Dhrystone benchmark. Of
>course, the quality of the compiler distorts the figure, but at least

It seems to me that if one programs in C, then the C compiler is part
of the environment.  The fact that the compiler distorts the raw
machine power to some extent is true, but unless you are an assembly
guru (not just think that you are, since the CISC machines are quite
complex and have timings that are no longer obvious due to caches and
pipelines) you cannot generate code to fully utilize a giver
archetecture'r power.  Therefore, the high-level language benchmarks
are very useful.

We are able to improve the Dhrystone ratings of our systems by as much
as 33% by improving the compiler.  This is real good news, as all
programs get some considerable performance gain by recompilation as
better compilers become available.

A couple of comments about RISC - 

Usually RISC is indicative of a design philosophy which uses little or
no microcode.  Most instructions are 2 or 3 address register to
register instructions, with memory accesses limited to a few simple
addressing modes of a load or store instruction.  The simple
instructions allow them to be organized to require the same length
pipeline.  Often pipline interlocks are left for the compiler to worry
about.  As a result, once the pipe is full, RISC will complete one
instruction per clock.  Normally, the instruction after a branch is
executed whether the branch is take or not, leading to a significant
performance improvement (by keeping the pipe full) provided the
compiler can find a useful instruction to execute regardless of
whether the branch is taken or not.  This appears to be possible about
90% of the time.  The other 10% is a nop - which is no loss, as the
pipe would have otherwise been disrupted anyway.

It is argued that the simpler instruction sets allow the compiler a
better shot at optimization during code generation than trying to find
exactly the right CISC instruction for a particular purpose.  In
essence, the compiler works at a level similar to the microcode level
of a CISC architecture.  Complex addressing modes are generated by
multiple simple instructions.

For example, the compiler generates MOVE.L ([ptr],offset),D0 to load a
value given by the c statements

register int D0;
struct TMP *ptr;

D0 = ptr -> offset;

while the RISC machine might need to do

LOAD #ptr,R20       Load immediate (use value from instruction stream)
LOAD (R20),R24      Load indirect (use address in) R20
LOAD #offset,R21
ADD R20,R21,R22     Add R20 and R21, leaving value in R22
LOAD (R22),R0       Now get actual value (previous stuff was address)

Of course, due to the (normally) large register set of the RISC
machine, the constants and variables may already be in the registers,
considerably reducing the number of instructions needed.  The compiler
is supposed to make this choice.

Of course, since RISC often requires more instructions to accomplish
its task, it is common to find RISC machines belonging to the Harvard
class (separate instruction and data memory streams).

Mark Huth