Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!unmvax!pprg.unm.edu!hc!lanl!jlg
From: jlg@lanl.gov (Jim Giles)
Newsgroups: comp.arch
Subject: Re: Compiling - RISC vs. CISC
Message-ID: <13980@lanl.gov>
Date: 11 Jul 89 06:02:07 GMT
References: <2190@oakhill.UUCP>
Organization: Los Alamos National Laboratory
Lines: 84

From article <2190@oakhill.UUCP>, by davet@oakhill.UUCP (David Trissel):
> In article <13976@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> [...]
>>For a RISC machine, the only hard part of the "back end" is register
>>allocation.  
> 
> What about the required pairing of registers for double wide operations
> such as floating-point or shifting? 

In what way does a machine which requires register pairing qualify as
a RISC?  If an instruction requires 2 operands, they should be allowed
to be any two general purpose registers.  Furthermore, you are assuming
that floating point is larger than other intrinsic types.  The best RISCs
are those which only have _one_ data size.  (By the way, my model of a
reasonable RISC would be a Cray-I instruction set without vectors.  This
is certainly RISCy - all data is 64 bits, all operations are reg to reg,
only one memory addressing mode, etc..)

> [...]
>>Instruction selection is fairly simple since there is
>>generally only one way the perform each intermediate code operation.
> 
> This is a strange statement. Since in general terms CISC instruction sets are 
> supersets of RISC models then why are the "extra" available CISC 
> instructions mandated to be used by a CISC compiler? Indeed, one of the
> arguments for RISC is the elimination of "unused" instructions from the
> instruction set. Although this may bring up important architectural
> differences between RISC and CISC it has no bearing on the complexity
> of a compiler.

This is _really_ a strange statement.  Since the supposed advantage of
CISC is the richer instruction set, failure to use it would not take
advantage of the machine.  I've heard CISC designers claim that individual
instructions can be allowed to be slower than possible in order to
provide the additional instructions.  If you are not using those
extra instructions, you might as well have a RISC which provides
only the instructions you _do_ use.  The hardware designer could
then spend more time making those work faster instead of making sure
that the unused instructions work.

So, this issue _does_ have a bearing on the complexity of the compiler.
If you are not willing to provide the sophisticated compilers required
to adequately use a machine, you have wasted money (read: design effort,
chip space, etc.) on the hardware.

>>On a pipelined machine, code ordering comes into play (at least if
>>you want optimized code).  This compilcates matters, since a different
>>code ordering makes different register allocation constraints.
>>For this reason, optimizing can be difficult, even on a RISC machine.
> 
> But as CISC implementations become more advanced the applicability of code
> reordering is starting to surface there as well.

_EXACTLY_!!!!!   All the optimizations required on a RISC are also
required on a CISC.  CISC just adds more complexity to the mix.

> [... example with C: *p++ ...]
>    mov.l  (%an),%dn
>    add.l  &4,%an
> or the faster
>    mov.l  (%an)+,%dn
> The 68K requires a routine in the compiler peephole optimizer to "discover" 
> and implement this optimization. But the result is a single 16-bit instruction
> which (I think) executes in a single clock on the MC68040.

Exactly my point.  There are actually several other possibilities
for instruuction selection in this case.  For example, p may already
be resident in a register.  The use of the data may require it to
end up in a register.  The further use of p may require it to be
left in a register.  Etc..  Only a pretty sophisticated compiler
can determine which instructions to use in each context.  By contrast,
there is only _one_ instruction sequence which will work on most RISC
machines:  load p, load data, store data, increment p, store p.
The 'peephole' optimizer need only discover the redundant loads
and stores to fit this sequence into context.  The instruction 
scheduler can reeorder the last four of these any way it likes.

Now, clearly 5 instructions may take longer than the 1 in your 68K example.
But, RISC machines are easier to pipeline, easier to speed up the clock
for, easier to provide staged functional units for, etc..  I don't
know of any CISC machines with 'hardwired' instruction sets.  Micro-
coding slows the machine down, but is typically the only way to fit
a CISC on a chip.  All this may mean that 5 instructions on a RISC
may be _faster_ than one on a CISC.